Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Science for Web3

You're reading from   Data Science for Web3 A comprehensive guide to decoding blockchain data with data analysis basics and machine learning cases

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781837637546
Length 344 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Gabriela Castillo Areco Gabriela Castillo Areco
Author Profile Icon Gabriela Castillo Areco
Gabriela Castillo Areco
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1 Web3 Data Analysis Basics
2. Chapter 1: Where Data and Web3 Meet FREE CHAPTER 3. Chapter 2: Working with On-Chain Data 4. Chapter 3: Working with Off-Chain Data 5. Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity 6. Chapter 5: Exploring Analytics on DeFi 7. Part 2 Web3 Machine Learning Cases
8. Chapter 6: Preparing and Exploring Our Data 9. Chapter 7: A Primer on Machine Learning and Deep Learning 10. Chapter 8: Sentiment Analysis – NLP and Crypto News 11. Chapter 9: Generative Art for NFTs 12. Chapter 10: A Primer on Security and Fraud Detection 13. Chapter 11: Price Prediction with Time Series 14. Chapter 12: Marketing Discovery with Graphs 15. Part 3 Appendix
16. Chapter 13: Building Experience with Crypto Data – BUIDL 17. Chapter 14: Interviews with Web3 Data Leaders 18. Index 19. Other Books You May Enjoy Appendix 1
1. Appendix 2
2. Appendix 3

Data standards challenges

As a young industry, there is still no complete consensus on the meaning of many concepts. Let’s examine a few examples.

Retail

Within the cryptocurrency space, there is a complete aquatic ecosystem used to categorize addresses based on the amount of cryptocurrencies they hold. Larger addresses are often referred to as “whales,” while smaller addresses have their own names. Please refer to the following illustration for reference:

Figure 1.5 – Sizes of crypto holdings and their aquatic equivalent

Figure 1.5 – Sizes of crypto holdings and their aquatic equivalent

While there is a consensus on the aquatic equivalents used for categorization, there is no unified agreement on the specific number of Bitcoins that each category represents. A quick Google search will reveal varying criteria for what constitutes an address in one category or another.

Another classification, which is particularly valuable for analysts, distinguishes between retail addresses (small investors) and professional addresses. The challenge lies in determining the threshold that distinguishes one from the other. Various approaches are in use, and we can follow the aquatic equivalent definition as mentioned previously or opt for the definition proposed by a forensic company called Chainalysis, which states: “Retail traders (…) deposit less than $10,000 USD worth of Bitcoin at a time on exchanges.”

Confirmations

In a traditional bank or centralized organization, when a user sends a transaction, once received, it is confirmed and can be considered complete. In the blockchain space, the decentralized nature of the network introduces a different dynamic and, consequently, it is common to see the number of confirmations required for transactions of different amounts to be considered valid.

Within a decentralized network, it is entirely possible for two blocks to be mined simultaneously in different parts of the world. The protocol waits for the next block to be mined and, depending on where it attaches, determines which chain is the longest (the longest chain is deemed the valid one). A block in Bitcoin that doesn’t become part of the longest chain is referred to as a stale block, while in Ethereum (prior to the merge), they were known as uncle blocks. Once a transaction is incorporated into a block, it is assigned one confirmation. If our transaction finds its way into a stale block, it will be reversed and return to the mempool in Bitcoin or be added to another block in Ethereum, resuming the count of confirmations.

Given the possibility of reversal, however slim, it has become customary for transaction counterparties to request a certain number of confirmations before accepting that their transactions are irreversible. The longer the chain grows after the block that included our transaction, the less likely it is to be reversed. Following the merge in Ethereum (which took place at block 15537394), uncle blocks ceased to be generated, but some of these practices persist among market participants.

There are no universal standards for the number of confirmations required. Recommendations can vary, with some suggesting six confirmations for Bitcoin and only two for small transfer amounts. For Ethereum, the range was typically between 20 and 40 confirmations. Notably, centralized exchanges such as Coinbase may still require two confirmations for Bitcoin and 35 for Ethereum.

Figure 1.6– Stale and valid transactions

Figure 1.6– Stale and valid transactions

NFT Floor Price

The NFT Floor Price serves as a metric for determining the minimum price at which any NFT within a collection can be sold, providing market participants with valuable insights into a project’s fair pricing.

There is no universally accepted method for its calculation. One basic approach involves finding the minimum price at which an NFT within a collection has previously been sold. However, due to the presence of multiple marketplaces, each with its unique pricing structure, an alternative approach is to consider prices from the most prominent art marketplaces or to aggregate prices from various sources, giving more weight to the significant marketplaces.

Furthermore, it is crucial to account for practices such as wash trading, which artificially inflates the metric under analysis. We will analyze more of this concept in Chapter 4.

The concept of “lost”

Suppose we need to calculate the circulating supply of Bitcoins for the next five years. For such a calculation, we must take into account not only how much will be mined but also how many Bitcoins can be considered “lost.” How do we determine that a certain amount of crypto is lost?

To move assets on the blockchain, we need to sign the transfer with our private key. If that private key is lost, we cannot access those assets, and therefore, those assets have to be counted as lost. With this information in mind, it is safe to assume that some of the Bitcoin supplied as of today is already lost or will be. When reading the blockchain, we can see those funds in possession of a certain address, but it is entirely possible that such an address is unable to dispose of them. Since this is a pseudo-anonymous system, we cannot contact the Bitcoin holders and ask them to verify whether they have access to their funds; there is no centralized way to do it.

The forensics company named Chainalysis proposed the definition that “Bitcoin that has not moved for five years now is considered lost.” The consequence of such a definition is that 20% of the mined bitcoins would be lost. This is a proposed concept, and it is yet to be seen whether it becomes a standard.

In conclusion, we can agree on three ideal ways of approaching data in Web3:

  • Deep dive into the metrics that will be available in our dashboards or the data that will be consumed by our model. Read the concepts and documentation thoroughly.
  • Be open to finding different approaches to the same market subject. Since the industry is growing, there is no established way of doing some things.
  • Be prepared to witness concepts change as the industry matures and fine-tunes its best practices.

To understand the technical aspects of smart contracts, the OpenZeppelin documentation is a valuable reference. Similarly, for market-related concepts, as mentioned previously, Chainalysis defines many concepts and can help as a starting point.

You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781837637546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime