Data standards challenges
As a young industry, there is still no complete consensus on the meaning of many concepts. Let’s examine a few examples.
Retail
Within the cryptocurrency space, there is a complete aquatic ecosystem used to categorize addresses based on the amount of cryptocurrencies they hold. Larger addresses are often referred to as “whales,” while smaller addresses have their own names. Please refer to the following illustration for reference:
Figure 1.5 – Sizes of crypto holdings and their aquatic equivalent
While there is a consensus on the aquatic equivalents used for categorization, there is no unified agreement on the specific number of Bitcoins that each category represents. A quick Google search will reveal varying criteria for what constitutes an address in one category or another.
Another classification, which is particularly valuable for analysts, distinguishes between retail addresses (small investors) and professional addresses. The challenge lies in determining the threshold that distinguishes one from the other. Various approaches are in use, and we can follow the aquatic equivalent definition as mentioned previously or opt for the definition proposed by a forensic company called Chainalysis, which states: “Retail traders (…) deposit less than $10,000 USD worth of Bitcoin at a time on exchanges.”
Confirmations
In a traditional bank or centralized organization, when a user sends a transaction, once received, it is confirmed and can be considered complete. In the blockchain space, the decentralized nature of the network introduces a different dynamic and, consequently, it is common to see the number of confirmations required for transactions of different amounts to be considered valid.
Within a decentralized network, it is entirely possible for two blocks to be mined simultaneously in different parts of the world. The protocol waits for the next block to be mined and, depending on where it attaches, determines which chain is the longest (the longest chain is deemed the valid one). A block in Bitcoin that doesn’t become part of the longest chain is referred to as a stale block, while in Ethereum (prior to the merge), they were known as uncle blocks. Once a transaction is incorporated into a block, it is assigned one confirmation. If our transaction finds its way into a stale block, it will be reversed and return to the mempool in Bitcoin or be added to another block in Ethereum, resuming the count of confirmations.
Given the possibility of reversal, however slim, it has become customary for transaction counterparties to request a certain number of confirmations before accepting that their transactions are irreversible. The longer the chain grows after the block that included our transaction, the less likely it is to be reversed. Following the merge in Ethereum (which took place at block 15537394
), uncle blocks ceased to be generated, but some of these practices persist among market participants.
There are no universal standards for the number of confirmations required. Recommendations can vary, with some suggesting six confirmations for Bitcoin and only two for small transfer amounts. For Ethereum, the range was typically between 20 and 40 confirmations. Notably, centralized exchanges such as Coinbase may still require two confirmations for Bitcoin and 35 for Ethereum.
Figure 1.6– Stale and valid transactions
NFT Floor Price
The NFT Floor Price serves as a metric for determining the minimum price at which any NFT within a collection can be sold, providing market participants with valuable insights into a project’s fair pricing.
There is no universally accepted method for its calculation. One basic approach involves finding the minimum price at which an NFT within a collection has previously been sold. However, due to the presence of multiple marketplaces, each with its unique pricing structure, an alternative approach is to consider prices from the most prominent art marketplaces or to aggregate prices from various sources, giving more weight to the significant marketplaces.
Furthermore, it is crucial to account for practices such as wash trading, which artificially inflates the metric under analysis. We will analyze more of this concept in Chapter 4.
The concept of “lost”
Suppose we need to calculate the circulating supply of Bitcoins for the next five years. For such a calculation, we must take into account not only how much will be mined but also how many Bitcoins can be considered “lost.” How do we determine that a certain amount of crypto is lost?
To move assets on the blockchain, we need to sign the transfer with our private key. If that private key is lost, we cannot access those assets, and therefore, those assets have to be counted as lost. With this information in mind, it is safe to assume that some of the Bitcoin supplied as of today is already lost or will be. When reading the blockchain, we can see those funds in possession of a certain address, but it is entirely possible that such an address is unable to dispose of them. Since this is a pseudo-anonymous system, we cannot contact the Bitcoin holders and ask them to verify whether they have access to their funds; there is no centralized way to do it.
The forensics company named Chainalysis proposed the definition that “Bitcoin that has not moved for five years now is considered lost.” The consequence of such a definition is that 20% of the mined bitcoins would be lost. This is a proposed concept, and it is yet to be seen whether it becomes a standard.
In conclusion, we can agree on three ideal ways of approaching data in Web3:
- Deep dive into the metrics that will be available in our dashboards or the data that will be consumed by our model. Read the concepts and documentation thoroughly.
- Be open to finding different approaches to the same market subject. Since the industry is growing, there is no established way of doing some things.
- Be prepared to witness concepts change as the industry matures and fine-tunes its best practices.
To understand the technical aspects of smart contracts, the OpenZeppelin documentation is a valuable reference. Similarly, for market-related concepts, as mentioned previously, Chainalysis defines many concepts and can help as a starting point.