Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Science for Web3

You're reading from   Data Science for Web3 A comprehensive guide to decoding blockchain data with data analysis basics and machine learning cases

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781837637546
Length 344 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Gabriela Castillo Areco Gabriela Castillo Areco
Author Profile Icon Gabriela Castillo Areco
Gabriela Castillo Areco
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1 Web3 Data Analysis Basics
2. Chapter 1: Where Data and Web3 Meet FREE CHAPTER 3. Chapter 2: Working with On-Chain Data 4. Chapter 3: Working with Off-Chain Data 5. Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity 6. Chapter 5: Exploring Analytics on DeFi 7. Part 2 Web3 Machine Learning Cases
8. Chapter 6: Preparing and Exploring Our Data 9. Chapter 7: A Primer on Machine Learning and Deep Learning 10. Chapter 8: Sentiment Analysis – NLP and Crypto News 11. Chapter 9: Generative Art for NFTs 12. Chapter 10: A Primer on Security and Fraud Detection 13. Chapter 11: Price Prediction with Time Series 14. Chapter 12: Marketing Discovery with Graphs 15. Part 3 Appendix
16. Chapter 13: Building Experience with Crypto Data – BUIDL 17. Chapter 14: Interviews with Web3 Data Leaders 18. Index 19. Other Books You May Enjoy Appendix 1
1. Appendix 2
2. Appendix 3

Data quality challenges

In this section, we will discuss the challenges of data quality, which are not unique to Web3 but relevant to all professionals who make decisions based on data. Data quality challenges range from access to incomplete, inaccurate, or inconsistent data to matters of data security, privacy, or governance. However, one of the most important challenges that a Web3 data analyst will face is the reliability of sources.

For instance, the market cap is the result of a simple multiplication of two data sources: the blockchain data that informs the total supply of tokens in circulation and the market price. However, the result of such multiplication varies depending on the source. Let’s take an example of the market cap for USDT. In one source, the following information appears:

Figure 1.4 – USDT market cap information (source: https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7)

Figure 1.4 – USDT market cap information (source: https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7)

On the CoinMarketCap website, for the same token, the fully diluted market cap is $70,158,658,274 (https://coinmarketcap.com/currencies/tether/).

As we see from the example, the same concept is shown differently depending on the source we review. So, how do we choose when we have multiple sources of information?

The most trustworthy and comprehensive source of truth regarding blockchain activity is the full copy of a node. Accessing a node ensures that we will always have access to the latest version of the blockchain. Some services index the blockchain to facilitate access and querying, such as Google BigQuery, Covalent, or Dune, continuously updating their copies. These copies are controlled and centralized.

When it comes to prices, there are numerous sources. A common approach to sourcing prices is connecting to an online marketplace for cryptocurrencies, commonly known as exchanges, such as Binance or Kraken, to extract their market prices. However, commercialization in these markets can be halted for various reasons. For example, during the well-known Terra USD (TUSD) de-peg incident, when the stablecoin lost its 1:1 peg to the US dollar, many exchanges ceased commercialization, citing consumer protection concerns. If our workflow relies on such data, it can be disrupted or show inaccurate old prices. To solve this issue, it is advisable to source prices from sources that average the prices from multiple exchanges, providing more robust information.

At this stage, it is crucial to understand what constitutes quality for our company. Do we prioritize fast and readily available information updated by the second, or do we value highly precise information with relatively slower access? While it may not be necessary to consider this for every project, deciding on certain sources and standardizing processes will save us time in the future.

Once we have determined the quality of the information we will consume, we need to agree on the concepts we want to analyze.

You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781837637546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime