Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Data Science for Web3

You're reading from   Data Science for Web3 A comprehensive guide to decoding blockchain data with data analysis basics and machine learning cases

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781837637546
Length 344 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Gabriela Castillo Areco Gabriela Castillo Areco
Author Profile Icon Gabriela Castillo Areco
Gabriela Castillo Areco
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1 Web3 Data Analysis Basics
2. Chapter 1: Where Data and Web3 Meet FREE CHAPTER 3. Chapter 2: Working with On-Chain Data 4. Chapter 3: Working with Off-Chain Data 5. Chapter 4: Exploring the Digital Uniqueness of NFTs – Games, Art, and Identity 6. Chapter 5: Exploring Analytics on DeFi 7. Part 2 Web3 Machine Learning Cases
8. Chapter 6: Preparing and Exploring Our Data 9. Chapter 7: A Primer on Machine Learning and Deep Learning 10. Chapter 8: Sentiment Analysis – NLP and Crypto News 11. Chapter 9: Generative Art for NFTs 12. Chapter 10: A Primer on Security and Fraud Detection 13. Chapter 11: Price Prediction with Time Series 14. Chapter 12: Marketing Discovery with Graphs 15. Part 3 Appendix
16. Chapter 13: Building Experience with Crypto Data – BUIDL 17. Chapter 14: Interviews with Web3 Data Leaders 18. Index 19. Other Books You May Enjoy Appendix 1
1. Appendix 2
2. Appendix 3

Understanding the blockchain ingredients

If you have a background in blockchain development, you may skip this section. Web3 represents a new generation of the World Wide Web that is based on decentralized databases, permissionless and trustless interactions, and native payments. This new concept of the internet opens up various business possibilities, some of which are still in their early stages.

Figure 1.1 – Evolution of the web

Figure 1.1 – Evolution of the web

Currently, we are in the Web2 stage, where centralized companies store significant amounts of data sourced from our interactions with apps. The promise of Web3 is that we will interact with Decentralized Apps (dApps) that store only the relevant information on the blockchain, accessible to everyone.

As of the time of writing, Web3 has some limitations recognized by the Ethereum organization:

  • Velocity: The speed at which the blockchain is updated poses a scalability challenge. Multiple initiatives are being tested to try to solve this issue.
  • Intuition: Interacting with Web3 is still difficult to understand. The logic and user experience are not as intuitive as in Web2 and a lot of education will be necessary before users can start utilizing it on a massive scale.
  • Cost: Recording an entire business process on the chain is expensive. Having multiple smart contracts as part of a dApp costs a lot for the developer and the user.

Blockchain technology is a foundational technology that underpins Web3. It is based on Distributed Ledger Technology (DLT), which stores information once it is cryptographically verified. Once reflected on the ledger, each transaction cannot be modified and multiple parties have a complete copy of it.

Two structural characteristics of the technology are the following:

  • It is structured as a set of blocks, where each block contains information (cryptographically hashed – we will learn more about this in this chapter) about the previous block, making it impossible to alter it at a later stage. Each block is chained to the previous one by this cryptographic sharing mechanism.
Figure 1.2 – Representation of a set of blocks

Figure 1.2 – Representation of a set of blocks

  • It is decentralized. The copy of the entire ledger is distributed among several servers, which we will call nodes. Each node has a complete copy of the ledger and verifies consistency every time it adds a new block on top of the blockchain.

This structure provides the solution to double spending, enabling for the first time the decentralized transfer of value through the internet. This is why Web3 is known as the internet of value.

Since the complete version of the ledger is distributed among all the participants of the blockchain, any new transaction that contradicts previously stored information will not be successfully processed (there will be no consensus to add it). This characteristic facilitates transactions among parties that do not know each other without the need for an intermediary acting as a guarantor between them, which is why this technology is known as trustless.

The decentralized storage also takes control away from each server and, thus, there is no sole authority with sufficient power to change any data point once the transaction is added to the blockchain. Since taking down one node will not affect the network, if a hacker wants to attack the database, they would require such high computing power that the attempt would be economically unfeasible. This adds a security level that centralized servers do not have.

Three generations of blockchain

The first-generation blockchain is Bitcoin, which is based on Satoshi Nakamoto’s paper Bitcoin: A Peer-to-Peer Electronic Cash System. The primary use case of this blockchain is financial. Although the technology was initially seen as a way to bypass intermediaries such as banks, currently, traditional financial systems and the crypto world are starting to work together, especially with regard to Bitcoin because it is now considered a digital store of value, a sort of digital gold. Notwithstanding the preceding, there are still many regulatory and practical barriers to the integration of the two systems.

The second-generation blockchain added the concept of smart contracts to the database structure described previously, and Ethereum was the first to introduce this. With Ethereum, users can agree on terms and conditions before a transaction is carried out. This chain started the smart contracts era, and as Nick Szabo describes it, the smart contract logic is that of a vending machine that can execute code autonomously, including the management of digital assets, which is a real revolution. To achieve this, the network has an Ethereum Virtual Machine (EVM) that can execute arbitrary code.

Lastly, the third-generation blockchain builds upon the previous generations and aims to solve scalability and interoperability problems. When referring to on-chain data in this book, we will be talking about data generated by the second- and third-generation blockchains that are EVM compatible, as this is where most development is being carried out at the time of writing (e.g., Ethereum, BSC, or Rootstock). Consequently, Bitcoin data and non-EVM structures are not covered.

Introducing the blockchain ingredients

Now, let’s understand some important additional concepts regarding blockchain.

Gas

In order to make a car move forward, we use gas as fuel. This enables us to reach our desired destination, but it comes at a cost. The price of gas fluctuates based on various factors, such as oil prices and transportation costs. The same concept applies to the blockchain technology. To save a transaction on a chain, it is necessary to pay for gas. In short, gas is the instruction cost paid to the network to carry out our transactions. The purpose of establishing a cost is twofold: the proceedings of the gas payment go to the miners/validators as a payment for their services and as an incentive to continue being integrated into the blockchain; it also sets a price for users to be mindful of when using resources, encouraging the use of the blockchain to record only what is worth more than the gas value paid. This concept is universal to all networks we will study in this book.

Gas has several cost implications. As the price of gas is paid in the network’s native coin, if the price increases, the cost of using the network can become excessively expensive, discouraging adoption. This is what happened with Ethereum, which led to multiple changes to its internal rules to solve this issue.

As mentioned earlier, each interaction with the blockchain incurs a cost. Therefore, not everything needs to be stored in it and the adoption of such a database as blockchain needs to be validated by business requirements.

Cryptocurrencies can be divided into smaller units of that cryptocurrency, just like how a dollar can be divided into cents. The smaller unit of a Bitcoin is a Satoshi and the smaller denomination of an Ether is Wei. The following is a chart with the denominations, which will be useful for tracking gas costs.

Unit name

Wei

Ether

Wei

1

10-18

Kwei

1,000

10-15

Mwei

1,000,000

10-12

Gwei

1,000,000,000

10-9

Microether

1,000,000,000,000

10-6

Milliether

1,000,000,000,000,000

10-3

Ether

1,000,000,000,000,000,000

1

Table 1.1 – Unit denominations and their values

Address

When we use a payment method other than cash, we transmit a sequence of letters or numbers, or a combination of both, to transfer our funds. This sequence of characters is essential for identifying the country, bank, and account of the recipient, for the entity that holds our funds. Similarly, an address performs a comparable function and serves as an identification number on the blockchain. It is a string of letters and numbers that can send or receive cryptocurrency. For example, Ethereum addresses consist of 42 hexadecimal characters. An address is the public key hash of an asymmetric key pair, which is all the information required by a third party to transfer cryptocurrency. This public key is derived from a private key, but the reverse process (deriving the private key from a public key) cannot be performed. The private key is required to authorize/sign transactions or access the funds stored in the account.

Addresses can be classified into two categories: Externally Owned Addresses (EOAs) and contract accounts. Both of them can receive, hold, and send funds and interact with smart contracts. EOAs are owned by users who hold the private key, and users can create as many as they need. Contract accounts are those where smart contracts are deployed and are controlled by their contract code. Another difference between them is the cost of creating them. Creating an EOA does not cost gas but creating a smart contract address has to pay for gas. Only EOA accounts can initiate transactions.

There is another product in the market known as smart accounts that leverage the account abstraction concept. The idea behind this development is to facilitate users to program more security and better user experiences into their accounts, such as setting rules on daily spending limits or selecting the token to pay for gas. These are programmable smart contracts.

Although the terms “wallet” and “address” are often used interchangeably, there is a technical distinction between them. As mentioned before, an address is the public key hash of an asymmetric key pair. On the other hand, a wallet is the abstract location where the public and private keys are stored together. It is a software interface or application that simplifies interacting with the network and facilitates querying our accounts, transaction signing, and more.

Consensus protocols

When multiple parties work together, especially if they do not know each other, it is necessary to agree on a set of rules to work sustainably. In the blockchain case, it is necessary to determine how to add transactions to a block and alter its state. This is where the consensus protocol comes into play. Consensus refers to the agreement reached by all nodes of the blockchain to change the state of the chain by adding a new block to it. The protocol comprises a set of rules for participation, rewards/penalties to align incentives, and more. The more nodes participate, the more decentralized the network becomes, making it more secure.

Consensus can be reached in several ways, but two main concepts exist in open networks.

Proof of Work (PoW)

This is the consensus protocol used by Bitcoin. It involves solving mathematical equations that vary in difficulty depending on how congested the network is.

Solving these puzzles consumes a lot of energy, resulting in a hardware-intensive competition. Parties trying to solve the puzzle are known as miners.

The winning party finds an integer that complies with the equation rules and informs the other nodes of the answer. The other parties verify that the answer is correct and add the block to their copy of the blockchain. The winning party gets the reward for solving the puzzle, which is a predefined amount of cryptocurrency. This is how the system issues Bitcoin that has never been spent and is known as a Coinbase transaction.

In Bitcoin protocol, the reward is halved every 21,000 blocks.

Proof of Stake (PoS)

This is the current protocol used by the Ethereum blockchain (up to September 15, 2022, the consensus protocol was PoW) and many others, such as Cardano.

The rationale behind PoS is that parties become validators in the blockchain by staking their own cryptocurrency in exchange for the chance to validate transactions, update the blockchain, and earn rewards. Generally, there is a minimum amount of cryptocurrency that must be staked to become a validator. It is “at stake” because the rules include potential penalizations or “slashing” of the deposited cryptocurrency if the validator (node that processes transactions and adds new blocks to the chain) goes offline or behaves poorly. Slashing means losing a percentage of the deposited cryptocurrency.

As we can see, there are rewards and penalties to align the incentives of all participants toward a single version of the blockchain.

The list of consensus protocols is continuously evolving, reflecting the ongoing search to solve some of the limitations identified in Web3, such as speed or cost. Some alternative consensus protocols include proof of authority – where a small number of nodes have the power to validate transactions and add blocks to the chain – and proof of space – which uses disk space to validate transactions.

Making the first transaction

With these concepts in mind, we will now carry out a transaction on our local environment with local Ethereum from Ganache.

To get started, let’s open a local Jupyter notebook and a quick-start version of Ganache.

Here is the information we need:

Figure 1.3 – Ganache main page and relevant information to connect

Figure 1.3 – Ganache main page and relevant information to connect

Let’s look at the code:

  1. Import the Web3.py library:
    from web3 import Web3
  2. Connect to the blockchain running on the port described in our Ganache page (item 1):
    ganache_url= "http://127.0.0.1:8545"
    web3= Web3(Web3.HTTPProvider(ganache_url))
  3. Define the receiving and sending addresses (item 2):
    from_account="0xd5eAc5e5f45ddFCC698b0aD27C52Ba55b98F5653"
    to_account= "0xFfd597CE52103B287Efa55a6e6e0933dff314C63"
  4. Define the transaction. In this case, we are transferring 30 ether between the accounts defined previously:
    transaction= web3.eth.send_transaction({
      'to': to_account,
      'from': from_account,
      'value': web3.toWei(30, "ether")
    })
  5. We can review the account balances before and after the transaction with the following code snippet:
    web3.fromWei(web3.eth.getBalance(from_account),'ether'))
    web3.fromWei (web3.eth.getBalance(to_account), 'ether'))

Congratulations! If you have never before transferred value on a blockchain, you have achieved your first milestone. The complete code can be found in Chapter01/First_transaction.

A word on CBDC

What is CBDC? The acronym stands for Central Bank Digital Currency. It is a new form of electronic money issued by the central banks of countries.

Many countries are at different stages in this roadmap. On January 20, 2022, the Federal Reserve Board issued discussion papers about CBDC, and prior to the COVID-19 pandemic, they also informed of ongoing research regarding the benefits that could be brought to their system. As of July 2022, there were 100 CBDCs in research and development. Countries are looking for the best infrastructure, studying the impact on their communities, and are mindful of a new range of risks that this new way of transferring value will pose to financial systems that may be reluctant to change.

Some of the concepts that we have covered in this chapter will be useful for the CBDC era, but depending on the project and its characteristics, not all of them will be present. It will be especially interesting to see how they solve centralization issues. A very informative tracker on the status of the projects is available at the following link: https://cbdctracker.org/.

In this section, we analyzed the fundamentals of blockchain technology, including key concepts such as gas, addresses, and consensus protocols, and explored the evolution of Web3. We also executed a transaction using Ganache and Web3.py.

With this basic understanding of the transaction flow, we will now shift our focus toward analyzing initial metrics and gaining a better understanding of the data challenges in this industry.

You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781837637546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image