Laying the foundation with the building blocks of blockchains
In this section, let's learn the most basic concept of blockchains—what a blockchain is made up of.
A blockchain can be imagined as a series of connected blocks, with each block containing a finite amount of information.
The following diagram demonstrates this clearly with multiple connected blocks.
Figure 1.1 – Representation of a blockchain
Just like in a traditional database, there are multiple tables in which the data is stored sequentially in the form of records, and the blockchain has multiple blocks that store a particular number of transactions.
The following diagram demonstrates blocks as a store for multiple transactions:
Figure 1.2 – Blocks with transaction data
The question now is, why not just use databases? Why do we even need blockchains? Well, the main difference here is that there is no admin and nobody is in charge. The other significant difference is that most blockchains are engineered to be permissionless at the core (even though permissioned blockchains exist and have specific use cases at the enterprise level), making them accessible to everyone and not just to people with access.
Another equally substantial difference is that blockchains only have insert operations, whereas databases have CRUD operations, making blockchains inherently immutable. This also implies that blockchains are not recursive in nature; you cannot go back to repeat a task on records while databases are recursive.
Now, this is a complete shift in how we approach data storage with blockchains in comparison to traditional databases. Then there is decentralization, which we will learn about shortly and that is what makes blockchains an extremely powerful tool.
Web 3.0, another confusing and mysterious term, can, at a considerably basic level, be defined as the internet of blockchains. Until now, we have had client-server architecture applications being connected to each other. That was Web 2.0, but suddenly, with the help of blockchains, we will have a more decentralized internet. Even if most of this does not make sense right now, do not despair, for we have plenty to cover.
In the following subsections, we will learn about things such as hashes, transactions, security, decentralized storage, and computing.
Blocks
The smallest or atomic part of any blockchain is a block. We learned in the previous section that blocks contain transactions, but that’s not all; they also store some more information. Let’s peel through the layers.
Let's look at a visual representation of the inner workings of a block:
Figure 1.3 – Connected blocks of a blockchain
In the preceding diagram, we notice that the first block is called the Genesis Block, which is an industry-standard term for the first block of the chain. Now, apart from transaction data, you also see a hash. In the next section, Hashes, we will learn how this hash is created and why it is required. For now, let's consider it to be a random number. So, each block has a hash, and you will also notice that the blocks are storing the previous hash. This is the same as the hash of the previous block.
The previous hash block is critical because it is what connects the blocks to each other. There is no other aspect that connects the blocks to make a blockchain; it’s simply the fact that a subsequent, sequential block holds the hash of the previous block.
We also notice a field called nonce. This stands for number only used once. For now, we need to understand that the nonce needs to be consistent with the hash for the block to be valid. If they’re not consistent, the following blocks of the blockchain go completely out of sync and this fortifies the immutability aspect of blockchains that we will learn about in detail in the Forking section. Now, as we go further, we will uncover more layers to this, but we’re at a great starting point and have a broad overview.
Hashes
Hashes are a core feature of the blockchain and are what hold the blocks together. We remember from earlier that blocks store hash and previous hash and hashes are simply created by adding up all the data, such as transactions and timestamps, and passing it through some hashing algorithm. One example is the SHA-256 algorithm.
The following diagram shows a visual representation of data being passed to the SHA-256 algorithm and being converted into a usable hash:
Figure 1.4 – Data to SHA-256 hash
A hash is a unique fixed-length string that can be used to identify or represent a piece of data and a hash algorithm, such as SHA-256, is a function that computes data into a unique hash.
While there are several other SHA algorithms available (such as SHA-512), SHA-256 stands as the most prevalent choice within blockchains due to its robust hash security features and the notable fact that it remains unbroken to this day.
There are four important properties of the SHA-256 algorithm:
- One-way: The hash generated from SHA-256 is 256 bits (or 32 bytes) in length and is irreversible; if you want to get the plaintext back (plaintext being the data that we passed through SHA-256), you will not be able to do so.
- Deterministic: Every time you send a particular data through the algorithm, you will get the same predictable result. This means that the hash doesn’t change for the same data.
- Avalanche effect: Changing one character of the data, completely changes the hash and makes it unrecognizable.
- For example, the hash for abcd is
88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589
but the hash for abce is
84e73dc50f2be9000ab2a87f8026c1f45e1fec954af502e9904031645b190d4f
. - The only thing common between them is that they start with
8
. There’s nothing else that matches, so you can’t possibly predict how the algorithm represents a, b, or c, and you can’t work your way backward to either get the plaintext data or predict what the hash representation for some other data will look like. - Withstand collision: Collision in hashing means the algorithm produces the same hash for two different values. SHA-256 has an extremely low probability of collision, and this is why it’s heavily used.
All of these properties of the SHA-256 are the reason why blockchains are the way they are.
Let’s understand the effect that these properties have by going over the following few points:
- Irreversibility translates into immutability in blockchains (transaction data, once recorded, can’t be changed)
- Determinism translates into a unique, identifiable hash that can identify a user, wallet, transaction, token, or account on the blockchain (all of these have a hash)
- The avalanche effect translates into security, making the system extremely difficult to hack since the information that’s encrypted can’t be predicted by brute force (running multiple computers to estimate incrementally, starting with a hypothesis)
- Collision tolerance leads to each ID being unique and there being an extremely high mathematical limit to the unique hashes that can be produced, and since we require hashes to represent various types of information on the blockchain, this is an important functionality
In this section, we have seen how the properties of blockchains actually come from the hashing algorithms, and we can safely say that it’s the heart and soul of a blockchain.
Transactions
Because of the previously mentioned properties of blockchains, storing financial data is one of the biggest use cases that blockchains are used for, as they have advanced security requirements.
A transaction is showcased through unspent cryptocurrency, or unspent transaction output (UTXO). This refers to unused coins owned by individuals logged on the blockchain for transparency. It’s essential to recognize that while UTXO is a key element in certain blockchains such as Bitcoin, it’s not a universal feature across all blockchain platforms.
The following diagram helps us visualize all the fields in a transaction:
Figure 1.5 – The contents of a blockchain transaction
Let’s go through all the fields that form a Bitcoin transaction:
- Version: This specifies which rules the transaction follows
- Input counter: This is the number of inputs in the transaction (this is just a count)
- Inputs: This is the actual input data
- Output counter: This is similar to the input counter, but it’s for keeping a count of the transactions’ output
- Output: This is the actual output data from the transaction
- Blocktime: This is simply a Unix timestamp that records when the transaction happened.
Initially, blockchains were primarily designed to record financial transactions within the realm of cryptocurrencies. However, as they evolved, blockchains demonstrated their versatility by finding applications beyond this initial purpose. Soon, we’ll delve into these additional uses.
But for now, it is important to understand that when we mention transactions, it does not strictly mean financial or currency-related transactions. Rather, in modern blockchains, a transaction is anything that changes the state of the blockchain, so any program that runs or any information that’s stored is simply a transaction.
Security
So, the main selling point for blockchains is that they’re extremely secure. Now, let’s understand why this is so:
- All the records are secured with cryptography thanks to the SHA-256 algorithm.
- The records and other blockchain data are copied to multiple nodes; we will learn about this in the Peers, nodes, validators, and collators section. Even if the data gets deleted in one node, it doesn’t mean that it’s deleted from the blockchain.
- To participate as a node in the blockchain network, requiring ownership of private keys is essential. Private keys and secret codes known only to you, grant access to control your cryptocurrency holdings, sign transactions, and ensure security. Possessing private keys safeguards your digital assets and enables engagement in network activities.
- Nodes need to come to a consensus on new data to be added to the blockchain. This means bogus data and corrupted data cannot be added to the blockchain, as it could compromise the entire chain.
- Data cannot be edited on the blockchain. This means the information you have stored cannot be tampered with.
- They’re decentralized and don’t have a single point of failure. The bigger the network or the more decentralized the network, the lower the probability of failure.
We will learn about nodes, decentralization, validation, and consensus later on in this book, and all these points will be clearer.
Storage versus compute
Bitcoin introduced blockchain for the storage of financial transactions, but Ethereum took things a bit further and helped us imagine what it could be like if you could run programs on a blockchain. Hence, the concept of smart contracts was created (we will dig deeper into smart contracts later in this chapter, but you can think of them as code that can run decentralized on the blockchain).
Independent nodes could join a network for the blockchain and pool their processing power in the network.
According to Ethereum, they’re building the biggest supercomputer in the world. There are two ways to build the biggest supercomputer— build it centralized, where all machines will exist centrally in one location, or build a decentralized version where thousands of machines can be connected over the internet and divide tasks among themselves.
Ethereum enables you to process programs on the blockchain. This means anyone on the internet can build a smart contract and publish it on the blockchain where anyone else across the world can interact with the program.
This is the reason we see so many startups building their products on the Ethereum chain. After Ethereum, blockchains such as Solana, NEAR, and Polkadot have taken this idea much further and brought many new concepts by improving on Ethereum. This book is going to deal with all three of these blockchains.