You're reading from Mastering Blockchain, Fourth Edition Inner workings of blockchain, from cryptography and decentralized identities, to DeFi, NFTs and Web3

Product type Paperback

Published in Mar 2023

Publisher Packt

ISBN-13 9781803241067

Length 818 pages

Edition 4th Edition

Languages

Solidity

Tools

Blockchain

Concepts

Blockchain

Author (1):

Imran Bashir

View More author details

Table of Contents (24) Chapters

Preface

1. Blockchain 101 FREE CHAPTER

2. Decentralization

3. Symmetric Cryptography

4. Asymmetric Cryptography

5. Consensus Algorithms

6. Bitcoin Architecture

7. Bitcoin in Practice

8. Smart Contracts

9. Ethereum Architecture

10. Ethereum in Practice

11. Tools, Languages, and Frameworks for Ethereum Developers

12. Web3 Development Using Ethereum

13. The Merge and Beyond

14. Hyperledger

15. Tokenization

16. Enterprise Blockchain

17. Scalability

18. Blockchain Privacy

19. Blockchain Security

20. Decentralized Identity

21. Decentralized Finance

22. Blockchain Applications and What’s Next

23. Index

Distributed systems

Understanding distributed systems is essential to our understanding of blockchain, as blockchain is a distributed system at its core. It is a distributed ledger that can be centralized or decentralized. A blockchain is originally intended to be and is usually used as a decentralized platform. It can be thought of as a system that has properties of both decentralized and distributed paradigms. It is a decentralized-distributed system.

A distributed system is a computing paradigm whereby two or more nodes work with each other in a coordinated fashion to achieve a common outcome. It is modeled in such a way that end users see it as a single logical platform. For example, Google’s search engine is based on a large distributed system; however, to a user, it looks like a single, coherent platform. It is composed of processes (nodes) and channels (communication channels) where nodes communicate by passing messages. A blockchain is a message-passing distributed system.

A node is an individual player (a computer) in a distributed system. All nodes can send and receive messages to and from each other. Nodes can be honest, faulty, or malicious, and they have memory and a processor. A node that exhibits arbitrary behavior is known as a Byzantine node after the Byzantine Generals problem.

The Byzantine Generals problem

In 1982, a thought experiment was proposed by Lamport et al. in their research paper, The Byzantine Generals Problem, which is available here: https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/

In this problem, a group of army generals who lead different parts of the Byzantine army is planning to attack or retreat from a city. The only way of communicating with them is via a messenger. They need to agree to strike at the same time to win. The issue is that one or more generals might be traitors who could send a misleading message. Moreover, the messenger could be captured by the city, resulting in no message delivery. Therefore, there is a need for a viable mechanism that allows agreement among the generals, even in the presence of the treacherous ones, and message loss, so that the attack can still take place at the same time. As an analogy for distributed systems, the generals can be considered as honest nodes, the traitors as Byzantine nodes (that is, nodes with arbitrary behavior), the messenger can be thought of as a channel of communication with the generals, and a captured messenger as a delayed or lost message. Several solutions were presented to this problem in the paper by Lamport et al. in 1982.

This type of inconsistent behavior of Byzantine nodes can be intentionally malicious, which is detrimental to the operation of the network. Any unexpected behavior by a node on the network, whether malicious or not, can be categorized as Byzantine.

A small-scale example of a distributed system is shown in the following diagram. This distributed system has six nodes, of which one (N4) is a Byzantine node, leading to possible data inconsistency. L2 is a link that is broken or slow, and this can lead to a partition in the network:

Diagram Description automatically generated

Figure 1.1: Design of a distributed system: N4 is a Byzantine node and L2 is broken or a slow network link

Two key challenges of a distributed system design are the coordination between nodes and fault tolerance. Even if some (a certain threshold dictated by the consensus protocol) of the nodes become faulty or network links break, the distributed system should be able to tolerate this and continue to work to achieve the desired result. This problem has been an active area of distributed system design research for many years, and several algorithms and mechanisms have been proposed to overcome these issues.

Distributed systems are challenging to design. It has been proven that a distributed system cannot have all three of the much-desired properties of consistency, availability, and partition tolerance simultaneously. This principle is known as the CAP theorem.

CAP theorem

The CAP theorem, also known as Brewer’s theorem, was introduced by Eric Brewer in 1998 as a conjecture. In 2002, it was proven as a theorem by Seth Gilbert and Nancy Lynch. The theorem states that any distributed system cannot have consistency, availability, and partition tolerance simultaneously:

Consistency is a property that ensures that all nodes in a distributed system have a single, current, and identical copy of the data. Consistency is achieved using consensus algorithms to ensure that all nodes have the same copy of the data. This is also called state machine replication (SMR). The blockchain is a means of achieving state machine replication.
Availability means that the nodes in the system are up, accessible, and are accepting incoming requests and responding with data without any failures as and when required. In other words, data is available at each node and the nodes are responding to requests.
Partition tolerance ensures that if a group of nodes is unable to communicate with other nodes due to network failures, the distributed system continues to operate correctly. This can occur due to network and node failures.

A Venn diagram is commonly used to visualize the CAP theorem, as shown below:

Diagram, venn diagram Description automatically generated

Figure 1.2: CAP theorem

The preceding diagram depicts that only two properties at a time can be attained; either AP, CA, or CP.

In summary:

If we opt for CP (consistency and partition tolerance), we sacrifice availability.
If we opt for AP (availability and partition tolerance), we sacrifice consistency.
If we opt for AC (availability and consistency), we sacrifice partition tolerance.

Note that AC does not really exist. The CAP theorem in practice means that in the case of a network partition, a distributed system is either available or consistent. As network partitions cannot be ignored, the choice is between consistency or availability when a network partition occurs.

We can explain this concept with the following example.

Let’s imagine that there is a distributed system with two nodes. Now, let’s apply the three theorem properties to this smallest of possible distributed systems with only two nodes:

Consistency is achieved if both nodes have the same shared state; that is, they have the same up-to-date copy of the data.
Availability is achieved if both nodes are up and running and responding with the latest copy of data.
Partition tolerance is achieved if, despite communication failure or delay between nodes, the network (distributed system) continues to operate.

Now think of a scenario where a partition occurs and nodes can no longer communicate with each other. If newly updated data comes in now, it can only be updated on one node. If the node accepts the update, then only one node in the network is updated, and consistency is lost. If the node rejects the update, that will result in a loss of availability. This means that either availability or consistency is unachievable due to the network partition. This is strange because somehow, blockchain manages to achieve all these properties, violating the theorem (especially in its most successful implementation, Bitcoin)—or does it?

In blockchains, consistency is sacrificed in favor of availability and partition tolerance. In this scenario, consistency (C) in the blockchain is not achieved simultaneously with partition tolerance (P) and availability (A), but it is achieved over time. This is called eventual consistency, where consistency is achieved due to validation from multiple nodes over time. There can be a temporary disagreement between nodes on the final state, but it is eventually agreed upon. For example, in Bitcoin, multiple transaction confirmations are required to achieve a good confidence level that transactions may not be rolled back in the future. Eventually, a consistent view of transaction history is available in all nodes. Multiple confirmations of a transaction over time provide eventual consistency in Bitcoin. For this purpose, the process of mining was introduced in Bitcoin. Mining is a process that facilitates the achievement of consensus by using the Proof of Work (PoW) algorithm. At a higher level, mining can be defined as a process that’s used to add more blocks to the blockchain. We will cover more on this later in Chapter 6, Bitcoin Architecture.

PACELC theorem

An extension of the CAP theorem called PACELC was first proposed by Daniel J. Abadi from Yale University. It states that, in addition to the three properties proposed by CAP, there are also tradeoffs between latency and consistency. It states that tradeoffs between consistency and latency always exist in replicated systems, whereas CAP is only applicable when there are network partitions. In other words, it means that even if no network partitions occur, under normal operation the tradeoff between consistency and latency exists. For example, some databases may choose to give up consistency for lower latency, and some databases could pay the availability and latency costs to achieve consistency. This is true for replicated systems and presents a more inclusive picture of consistency tradeoffs in distributed systems.

PACELC was formally proved in a paper available here: https://dl.acm.org/doi/10.1145/3197406.3197420

With a better understanding of distributed systems, let’s now talk about blockchain itself. First, we’ll begin with a brief rundown of the history of blockchain and Bitcoin.

You're reading from Mastering Blockchain, Fourth Edition Inner workings of blockchain, from cryptography and decentralized identities, to DeFi, NFTs and Web3

Table of Contents (24) Chapters

Distributed systems

CAP theorem

PACELC theorem

Authors (1)

Personalised recommendations for you