Now that we are convinced the relational model is not a good fit for big data, let's try to figure out ways to handle big data. These are the solutions that paved the way for various NoSQL databases:
- Clustering: The data should be spread across different nodes in a cluster. The data should be replicated across multiple nodes in order to sustain node failures. This helps spread the data across the cluster, and different nodes contain different subsets of data. This improves performance and provides fault tolerance.
A node is an instance of database software running on a server. Multiple instances of the same database could be running on the same server.
- Flexible schema: Schemas should be flexible unlike the relational model and should evolve with the data.
- Relax consistency: We should embrace the concept of eventual consistency, which means data will eventually be propagated to all the nodes in the cluster (in case of replication). Eventual consistency allows data replication across nodes with minimum overhead. This allows for fast writes with the need for distributed locking.
- Denormalization of data: Denormalize data to optimize queries. This has to be done at the cost of writing and maintaining multiple copies of the same data.