Understanding big data clusters
A really important part of working with modern data solutions is the scalability of the solution. Scalability determines how well a system will keep functioning when we experience growth. Growth can mean any or all of the following:
- The system needs to handle more concurrent users.
- The volume of the data we need to store increases.
- The compute power needed increases because the query complexity increases.
The last two points are about being able to utilize more hardware resources. The main resources we need to consider are compute and storage. Compute refers to the number of CPU cores being used. Storage can mean storing data on actual hard drives or storing data in memory. In the end, data must always be stored on hard drives.
Hardware scalability is about adding more hardware resources to our database. The second part of scalability is to do with whether or not our database will actually benefit from extra hardware. This is...