System and data architecting
This section covers strategies to improve overall system performance, data indexing performance, and to maximize storage space.
Hot-Warm architecture
For time-series data, including Twitter and other social media data as well as data from Logstash, Elastic.co recommends setting up what they have dubbed a Hot-Warm architecture. This setup puts nodes into three groups.
Master nodes
Ideally, dedicate three nodes as master nodes that do not store data or fulfill queries. These machines don't need to be very powerful; they just perform cluster management operations.
Hot nodes
Hot nodes hold the most recent data indices. All data writes are directed at these machines, and they are likely the most-frequently queried nodes. Elastic.co recommends equipping hot nodes with solid state drives (SSDs) for better I/O performance.
Warm nodes
In this architecture, data is not being written to warm nodes; instead, they contain historical time-based data. For example, if we create a new...