Recommended architecture principles and considerations
Certain principles, which ensure that architectures, once realized, are scalable, modifiable, robust, and fault-tolerant are especially relevant for IoT architectures. Let’s take a look at some of these:
- Built on open communication protocols to support diverse device communication needs: As IoT is an amalgamation of real (hardware) and virtual (software) realms, each of which evolves at its own independent pace. Robust IoT architectures should be flexible enough to support current and possible future enhancements in both these realms – for example, on the one hand, continual advancements are made for connectivity/power capabilities on the device/hardware side, while on the other hand, there are central server side advances regarding analytics and AI/ML capabilities. Hence, there is an inherent impedance mismatch between real and virtual worlds (concerning the rate as well as nature of these enhancements). IoT architects should not only be aware of this mismatch but should also incorporate the required considerations to support the use case requirements for a longer time frame. These requirements are partially handled by adhering to a layered architecture whereby the components in a specific layer can be plugged in or plugged out with minimal impact on the overall architecture.
- Designed for “end-to-end” security: Security is an important consideration for any software system, especially in cases where data or commands are communicated over public communication channels. However, in terms of IoT, security requires deeper consideration, primarily due to two reasons:
- Actions initiated in the real/physical world can’t be rescinded, unlike the actions in the virtual/software world: An irrigation pump that is instructed (maliciously) to start pumping water in an agriculture field would have pumped considerable water before someone detects the anomaly and initiates corrective action. This contrasts with the scenario in the software world, where a simple update instruction is sufficient to undo/roll back database changes. Scenarios can be even more disastrous in domains such as healthcare, where IoT systems often control human life (for example, an oxygen ventilator controlled by an IoT system).
- The attack vector is considerably broader compared to pure software systems: This is because the complete data pipeline (end device > gateway > communication channel > central server > application) needs to be secured and each entity in the data pipeline has diverse applicable security requirements – end devices (with their inherently constrained compute/storage capabilities) can’t support the security rigor that the central server can support, so each component’s security vulnerabilities and the relevant security guardrails need to be independently analyzed. Similarly, data should be protected in transit as well as at rest at all times.
- Enterprise integration enabled by the “API-first” approach: Any production-grade IoT system will typically be integrated with other external systems to deliver full value. Real-world data collated by IoT systems is fed (data push) into external systems to enable richer use cases. Similarly, data from the external systems (data pull) is used to enrich the collated data. This type of integration is not possible unless the IoT system has been architected with API-first as one of the core architectural tenants whereby IoT data can be consumed by enterprise applications. These APIs also enable workflows that span both IoT and non-IoT (that is, external systems).
- Satisfy diverse data needs: IoT systems are leveraged by a diverse set of users, each with different backgrounds and information needs. Accordingly, it is important to capture the raw data needs of all the (current and future) stakeholders and to present the data in a way that is easily assimilable by a diverse set of stakeholders (personas). Role-based access control (RBAC) is one mechanism that shows the required information to stakeholders while at the same time obscuring non-relevant information. Also, some of the stakeholders will have real-time data needs (operators who want real-time notifications for emergency alarms), whereas others will want insights from consolidated data (batch processing). Decoupling data ingestion from data processing is one such principle that enables us to accomplish this need. Some of the other data collation/manipulation requirements are listed as follows:
- Diverse (structured, semi-structured, and unstructured) operational data from sources such as Manufacturing Execution Systems (MESs) and Laboratory Information Management Systems (LIMSs) should be consolidated in a common data store (data lake) either at the edge, the cloud, or both.
- Separating streaming, batch, and right-time data pipelines for scalability, efficiency, and cost optimization considerations. De-coupling data producers from consumers ensures a robust architecture as well as the flexibility of technology and implementation choices.
- Technology-neutral architecture providing deployment flexibility: IoT systems can be deployed in different configurations, such as on-premise, public cloud, private cloud, and/or hybrid multi-cloud configurations, depending on the customer’s sensitivity to security as well as governance and regulatory needs. Considering this, the architecture should be generic enough that it can cater to diverse deployment needs and can be supported by multiple technology stacks. This is generally achieved by creating an IoT reference architecture (devoid of specific technology choices) and then transitioning to a technical architecture (where generic architectural components are replaced by specific technology components).
- Design for high availability: Although the need for high availability varies widely from one IoT use case to another, some use cases are categorized as mission-critical with almost zero downtime expectations, whereas others can accommodate a considerable downtime period. The central server architecture should mimic the uptime expectations as typically, less downtime translates into higher costs. In the context of IoT, high availability must be considered from an overall system perspective. For example, in scenarios where longer central server downtime is acceptable, end devices need to have higher data buffering capabilities (that is, greater storage space) to minimize data loss.
- Support for “unlimited scalability”: IoT deployments start small with a few end devices but tend to scale to a large number in a short duration. As a result, generally, in IoT solutions, horizontal scalability is preferred over vertical scalability.
- Device communication considerations: Data is communicated over a bi-directional communication channel between the gateway and central server. This channel can be supported by multiple communication technologies (with some of the common ones being cellular, Wi-Fi, LoRa, and SigFox). Considerations such as range (physical distance from the central server), payload size, battery life, and ambient noise play a role in finalizing the ideal communication technology for a particular IoT implementation. Some of the other considerations from the device side include the ability to store/buffer data in case of connectivity loss to central server, sleep/wakeup logic for conserving battery power, and data aggregation/filtering needs.
The following diagram summarizes the key architectural principles/considerations discussed in this section:
Figure 1.4 – Architectural considerations for developing IoT solutions