When an architect, designer, or developer starts thinking about solution architecture for a business problem, they usually start thinking about a standard reference model or reference architecture that has been used and tested, with proof of success, by other experts in solving the same business problem they have in hand.
In traditional IT systems, you go with well-known architecture paradigms such as three-tier architecture, Service-Oriented Architecture (SOA), two-tier architecture, and many more modern paradigms, such as microservice-based architecture and serverless architecture. Even at the software code level, there are software design patterns.
IoT solutions are not unlike traditional IT solutions in the sense of the need to have a standard solution reference architecture or solution building blocks for any IoT solutions. However, IoT solutions are different in the following ways:
- IoT solutions, by default, are End-to-End (E2E) solutions, from devices/sensors to web and mobile apps for end user use.
- IoT solutions contain so many architecture paradigms that in the IoT solution application layer, you could leverage one of the architecture paradigms we mentioned before, for example, three-tier architecture or serverless, while in the IoT solution analytics layer, you have different architecture paradigms, such as Lambda and Kapp architecture, with other IoT solution layers. We will explain those types of architecture paradigms later in this book.
- The footprint of skills and technologies required for IoT solutions is big compared to traditional IT solutions.
In Figure 1.2, we have tried to capture, to some extent, the standard or commonly agreed upon IoT reference architecture, or IoT solution building blocks, that can be used to address IoT solutions for different business problems in different business domains:
Figure 1.2 – IoT solution reference architecture
Let's examine each layer of the IoT solution reference architecture in some detail.
IoT devices layer
This layer of the solution focuses on the IoT devices and their ecosystems. IoT devices can be classified into two categories:
- IoT endpoint devices: This kind of device is usually cheap, low-power, or battery-based, such as constrained microcontroller devices that have sensors attached to them to sense objects in the physical world, or a wireless communication module to support short-range connectivity options (Wi-Fi, Zigbee, Bluetooth, and so on) to connect that device to an IoT edge or gateway device or to the IoT backend cloud directly using long-range connectivity options such as cellular, Ethernet, or satellite connectivity options. They also have a real-time operating system (RTOS) and different software stacks, Software Development Kits (SDKs), and embedded systems.
- IoT gateway devices: This kind of device is more powerful or bigger in terms of resources (compute, networking, and storage) compared to resource-constrained and limited IoT endpoint devices. Such types of devices usually run gateway services and have features such as the following:
a) They act as a communication hub/router, that is, helping IoT endpoint device-to-device communication or IoT endpoint device-to-IoT backend cloud communication.
b) Data caching, cleansing, buffering, and aggregation locally at the edge.
c) IoT endpoint devices go through the gateway to get internet access and don't directly connect to the internet, so such devices manage IoT devices' security.
d) They play a role in the edge computing paradigm as sometimes it is used for local data processing and analytics.
e) The IoT gateway plays a critical role in supporting legacy and protocol conversion. Old IoT devices typically run old IoT connectivity protocols. The IoT gateway can convert those old protocols to modern and supported IoT protocols that run in the IoT edge layer or the IoT Cloud.
We will cover this layer in detail later in the book.
IoT edge layer
The edge or fog computing paradigm, in short, is running data center workloads very close to the end user or devices. You can move a small data center workload or an entire workload to the edge; it all depends on the edge location facility size and the data center's supported capabilities.
Let's look at the three main drivers behind the need for the edge computing paradigm.
Latency
We can't beat the speed of light, right? This is physics. In other words, whatever the quality and strength of a fiber optic cable used for data transmission in a packet data network, the transmission of the data will not be done in zero or fewer milliseconds.
Think about an IoT device running in a car park in the Singapore region, with the IoT analytics and applications running in the IoT backend cloud in the USA region. We should expect – and we can't do anything about it – in the region of a 250 ms latency factor added to the overall application request-response latency. So, a request raised from the IoT device in the car park might take roughly 1 or 2 seconds in total (250 ms latency + application processing time + database processing time, and so on) to get a response, assuming the workload running in the IoT Cloud is well designed and implemented and will not add any further unnecessary latency to the response time.
In some applications, getting an answer or response from the IoT backend cloud in 1 or 2 seconds could be fine, but in other real-time or near-real-time applications, that time could be a big problem. Regarding the example we just discussed, imagine there is a fire in the car park and the drivers need to get out or escape as soon as possible. The car park gate is closed and waiting for a response from the IoT backend cloud to open. Getting the response in 1 or 2 seconds in this case (note: we are not discussing here a no-response scenario as that is a completely different case altogether) might be too late to save people's lives.
In network topology, data traffic goes through multiple nodes or hops till it reaches its destination. Besides the light speed latency that we discussed above, you could also have a number of network issues, such as the cable being torn or some congestion in one of the nodes in the network topology. All such factors could even make the situation worse in terms of latency or getting a response quickly from the IoT backend cloud.
Edge computing solves these issues and challenges by running the IoT Cloud workload required (for example, Apps, Analytics, ML, and Control) locally, that is, as close as possible to IoT devices.
Cost savings
Data is important and, in fact, the goal of any IoT solution is to acquire data, gain some insights, and finally, act upon those insights.
IoT devices generate, or can generate, massive amounts of data. You can have data generated in a very small time resolution, for example, one second or less. Domains such as analytics and ML usually require such big data to give proper analytic outputs or excellent ML model accuracy results, but processing and storing such an amount of data comes at a cost.
Edge computing solves that challenge by processing all (or part of) such IoT-generated data at the edge and storing only the relevant and required data needed for further analysis and applications in the IoT backend cloud.
In the edge cloud, you can do data aggregation first before sending the data or batch of data to the IoT backend cloud. You can run advanced near-real-time or streaming analytics on the edge and might be storing just the results of such real-time analytics in the IoT backend cloud for other historical analytics solutions (for example, trends). For example, running analytics on the last 10 or fewer minutes of usage of smart parking could be done easily on the edge with data stored and processed on the edge for such a time resolution, that is, 10 minutes. After that, data could be deleted or aggregated into a higher time resolution (for example, 1 hour or so) and that aggregated data is then sent to the IoT Cloud for historical analytics.
Edge computing solves the preceding issues and challenges by reducing the cost of storing and processing such massive amounts of generated IoT data in the IoT Cloud.
Data locality and privacy
Due to certain regulations or data privacy compliance requirements, you might have to store sensitive data such as personal data generated from IoT devices within the country or region where those IoT devices are deployed and operating. A problem may arise if you have a kind of centralized IoT backend cloud in one or more regions that differs from the highly regulated regions the IoT devices are deployed and operated in.
Edge computing solves that issue by storing and processing such sensitive data locally and might anonymize such data and push it later to the central IoT Cloud in an anonymized or masked form.
We will cover this layer in further detail later in the book.
IoT backend and application layer (IoT Cloud)
This layer is very important in any IoT solution as it covers so many solutions and applications. Let's look at these in detail.
Provisioning layer
Provisioning, in general, means setting up and configuring backend systems with all the required information and configurations the solution's upstream and downstream systems require to operate as expected.
In IoT solutions, IoT device provisioning can be done in many backend systems depending on the final IoT solution architecture, but here are common systems required in IoT device provisioning:
- A thing or IoT device provisioning: This system or platform is responsible for storing IoT device metadata in the IoT Cloud database – usually, it is called an IoT device registry solution. Metadata such as the device ID, device description, and so much other metadata that you could store about the IoT device will help you in the solution later; for example, storing the device's location (which floor of the building, parking lot, and so on the IoT device is installed on) might help in an end user's journey when it comes to searching for a smart parking solution and suchlike.
Also, IoT device identity details such as device credentials and/or X.509 certificates can be securely stored and provisioned in that layer.
- Connectivity provisioning: The IoT device might have one or more communication modules, for example, one communication module for Zigbee connectivity and another one for cellular (mobile) connectivity.
In the case of cellular connectivity, for example, you must configure or provision a Subscriber Identification Module (SIM) with the mobile network operator, otherwise such connectivity will not work.
Ingestion layer
This layer is the front-door layer of backend IoT Cloud solutions and services. In other words, this is the first layer that receives data from the IoT endpoint devices or IoT edge and ingests such data into the proper IoT Cloud storage layer. In that layer, you can have the following components:
- MQTT Message Broker: We will cover the Message Queuing Telemetry Transport (MQTT) protocol in greater detail later in this book, but for now, MQTT is a lightweight publish-subscribe (or Pub-Sub) network protocol that supports message transportation between devices, so one device publishes the message to a specific topic and the other device(s) or applications from the other end subscribe to that topic to get the published message. MQTT is considered one of the best IoT application communication protocols for a wide variety of reasons, which will be discussed later. But for now, the MQTT lightness feature is the most obvious reason to prefer MQTT over HTTP as IoT endpoint devices are usually constrained in terms of computing resources, so running heavy protocols such as HTTP might be a problem or not supported at all by the IoT endpoint device operating system.
A complete IoT solution should have a scalable, reliable, resilient, and secure MQTT message broker, or we can call it an MQTT server since the IoT endpoint, IoT edge devices, and IoT applications usually act as MQTT clients.
- Streaming Processing Engine: Data, in the case of powerful IoT devices, could come from the IoT devices directly in the form of data streams over the HTTP(S) protocol or, typically in large-scale and production-grade IoT solution architecture, you have in the front an MQTT message broker and that message broker sends or forwards such incoming IoT data (coming through MQTT) to the streaming processing engine for further processing if required.
A complete IoT solution should have – if needed – a scalable, reliable, resilient, and secure streaming processing engine such as Kafka to support IoT data streams.
IoT rule engine
This component is critical and crucial in any IoT solution. Why? Because this component is the glue between IoT devices on the ground and the IoT backend cloud. In other words, the component is responsible for directing incoming data from the IoT devices to their destination in the IoT backend cloud. There are so many destinations, such as the database (SQL, NoSQL, and so on), message queue/bus, data lake, or object storage such as Amazon S3, Hadoop, and a streaming engine for further processing or real-time analytics use cases.
Storage layer
In this layer of an IoT solution, there are so many different storage options, as we will discuss later in the book, but the most common one used in large-scale and production-grade IoT solutions is object storage solutions such as the famous Amazon S3 object storage service.
The concept of the data lake – usually built on top of object storage solutions such as Amazon S3 – is the recommended IoT design pattern where all data in whatever format coming from IoT devices will be ingested and stored durably and securely in that data lake storage for subsequent processing.
Further down the line of IoT solutions, you could have another process or system read such raw IoT data from the data lake for further processing and storage. For example, you could perform some data cleansing, preparation, and processing, and store the data in another data store such as a SQL database (for reporting) or a NoSQL database (for a real-time dashboard application).
Analytics and machine learning layer
This layer of the IoT solution covers all systems and components used in building a big data and analytics standard pipeline (collect->process->store->analyze->visualize).
It also contains systems and components used in building an ML pipeline (Data Extraction -> Model Training -> Model Evaluation -> Model Deployment).
This layer is so important because, as explained earlier, IoT is all about getting the data for analytics and insights.
We will cover this layer in further detail later in the book.
IoT applications layer
In this layer of the solution, there are many components. Let's briefly discuss them.
Compute services
You will write application code and you will need to deploy or run that code, so you will require compute services to host the application code. These are the typical options:
- Bare-metal or physical host: This option is so expensive and not often used in the era of public cloud and managed hosting services. It could be used only if you have legacy application code that has special hardware specifications or special software restrictions, for example, licenses.
- Virtual Machines: This option is the most used compute option whether applications are deployed in a traditional data center, a private cloud, or a public cloud.
Virtualization technologies have been a game-changer in the computing service domain in recent decades and are still valuable options in terms of modern application deployment.
- Containers: Container technologies are the latest compute services that offer the greatest benefits. We'll cover them later in the book.
Container technologies help a lot in achieving the desired benefits of new application architecture paradigms such as microservice architecture. Microservice architecture concepts were not new, but with container technologies, they become brighter and make much more sense.
Container technology introduces a need for a container orchestration platform to manage container deployment on a large scale. Kubernetes is an open source container orchestration engine designed to deploy and manage containerized services on a large scale.
There are other container orchestration platforms available on the market offered as commercial solutions, such as AWS ECS – Elastic Container Services, or open source, such as Apache Mesos or Docker Swarm. However, Kubernetes has proven to be the best and is the market-leading container orchestration platform with huge technical community support.
- Serverless: Serverless means there is a server, but the less part of serverless means you do not manage that server at all.
In the serverless paradigm, the application developer will focus on the code only, be it Java, Python, C#, and so on. Then, when it comes to deploying that code onto a server-side platform, the developer simply uploads the code artifacts to the serverless provider (whether it is a public cloud provider or a private cloud provider). The serverless provider behind the scenes will deploy that code to a server managed by that provider.
Serverless technologies, also called Function as a Service (FaaS), typically run and manage containers at scale behind the scenes to deploy and run the user's uploaded code.
Serverless usually follows an event-driven architecture paradigm. To execute the uploaded code, an event must be triggered to notify the serverless service to execute your code and return its result, or it can be triggered in a scheduled manner.
Examples of serverless offerings are AWS Lambda, Azure Functions, and Google Cloud Functions.
As part of the IoT solution design phase, you have to choose the compute service you require for your solution. You might prefer containers over virtual machines or vice versa, you might choose both for different applications' requirements, or you may prefer serverless for a quick start and to avoid server maintenance, and so on.
Database services
In modern applications, no one database engine or type fits all data and applications' purposes. Currently, there are lots of database engines for different uses and purposes, such as the following:
- Relational databases or SQL databases: This is the oldest and most well-known, with RDBMSs such as Oracle DB, Microsoft SQL Server, My SQL, MariaDB, PostgreSQL, and many more. Such kinds of database engines are usually used in traditional applications, ERP, and e-commerce solutions.
- Non-relational (or NoSQL) databases: As the name suggests, this other type of database engine emerged to solve some of the limitations associated with relational database engines, including scalability and availability.
There are many forms of NoSQL databases, including the following:
a. Key-value databases are commonly used in online gaming apps and high-traffic web apps.
b. Document databases are commonly used in content management systems and user profiles.
c. Graph databases are commonly used in social networking, recommendation, and fraud detection apps.
d. In-memory databases are commonly used in caching, session management, and gaming.
e. Time series databases are commonly used in IoT and industrial telemetry apps.
f. Ledger databases are commonly used in supply chain, registration, and banking transactions.
We will discuss these databases in more detail later and shed light on which one to choose during the design phase of an IoT solution. The IoT solution architect or designer should evaluate the different database options for the IoT solution based on the requirements at hand.
Middleware and integration services
This layer of the solution provides integration or middleware systems to connect IoT solution systems and integrate third-party and external systems as well.
There are many integration systems available, as per the following:
- Legacy middleware platforms: Such platforms were introduced earlier to support the integration of Service Oriented Architecture (SOA)-based applications. Such platforms host many integration services or features such as service bus, orchestration (the Business Process Execution Language (BPEL) engine), and many more integration features are included in the middleware platform, making it heavy or monolithic.
You might need one or two features out of all the included or built-in features of a single middleware platform. This makes decision makers think a lot before choosing between such kinds of platforms or the modern option, that is, an API gateway.
It is worth mentioning that companies behind such legacy middleware platforms have taken serious steps to modernize them to fit with new microservice and cloud-native architecture paradigms.
- API Gateways: With the new microservice-based architecture and API-first architecture paradigms, the need for a lightweight middleware-like component increased to avoid microservices' direct communication and to provide gateway features in terms of security, routing, and tracing. The API gateway was the component introduced into modern application architectures to offer those features. An API gateway can also be classified as a strong or a smart gateway.
- Event Bus / Message Queues: Message queues introduce significant benefits in terms of decoupling and scaling application components.
- Stream Processing Engine: The Pub-Sub (or publish-subscribe) pattern is one of the microservice-based application communications patterns. The other microservice communication pattern is the Sync pattern, where a microservice calls each other microservice's exposed APIs directly over the HTTP(s) protocol.
Pub-sub fits very well with microservice-based applications as microservices are usually developed, deployed, and scaled in standalone mode. Each microservice can trigger or broadcast events related to its operations and, based on the events triggered, the other microservices interested in such events can subscribe and listen to those events and act accordingly; for example, an order-created event triggered by a cart microservice. Then, other microservices, such as payment, inventory, and credit checkers that are interested in, and listening to, those kinds of events (that is, order-created events) can then start their functionality on the created order.
Apache Kafka is the most well-known streaming processing engine used on the market.
- Service Mesh: A new technology introduced to help service-to-service communications, the service mesh concept came into the picture to work side by side with API gateways in a very good and complete integration architecture pattern where an API gateway is used for external integration, while a service mesh is used for internal service-to-service communication. We will explain microservices, service meshes, and API gateways later in the book.
IoT applications and visualization
This layer hosts IoT applications, visualizations, and dashboards. In other words, this layer is the layer responsible for building what the end user will see and interact with, so it is an important layer in the whole IoT solution.
An E2E IoT solution is complex and requires many systems and solutions in order to be delivered. But if you think about IoT end users or consumers who will use the IoT solution in the end, you'll realize that those end users will interact mainly with the IoT applications of the solution in the form of a mobile or web app. Hence, the excellent customer experience that every enterprise or organization looks for will be driven mainly by that IoT application layer's Key Performance Indicators (KPIs).
To give an example, in your E2E IoT solution, you might have some problems in the connectivity layer, so, for the end user in that case, you could offer what is called a digital twin or device shadow to let the end user keep interacting with the IoT solution as normal, as if the IoT device is still connected. And when the device reconnects, it will read the instructions sent while it was offline from the device shadow service and apply what is needed. Or, let's say you have a problem in the device layer; for example, devices are not reachable at all and no telemetry is received from devices in the configured time.
In this instance, the entire IoT solution shouldn't go into stop or failure mode; you could still offer IoT application layer components to the end user, such as mobile and web apps, until you fix the device issue, but in that case, they will be in read-only mode, that is, the dashboards and reporting could still be offered to the end user. Yes, it will show out-of-date data, but that's better than shutting down the whole solution and losing end user engagement. The user might just want to check some historical reports or something.
In this layer, there are so many solutions and systems to be considered in building and delivering IoT solution applications. Let's briefly go through such components.
Dashboards and visualizations
There are many solutions when it comes to building IoT solution dashboards and visualization. Here, we'll mention a few:
- Ready-made IoT dashboard solutions: In this kind of application, IoT developers will be given a platform that has many built-in controls and widgets and they just need to configure or customize those controls and widgets with the IoT data source. This is a drag-and-drop kind of development.
There are commercial and non-commercial IoT platforms that provide such features as ThingWorx (commercial) and ThingsBoard (open source) and many others.
- Low-code solutions: Another option could be using generic low-code solutions to build the IoT solution dashboard and visualizations. Platforms include Microsoft PowerApps, Mendix, Appian, and many others besides
Usually, the two options above, that is, ready-made dashboards or an application built by a low-code platform, are used to quickly start the development cycle of IoT applications without the need for highly-skilled backend or full stack software developers to build the IoT applications from scratch.
- Do It Yourself (DIY) Apps: In this option, the company or the software vendor's developers will build the IoT solution dashboards, visualization, and apps from scratch.
Here, the IoT developers will use different frameworks and solutions to build IoT apps, such as the following:
a) Web and mobile frameworks – frontend.
b) Technologies and frameworks such as JavaScript, React, Angular, Vue, React Native, Ionic, Flutter, and so many others.
c) Microservices and backend APIs.
d) eTechnologies and frameworks such as Spring Boot with Spring Cloud, Flask, NodeJS, .NET, and so many other frameworks and technologies that are available on the market to build microservice-based applications.
e) GraphQL (read-only) works side by side with API gateways (read/write). It makes it easy for applications to get the data they need efficiently without chattiness between a client (for example, the browser) and backend server or backend API.
Now that we've discussed IoT applications and visualizations, let's move on to exploring IoT device management applications.
IoT device management applications
Managing an IoT device in terms of device provisioning, configuring, maintaining, authenticating, and monitoring is one of the mandatory IoT solution requirements. It is rare to find an IoT solution without a device management component in its architecture and ecosystems.
There are two options for IoT device management solutions:
- The buy option: There are lots of IoT device management solutions available on the market that cover most of the device management requirements and features.
- The build option: You can build an IoT device management solution in-house or with a software partner.
Without connectivity, there's no IoT solution. Connectivity is everywhere in an IoT solution and it acts as a glue between all IoT solution layers. Let's look at this in the next section.
IoT connectivity layer
Solution architects and designers should cover different connectivity options required in the IoT solution and how to connect IoT endpoint devices to edge devices and edge devices to the IoT backend cloud, which should cover which wireless (or wired) technology and communication protocols are to be used.
This layer will be detailed later in the book.
Security and identity and access control
Like connectivity, security and access control is a must-have requirement in any IoT solution. Different IoT solution components must incorporate security requirements into their design and delivery from day 1 or day zero. An IoT security breach is massive and dangerous in its impacts. Think about the hacking of connected cars and what could happen if a hacker has full control of a car while you are driving. What about switching off smart city streetlights or switching off electric grids and so on? It is serious, isn't it? We are now talking about systems that affect people's lives directly, not just traditional websites and online services.
Solution architects, designers, and developers should include all the required security and access controls in all IoT solution components. Topics such as authentication, authorization, malware protection, auditing, access control policies, and data protection in transit and at rest should be fully covered in the IoT solution.
Those are the five layers or the solution building blocks of any IoT solution. There are some other additional technological frameworks and tools used across all those layers, but they are mainly part of, or driven by, the delivery process used in the delivery of the IoT solution. For example, if the organization you are working in follows DevOps or DevSecOps practices, then developers and/or DevOps engineers will use things such as the following.
Infrastructure as code
Infrastructure as code means treating the infrastructure of the IoT solution the same way you treat the solution's source code. In other words, the infrastructure code or scripts are maintained in version control the same as the solution's source code, which will give us lots of benefits, such as faster time to production/faster time to market, improved consistency, less configuration drift, and modern app deployment methods. And finally, it improves the automation of the entire solution deployment.
There are many tools used for infrastructure as code. The most famous infrastructure or cloud-agnostic and open source tool is Hashicorp Terraform.
CI/CD (Continuous Integration / Continuous Delivery) pipelines
A solution or project's CI/CD pipeline tools usually include the following:
- Source code repository or version control: This is where the project or the solution's code artifacts will be stored and managed. Infrastructure code or scripts can also be hosted in this repository. There are so many source code repositories available on the market. The most common and famous repositories are those built in a Git repository, such as GitHub or GitLab.
- Build and test tool: When it comes to the build stage of the CI/CD pipeline, we have so many options and tools available to build the solution code and other solution artifacts, such as Maven, Ant, Docker, Gradle, Packer, and many more tools available on the market.
Usually, unit test scripts and integration scripts run as part of the building stage.
- Continuous Integration tool: This tool is the brain or the orchestrator of the whole CI/CD pipeline. It triggers the pipeline once developers commit code or based on a specific schedule in the Git repository, and then it executes the build and testing scripts. It then deploys the project binary and artifacts generated from the build stage to testing, staging, and finally to production environments if configured to do so, that is, without manual reviews. There are many tools, such as Jenkins, CircleCI, Bamboo, TeamCity, and others available on the market to do that job.
- Artifact repository: This is a tool to store and manage project or solution artifact deliverables. Usually, they are packaged in binary format. Such packages are basically the output of the build stage of the CI/CD pipeline. Examples include Docker images and the binary of the software. There are many tools on the market, such as Artifactory, Nexus, and many others in that domain.
In the next section, we will discuss IoT solution design patterns.