Any system that has a deterministic response to a given event can be considered "real-time." If a system is considered to fail when it doesn't meet a timing requirement, it must be real-time. How failure is defined (and the consequences of a failed system) can vary widely. It is extremely important to realize that real-time requirements can vary widely, both in the speed of the timing requirement and also the severity of consequences if the required real-time deadlines are not met.
What is real-time anyway?
The ranges of timing requirements
To illustrate the range of timing requirements that can be encountered, let's consider a few different systems that acquire readings from analog-to-digital converters (ADCs).
The first system we'll look at is a control system that is set up to control the temperature of a soldering iron (as seen in the following diagram). The parts of the system we're concerned with are the MCU, ADC, sensor, and heater.
The MCU is responsible for the following:
- Taking readings from a temperature sensor via the ADC
- Running a closed-loop control algorithm (to maintain a constant temperature at the soldering iron tip)
- Adjusting the output of the heater as needed
These can be seen in the following diagram:
Since the temperature of the tip doesn't change incredibly quickly, the MCU may only need to acquire 50 ADC samples per second (50 Hz). The control algorithm responsible for adjusting the heater (to maintain a constant temperature) runs at an even slower pace, 5 Hz:
The ADC will assert a hardware line, signaling a conversion has been completed and is ready for the MCU to transfer the reading to its internal memory. The MCU reading the ADC has up to 20 ms to transfer data from the ADC to internal memory before a new reading needs to be taken (as seen in the following diagram). The MCU also needs to be running the control algorithm to calculate the updated values for the heater output at 5 Hz (200 ms). Both of these cases (although not particularly fast) are examples of real-time requirements:
Now, on the other end of the ADC reading spectrum, we could have a high bandwidth network analyzer or oscilloscope that is going to be reading an ADC at a rate of tens of GHz! The raw ADC readings will likely be converted into the frequency domain and graphically displayed on a high-resolution front panel dozens of times a second. A system like this requires huge amounts of processing to be performed and must adhere to extremely tight timing requirements, if it is to function properly.
Somewhere in the middle of the spectrum, you'll find systems such as closed-loop motion controllers, which will typically need to execute their PID control loops between hundreds of Hz to tens of kHz in order to provide stability in a fast-moving system. So, how fast is real-time? Well, as you can see from the ADC examples alone, it depends.
In some of the previous cases, such as the oscilloscope or soldering iron, failure to meet a timing requirement results in poor performance or incorrect data being reported. In the case of the soldering iron, this might be poor temperature control (which could cause damage to components). For the test equipment, missing deadlines could cause erroneous readings, which is a failure. This may not seem like a big deal to some people, but for the users of that equipment, who are relying on the accuracy of the data being reported, it is likely to matter a great deal. Some laboratory equipment that is used in standard verification provides checks for product conformance. If there is an undetected malfunction in the equipment that results in an inaccurate measurement, an incorrect value could be reported. It may be possible for a suspect test to be rerun. Eventually, however, if retesting is required too often and reliable readings can't be counted on, then the test equipment will start to become suspect and viewed as unreliable and sales will decline—all because a real-time requirement wasn't being consistently met.
In other systems, such as the flight control of a UAV or motion control in industrial process control, failing to run the control algorithm in a timely manner could result in something more physically catastrophic, such as a crash. In this case, the consequences are potentially life-threatening.
Thankfully, there are steps that can be taken to avoid all of these failure scenarios.
The ways of guaranteeing real-time behavior
One of the easiest ways to ensure a system does what it is meant to do is to make sure it is as simple as possible while still meeting the requirements. This means resisting the urge to over-complicate a simple task. If a toaster is meant to toast a slice of bread, don't put a display on it and make it tell you the weather too; just have it turn on a heating element for the right amount of time. This simple task has been accomplished for years without requiring any code or programmable devices whatsoever.
As programmers, if we come across a problem, we have a tendency to immediately reach for the nearest MCU and start coding. However, some functions of a product (especially true if a product has electro-mechanical components) are best handled without code at all. A car window doesn't really need an MCU with a polling loop to run, turning on motors through drivers and watching sensors for feedback to shut them off. This task can actually be handled by a few mechanical switches and diodes. If a feedback-reporting mechanism is required for a given system—such as an error that needs to be asserted in the case of a stuck window—then there may be no choice but to use a more complex solution. However, our goal as engineers should always be the same—solve the problem as simply as possible, without adding additional complexity.
If a problem can be solved by hardware alone, then explore that possibility with the team first, before breaking out the MCU. If a problem can be handled by using a simple while loop to perform some polling of the sensor status, then simply poll the sensor for the status; there may be no need to start coding interrupt service routines (ISRs). If the functionality of the device is single-purposed, there are many cases where a full-blown RTOS can simply get in the way—so don't use one!
Types of real-time systems
There are many different ways of achieving real-time behavior. The following section is a discussion on the various types of real-time systems you might encounter. Also note that it is possible to have combinations of the following systems working together as subsystems. These different subsystems can occur at a product, board, or even chip level (this approach is discussed in Chapter 16, Multi-Processor and Multi-Core Systems).
Hardware
The original real-time system, hardware, is still the go-to for extremely tight tolerance and/or fast timing requirements. It can be implemented with discrete digital logic, analog components, programmable logic, or an application-specific integrated component (ASIC). Programmable logic devices (PLDs), complex programmable logic devices (CPLDs), and field-programmable gate arrays (FPGAs) are the various members of the programmable logic device portion of this solution. Hardware-based real-time systems can cover anything from analog filters, closed loop control, and simple state machines to complex video codecs. When implemented with power saving in mind, ASICs can be made to consume less power than an MCU-based solution. In general, hardware has the advantage of performing operations in parallel and instantly (this is, of course, an over-simplification), as opposed to a single-core MCU, which only gives the illusion of parallel processing.
The downsides for real-time hardware development generally include the following:
- The inflexibility of non-programmable devices.
- The expertise required is generally less commonly available than software/firmware developers.
- The cost of full-featured programmable devices (for example, large FPGAs).
- The high cost of developing a custom ASIC.
Bare-metal firmware
Bare-metal firmware is considered (for our purposes) to be any firmware that isn't built on top of a preexisting kernel/scheduler of some type. Some engineers take this a step further, arguing that true bare-metal firmware can't use any preexisting libraries (such as vendor supply hardware abstraction libraries)—there is some merit to this view as well. A bare-metal implementation has the advantage that the user's code has total control of all aspects of the hardware. The only way for the main loop code execution to be interrupted is if an interrupt fires. In this case, the only way for anything else to take control of the CPU is for the existing ISR to finish or for another higher-priority interrupt to fire.
Bare-metal firmware solutions excel when there is a small number of relatively simple tasks to perform—or one monolithic task. If the firmware is kept focused and best practices are followed, deterministic performance is generally easy to measure and guarantee due to the relatively small number of interactions between ISRs (or in some cases, a lack of ISRs). In some extreme cases for heavily loaded MCUs (or MCUs that are highly constrained in ROM/RAM), bare-metal is the only option.
As bare-metal implementations get to be more elaborate when dealing with events asynchronously, they start to overlap with functionality provided by an RTOS. An important consideration to keep in mind is that by using an RTOS—rather than attempting to roll your own thread-safe system—you automatically benefit from all of the testing the RTOS provider has put in. You'll also have the opportunity to use code that has the power of hindsight behind it—all of the RTOSes available today have been around for several years. The authors have been adapting and adding functionality the entire time to make them robust and flexible for different applications.
RTOS-based firmware
Firmware that runs a scheduling kernel on an MCU is RTOS-based firmware. The introduction of the scheduler and some RTOS-primitives allows tasks to operate under the illusion they have the processor to themselves (discussed in detail in Chapter 2, Understanding RTOS Tasks). Using an RTOS enables the system to remain responsive to the most important events while performing other complex tasks in the background.
There are a few downsides to all of these tasks running. Inter-dependencies can arise between tasks sharing data—if not handled properly, the dependency will cause a task to block unexpectedly. Although there are provisions for handling this, it does add complexity to the code. Interrupts will generally use task signaling to take care of the interrupt as quickly as possible and defer as much processing to a task as possible. If handled properly, this solution is excellent for keeping complex systems responsive, despite many complex interactions. However, if handled improperly, this design paradigm can lead to more timing jitter and less determinism.
RTOS-based software
Software running on a full OS that contains a memory management unit (MMU) and central processing unit (CPU) is considered RTOS-based software. Applications that are implemented with this approach can be highly complex, requiring many different interactions between various internal and external systems. The advantage of using a full OS is all of the capability that comes along with it—both hardware and software.
On the hardware side, there are generally more CPU cores available running at higher clock rates. There can be gigabytes of RAM and persistent memory available. Adding peripheral hardware can be as simple as the addition of a card (provided there are pre-existing drivers).
On the software side, there is a plethora of open source and vendor proprietary solutions for networking stacks, UI development, file handling, and so on. Underneath all of this capability and options, the kernel is still implemented in such a way that the critical tasks won't be blocked for an indefinite period of time, which is possible with a traditional OS. Because of this, getting deterministic performance is still within reach, just like with RTOS firmware.
Carefully crafted OS software
Similar to RTOS-based software, a standard OS has all of the libraries and features a developer could ask for. What's missing, however, is a strict focus on meeting timing requirements. Generally speaking, systems implemented with a traditional OS are going to have much less deterministic behavior (and none that can be truly counted on in a safety-critical situation). If there is a lax real-time requirement without catastrophic consequences, if a wishy-washy deadline isn't met on time, a standard OS can be made to work, as long as care is taken in choosing what software stacks are running and their resource use is kept in check. The Linux kernel with PREEMPT_RT patches is a good example of this type of real-time system.
So, now that all of the options for achieving a real-time system have been laid out, it's time to define exactly what we mean when we say RTOS, specifically an MCU-based RTOS.