As we now understand what serverless architectures and pipelines look like, how they may be leveraged into existing architectures, and also how microservices help keep architectures leaner and boost developer productivity, we shall look at the pros and cons of serverless systems in detail, so that software developers and architects can make decisions regarding when to leverage the serverless paradigm into their existing systems and when not to.
The positives of serverless systems are:
- Lower infrastructure costs: By deploying serverless systems, the infrastructure costs can be greatly optimized, as there would not be a need for servers to be running around the clock every day. As the servers start whenever the function is triggered, and stop whenever the function gets executed successfully, the billing would only be done for that brief time period when the function was running.
- Less maintenance needed: By virtue of the preceding reason, there is also no need for continuous monitoring and maintenance of servers. As the functions and triggers are automated, there is almost zero maintenance required for serverless systems.
- Higher developer productivity: As the developers don't need to worry about downtime and server maintenance, they can focus and work on better software challenges, such as scaling and designing functionalities.
The remaining part of the book will show you how serverless systems are changing the way software is done. So, as this chapter is intended to help architects decide whether serverless systems are a good choice for their architecture or not, we shall now look at the disadvantages of serverless systems.
The disadvantages of serverless systems are:
- Time limit of the function: The function which is whether executed, be it AWS's Lambda or GCP's cloud functions, has an upper time limit of 5 minutes. This makes execution of heavy computations impossible. However, this can be solved by executing a provisioning tool's playbook in nohup mode. This will be covered in detail, later in the chapter. However, making the playbook ready and setting up the container and anything else should be completed within the 5 minute time limit. The container gets automatically killed when the 5 minute limit is exceeded.
- No control over the container environment: The developer has no control over the environment of the container that is being created for executing the function. The operating system, the filesystem, and so on, are all decided by the cloud provider. For example, AWS's Lambda functions are executed inside containers that run the Amazon Linux operating system.
- Monitoring containers: Apart from the basic monitoring capabilities that are provided by the cloud provider via their in-house monitoring tools, there is no mechanism to do detailed monitoring of the container that is executing the serverless function. This becomes even more difficult when scaling up serverless systems to accommodate distributed systems.
- No control on security: There is no control on how the security of the data flow is ensured, as there is very little control over the container's environment. The container can be run in the VPC and subnets of the developer's choice, though, which helps work around this disadvantage.
However, serverless systems can be scaled up to distributed systems for large- scale computations where the developer need not worry about the time limit. As already mentioned, this will be discussed in detail in the upcoming chapters. However, for insight into an intuition on how one can choose serverless systems over monolithic systems for large-scale computations, let us understand some important pointers that need to be kept in mind when taking that architectural decision.
The pointers to be kept in mind when scaling serverless systems to distributed systems are:
- To scale up serverless systems into serverless distributed systems, one must understand how the concept of nohup works. It is a POSIX command that allows programs and processes to run in the background.
- Nohup processes should be properly logged, including both the output and the error logs. This is the only source of information for your processes.
- A provisioning tool, such as Ansible or Chef or a similar one, needs to be leveraged to create a master-workers architecture which has been spawned via the playbook running in nohup mode in the container where the serverless function is being executed.
- It is a good practice to ensure that all tasks that are being executed by the provisioning tool via the master server are properly monitored and logged, as there is no way one can retrieve the logs once the entire setup finishes executing.
- Proper security needs to be ensured by using a temporary credential facility available from the cloud providers.
- Proper closure should be ensured for the system. The workers and the master should self-terminate immediately after the pipeline of tasks finishes executing. This is very important and this is what makes the system serverless.
- Generally, temporary credentials come with an expiry time, which is 3,600 seconds for most environments. So, if the developer is using temporary credentials to execute a task which is supposed to take more than the expiry time, then there is a danger of the credentials getting expired.
- Debugging distributed serverless systems is an extremely difficult task for the following reasons:
- Monitoring and debugging a nohup process is extremely difficult. Whenever you want to debug one, you have to either refer to the log file created by the process or kill the nohup process by using the process ID, and then manually run the scripts for debugging.
- As the complete list of tasks executes sequentially in the provisioning tool, there is a danger that the instances may get terminated because the developer has forgotten to kill the nohup process before starting the debugging process.
- As this is a distributed system, it goes without saying that the architecture should be able to self-heal in the case of any failure or a disaster. An example scenario can be when one of the workers goes down while performing some operation on a bunch of files. The entire bunch of files is now lost, and there is no means of recovery.
- Another advanced disaster scenario can be when two worker servers go down while performing some operations on a bunch of files. In this case, the developer does not know which files have been executed successfully and which haven't.
- It is a good practice to ensure that all the worker instances receive an equal amount of the load to execute so that the load across the distributed system stays even and time and resources are well optimized.