Much of Go's performance stance is gained from concurrency and parallelism. Goroutines and channels are often used to perform many requests in parallel. The tools available for Go help to achieve near C-like performance, with very readable semantics. This is one of the many reasons that Go is commonly used by developers in large-scale solutions.
The ideology behind Go performance
Goroutines – performance from the start
When Go was conceived, multi-core processors were beginning to become more and more commonplace in commercially available commodity hardware. The authors of the Go language recognized a need for concurrency within their new language. Go makes concurrent programming easy with goroutines and channels (which we will discuss in Chapter 3, Understanding Concurrency). Goroutines, lightweight computation threads that are distinct from OS threads, are often described as one of the best features of the language. Goroutines execute their code in parallel and complete when their work is done. The startup time for a goroutine is faster than the startup time for a thread, which allows a lot more concurrent work to occur within your program. Compared to a language such as Java that relies on OS threads, Go can be much more efficient with its multiprocessing model. Go is also intelligent about blocking operations with respect to goroutines. This helps Go to be more performant in memory utilization, garbage collection, and latency. Go's runtime uses the GOMAXPROCS variable to multiplex goroutines onto real OS threads. We will learn more about goroutines in Chapter 2, Data Structures and Algorithms.
Channels – a typed conduit
Channels provide a model to send and receive data between goroutines, whilst skipping past synchronization primitives provided by the underlying platform. With properly thought-out goroutines and channels, we can achieve high performance. Channels can be both buffered and unbuffered, so the developer can pass a dynamic amount of data through an open channel until the value has been received by the receiver, at which time the channel is unblocked by the sender. If the channel is buffered, the sender blocks for the given size of the buffer. Once the buffer has been filled, the sender will unblock the channel. Lastly, the close() function can be invoked to indicate that the channel will not receive any more values. We will learn more about channels in Chapter 3, Understanding Concurrency.
C-comparable performance
Another initial goal was to approach the performance of C for comparable programs. Go also has extensive profiling and tracing tools baked into the language that we'll learn about in Chapter 12, Profiling Go Code, and Chapter 13, Tracing Go Code. Go gives developers the ability to see a breakdown of goroutine usage, channels, memory and CPU utilization, and function calls as they pertain to individual calls. This is valuable because Go makes it easy to troubleshoot performance problems with data and visualizations.
Large-scale distributed systems
Go is often used in large-scale distributed systems due to its operational simplicity and its built-in network primitives in the standard library. Being able to rapidly iterate whilst developing is an essential part of building a robust, scalable system. High network latency is often an issue in distributed systems, and the Go team has worked to try and alleviate this concern on their platform. From standard library network implementations to making gRPC a first-class citizen for passing buffered messaging between clients and servers on a distributed platform, the Go language developers have put distributed systems problems at the forefront of the problem space for their language and have come up with some elegant solutions for these complex problems.