Go

The level of scale at Google is unprecedented. There are millions of lines of code and thousands of engineers working on it. In such an environment where there are a lot of changes done by different people, a lot of software engineering challenges will crop up—in particular, the following:

Code becomes hard to read and poorly documented. Contracts between components cannot be easily inferred.
Builds are slow. The development cycles of code-compile-test grow in difficulty, with inefficiency in modeling concurrent systems, as writing efficient code with synchronization primitives is tough.
Manual memory management often leads to bugs.
There are uncontrolled dependencies.
There is a variety of programming styles due to multiple ways of doing something, leading to difficulty in code reviews, among other things.

The Go programming language was conceived in late 2007 by Robert Griesemer, Rob Pike, and Ken Thompson, as an open source programming language that aims to simplify programming and make it fun again. It's sponsored by Google, but is a true open source project—it commits from Google first, to the open source projects, and then the public repository is imported internally.

The language was designed by and for people who write, read, debug, and maintain large software systems. It's a statically-typed, compiled language with built-in concurrency and garbage collection as first-class citizens. Some developers, including myself, find beauty in its minimalistic and expressive design. Others cringe at things such as a lack of generics.

Since its inception, Go has been in constant development, and already has a considerable amount of industry support. It's used in real systems in multiple web-scale applications (image source: https://madnight.github.io/githut/):

For a quick summary of what has made Go popular, you can refer to the WHY GO? section at https://smartbear.com/blog/develop/an-introduction-to-the-go-language-boldly-going-wh/.

We will now quickly recap the individual features of the language, before we start looking at how to utilize them to architect and engineer software in the rest of this book.

The following sections do not cover Go's syntax exhaustively; they are just meant as a recap. If you're very new to Go, you can take a tour of Go, available at https://tour.golang.org/welcome/1, while reading the following sections.

Hello World!

No introduction to any language is complete without the canonical Hello World program (http://en.wikipedia.org/wiki/Hello_world). This programs starts off by defining a package called main, then imports the standard Go input/output formatting package (fmt), and lastly, defines the main function, which is the standard entry point for every Go program. The main function here just outputs Hello World!:

package main

import "fmt"

func main() {
    fmt.Println("Hello World!")
}

Go was designed with the explicit object of having clean, minimal code. Hence, compared to other languages in the C family, its grammar is modest in size, with about 25 keywords.

"Less is EXPONENTIALLY more."

- Robert Pike

Go statements are generally C-like, and most of the primitives should feel familiar to programmers accustomed to languages such as C and Java. This makes it easy for non-Go developers to pick up things quickly. That said, Go makes many changes to C semantics, mostly to avoid the reliability pitfalls associated with low-level resource management (that is, memory allocation, pointer arithmetic, and implicit conversions), with the aim of increasing robustness. Also, despite syntactical similarity, Go introduces many modern constructs, including concurrency and garbage collection.

Data types and structures

Go supports many elementary data types, including int, bool, int32, and float64. One of the most obvious points where the language specification diverges from the familiar C/Java syntax is where, in the declaration syntax, the declared name appears before the type. For example, consider the following snippet:

var count int

It declares a count variable of the integer type (int). When the type of a variable is unambiguous from the initial value, then Go offers a shorted variable declaration syntax pi := 3.14.

It's important to note the language is strongly typed, so the following code, for example, would not compile:

var a int = 10

var b int32 = 20

c := a + b

One unique data type in Go is the error type. It's used to store errors, and there is a helpful package called errors for working with the variables of this type:

err := errors.New("Some Error")
if err != nil {
   fmt.Print(err)
}

Go, like C, gives the programmer control over pointers. For example, the following code denotes the layout of a point structure and a pointer to a Point Struct:

type Point Struct {
    X, Y int
}

Go also supports compound data structures, such as string, map, array, and slice natively. The language runtime handles the details of memory management and provides the programmer with native types to work with:

var a[10]int  // an array of type [10]int

a[0] = 1      // array is 0-based

a[1] = 2      // assign value to element

var aSlice []int // slice is like an array, but without upfront sizing

var ranks map[string]int = make(map[string]int) // make allocates the map
ranks["Joe"] = 1  // set
ranks["Jane"] = 2
rankOfJoe := ranks["Joe"] // get

string s = "something"
suff := "new"
fullString := s + suff // + is concatenation for string

Go has two operators, make() and new(), which can be confusing. new() just allocates memory, whereas make() initializes structures such as map. make() hence needs to be used with maps, slices, or channels.
Slices are internally handled as Struct, with fields defining the current start of the memory extent, the current length, and the extent.

Functions and methods

As in the C/C++ world, there are code blocks called functions. They are defined by the func keyword. They have a name, some parameters, the main body of code, and optionally, a list of results. The following code block defines a function to calculate the area of a circle:

func area(radius int) float64 {
    var pi float64 = 3.14
    return pi*radius*radius
}

It accepts a single variable, radius, of the int type, and returns a single float64 value. Within the function, a variable called pi of the float64 type is declared.

Functions in Go can return multiple values. A common case is to return the function result and an error value as a pair, as seen in the following example:

func GetX() (x X, err error)

myX, err := GetX()
if err != nil {
     ... 
}

Go is an object-oriented language and has concepts of structures and methods. A struct is analogous to a class and encapsulates data and related operations. For example, consider the following snippet:

type Circle struct {
    Radius int
    color String
}

It defines a Circle structure with two members and fields:

Radius, which is of the int type and is public
color, which is of the String type and is private

We shall look at class design and public/private visibility in more detail in Chapter 3, Design Patterns.

A method is a function with a special parameter (called a receiver), which can be passed to the function using the standard dot notation. This receiver is analogous to the self or this keyword in other languages.

Method declaration syntax places the receiver in parentheses before the function name. Here is the preceding Area function declared as a method:

func (c Circle) Area() float64 {
    var pi float64 = 3.14
    return pi*c.radius*c.radius
}

Receivers can either be pointers (reference) or non-pointers (value). Pointer references are useful in the same way as normal pass-by-reference variables, should you want to modify struct, or if the size of struct is large, and so on. In the previous example of Area(), the c Circle receiver is passed by value. If we passed it as c * Circle, it would be pass by reference.

Finally, on the subject of functions, it's important to note that Go has first-class functions and closures:

areaSquared := func(radius int) float64 {  
    return area*area
}

There is one design decision in the function syntax that points to one of my favorite design idioms in Go—keep things explicit. With default arguments, it becomes easy to patch API contracts and overload functions. This allows for easy wins in the short term, but leads to complicated, entangled code in the long run. Go encourages developers to use separate functions, with clear names, for each such requirement. This makes the code a lot more readable. If we really need such overloading and a single function that accepts a varied number of arguments, then we can utilize Go's type-safe variadic functions.

Flow control

The main stay of flow control in code is the familiar if statement. Go has the if statement, but does not mandate parentheses for conditions. Consider the following example:

if val > 100 {
   fmt.Println("val is greater than 100")
} else {
   fmt.Println("val is less than or equal to 100")
}

To define loops, there is only one iteration keyword, for. There are no while or do...while keywords that we see in other languages, such as C or Java. This is in line with the Golang design principles of minimalism and simplicity—whatever we can do with a while loop, the same can be achieved with a for loop, so why have two constructs? The syntax is as follows:

func naiveSum(n Int) (int){
   sum := 0;
   for i:=0; i < n ; i++ {
       sum += index
   }
   return sum
}

As you can see, again, there are no parentheses around the loop conditions. Also, the i variable is defined for the scope of the loop (with i:= 0). This syntax will be familiar to C++ or Java programmers.

Note that the for loop need not strictly follow the three-tuple initial version (declaration, check, increment). It can simply be a check, as with a while loop in other languages:

i:= 0
for i <= 2 {
    fmt.Println(i)
    i = i + 1
}

And finally, a while(true) statement looks like this:

for {
    // forever
}

There is a range operator that allows iterations of arrays and maps. The operator is seen in action for maps here:

// range over the keys (k) and values (v) of myMAp
for k,v := range myMap {
   fmt.Println("key:",k)
   fmt.Println("val:",v)
}

// just range over keys
for key := range myMap {
    fmt.Println("Got Key :", key)
}

The same operator works in an intuitive fashion for arrays:

    input := []int{100, 200, 300}
    
    // iterate the array and get both the index and the element
    for i, n := range input {
        if n == 200 {
            fmt.Println("200 is at index : ", i)
        }
    }

    sum := 0
    // in this iteration, the index is skipped, it's not needed
    for _, n := range input {
        sum += n
    }
    fmt.Println("sum:", sum)

Packaging

In Go, code is binned into packages. These packages provide a namespaces for code. Every Go source file, for instance, encoding/json/json.go, starts with a package clause, like this:

 package json

Here, json is the package name, a simple identifier. Package names are usually concise.

Packages are rarely in isolation; they have dependencies. If code in one package wants to use something from a different package, then the dependency needs to be called out explicitly. The dependent packages can be other packages from the same project, a Golang standard package, or from a third-party package on GitHub. To declare dependent packages, after the package clause, each source file may have one or more import statements, comprising the import keyword and the package identifier:

import "encoding/json”

One important design decision in Go, dependency-wise, is that the language specification requires unused dependencies to be declared as a compile-time error (not a warning, like most other build systems). If the source file imports a package it doesn't use, the program will not compile. This was done to speed up build times by making the compiler work on only those packages that are needed. For programmers, it also means that code tends to be cleaner, with less unused imports piling up. The flip side is that, if you're experimenting with different packages while coding, you may find the compiler errors irritating!

Once a package has been imported, the package name qualifies items from the package in the source file being imported:

var dec = json.NewDecoder(reader)

Go takes an unusual approach to defining the visibility of identifiers (functions/variables) inside a package. Unlike private and public keywords, in Go, the name itself carries the visibility definition. The case of the initial letter of the identifier determines the visibility. If the initial character is an uppercase letter, then the identifier is public and is exported out of the package. Such identifiers can be used outside of the package. All other identifiers are not visible (and hence not usable) outside of the host package. Consider the following snippet:

package circles

func AreaOf(c Circle) float64 {
}

func colorOf(c Circle) string {
}

In the preceding code block, the AreaOf function is exported and visible outside of the circles package, but colorOf is visible only within the package.

We shall look at packing Go code in greater detail in Chapter 3, Design Patterns.

Concurrency

Real life is concurrent. With API-driven interactions and multi-core machines, any non-trivial program written today needs to be able to schedule multiple operations in parallel, and these need to happen concurrently using the available cores. Languages such as C++ or Java did not have language-level support for concurrency for a long time. Recently, Java 8 has added support for parallelism with stream processing, but it still follows an inefficient fork-join process, and communication between parallel streams is difficult to engineer.

Communicating Sequential Processes (CSP) is a formal language for describing patterns of interaction in concurrent systems. It was first described in a 1978 paper by Tony Hoare. The key concept in CSP is that of a process. Essentially, code inside a process is sequential. At some point in time, this code can start another process. Many times, these processes need to communicate. CSP promotes the message-passing paradigm of communication, as compared to the shared memory and locks paradigm for communication. Shared memory models, like the one depicted in the following diagram, are fraught with risks:

It's easy to get deadlock and corruption if a process misbehaves or crashes inside a critical section. Such systems also experience difficulty in recovering from failure.

In contrast, CSP promotes messages passing using the concept of channels, which are essentially queues with a simple logical interface of send() and recv(). These operations can be blocking. This model is described in this following:

Go uses a variant of CSP with first-class channels. Procedures are called goroutines. Go enables code, which is mostly regular procedural code, but allows concurrent composition using independently executing functions (goroutines). In procedural programming, we can just call a function inline; however, with Go, we can also spawn a goroutine out of the function and have it execute independently.

Channels are also first-class Go primitives. Sharing is legal and passing a pointer over a channel is idiomatic (and efficient).

The main() function itself is a goroutine, and a new goroutine can be spawned using the go keyword. For example, the snippet below modifies the Hello World program to spawn a goroutine:

package main

import (
    "fmt"
    "time"
)

func say(what string){
    fmt.Println(what)
}

func main() {
    message := "Hello world!"
    go say(message)
    time.Sleep(5*time.Second)
}

Note that, after the go say(message) statement is executed, the main() goroutine immediately proceeds to the next statement. The time.Sleep() function is important here to prevent the program from exiting! An illustration of goroutines is shown in the following diagram:

We shall look at channels and more concurrency constructs in Chapter 4, Scaling Applications.

Garbage collection

Go has no explicit memory-freeing operation: the only way allocated memory can be returned to the pools is via garbage collection. In a concurrent system, this is an must-have feature, because the ownership of an object might change (with multiple references) in non-obvious ways. This allows programmers to concentrate on modeling and coding the concurrent aspects of the system, without having to bother about pesky resource management details. Of course, garbage collection brings in implementation complexity and latency. Nonetheless, ultimately, the language is much easier to use because of garbage collection.

Not everything thing is freed on the programmer's behalf. Sometimes, the programmer has to make explicit calls to enable the freeing of an object's memory.

Object-orientation

The Go authors felt that the normal type-hierarchy model of software development is easy to abuse. For example, consider the following class and the related description:

Coding in such large class hierarchies usually generates brittle code. Early decisions become very hard to change, and base class changes can have devastating consequences further down the line. However, the irony is that, early on, all of the requirements might not be clear, nor the system well understood enough, to allow for great base class design.

The Go way of object-orientation is : composition over inheritance.

For polymorphic behavior, Go uses interfaces and duck typing:

"If it looks like a duck and quacks like a duck, it's a duck."

Duck typing implies that any class that has all of the methods that an interfaces advertises can be said to implement the said interface.

We shall look at more detail on object-orientation in Go later on in Chapter 3, Design Patterns.