Protobuf is an IDL
Protobuf is a language. More precisely, it is an IDL. It is important to make such a distinction because, as we will see more in detail later, in Protobuf, we do not write any logic the way we do in a programming language, but instead, we write data schemas, which are contracts to be used for serialization and are to be fulfilled by deserialization. So, before explaining all the rules that we need to follow when writing a .proto
file and going through all the details about serialization and deserialization, we need to first get a sense of what an IDL is and what is the goal of such a language.
An IDL, as we saw earlier, is an acronym for Interface Description Language, and as we can see, the name contains three parts. The first part, Interface, describes a piece of code that sits in between two or more applications and hides the complexity of implementation. As such, we do not make any assumptions about the hardware on which an application is running, the OS on which it runs, and in which programming language it is written. This interface is, by design, hardware-, OS-, and language-agnostic. This is important for Protobuf and several other serialization data schemas because it lets developers write the code once and it can be used across different projects.
The second part is Description, and this sits on top of the concept of Interface. Our interface is describing what the two applications can expect to receive and what they are expected to send to each other. This includes describing some types and their properties, the relationship between these types, and the way these types are serialized and deserialized. As this may be a bit abstract, let us look at an example in Protobuf. If we wanted to create a type called Account
that contains an ID, a username, and the rights this account has, we could write the following:
syntax = "proto3"; enum AccountRight { ACCOUNT_RIGHT_UNSPECIFIED = 0; ACCOUNT_RIGHT_READ = 1; ACCOUNT_RIGHT_READ_WRITE = 2; ACCOUNT_RIGHT_ADMIN = 3; } message Account { uint64 id = 1; string username = 2; AccountRight right = 3; }
If we skip some of the details that are not important at this stage, we can see that we define the following:
- An enumeration listing all the possible rights and an extra role called
ACCOUNT_RIGHT_UNSPECIFIED
- A message (equivalent to a class or struct) listing the three properties that an
Account
type should have
Again, without looking at the details, it is readable, and the relationship between Account
and AccountRight
is easy to understand.
Finally, the last part is Language. This is here to say that, as with every language—computer ones or not—we have rules that we need to follow so that another human, or a compiler, can understand our intent. In Protobuf, we write our code to please the compiler (protoc), and then it does all the heavy lifting for us. It will read our code and generate code in the language that we need for our application, and then our user code will interact with the generated code. Let us look at a simplified output of what the Account
type defined previously would give in Go:
type AccountRight int32 const ( AccountRight_ACCOUNT_RIGHT_UNSPECIFIED AccountRight = 0 AccountRight_ACCOUNT_RIGHT_READ AccountRight = 1 AccountRight_ACCOUNT_RIGHT_READ_WRITE AccountRight = 2 AccountRight_ACCOUNT_RIGHT_ADMIN AccountRight = 3 ) type Account struct { Id uint64 `protobuf:"varint,1,…` Username string `protobuf:"bytes,2,…` Right AccountRight `protobuf:"varint,3,…` }
In this code, there are important things to notice. Let us break this code into pieces:
type AccountRight int32 const ( AccountRight_ACCOUNT_RIGHT_UNSPECIFIED AccountRight = 0 AccountRight_ACCOUNT_RIGHT_READ AccountRight = 1 AccountRight_ACCOUNT_RIGHT_READ_WRITE AccountRight = 2 AccountRight_ACCOUNT_RIGHT_ADMIN AccountRight = 3 )
Our AccountRight
enum is defined as constants with values of type int32
. Each enum variant’s name is prefixed with the name of the enum, and each constant has the value that we set after the equals sign in the Protobuf code. These values are called field tags, and we will introduce them later in this chapter.
Now, take a look at the following code:
type Account struct { Id uint64 `protobuf:"varint,1,…` Username string `protobuf:"bytes,2,…` Right AccountRight `protobuf:"varint,3,…` }
Here, we have our Account
message transpiled to a struct with Id
, Username
, and Right
exported fields. Each of these fields has a type that is converted from a Protobuf type to a Golang type. In our example here, Go types and Protobuf types have the exact same names, but it is important to know that in some cases, the types will translate differently. Such an example is double
in Protobuf, which will translate to float64
for Go. Finally, we have the field tags, referenced in the metadata following the field. Once again, their meaning will be explained later in this chapter.
So, to recapitulate, an IDL is a piece of code sitting between different applications and describing objects and their relationships by following certain defined rules. This IDL, in the case of Protobuf, will be read, and it will be used to generate code in another language. And after that, this generated code will be used by the user code to serialize and deserialize data.