Understand Protocol Buffer basics

Riteek Srivastav
3 min readSep 23, 2017

--

In software development we often transfer data among servers or between servers and clients. And for the same we use different mechanisms and data formats. Protocol buffer is one of them. It is like JSON and XML but much faster than both. So here I will be talking about what it is, why is it faster than the others(JSON/XML) and why people are shifting towards it for most of the cases.

Initially proto buff was developed by google in early 2001 for internal server request/response protocol. Prior to this they were using marshalling and unmarshalling of request/response.(If you are curious to know about marshalling and unmarshalling please read this, otherwise it’s not very much relevant here). Later on 7 July 2008 it was made public.

Protocol buffer is a mechanism of serialising the data in faster and simpler way. It involves an interface description language to describe the structure of the data. We define our data structure or messages in .proto file and compile it with proto compiler (of your preferred language).This compilation generates code that is invoked by sender or recipient. The compiled code will define classes for each message and service that .proto file defines. A schema for a particular use of protocol buffers associates data types with field names, using integers to identify each field. A .proto file looks something like this.

message Point {
required int32 x = 1;
required int32 y = 2;
optional string label = 3;
}

message Line {
required Point start = 1;
required Point end = 2;
optional string label = 3;
}

message Polyline {
repeated Point point = 1;
optional string label = 2;
}
// The above code follows the proto2 version

In the above snippet “Point” message has two mandatory data items x and y and has a label which is optional. Similar to Point, Line and Polyline has data items and label. Both contain Point and demonstrate how composition works in Protocol Buffers. Here I have used three fields

  1. required : used for mandatory data items,
  2. options : used for optional data item whose value may or may not present,
  3. repeated : it behaves like a vector.

After this we run the protocol buffer compiler for application’s language on the above .proto file to generate data access classes. These provide simple accessors for each field (like x() and set_x()). After compilation of above .proto file three classes will be generated; Point, Line and Polyline. We can then use these classes in our application to populate, serialise, and retrieve protocol buffer messages. We can set the data like this(for cpp).

Point point;
point.set_x(1);
point.set_y(0);
point.set_label("Unit length line from origin on x axis");
fstream output("myfile", ios::out | ios::binary);
point.SerializeToOstream(&output);

And later on we can read our message back in by using below code

fstream input("myfile", ios::in | ios::binary);
Point point;
point.ParseFromIstream(&input);
cout << "x_coordinate: " << point.x()<< endl;
cout << "y_coordinate: " << point.y() << endl;
cout << "Description: " << point.level() << endl;

We can add new fields to a message format without breaking backwards-compatibility, old binaries simply ignore the new field when parsing. This way it is backward and forward compatible. The flow of protocol buffer will look something like this

        [Msg]                                         [Msg] 
| |
| |
| |
v v
[Encoding]----------------------------------->[Decoding]
^ ^
| |
| |
|____________________________________________|
|
|
|
[Proto]

If you want to know about how encoding/decoding works then please see this, it’s interesting.

Protocol buffer has two versions proto 2 and proto 3. You might be thinking that why they started with version 2, the reason is that the initial version of protocol buffers (“Proto1”) was developed in Google starting in early 2001 for the internal use so it was private.

It is much faster because it doesn’t send field names, rather the integers used to serialise those fields. On other side receiver also has the schema to decode the data received. One of the disadvantage I observed is that in case of protocol buffer sender and receiver both should have the knowledge of the schema. If you further want to know why protocol buffer is better than JSON or XML, simply google it, you will get a lot of links and answers. This is a good link for the comparison between JSON, XML and Protocol buffer.

--

--

Riteek Srivastav

Writing or applying is the best way to validate your learning.