A complete journey with Goroutines

7 min readApr 16, 2018

When we want things to be done concurrently in Golang, we use Goroutines.

But wait. What is Concurrency?

Here’s an example: I’m writing this article and feel thirsty, I’ll stop typing and drink water. Then start typing again. Now, I’m dealing with two jobs (typing and drinking water) by some time slice, which is said to be concurrent jobs. Point to be noted here is that these two tasks (writing and drinking) are not being done at the same time. When things are being done at the same time it’s called parallelism.(Think checking your mobile AND eating chips).

So concurrency is dealing with multiple things at once (does not need to be done at the same time) with some time schedule and parallelism(doing multiple things at the same time) is a subset of this. If you are keen to know more about the difference between concurrency and parallelism, I highly recommend this video.

Overview

In this blog, I will be talking about:

What are Goroutines?
How are they different from threads?
Scheduling Goroutines
Common mistakes while using Goroutines and how to avoid them
Keywords

What are Goroutines?

Goroutines are a way of doing tasks concurrently in golang. They allow us to create and run multiple methods or functions concurrently in the same address space inexpensively. The idea of Goroutines was inspired by Coroutines. In my opinion, the only difference is Coroutines supports the explicit mean of transferring the control to other Coroutines while Goroutines have it implicitly (This point will get clearer in the scheduling section of this blog).

Goroutines are lightweight abstractions over threads because their creation and destruction are very cheap as compared to threads, and they are scheduled over OS threads. Executing the methods in the background is as easy as prepending the word go in a function call. Here’s a simple example:

Try

The output of the above program will be:

My first goroutine
main function

Remove time.sleepand run the program again. You will notice this output:

main function

To understand this we need to know how Goroutines are executed. In Golang, unlike normal functions, the control does not wait for the Goroutine to finish executing. The Control immediately returns to the next line of the code after a Goroutine call. The main Goroutine (main function is also a Goroutine) should be running for any other Goroutines to run. If the main Goroutine terminates, the program will be terminated and no other Goroutine will run.

Hence, when we are not using time.Sleep() then the controller immediately executes the line fmt.Println(“main function) just after the Goroutine call. After this, the main program gets terminated and we do not get the output of the Goroutine on the terminal.

Using sleep in the main Goroutine is a hack we are using just to understand how Goroutines work. Mainly, we use channels to block the main Goroutine until all other Goroutines finish execution.

How are they different from threads?

Many think Goroutines are faster than threads. This is not entirely correct. It’s not any faster, but it allows you to do things concurrently. And if things are right, (like some processors are free) the runtime will also split the work across multiple processors. So you may get parallelism out of it, too. But think of it as a way of doing things concurrently. If task A is blocked on something (waiting on I/O), the scheduler understands that another Goroutine that’s ready to run can be executed while waiting for the I/O to return.

Goroutines have benefits over threads in: (inspired from here, I am just quoting)

Memory consumption: The creation of Goroutines require much lesser memory as compared to threads. It requires 2kb of memory, while threads requires 1Mb(~500 times as compared to Goroutines). They are designed in a way that stack size of Goroutines can grow and shrink according to the need of an application. There might be only one thread in the program with thousands of Goroutines.

Setup and Teardown cost: Threads have significant setups and teardown costs because it has to request resources from the OS and return it once it’s done. While Goroutines are created and destroyed by the go runtime (it manages scheduling, garbage collection, and the runtime environment for Goroutines) and those operations are pretty cheap.

Switch cost: This difference is mainly because of the difference in the scheduling of Goroutines and threads. Threads are scheduled preemptively (If a process is running for more than a scheduler time slice, it would preempt the process and schedule execution of another runnable process on the same CPU), the schedular needs to save/restore all registers i.e. 16 general purpose registers, PC (Program Counter), SP (Stack Pointer), segment registers etc.

While Goroutines are scheduled cooperatively, (explained in the scheduling section) they do not directly talk to the OS kernel. When a Goroutine switch occurs, very few registers (say 3) like program counter and stack pointer need to be saved/restored. For more details refer to this.

Scheduling of Goroutines

As I have mentioned in the last paragraph, Goroutines are cooperatively scheduled. In cooperative scheduling there is no concept of scheduler time slice. In such scheduling Goroutines yield the control periodically when they are idle or logically blocked in order to run multiple Goroutines concurrently. The switch between Goroutines happen only at well defined points — when an explicit call is made to the Goruntime scheduler. And those well defined points are:

Channels send and receive operations, if those operations would block.
The Go statement, although there is no guarantee that the new Goroutine will be scheduled immediately.
Blocking syscalls like file and network operations.
After being stopped for a garbage collection cycle.

Now let’s see how they are scheduled internally. Go uses three entities to explain the Goroutine scheduling.

Processor (P)
OSThread (M)
Goroutines (G)

In any particular Go application the number of threads available for Goroutines to run is equal to the GOMAXPROCS, which by default is equal to the number of cores available for that application. We can use the runtime package to change this number at runtime as well. OSThreads are scheduled over processors and Goroutines are scheduled over OSThreads as explained in the Fig 2.

Golang has an M:N scheduler that also utilizes multiple processors. At any time, M goroutines need to be scheduled on N OS threads that runs on at most GOMAXPROCSnumbers of processors(N <= GOMAXPROCS). Go scheduler distribute runnable goroutines over multiple worker OS threads that runs on one or more processors.

Every P has a local Goroutine queue. There’s also a global Goroutine queue which contains runnable Goroutines. Each M should be assigned to a P. Ps may have no Ms if they are blocked or in a system call. At any time, there are at most GOMAXPROCS number of P and only one M can run per P. More Ms can be created by the scheduler if required.

In each round of scheduling, scheduler finds a runnable G and executes it. Once a runnable G is found, it gets executed until it is blocked (as explained above).

Search in the local queue
        if not found 
            Try to steal from other Ps' local queue //see Fig 3
            if not found 
                Search in the global queue         Also periodically it searches in the global queue (every ~ 1/70)

Common mistakes while using Goroutines and how to avoid them

Not checking the FD limit and memory limit: You should always check the resources (like file descriptor and memory) limit required for your Go program when you are creating a huge number of Groutines. Otherwise, if the memory limit will exceeds, then a memory leak may occur and if FD limit will exceed, then Goroutines may get stuck.

Running Goroutines with infinite loops: Try executing the following golang code snippet.

Let’s execute the program:

$ GOMAXPROCS=8 go run test.go

We can see that the program never terminates. If we write the same program in C/C++, we will never observe such an issue. Now let’s rerun the program by changing the line: processors = runtime.GOMAXPROCS(0) -1. Now the program will terminate properly and prints the result. This is surprising isn’t it? you can read this blog to read more about it.

Keywords

GOMAXPROCS

In the current version of Go, GOMAXPROCS is used to control the number of threads available for Goroutine execution to a particular go program. The hard limit is still on the number of CPU cores presented to the OS. The GOMAXPROCS option allows you to tune it down. By default, as of 1.5+ or 1.6+, GOMAXPROCS is set to runtime.NumCPU(). Fun trivia: in older versions of Go, it was set to 1, because the scheduler wasn’t as smart and GOMAXPROCS > 1 was extremely detrimental to performance.

Any other hacks you can think of? Please do leave a comment.

And… We’re hiring. Check out gojek.jobs for more.

References