Dawn of the Thread

The zombies are coming!
11.02.2015
Jasper Timm
Tags

zombie

The zombies are coming!

In my last blog post I discussed a general overview of our use of Go in a project. I thought I’d take the time in this follow up post to discuss one particular issue a little more thoroughly.

The idea of concurrency seems quite popular when people discuss Go. Indeed it seems like it’s one of the primary motivators for people to choose the language in the first place - because it handles concurrency in such a simple fashion.

To be clear actually, Go handles concurrency using what it refers to as goroutines, not threads, where multiple goroutines can run on the same thread. Of course I wasn’t going to let the facts get in the way of a good title though. ‘Dawn of the goroutine’ doesn’t quite have the same ring to it now does it?

Incidentally, the idea of multiple goroutines running on a single thread (which can therefore be run on a single core CPU) is a good example of Concurrency as opposed to Parallelism, often people confuse the two. A good explanation from Rob Pike himself here:

In programming, concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.

If you’re interested, Rob gave a great talk on it: Concurrency vs. Parallelism.

Channels (Go’s plumbing)

Channels are Go’s way of communicating safely between goroutines. They allow one goroutine to send a value down a channel and a concurrent goroutine to receive that value from the channel.

A lot of people use the analogy of a pipe to explain Go’s channels. I think this is a fairly apt analogy, with one clarification: the size of a channel (or the length of the pipe) should not be considered an afterthought or advanced way of using channels - it is an integral property of a channel. The size of a channel dictates how many values it can hold or buffer. For this reason, a channel with a size greater than 0 is referred to as a buffered channel. When the channel is full the sender will block until room is made, when the channel is empty the receiver will block until there is a value in the channel.

Let’s look at an example:

myChannel := make(chan int, 1)

The above code describes a channel which passes integers as values and whose size is 1.

Using our pipe analogy then, what does this look like?
gopher pipe

Initially our pipe is empty, the receiver is not currently listening but is busy with something else and the sender would like to push 3 into the pipe. No problem, there’s no need to wait (block) as there is room for one value in the pipe, it sends the value then continues on with something else. Some time later the receiver arrives to pull a value from the pipe. Again there is no need to wait as there is an existing value in the pipe already, it receives the value and continues, making room in the pipe for a value once again.


However, typically the first example you’ll see of Go’s channels would be something like this:

myChannel := make(chan int)

In an effort to simplify things for beginners to the language, the notion of channel sizes is initially ignored. What this results in is a channel of size 0, i.e. it is equivalent to the following:

myChannel := make(chan int, 0)

As its size is 0 it cannot buffer values at all and is referred to as an unbuffered channel. Oddly enough then, it is always full and empty. Admittedly our pipe analogy struggles a little here, but I like to picture this as a ring (or portal maybe?) through which values travel. As there is no room to store a value, when the sender wishes to send a value it must wait until the receiver is ready to receive. It therefore enforces synchronisation between the sender and receiver. Sometimes this is what you want. Sometimes not…

A goroutine left in limbo

Let’s say you write some code to send a request off to a web server and wait for a response. In an effort not to flood the server with requests, you also implement a pooling mechanism as well, so that only a finite number of requests can be sent simultaneously.

func webRequest(req *Request) *Response {
	resource := pool.Get()
	resp := resource.sendRequest(req)
	pool.Free(resource)

	return resp
}

You grab a resource from the resource pool and use it to send your request. Afterwards you free up the slot in the pool and send back the response. Simple enough and it seems to work fine.

Further down the road you start to get sick of waiting so long when the web server is down and decide to implement a timeout mechanism. You’ve just seen an introduction to Go’s channels and the time.After() function and come up with the following:

func webRequest(req *Request) *Response {
	respChannel := make(chan *Response)

	go func() {
		resource := pool.Get()
		respChannel <- resource.sendRequest(req)
		pool.Free(resource)
	}()

	select {
    case resp := <-respChannel:
    	return resp
    case <-time.After(time.Second * 5):
		logger.Error("Timed out waiting for response!")
		return nil
    }
}

We create a channel to receive the response on, then submit a goroutine which will execute the request and send the response down the channel when complete. Using Go’s select{} statement we then wait on the first channel to send a value: either we get a response from respChannel or time.After completes first indicating a timeout has occurred.

Initial tests seem fine - when the server is down the timeout kicks in and an error is logged. A few days later, the webserver goes down (classic webserver) but when it comes back up again for some reason no requests are being sent anymore…

See the problem? When a timeout occurs webRequest returns but the goroutine for the request is still running. Eventually resource.sendRequest(req) returns but when the goroutine attempts to send the response down the channel there is no longer anyone on the other end waiting to receive it. Unfortunately when we declared our channel we neglected to specify a size, meaning it is an unbuffered channel, which means there is no room to store a response. So that goroutine will be blocking on the channel forever, waiting for someone to send his response to, questioning his existence in life. Zombie goroutine!

gopher portal sml 2

In a lot of cases this wouldn’t be immediately obvious - you’d generate a new zombie goroutine with each timeout but goroutines are quite cheap (a few kB), you can have tens of thousands of them without problems (it would eventually become a problem of course). In this case however, the line below the send on the channel is quite significant - it frees up a resource slot in our pool. So when timeouts occur we are losing a resource slot we’ll never get back which will quickly exhaust our resource pool.

Leave no gopher behind

Hopefully it’s clear then that in this case the solution is to buffer the channel, creating it with a size of 1.

respChannel := make(chan *Response, 1)

That way when a timeout occurs, webRequest returns and forgets about the request but there is still room in the channel for the request goroutine to place its response and afterwards free its slot in the resource pool.


In closing then, this isn’t an attempt to say that you should always buffer your channels, sometimes you really do want an unbuffered channel to enforce synchronisation between goroutines. Rather that you should keep the size of a channel in mind when you’re using them, as you can see it can be quite important. Don’t let the zombie scourge infect your coding!

Even given this small gotcha using channels I still believe the creators of Go have done a marvelous job of simplifying concurrency for developers. At kreuzwerker we strive to find the right tool for the job and I’ve come across my fair share of different languages. Where in other languages I may have groaned at the thought of debugging code involving threads I actually look forward to the opportunity of using channels in Go.

How has your experience been using Go’s channels? Did this post help you remain zombie free?