Concurrency (10,000 ft view)

A concurrent process is a process that is capable of running along side other processes on the same computer. Asynchronous programming allows us to use threads or forked processes to execute some task in the background while we continue doing other things in our main program thread.

Note: this is an attempt at a simple breakdown of a very complex process on traditional async programming

Traditionally programs that have multiple threads defer the execution of those threads to the kernel scheduler, which is a part of the operating system on the machine where your program is running. The kernel will decide when your new process gets ran and make sure all of the threads are in the appropriate running, sleeping or closed state. Each time a new thread is spawned a new runtime stack is created to track the values of that thread as if it were a new process, the same kind of stack that would be created if you ran two instances of your program on the same machine. These thread stacks all get their own fixed size memory address space on the machine, so they are all independent from one another. Threads have their own resources, and as far as the kernel knows they are different programs entirely.

Now enter the Goroutine…

Goroutines

Now that you know some basics about threads, memory and the kernel scheduler, let’s talk about goroutines and how they differ from threads.

A goroutine is a some function or task that is capable of running concurrently, just like a thread or concurrent process. Goroutines are lightweight and resource inexpensive to create. Similar to threads, each go routine gets its own thread stack. However, goroutine threads do not have a fixed stack size, they start out very small (a few kilobytes) and they can grow as needed when the program is running. A Go program can spawn hundreds of these threads, and because they are so small and adaptive the threads will not be as memory intensive as you would think. All goroutines for a single program execute in a shared pool of memory address space. Because they share this address space, our goroutines can communicate with each other using something called channels (more about those later).

Something particularly interesting is that goroutines are managed by the Go runtime, not by the kernel, so Go itself maintains the context of which threads are going to run and when, kinda. The kernel still maintains and schedules when the go runtime is executing so

NOTE: every main function in go is an implicit goroutine

Make a goroutine

To create a goroutine we use the go keyword followed by the function invocation. To make a goroutine for function f(), the code would be go f() and then an async goroutine would be created to execute the f() function.

To witness goroutines in action here is an example with a main function and a sub function f() that will be called from goroutines: The main function has a loop that will call a new goroutine to execute function f() for every loop iteration. The goroutines will all begin to execute f() independent from the main goroutine.

f() is a function that accepts an int value named n and uses that value to execute a new loop with n iterations. Inside that loop it prints out some data about n and which iteration of the loop we are on. When we call f() we can give in the value n as the number iteration for our main loop, this will identify each goroutine for visibility.

important concept: since main would create all of these goroutines and have nothing left to do, it would exit causing all of the other goroutines to be terminated. We are going to prevent this by adding this little bit of code.

var input string
fmt.Scanln(&input)

The above will halt the main goroutine waiting for some user inputs, we are using it to pause the main goroutine so we can see the other goroutines complete. The main goroutine is the most important task to the program, when it is completed all of the goroutines it created are destroyed regardless of if they completed.

package main

import (
  "fmt"
  "math/rand"
  "time"
)

// this function will be called as a goroutine
// the goal is for every running go routine to print something 10 times
func f(n int) {
  for i := 0; i < 10; i++ {
    fmt.Println("goroutine", n, ":", "iteration", i)
  }
}

// main is an implicit goroutine
func main() {
  // create 10 separate go routines
  for i := 0; i < 10; i++ {
    // call F as a goroutine i times
    go f(i)
  }

  // main goroutine halting 
  // the below code is only to keep the main thread from ending
  var input string
  fmt.Scanln(&input)
}

Pay attention to the output, you can see that all the goroutines start at almost the exact same time and begin to be scheduled to output their values. Notice that there is not distinct order to the goroutines execution. You might would think that goroutine 0 would run all of its executions first, but that is not the case. Since goroutines run concurrently, whichever operation gets scheduled first gets executed first or a “first one there gets to do its task”. If you run this program multiple times, the order of outputs will vary almost every time.

output:

goroutine 0 : iteration 0
goroutine 0 : iteration 1
goroutine 0 : iteration 2
goroutine 0 : iteration 3
goroutine 0 : iteration 4
goroutine 0 : iteration 5
goroutine 0 : iteration 6
goroutine 0 : iteration 7
goroutine 0 : iteration 8
goroutine 2 : iteration 0
goroutine 9 : iteration 0
goroutine 9 : iteration 1
goroutine 9 : iteration 2
goroutine 9 : iteration 3
goroutine 9 : iteration 4
goroutine 9 : iteration 5
goroutine 9 : iteration 6
goroutine 9 : iteration 7
goroutine 9 : iteration 8
goroutine 9 : iteration 9
goroutine 1 : iteration 0
goroutine 7 : iteration 0
goroutine 7 : iteration 1
goroutine 7 : iteration 2
goroutine 1 : iteration 1
goroutine 1 : iteration 2
goroutine 1 : iteration 3
goroutine 7 : iteration 3
goroutine 7 : iteration 4
goroutine 7 : iteration 5
goroutine 7 : iteration 6
goroutine 7 : iteration 7
goroutine 7 : iteration 8
goroutine 7 : iteration 9
goroutine 8 : iteration 0
goroutine 8 : iteration 1
goroutine 8 : iteration 2
goroutine 8 : iteration 3
goroutine 8 : iteration 4
goroutine 8 : iteration 5
goroutine 8 : iteration 6
// cut off for length, but you get the idea

The goroutines are executing whenever the Go runtime decides that they should execute, but what if we need data or values from one goroutine to complete before another goroutine can execute? We can use another concurrent concept in go called Channels to synchronize goroutine execution.

Synchronizing Goroutines Using Channels

Channels provide a way for two goroutines to communicate with one another and synchronize their execution. To roughly understand channels think of a relay race, one runner will pass off a baton to the other runner, the second runner cannot go anywhere until the get the baton from the previous runner. Channels accept type properties, so each channel is built to accept a value of whatever type you give it. This feature is used to communicate from one goroutine to another, so this is what our baton is made of if we are relay runners. One runner (goroutine) will run their leg of the race (complete some task) and hand off the baton (type value added to the channel) to the next runner (next goroutine using the channel). If the person who is going to receive the baton from the first runner is not present, then the first runner cannot continue the race for them. Channels are blocking for all goroutines, so when the goroutine that is adding a value to the channel or the goroutine receiving from the channel are not present then neither can advance to anything else.

We declare channels with the make() function, so to make a channel that will use a string value, type:

var c chan string= make(chan string)

Here is an example program using channels:

There is a main function that will spawn 3 goroutines: 1. pinger, this function will add the value “ping” to a channel 2. ponger, this function will add the the value “pong” to a channel 3. printer, will be listening to the channel, and print whatever string value is added to the channel.

Our main thread will create the string channel, and the functions will start executing or listening to the channel.

package main

import (
  "fmt"
  "time"
)

// function to add content to the channel
func pinger(c chan string) {
  for i := 0; i < 5; i++ {
    // syntax to add content to the channel
    // this will wait until the receiver is ready (blocking)
    c <- "ping"
  }
}

// functional the same as pinger(c chan string)
// we will put a different value on the channel to demonstrate
// which go routine has used the channel
func ponger(c chan string) {
  for i := 0; i < 5; i++ {
    c <- "pong"
  }
}

// print data coming from the channel
func printer(c chan string) {
  for i := 0; i < 10; i++ {
    // pull content from the channel into a variable
	  // unblocks a waiting goroutine
    msg := <-c
    // print the value
    fmt.Println(msg)
    time.Sleep(time.Second * 1)
  }
}

func main() {
  // shared channel defined at the top level
  var c chan string = make(chan string)

  go pinger(c)
  go ponger(c)
  go printer(c)

  // pause the main goroutine
  var input string
  fmt.Scanln(&input)
}

output:

ping
pong
ping
pong
ping
pong
ping
pong
ping
pong
// cut short for length

Let’s step through the main function to follow what happened to cause this output:

  1. channel is declared at top level of scope
  2. ping goroutine is created
  3. pong goroutine is created
  4. printer goroutine is created
  5. ping will be ready to put something on the channel, but waiting for a receiver to take the value from the channel
  6. printer will hit the channel ready to receive
  7. ping will put "ping" on the channel, and the printer will receive that value. then print it out
  8. pong will be immediately ready to put something on the channel once ping has moved out of the way
  9. printer will go back to the channel ready to receive
  10. pong will put the value "pong" on the channel, and the printer will receive that value. then print it out
  11. ping will be waiting to write to the channel the instant pong is finished
  12. This will continue until the iterations are over, printing out
    1. ping
    2. pong
    3. ping
    4. pong
    5. …… etc, 5 times each

The go runtime will determine their execution order, but what if we want to determine that too?

Goroutine priority?

What if we took the same example above and had multiple goroutines that are all blocked waiting to send to a channel, How is priority established and can we confirm that the same coroutines will run in the same order every time?

The example will use the same concept as above, but with more functions.

package main

import (
  "fmt"
  "time"
)

// function to add content to the channel
func pinger(c chan string) {
  for i := 0; i < 5; i++ {
    // syntax to add content to the channel
    // this will wait until the receiver is ready (blocking)
    c <- "ping"
  }
}

func ponger(c chan string) {
  for i := 0; i < 5; i++ {
    c <- "pong"
  }
}

func blooper(c chan string) {
  for i := 0; i < 5; i++ {
    c <- "bloop"
  }
}

func boper(c chan string) {
  for i := 0; i < 5; i++ {
    c <- "bop"
  }
}

// print data coming from the channel
func printer(c chan string) {
  for i := 0; i < 20; i++ {
    // pull content from the channel into a variable
    msg := <-c
    // do something with the value
    fmt.Println(msg)
    time.Sleep(time.Second * 1)
  }
}

func main() {
  // shared channel defined at the top level
  var c chan string = make(chan string)

  go pinger(c)
  go ponger(c)
  go blooper(c)
  go boper(c)
  go printer(c)

  // pause the main goroutine
  var input string
  fmt.Scanln(&input)
}

output:

ping
pong
bop
bloop
ping
pong
bop
bloop
ping
pong
bop
bloop
ping
pong
bop
bloop
ping
pong
bop
bloop

Running the above code typically outputs the values in the same order, however it is important to understand that that is not a guarantee. The only entity with power over the priority of goroutines is the Go runtime, and we have to use channels to synchronize events the way we would like them to execute, and since all the above goroutines get the channel around the same time then no order will be guaranteed. We can make more use of channels to help ensure we get the outcomes we need.

Channel Direction

We can specify a direction on a channel type thus restricting it to either sending or receiving. For example pinger’s function signature can be changed to this: func pinger(c chan<- string) Now c can only be sent to. Attempting to receive from c will result in a compiler error. Similarly we can change printer to this: func printer(c <-chan string) A channel that doesn’t have these restrictions is known as bi-directional. A bi-directional channel can be passed to a function that takes send-only or receive-only channels, but the reverse is not true.

Select statements

Go has a special statement called select which works like a switch but for channels:

func main() {

	// two channels
  c1 := make(chan string)
  c2 := make(chan string)

// will send every 2 seconds
  go func() {
    for {
      c1 <- "from 1"
      time.Sleep(time.Second * 2)
    }
  }()

// will send every 3 seconds
  go func() {
    for {
      c2 <- "from 2"
      time.Sleep(time.Second * 3)
    }
  }()

// will block (without a default) until something is ready to send
  go func() {
    for {
      select {
      case msg1 := <- c1:
        fmt.Println(msg1)
      case msg2 := <- c2:
        fmt.Println(msg2)
      }
    }
  }()

  var input string
  fmt.Scanln(&input)
}

Output:

from 1
from 2
from 1
from 2
from 1
from 2
from 1
from 1
from 2
from 1
from 2
from 1

This program prints “from 1” every 2 seconds and “from 2” every 3 seconds. Select picks the first channel that is ready and receives from it (or sends to it). If more than one of the channels are ready then it randomly picks which one to receive from. If none of the channels are ready, the statement blocks until one becomes available.

The select statement is often used to implement a timeout:

select {
case msg1 := <- c1:
  fmt.Println("Message 1", msg1)
case msg2 := <- c2:
  fmt.Println("Message 2", msg2)
case <- time.After(time.Second):
  fmt.Println("timeout")
}

Output:

Message 1 from 1
Message 2 from 2
timeout
Message 1 from 1
Message 2 from 2
timeout
Message 1 from 1
timeout

time.After creates a channel and after the given duration will send the current time on it. (we weren’t interested in the time so we didn’t store it in a variable) We can also specify a default case:

select {
case msg1 := <- c1:
  fmt.Println("Message 1", msg1)
case msg2 := <- c2:
  fmt.Println("Message 2", msg2)
case <- time.After(time.Second):
  fmt.Println("timeout")
default:
  fmt.Println("nothing ready")
}

Output:

Message 1 from 1
Message 2 from 2
nothing ready
nothing ready
Message 1 from 1
nothing ready
Message 2 from 2
nothing ready
Message 1 from 1
nothing ready
nothing ready
Message 1 from 1
Message 2 from 2
nothing ready
nothing ready
Message 1 from 1
nothing ready
nothing ready
Message 2 from 2
Message 1 from 1
nothing ready

The default case happens immediately if none of the channels are ready.

Buffered Channels

It’s also possible to pass a second parameter to the make function when creating a channel: c := make(chan int, 1) This creates a buffered channel with a capacity of 1. Normally channels are synchronous; both sides of the channel will wait until the other side is ready. A buffered channel is asynchronous; sending or receiving a message will not wait unless the channel is already full.

Closing notes

There is still a lot more to cover, this is just scratching the surface of concurrency. For some more info check out this talk by one of the Go creators Rob Pike: https://www.youtube.com/watch?v=cN_DpYBzKso