Benchmarking Go and C# async performance

Benchmarking Go and C# async performance

One large problem with async/await-based coroutines is that they are not very fast when compared with properly implemented fibers (aka “lightweight threads” or “goroutines”).

This is easy to demonstrate with a simple program:

package main

import (
	"log"
	"time"
)

func Calculate(val int) int {
	if val == 0 {
		return 0
	}
	res := Calculate(val - 1)
	res += 1
	return res
}

func main() {
	start := time.Now()
	var num int
	for i := 0; i < 1000000; i++ {
		num += Calculate(20)
	}
	diff := time.Now().Sub(start)
	log.Printf("Val: %d, %d(ms)", num, diff.Milliseconds())
}

Its output:

cyberax@CybArm:~/simp/bench$ go run test.go
2023/03/16 17:14:47 Val: 20000000, 32(ms)

I'm going to use C# to implement the coroutine-based version:

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;

namespace ChannelsTest
{
    class Program
    {
        static async Task<int> Calculate(int val) {
                if (val == 0) {
                        return 0;
                }
                int res = await Calculate(val - 1);
                res += 1;
                return res;
        }


        static void Main(string[] args)
        {
                int num = 0;
                var sw = new Stopwatch();
                sw.Start();
                for (var i = 0; i < 1000000; i++) {
                        num += Calculate(20).Result;
                }
                sw.Stop();
                Console.WriteLine($"Result is: {num}, {sw.Elapsed.TotalMilliseconds:0.000}ms");
        }
    }
}

Its output:

cyberax@CybArm:~/simp/bench$ dotnet run -c Release
Result is: 20000000, 492,313ms

This is caused by the C# version having to allocate a heap object to store the stack frame for each level in the Calculate function. In contrast, Go simply uses a normal stack, that can be grown as needed.

Discuss...