A guide to writing real-life thread-safe code

widget

When writing code that will be used by multiple processes simultaneously, it is important to make sure that code is thread-safe so that your application functions properly. In this shot, I will explain thread-safety by showing a real-life, thread-unsafe code and explaining ways to make it thread-safe.

How do we know if code is thread-safe?

We can tell that code is thread-safe if it only uses and updates shared resources in a way that guarantees safe execution by multiple threads at the same time. Shared resources can be a counter variable, an array, or anything else.

What does code look like when it’s not thread-safe?

In this example, I won’t show the conventional counter++ example shown in textbooks. Instead, I’ll show an example that is more relatable and can literally happen to anyone writing production code.

class Store
{
private List<string> storeProducts;
public Store()
{
storeProducts = new List<string>();
}
async Task<string> GetOrAddProduct(Product product)
{
if(storeProducts.Contains(product.Name))
{
return product;
}
var token = HelperClass.GenerateTokenOrSomething();
await HelperClass.UploadProductDetails(token, product);
storeProducts.Add(product.Name);
return product.Name;
}
}

This example is in C#, but regardless of the programming language you’re working with, the concept remains the same.

The code above looks fine when you’re in a single-threaded environment. However, in a multithreaded or distributed environment, where multiple processes call your code simultaneously, this could actually be very dangerous. Let me explain why.

In the case where we have 3 processes calling GetOrAddProduct simultaneously, the scenario described below could happen:

  • Process A & Process C want to get or add Product A to the dictionary.
  • Process B wants to get or add Product B to the dictionary.
  • All three processes are started simultaneously.
  • Process B gets to line 12 and sees that Product B doesn’t exist. It then jumps to line 16, gets a token, and goes to line 17 to upload the product. The upload process takes a long time, so, while Process B is still uploading…
  • Process A gets to line 12 and sees that Product A doesn’t exist. It then jumps to line 16, gets a token, and then goes to line 17 to upload the product. The upload process takes a long time, so, while Process A is still uploading…
  • Process B is now done, adds Product B to the dictionary, and exits the method.
  • Process C gets to line 12 and sees that Product A doesn’t exist. It then jumps to line 16, gets a token, and then goes to line 17 to upload the product. The upload process takes a long time, so, while Process C is still uploading…
  • Process A is now done – it adds Product A to the dictionary and exits the method.
  • Process C is now done – it adds Product A to the dictionary and exits the method.

In this scenario, two things can go wrong (and possibly have):

  • Product A has been uploaded twice (or the second upload threw an Exception depending on how your upload logic is set up).
  • Product A has been added to the list twice, so the size of the list is three instead of two.

This is called a race condition. In scenarios like this, we might be tempted to replace

storeProducts.Add(product.Name); on line 18 with:

if(!storeProducts.Contains(product.Name)){
storeProducts.Add(product.Name);
}

However, this is not really scalable because you could decide to do this check based on the fact that this code is adding to a list. Imagine instead that we had something like the code snippet below:

class Store
{
private double revenue;
private List<string> storeProducts;
public Store(RevenueGenerator generator)
{
revenue = generator.GenerateCurrentRevenue();
storeProducts = new List<string>();
}
async Task UpdateStoreRevenue(Product product)
{
if(!storeProducts.Contains(product.Name))
{
var token = HelperClass.GenerateTokenOrSomething();
await HelperClass.UpdateStoreRevenue(token, product);
revenue += product.Price;
storeProducts.Add(product.Name);
}
}
}

In the code snippet above, we are updating store revenue before adding the products to our list, and there’s no direct way of checking if we’ve added a price to the overall revenue. This could be a disaster – imagine if a customer’s product worth ($400,000,000.00) gets added twice. Audio money? Now, that’s a problem.

A more scalable solution

The more scalable solution is to write thread-safe code by adding synchronization to the part of your code that isn’t thread-safe. This helps protect access to shared resources. If a process owns a lock, then it can access the protected shared resource. If a process does not own the lock, then it cannot access the shared resource.

In our previous example, since Process B gets to the area of the unsafe code first, it acquires the lock and keeps executing. When Process B is done executing, it should release the lock for other processes. If Process A or C tries to acquire the lock when Process B is not done, it will have to wait.

There are a bunch of lockable objects, but I will be explaining a mutex:

Mutex

A mutex is the short form of MUTual EXclusion. A mutex can be owned by one thread at a time. If we had to use a mutex to fix our code, it would look like this:

class Store
{
private static Mutex mutex = new Mutex();
private List<string> storeProducts;
public Store()
{
storeProducts = new List<string>();
}
async Task<string> GetOrAddProduct(Product product)
{
try
{
mutex.WaitOne(); // controls access to code that isn't thread-safe
if(storeProducts.Contains(product.Name))
{
return product;
}
var token = HelperClass.GenerateTokenOrSomething();
await HelperClass.UploadProductDetails(token, product);
storeProducts.Add(product.Name);
}
finally
{
mutex.ReleaseMutex();
return product.Name;
}
}
}

The code is wrapped in a try-finally block because, regardless of what happens in our code, we want the code in the finally block to execute. If we do not wrap this in a try-finally block and HelperClass.UploadProductDetails(token, product); throws an exception, the lock is never released, which could cause a deadlockanother concurrency problem. In basic terms, a deadlock means that processes waiting for a particular resource are blocked indefinitely.

Conclusion

There are other ways to write thread-safe code in distributed or multithreaded environments. If you’d like to know more, I found a tutorial series that talks extensively about concurrency problems and fixing thread-safety issues.

Attributions:
  1. undefined by undefined