Implementing your own synchronisation primitives is tricky

This blog post is about the folly of implementing your own synchronization primitives without thinking about what compilers are allowed to do. If you’re not into low-level C/x64 parallelism programming then you can safely skip it ;)

In the Insieme project, we use one double-ended work stealing queue per hardware thread which can be independently accessed (read and write) at both ends. It’s implemented as a circular buffer with a 64 bit control word.

The original code for adding a new item to this queue looked something like this:

Now, this generally worked fine in practice, but in unit tests around ever 21 millionth insertion failed. After chasing a few wrong leads I figured out that setting  newstate  to volatile  fixed the issue. The problem with this, of course, is that it makes no sense. It’s a local variable stored on the stack of the executing thread – it can not be accessed by any other thread.

In the end, to understand the issue, looking into the generated assembler code for both versions was required. Here’s what gcc does in the nonvolatile version:

And here’s the volatile one:

As you can see from the comments in the first version, we started interpreting the assembly from the top. That was a mistake. If you look at the last few lines, you can see the culprit. The line mov QWORD PTR [rdi+8+rax*8], rsi  corresponds to wb->items[newstate.top_update] = wi; . In the non-volatile version, gcc decides to move that line below the unlocking of the data structure. This is a perfectly valid transformation, since there are no dependencies between the two lines (gcc is obviously unaware of any parallelism going on).

There are many ways to fix the issue: add a memory barrier ( __sync_synchronize in gcc), do the assignment using an atomic exchange operation, or if you want to stay in pure C: (wb->items[newstate.top_update] = wi) && (wb->state.top_val = newstate.top_update); . Which is admittedly ugly, and only works since wi is never NULL . Sadly, all of these options have a slight performance penalty. If anyone knows any other portable way to enforce the ordering of operations in this case, I’d be happy to hear about it.

And that’s it, more or less. Lessons learned: take care when implementing your own synchronizations. If you think you are taking care, take more care. And when comparing assembly, look at the obvious differences before starting to interpret the code top down.

6 thoughts on “Implementing your own synchronisation primitives is tricky

  1. Hi, durante sorry for post this here, but there’s no way to contact with you, it is possible to make a mod for “The Walking Dead : Survival Instinct” like the Dark Souls mod? i mean, to edit textures and use our own textures, if we can retexture that game, it can be awesome, cause the main problem of that game is the texture work, but if we can make HD textures… It can be awesome, hope you see this and can answer me :(

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">