New GeDoSaTo Version, Status Update

It’s somewhat traditional (well, since last year) to have a Final Fantasy-related Christmas update.

This time around, I fixed a variety of bugs/missing features which made GeDoSaTo incompatible with Lightning Returns. After doing that, I also quickly discovered a PSHash for hudless screenshots which seems to work pretty well so far. Due to the way the game handles downsampling resolutions, you need to take some care when trying to get hudless downsampled screenshots, look in the .ini for details.

FF13 LR screenshotOther than the strange insistence on having a fixed set of supported resolutions rather than querying the OS Lightning Returns seems to be a pretty good port so far. At the very least, it doesn’t suffer from the random performance issues of the other two games in the trilogy. The gameplay seems fun as well so far.

DX11 / 64 bit status

People still frequently ask about these, so I thought I’d give a status update. For DX11, the vast majority of the relevant APIs are intercepted at this point, what’s missing is porting/reimplementing the support infrastructure required for downsampling and general operation (e.g. the text console, effect loading and application, rendertarget handling etc.) to the API. This is still a non-trivial amount of work.

64 bit support, on the other hand, would mostly just amount to compiling multiple versions of the .dlls, and setting the correct paths in the tool/loader. The only reason I haven’t done it yet is that it seems quite useless without DX11 support, given that 64 bit games are generally also DX11.


The latest version should also fix some other compatibility issues reported over the past few weeks (some, by far not all of them!), and even a few older ones, including installer issues – see the github tracker for details. I just hope it doesn’t introduce new ones, but if it does then do report them on github.

You can get the latest version by updating from within GeDoSaTo or with the installer provided here. And, as always, you can donate to support continuing GeDoSaTo development here.

Happy Holidays everyone!

Implementing your own synchronisation primitives is tricky

This blog post is about the folly of implementing your own synchronization primitives without thinking about what compilers are allowed to do. If you’re not into low-level C/x64 parallelism programming then you can safely skip it ;)

In the Insieme project, we use one double-ended work stealing queue per hardware thread which can be independently accessed (read and write) at both ends. It’s implemented as a circular buffer with a 64 bit control word.

The original code for adding a new item to this queue looked something like this:

Now, this generally worked fine in practice, but in unit tests around ever 21 millionth insertion failed. After chasing a few wrong leads I figured out that setting  newstate  to volatile  fixed the issue. The problem with this, of course, is that it makes no sense. It’s a local variable stored on the stack of the executing thread – it can not be accessed by any other thread.

In the end, to understand the issue, looking into the generated assembler code for both versions was required. Here’s what gcc does in the nonvolatile version:

And here’s the volatile one:

As you can see from the comments in the first version, we started interpreting the assembly from the top. That was a mistake. If you look at the last few lines, you can see the culprit. The line mov QWORD PTR [rdi+8+rax*8], rsi  corresponds to wb->items[newstate.top_update] = wi; . In the non-volatile version, gcc decides to move that line below the unlocking of the data structure. This is a perfectly valid transformation, since there are no dependencies between the two lines (gcc is obviously unaware of any parallelism going on).

There are many ways to fix the issue: add a memory barrier ( __sync_synchronize in gcc), do the assignment using an atomic exchange operation, or if you want to stay in pure C: (wb->items[newstate.top_update] = wi) && (wb->state.top_val = newstate.top_update); . Which is admittedly ugly, and only works since wi is never NULL . Sadly, all of these options have a slight performance penalty. If anyone knows any other portable way to enforce the ordering of operations in this case, I’d be happy to hear about it.

And that’s it, more or less. Lessons learned: take care when implementing your own synchronizations. If you think you are taking care, take more care. And when comparing assembly, look at the obvious differences before starting to interpret the code top down.