Hexer – A hex editor

Way back in July I was doing some simple binary data file reverse engineering, and got annoyed at the fact that I couldn’t find a free hex editor for Windows which offered all of the following:

  • Seeing all interpretations of data (e.g. various integer types, floats and text) while hovering over some binary data.
  • Searching data of a given type (e.g. uint32) quickly and conveniently in different numerical formats (e.g. decimal or hex).
  • Marking identified pieces of data with their type and some notes.

I thought “it can’t be that hard now, can it?” and set out to create my own. After some deliberation I choose C#/Winforms to implement it, simply because the tool support is second to none and I didn’t want to waste more time on UI stuff than strictly necessary.

After spending a few hours on it way back then, and finally a few more today, it has turned into quite a usable (but far from complete or user-friendly) program. I called it Hexer, which is both appropriate in English and also means “Warlock” (or even “Witcher”, literally) in German.

It has all the features which I was missing:

  • The pane on the left shows various interpretations of selected and hovered-over data.
  • You can easily enter different types of numerical addresses written in any C-style string format (e.g. “161″ searches for the decimal number 161, “0xFF” searches for decimal 255 and 010 searches for decimal 8).
  • As seen in the screenshot, you can mark ranges of data with some data type, and see the in-line interpretation of that data in the file. These markers can be stored and loaded independently of the files they apply to.

I put the source up on GitHub here, and here’s an executable if you want to give it a try. (You’ll need the right version of the .net Runtime of course)

Of the entire implementation I like this part best, which is a simple descriptive listing of all the data types, including their properties (such as name and size) as well as the ability to convert values of that type to and from raw binary and strings. It’s succinct, easy to extend (both with new data types and new meta-information about them), and many of the UI elements are generated directly from that list.

Hexer is far from fully-featured – I put up a short list of TODOs in the Github readme, but there’s a lot more which could (and should) be done.

Dark Dreams Don’t Die (D4) Alternative Launcher

D4 PC released (on Steam and elsewhere) recently. The game uses UE3, so it works well on PC, and the mouse controls are also well done. The only problem is that the built-in launcher presents a fixed list of resolutions rather than all available ones, and also somehow messes up with DPI scaling.

I wrote a new Launcher in C# which doesn’t have this issue. It’s available here.

I also made the source code available on Github. Not because I’m very proud of it or because it’s particularly interesting (it’s just a simple Winforms C# app written in an hour or so), but because I hope if I put it in the public domain it could help improve this state of affairs in the future.

Well, that’s all.

Graveyard of Forgotten Projects (Part 2)

This is a direct continuation of the first part. Let this partial quote from there serve as an introduction:

This series (hopefully) of blog posts will gather and shortly describe some previously unreleased and/or forgotten projects I’ve worked on over the past decade and a half, starting from 2000. This is by no means a complete listing, I’m still skipping over the vast majority of unfinished ideas and half-baked projects, but I’m trying to include everything which worked to an extent or has some interesting features.

The primary purpose of this is to serve as an archive for myself, because, as you will notice when I go through the list, I’ve already lost a lot of somewhat interesting stuff and would rather not have that happen again. Perhaps one thing or another might also be useful to someone else, but I obviously won’t troubleshoot code I wrote more than a decade ago and haven’t touched since then! And, just to make that clear, it’s also obviously not indicative of my current skills.

I stopped the previous post around 2003-2004, a timeframe of abortive attempts at creating a 3D multiplayer physics-driven space ball game.


I wasn’t sure where to slot this in, as I started this project in ~2002, but did a lot of work on it in 2005. I chose to go with something closer to the latter than the former. This is an Arkanoid clone written in Ruby. It used the RUDL SDL wrapper (another amazing example of my naming sense really), and initially did software rendering. Later on I switched to using OpenGL for the rendering for performance reasons, which is why the final 2005 iteration was called RudloidGL.

What is interesting about this project is that it’s actually written rather well, in some ways at least. For example, I implemented features like collision detection, sound effects and sprite animation as Ruby mixins, and they are re-used in every object where it makes sense. So the balls, paddle, blocks etc. use the same basic collision code with distinct response events, and animated blocks, paddles and balls use the exact same sprite animation code just with slightly different parameters.

Of course, there are also some things were I was maybe going a bit too far with trying to be smart. For example, I used the “evil” library (yes, that’s actually its name) to implement different ball/paddle/etc. types and changing between them (when picking up a powerup pill) by actually changing the class type. It works, and in a way it’s elegant, but it’s also a bit wild.

Talking about powerups, that’s actually another thing where the code for this game is a lot better already than most of what I discussed in the first post in this series. For example, I seem to have had attained a decent understanding of the “once and only once” principle at this point, and the value of setting up tools in the code base so that it can be effectively used. For example, this is the actual, entire definition code for all 12 powerup pills in the game:

While eval’ing some random strings would make my hair stand on end now, at least the heart for this code is in the right place, and I still appreciate how concise it is.

The game also features a level editor, which nicely reuses most of the actual game rendering and related code. It also allows for undo/redo and copy/paste operations.

On the whole, this isn’t too bad really. What was bad, no, terrible are the sound effects. I never spent much time on those and it really, really shows. Check it out for yourself in this Shadowplay video I took of the game (after spending literally 2 hours finding all the dependencies in compatible versions and getting them to work together).

I will not upload the .zip for this, since I don’t think I am actually allowed to distribute some of the music and sound effects I used. If anyone cares a lot, post a comment and I’ll try to create a working package without those.


The imaginatively-named “Wasser” (German for Water) is a very small program I hacked up in early 2005 in order to generate daily data fitting a decent-looking curve from monthly data about the electricity production from hydroelectric power plants. Until I made this, that was apparently a process carried out manually in Excel (I shit you not).

wasser2There are two interesting things about this one:

  • It uses both Tk for the UI window and SDL (via RUDL) for the visualization window. That’s insane, but I guess I just used what I knew, and it works.
  • I came up with the method of how to distribute data across the days of each month on my own, without really knowing how one would actually solve such a thing correctly. Nonetheless, what I do in the end isn’t too different from what an explicit iterative solver would look like.

You can get it here, though I have no idea why you’d want to.


Released (actually released!) Christmas 2005, Ragex is a tiny generator for XHTML. I don’t need to write much about it here, because the web page is still online, I’m just including it for completeness’ sake. The best part about this one was when I got an email from someone actually using it, in 2008 or so, when I had almost forgotten creating it.


After my 3D experiments in 2004, and other game projects that never quite reached completion, I really wanted to actually finish a game. That’s why I came up with a very simple action/puzzle principle, and Crystalise was born (also check out the help file for a visual explanation of its gameplay).

Here’s a bullet point summary:

  • It’s made for PSP, the hottest gaming device of 2005 ;)
  • The vast majority of the game is written in Lua, with some of the heavy lifting (rendering, collision detection) in C. Pretty modern in that regard!
  • As far as I can tell, I completely lost the entire C source code for this project, as well as all the original (non-distribution) assets. A valuable lesson about backups, and even more so the value of releasing the source. The lua code at least is included in the distribution.
  • It really is a complete game, including stuff like a main menu, difficulty levels and high score tracking.
  • The levels (including boss and special levels) are procedurally generated, probably one reason why I actually managed to finish this game.
  • If you hit a certain level (which probably no one except me ever did) the game increases the PSP clock speed to 300 MHz in order to keep up solid 60 FPS:
  • I believe I invented the now commonly used paradigm for writing text with a analog gamepad in this game. That is, selecting a direction with the analog stick and then selecting a character from that direction using a button:

This is one of those projects where I’m really sad I lost (part of) the source. At least in this case most of the Lua is still available. And some of it is pretty neat, check e.g. circle.lua which (despite the name) keeps track of and handles updates to the current state of the level.


This is a C#-based program launching (and other stuff) tool that I’m still quite proud of. I made and released it in early 2006, and the web page is still online. I just downloaded and ran the 54 kB package, and it still works well on my current Windows 7 installation.The major reason I created this was because I was fed up with the Windows (pre-Vista) way of launching programs. Of course, Microsoft seemingly agreed, and the keyboard / string search driven way of using the start menu in Vista onwards pretty much made LuXr obsolete.

Still, there are a few things to like about this one:

  • The aesthetics for the startup effect were influenced by the OS in Serial Experiments Lain. Which was much more awesome of course.
  • It features a plugin system which dynamically loads plugins written in C#, which can have their own config dialogue pages (they are adressed by adding a prefix to what you type, e.g. “g something” to google “something” with the google plugin, or “e” to use the eval plugin:
  • I still prefer how this resolves substring matches comapred to the Windows Start menu search. For example, if I write “vis 13″ in the start menu on my current PC, it finds no matches. LuXr finds “Visual Studio 2013″.

What’s not to like is that I believe I lost the source for this one as well (except for the plugins, which are part of the distribution package). It seems that with the years progressing I got better about actually releasing stuff from time to time, but worse about keeping backups.


That’s it for today, next time around we’ll get to actual 3D graphics programming. WHOOOHOO!

Wrapper_gen, a wrapper generator for COM interfaces

DSfix was based on a Direct3D9 wrapper, which was mostly taken from an existing code base and extended manually.

Recently, I’ve needed to hook Direct3D9Ex, and came to the conclusion that the manual busy work of writing the initial wrapper is better left to a computer than a human. Therefore, I wrote a Ruby script which takes a Microsoft COM dll header interface specification, and generates the C++ code for a wrapper class for it.

Here’s the script (wrapper_gen.rb), it’s rather tiny:

To use it, you specify the interface name, input header file, output file base name, and optionally whether you want logging information to be generated for each wrapped method.

For example, ruby wrapper_gen.rb IDirect3DTexture9 d3d9.h d3d9tex true would generate a wrapper for the IDirect3DTexture9 interface, get the information from d3d9.h, and store the generated wrapper on d3d9tex.h and d3d9tex.cpp. The implementations for the latter would include logging.

Here’s are the generated files for this test case.



You can adjust the code generated for the logging in the Ruby script. As you can see, this can save you a lot of rote work, particularly if you want to intercept multiple large interfaces.


The original script didn’t deal with unnamed function parameters correctly. Now it should.


C++11 chrono timers

I’m a pretty big proponent of C++ as a language, and particularly enthused about C++11 and how that makes it even better. However, sadly reality still lags a bit behind specification in many areas.

One thing that was always troublesome in C++, particularly in high performance or realtime programming, was that there was no standard, platform independent way of getting a high performance timer. If you wanted cross-platform compatibility and a small timing period, you had to go with some external library, go OpenMP or roll your own on each supported platform.

In C++11, the chrono namespace was introduced. It, at least in theory, provides everything you always wanted in terms of timing, right there in the standard library. Three different types of clocks are offered for different use cases: system_clock ,  steady_clock  and high_resolution_clock.

Yesterday I wrote a small program to query and test these clocks in practice on different platforms. Here are the results:

So, sadly everything is not as great as it could be, yet. For each platform, the first three blocks are the values reported for the clock, and the last block contains values determined by repeated measurements:

  • “period” is the tick period reported by each clock, in nanoseconds.
  • “unit” is the unit used by clock values, also in nanoseconds.
  • “steady” indicates whether the time between ticks is always constant for the given clock.
  • “time/iter, no clock” is the time per loop iteration for the measurement loop without the actual measurement. It’s just a reference value to better judge the overhead of the clock measurements.
  • “time/iter, clock” is the average time per iteration, with clock measurement.
  • “min time delta” is the minimum difference between two consecutive, non-identical time measurements.

On Linux with GCC 4.8.1, all clocks report a tick period of 1 nanosecond. There isn’t really a reason to doubt that, and it’s obviously a great granularity. However, the drawback is that it takes around 120 nanoseconds on average to get a clock measurement. This would be understandable for the system clock, but seems excessive in the other cases, and could cause significant perturbation when trying to measure/instrument small code areas.

On Windows with VS12, a clock period of 100 nanoseconds is reported, but the actual measured tick period is a whopping 1000000 ns (1 millisecond). That is obviously unusable for many of the kind of use cases that would call for a “high resolution clock”. Windows is perfectly capable of supplying a true high resolution clock measurement, so this performance (or lack of it) is quite surprising. On the bright side, a measurement takes just 9 nanoseconds on average.

Clearly, both implementations tested here still have a way to go. If you want to test your own platform(s), here is the very simple program:


Implementing your own synchronisation primitives is tricky

This blog post is about the folly of implementing your own synchronization primitives without thinking about what compilers are allowed to do. If you’re not into low-level C/x64 parallelism programming then you can safely skip it ;)

In the Insieme project, we use one double-ended work stealing queue per hardware thread which can be independently accessed (read and write) at both ends. It’s implemented as a circular buffer with a 64 bit control word.

The original code for adding a new item to this queue looked something like this:

Now, this generally worked fine in practice, but in unit tests around ever 21 millionth insertion failed. After chasing a few wrong leads I figured out that setting  newstate  to volatile  fixed the issue. The problem with this, of course, is that it makes no sense. It’s a local variable stored on the stack of the executing thread – it can not be accessed by any other thread.

In the end, to understand the issue, looking into the generated assembler code for both versions was required. Here’s what gcc does in the nonvolatile version:

And here’s the volatile one:

As you can see from the comments in the first version, we started interpreting the assembly from the top. That was a mistake. If you look at the last few lines, you can see the culprit. The line mov QWORD PTR [rdi+8+rax*8], rsi  corresponds to wb->items[newstate.top_update] = wi; . In the non-volatile version, gcc decides to move that line below the unlocking of the data structure. This is a perfectly valid transformation, since there are no dependencies between the two lines (gcc is obviously unaware of any parallelism going on).

There are many ways to fix the issue: add a memory barrier ( __sync_synchronize in gcc), do the assignment using an atomic exchange operation, or if you want to stay in pure C: (wb->items[newstate.top_update] = wi) && (wb->state.top_val = newstate.top_update); . Which is admittedly ugly, and only works since wi is never NULL . Sadly, all of these options have a slight performance penalty. If anyone knows any other portable way to enforce the ordering of operations in this case, I’d be happy to hear about it.

And that’s it, more or less. Lessons learned: take care when implementing your own synchronizations. If you think you are taking care, take more care. And when comparing assembly, look at the obvious differences before starting to interpret the code top down.