Search in sorted lists

It’s widely accepted/taught that binary search is the fastest way to search sorted lists. It has a worst-case complexity of log2(N), only touching 24 elements for N = 10000000.

However, I recently had the idea that on modern architectures the search time should be dominated by those 24 memory accesses, since they are “random” and will thus not be cached. Based on this, it could make sense to implement an n-ary search, that is one that reads n-1 elements at the same time and finds the interval (within n options) for the next recursive/iterative step. This way, 2-ary search would equal standard binary search, while 3-ary search reads 2 elements per step and decides between the 3 thirds of the search space.

My theory was that this could allow the chip / memory architecture to do the fetches in parallel and mitigate the latency. In the worst case, binary search would use 24 steps, with 24 total, consecutive memory accesses for 10 million elements, while “trinary” search would perform 15 steps, accessing 30 elements but always 2 in parallel. “Quadrary” would require 12 steps, access 36 elements with 3 in parallel, and so on.

I implemented all of those (both recursively and iteratively) and a small testbench which generates N random numbers, searches M other random numbers and repeats the whole measurement R times. Before each search, the cache is cleared by a streaming operation.

Here are the results on 4 architectures with N=10000000, M=1000 and R=100:

All benchmarks were compiled with GCC 4.6, using -O3 and -march=native. Sadly, the impact wasn’t as large as I would have hoped, but on the nehalem and power7 systems “trinary” search is repeatably 5+% faster than binary search.

Leave a comment if you’re interested in the benchmark code, it’s a bit of a mess right now.

Easy font rendering in OpenGL

It’s surprising to what degree you are expected to bloat your code base, executable size and/or dependencies when you just want some decent true type font rendering in a cross-platform OpenGL application. Thankfully, Sean Barrett wrote the wonderful stb_truetype.h (and released it to the public domain no less!), which solves the problem in a single (under 2000 line) header file you can just drop into your project.

If you don’t want Visual Studio 11′s debugger to throw you out due to uninitialized values, replace line 1038 in that file with

It should have previously been

By the way, his site also hosts a great and similarly easy to use image reading/writing library.

Windows console stuff

Here’s how to create a windows .exe without a console window, and without having to deal with the WinMain entry point:

  1. Just keep the “Console” subsystem selected in the project properties
  2. Add the following to the file containing your “int main(…)” entry point:

Also helpful: redirecting stdout/stderr using freopen: