Dark Souls internal rendering resolution fix

I’ll add more internal details here soon.

Check out a general overview at GAF.

Also, this is the donation link in case you want to donate.

Download Link

Important: turn off in-game AA (blur filter)

 

Known Issues

  • Message text is not rendered correctly. I will work on this tomorrow.
  • Does not work with the in-game AA (but that’s just a blur filter anyway)
  • Bad framerate reported on AMD cards, NV seems fine

 

Nvidia RGB Full/limited range toggler

I recently had a problem where I simply couldn’t get my NV GPU to supply full range RGB (0-255) over HDMI when the resolution was either 720p or 1080p at 59.94 Hz.
Apparently this has been a known problem for years, and the only reliable solution was to edit your driver .ini files before installation.

After some digging through obscure NV support posts I managed to find the registry keys that control this behaviour, and I implemented a small tool to switch all graphics modes between full range and limited range. It’s not as convenient as a driver level toggle, but it fixed my problem and since NV hasn’t acted on this for years I don’t expect them to do so any time soon.

Here’s the binary: NV_RGBFullRangeToggle

And here’s the source for anyone interested: NV_RGBFullRangeToggle_Source

It’s a very simple program, basically it does this:

 

Dll injection and error 0xc0000007b

I’m currently writing a DirectX wrapper dll (before you ask, it’s purpose is to improve graphics, not to cheat) and had the issue that most programs I tried it with immediately failed with error 0xc0000007b before even executing the dll entry point.

Many posts on the internet will point to a 64/32 bit version mismatch, which is probably the case for most end users. But in my case, the reason was much simpler: I tried to load a .dll built in debug mode. As most executable are likely shipped in release mode that leads to the incompatibility. Just something to keep in mind.

Search in sorted lists

It’s widely accepted/taught that binary search is the fastest way to search sorted lists. It has a worst-case complexity of log2(N), only touching 24 elements for N = 10000000.

However, I recently had the idea that on modern architectures the search time should be dominated by those 24 memory accesses, since they are “random” and will thus not be cached. Based on this, it could make sense to implement an n-ary search, that is one that reads n-1 elements at the same time and finds the interval (within n options) for the next recursive/iterative step. This way, 2-ary search would equal standard binary search, while 3-ary search reads 2 elements per step and decides between the 3 thirds of the search space.

My theory was that this could allow the chip / memory architecture to do the fetches in parallel and mitigate the latency. In the worst case, binary search would use 24 steps, with 24 total, consecutive memory accesses for 10 million elements, while “trinary” search would perform 15 steps, accessing 30 elements but always 2 in parallel. “Quadrary” would require 12 steps, access 36 elements with 3 in parallel, and so on.

I implemented all of those (both recursively and iteratively) and a small testbench which generates N random numbers, searches M other random numbers and repeats the whole measurement R times. Before each search, the cache is cleared by a streaming operation.

Here are the results on 4 architectures with N=10000000, M=1000 and R=100:

All benchmarks were compiled with GCC 4.6, using -O3 and -march=native. Sadly, the impact wasn’t as large as I would have hoped, but on the nehalem and power7 systems “trinary” search is repeatably 5+% faster than binary search.

Leave a comment if you’re interested in the benchmark code, it’s a bit of a mess right now.

Easy font rendering in OpenGL

It’s surprising to what degree you are expected to bloat your code base, executable size and/or dependencies when you just want some decent true type font rendering in a cross-platform OpenGL application. Thankfully, Sean Barrett wrote the wonderful stb_truetype.h (and released it to the public domain no less!), which solves the problem in a single (under 2000 line) header file you can just drop into your project.

If you don’t want Visual Studio 11′s debugger to throw you out due to uninitialized values, replace line 1038 in that file with

It should have previously been

By the way, his site also hosts a great and similarly easy to use image reading/writing library.

Windows console stuff

Here’s how to create a windows .exe without a console window, and without having to deal with the WinMain entry point:

  1. Just keep the “Console” subsystem selected in the project properties
  2. Add the following to the file containing your “int main(…)” entry point:

Also helpful: redirecting stdout/stderr using freopen:

Audio programming and DTS decoding

Since I recently gained the option of connecting digital audio to my capture card, and since some people requested it, I added audio handling to PtBi. This required less effort than I originally anticipated, partly due to the ease of use of the bass audio library. It’s a tiny .dll and quite well documented.

One disappointment was discovering that the Blackmagic Intensity Pro doesn’t handle more than 2 audio channels. However, with some bit fiddling, it’s possible to reconstruct and decode DTS 5.1 encoded audio frames from the left/right channels supplied by the hardware. While this could mean that PtBi now may or may not violate any number of software patents in some countries, I doubt anyone cares.

Since I was getting into this audio stuff, I also added some quick features I could use such as a general audio level boost and stereo to quadrophonic expansion.

More TPXAA Comparisons

What better way to compare staircase artifacts than with a staircase? Here we see that for classic aliasing artifacts, TPXAA smoothing performance is as good as FXAA. Looking at the grass texture in the upper left corner the undesirable overall blur of the FXAA image is faintly visible.

In this purely 2D comparison, TPXAA does nothing while FXAA deforms the straight red geometric shape and diminishes the text outline.

Finally, this mixed comparison shows sume HUD deformation with FXAA that is prevented by TPXAA.

Full source images: