Audio programming and DTS decoding

Since I recently gained the option of connecting digital audio to my capture card, and since some people requested it, I added audio handling to PtBi. This required less effort than I originally anticipated, partly due to the ease of use of the bass audio library. It’s a tiny .dll and quite well documented.

One disappointment was discovering that the Blackmagic Intensity Pro doesn’t handle more than 2 audio channels. However, with some bit fiddling, it’s possible to reconstruct and decode DTS 5.1 encoded audio frames from the left/right channels supplied by the hardware. While this could mean that PtBi now may or may not violate any number of software patents in some countries, I doubt anyone cares.

Since I was getting into this audio stuff, I also added some quick features I could use such as a general audio level boost and stereo to quadrophonic expansion.

PXAA in PtBi

PtBi has had the option of running FXAA in realtime on captured frames for a while. This has great results on edges (as long as they aren’t sub-pixel sized) and is very fast, but the disadvantage of also inadvertently affecting non-edge elements, particularly text and 2D GUI elements in general.

I’ve thought about how to prevent this for years, and finally implemented some of my ideas now. I call the result PXAA (Predicated FXAA), and it worked out rather well.

The general idea is this:

  1. From the original input frame luminosity, calculate 6 true/false values representing the following: horizontal rising edge start/end, horizontal edge, horizontal falling edge start/end, vertical rightward edge start/end, vertical edge, vertical leftward edge start/end
  2. Use these values to detect horizontal and vertical aliasing artifacts by following along an edge from one start/end to the next start/end, and store these edges in a predicate buffer
  3. (optional) Accumulate this predicate over several frames (I call this TPXAA)
  4. Use that predicate buffer to select whether or not to enable FXAA for each pixel

In terms of implementation, part (2) was challenging for a while since I could not think of a way to implement this efficiently without the ability to write multiple locations in a pixel shader. I toyed with the idea of implementing it in OpenCL, but that introduces more dependencies for my application and lots of boilerplate code. Luckily, the somewhat recent GL_ARB_shader_image_load_store extension does exactly what I needed.

Results

The images below show some results of applying the process:

Test image 1 - NOAA

Test image 1 - FXAA

Test image 1 - TPXAA

To clearly illustrate the differences, here’s the result of a subtraction of the results of FXAA and PXAA from the Non-AAed images:

Test image 1 - FXAA/NOAA difference image

Test image 1 - PXAA/NOAA difference image

This gallery shows more comparisons + difference images:

Performance

Currently, performance results look like this on my GTX460 (naive measurement, from/to a glFinish) :

  • noaa (this is a copy operation): 0.2 ms
  • FXAA: 1.3 ms
  • PXAA: 2.3-3.2 ms (depends on number/length of edges)
  • TPXAA: 2.8-3.6 ms (as above)

A prototype without the image_load_store extension takes > 8 ms.

Limitations

Because I (have to) use analog input for now, horizontal edges are slightly blurred, reducing the general effectiveness and accuracy of the method. For digital input, I could tighten the threshold and further reduce the amount of false positives.