PtBi has had the option of running FXAA in realtime on captured frames for a while. This has great results on edges (as long as they aren’t sub-pixel sized) and is very fast, but the disadvantage of also inadvertently affecting non-edge elements, particularly text and 2D GUI elements in general.
I’ve thought about how to prevent this for years, and finally implemented some of my ideas now. I call the result PXAA (Predicated FXAA), and it worked out rather well.
The general idea is this:
- From the original input frame luminosity, calculate 6 true/false values representing the following: horizontal rising edge start/end, horizontal edge, horizontal falling edge start/end, vertical rightward edge start/end, vertical edge, vertical leftward edge start/end
- Use these values to detect horizontal and vertical aliasing artifacts by following along an edge from one start/end to the next start/end, and store these edges in a predicate buffer
- (optional) Accumulate this predicate over several frames (I call this TPXAA)
- Use that predicate buffer to select whether or not to enable FXAA for each pixel
In terms of implementation, part (2) was challenging for a while since I could not think of a way to implement this efficiently without the ability to write multiple locations in a pixel shader. I toyed with the idea of implementing it in OpenCL, but that introduces more dependencies for my application and lots of boilerplate code. Luckily, the somewhat recent GL_ARB_shader_image_load_store extension does exactly what I needed.
The images below show some results of applying the process:
To clearly illustrate the differences, here’s the result of a subtraction of the results of FXAA and PXAA from the Non-AAed images:
This gallery shows more comparisons + difference images:
Currently, performance results look like this on my GTX460 (naive measurement, from/to a glFinish) :
- noaa (this is a copy operation): 0.2 ms
- FXAA: 1.3 ms
- PXAA: 2.3-3.2 ms (depends on number/length of edges)
- TPXAA: 2.8-3.6 ms (as above)
A prototype without the image_load_store extension takes > 8 ms.
Because I (have to) use analog input for now, horizontal edges are slightly blurred, reducing the general effectiveness and accuracy of the method. For digital input, I could tighten the threshold and further reduce the amount of false positives.