Texture Scaling in Emulators

PPSSPP is a great PSP emulator for all kinds of platforms, including Windows and Android. I recently started using it to play some of my PSP games, and I was surprised how nice a few of them (particularly the stylized ones) can look with some AA and a higher rendering resolution.

However, the texture resolution on many of the games is a huge blemish on the visuals. Look at this example (from Fate: Extra): DefaultParticularly the hair is absurdly pixelized, but the clothing and tree texture aren’t much better. In general, trying to make a higher resolution image from a lower resolution one is a fool’s errand, as the information just isn’t there. However, for stylized textures such as these I thought something might be done.

The first idea was to use HQ4x, an image scaling algorithm designed for pixel art. Hacking that into PPSSPP yielded the following result:
hq4xAs you can see, it was pretty effective on the hard transparency edges of the hair and tree textures, but only increased the pixelation on the soft, anti-aliased edges of the cloth.

Luckily, scaling of image art has advanced quite a bit since HQx was created, and I soon found an algorithm called xBR created by Hyllian on the byuu.org message boards. The source code for xBRZ, a slightly improved and parallelizable implementation of xBR is available as part of the HqMAME project. It deals much better with anti-aliased edges, and integrating it into PPSSPP ended up looking like this:xbrzIt’s a generally great result, and better than HQ4x, with one drawback: the posterization of gradients. It’s not too apparent in the image above, but it can be very distracting in other scenes and games (e.g. it can look really bad in sky textures).

To circumvent that effect I had to take to Matlab. I came up with an algorithm that calculates a mask based on the local contrast of a texture, and then chooses between xBRZ and bilinear/bicubic texture scaling based on the mask value.
maskPutting all of that together, and adding an additional deposterization step which improves the quality of compressed textures, I arrived at this:
hybridThe initial version was very slow, particularly with bicubic scaling. So I also parallelized everything and added a SSE 4.1 version of the scaling function. You can try the final result in any recent build of PPSSPP.

There are still many things that could be explored for even better automatic texture scaling in emulators. One particular deficiency of xBR for texture scaling is how it deals with the borders of images. It simply assumes that the texture continues as on the border (i.e. replicates it). A better idea for textures could be to assume that the edge direction continues as it does on the border – this could reduce some tiling artifacts that appear when scaling.

Another interesting topic would be the replication of noise or small-scale detail on an upscaled texture, but it would require some in-depth analysis of the texture images which might not be feasible in real-time.

Oculus Rift

Two days ago I received my Oculus Rift developer kit. If you’re unfamiliar with the Rift, it’s an affordable Virtual Reality headset that had a successful kickstarter for developer kits last year.

My kit had a pretty long journey, going to Australia first. I used to think that people (particularly in the US) mixing up Austria and Australia was just a myth, but it seems like it actually happens:

Mislabeled Package

Tracking Information for the UPS order

Tracking Information for the UPS order

But hey, all is well that ends well. It’s a really nicely packaged kit, and includes adapters for anywhere on earth and 3 times as many video cables as you need:

Box

You can find much better pictures of exactly what’s inside (and the great box!) elsewhere on the web.

Sadly, I don’t have much time to do development for the Rift or even much testing right now, but here are my first impressions:

  • It works! When you first put it on and look around, it really feels like an entirely new experience. I had a few people at work try it today, and all were really impressed as well.
  • The resolution is low, but not as bad as I expected. I think with the consumer version’s planned 1080p resolution and really nicely anti-aliased rendering, we’ll be fine for a while.
  • The pixel switching time of the current display is too long. Ideally, I think it should use something like an OLED display, with instant response.
  • The headtracking is really fast, I didn’t notice any perceptible delay.

I just tested using the “Oculus World Demo” included with the SDK, and I noticed that the reaction speed and even the blur with head movement seemed significantly better with the windowed fullscreen mode instead of the “real” fullscreen mode. I’m not sure why this is the case, it could be that in real fullscreen I had VSync on.

Anyway, I hope I get more time to play around with it this weekend.

 

Implementing your own synchronisation primitives is tricky

This blog post is about the folly of implementing your own synchronization primitives without thinking about what compilers are allowed to do. If you’re not into low-level C/x64 parallelism programming then you can safely skip it ;)

In the Insieme project, we use one double-ended work stealing queue per hardware thread which can be independently accessed (read and write) at both ends. It’s implemented as a circular buffer with a 64 bit control word.

The original code for adding a new item to this queue looked something like this:

Now, this generally worked fine in practice, but in unit tests around ever 21 millionth insertion failed. After chasing a few wrong leads I figured out that setting  newstate  to volatile  fixed the issue. The problem with this, of course, is that it makes no sense. It’s a local variable stored on the stack of the executing thread – it can not be accessed by any other thread.

In the end, to understand the issue, looking into the generated assembler code for both versions was required. Here’s what gcc does in the nonvolatile version:

And here’s the volatile one:

As you can see from the comments in the first version, we started interpreting the assembly from the top. That was a mistake. If you look at the last few lines, you can see the culprit. The line mov QWORD PTR [rdi+8+rax*8], rsi  corresponds to wb->items[newstate.top_update] = wi; . In the non-volatile version, gcc decides to move that line below the unlocking of the data structure. This is a perfectly valid transformation, since there are no dependencies between the two lines (gcc is obviously unaware of any parallelism going on).

There are many ways to fix the issue: add a memory barrier ( __sync_synchronize in gcc), do the assignment using an atomic exchange operation, or if you want to stay in pure C: (wb->items[newstate.top_update] = wi) && (wb->state.top_val = newstate.top_update); . Which is admittedly ugly, and only works since wi is never NULL . Sadly, all of these options have a slight performance penalty. If anyone knows any other portable way to enforce the ordering of operations in this case, I’d be happy to hear about it.

And that’s it, more or less. Lessons learned: take care when implementing your own synchronizations. If you think you are taking care, take more care. And when comparing assembly, look at the obvious differences before starting to interpret the code top down.

PtBi 4.1516

I just fixed the crash bug in PtBi introduced with the latest NVidia WHQL drivers.

If anyone from NV is reading this, I really don’t think having a:

Should cause the shader compiler to spit out this:

It works just fine without the “restrict”.

Anyway, if you’re using PtBi with a NV GPU then you can find an updated, working version on the PtBi homepage. Sorry for the delay in fixing this.

DSfix 2.0.1

With 2.0 yesterday I introduced an issue with the HUD modifications. I fixed it now. That’s all that has changed.

People are also reporting some stability problems and physics issues since the patch, but I’m not sure those are related to DSfix. On the bright side, it seems like in addition to fixing the stereo downmix, the patch also somewhat reduced the CPU load of the game.

As always, consider donating if you like the mod.

Get DSfix 2.0.1 here.

DSfix 2.0

Dark Souls was updated today, fixing the audio downmixing bug that had been present since launch (and maybe more?). Unfortunately, it also broke some features of DSfix, most significantly the FPS unlocking.

Well, with a lot of help from Clément Barnier, here is version 2.0 of DSfix which resolves these issues and adds a small new feature.

Changes:

  • Updated the framerate unlock feature to work with the patched version of the game (Nwks)
  • Updated post-processing AA to work with the patched version of the game
  • Fixed an issue where hudless screenshots would sometimes not correctly capture some effects
  • Added “presentWidth” and “presentHeight” to the .ini for full control over (windowed) downsampling. For example, if you want to downsample from 2560×1440 to 1080p, you would use renderWidth 2560, renderHeight 1440, presentWidth 1920 and presentHeight 1080. If none of that makes sense to you just leave these values at 0 ;)

I hope this allows you to enjoy Dark Souls in its full glory again. Happy holidays!

As always, consider donating if you like the mod.

Get DSfix 2.0 here.

It’s 4 am here now so if I messed up anything in this release it will have to wait until tomorrow.

PtBi and GeForce 310 driver series

It has been brought to my attention that the latest beta GeForce drivers break compatibility with PtBi. Since the error message they report isn’t particularly helpful, I’d like to wait until there’s a WHQL release to investigate this further.

For now, I suggest sticking with the 306.97 driver release.

The next release of PtBi (which I’ll make after the final release of the new driver, and after I fixed the issue if it still persists in that release) should also slightly decrease latency in some configurations.

Dark Souls online and Amazon deal

First of all, Dark Souls is currently 19.90 at Amazon.com. So, if for some reason you haven’t bought it yet (or know someone who hasn’t) then now is the time!

More importantly, I wanted to collect some hints for successful multiplayer gaming in the PC version of Dark Souls. Many people seem to have problems with this, and there is a lot of misinformation online, while some good hints are hard to find. Since I played the game for probably 80 hours or so in co-op, killed lots of people who defiled the forest and have tracked down dozens of the guilty, I feel like I should share what I learned:

General:

  • Enable UPNP on your router, and open these ports:
    TCP ports 80 and 443
    UDP and TCP ports 53, 88 and 3074
    More details here.
  • Don’t expect to play multiplayer during the first 5-15 minutes after you started the game and loaded your save. Spend this time doing something different. It may be a good idea, if you want to play MP later, to start the game early, load your save, and let the game idle a bit (in a safe zone of course). Alt-tabbing out is fine!

Coop:

  • Make sure you are at a good level range for the area you are in.
  • If you want to summon someone specific, and need to get their summoning sign to appear, try to run away and back to the spot a few times.
  • Replacing your sign is generally not a good idea, even if it is often suggested online. Only replace your sign if your coop partner absolutely couldn’t make it appear for 10 minutes or so.
  • Once you have connected to someone for the first time, during the rest of the session any summoning attempts will succeed immediately. Also, signs will appear much more rapidly. Clearly, it’s a good idea to play long sessions with the same people.

Ring-based PvP:

  • This applies to Forest defenders and Blades of the darkmoon.
  • Again, there is some “startup delay” here. Equip the ring and go do something else (e.g. farming). I thought when I first joined the Forest defenders that the ring wasn’t working, but I only kept it equipped for a short time. Once you get summoned for the first time, it will continue to happen every few minutes!

Other PvP:

  • Obviously, try not to overlevel.
  • Kill the guilty.

Hope this helps!
Taking all this into account, it usually doesn’t take us more than 10 minutes to get a multiplayer session going.

DSfix 1.9

I got the first external code patch for DSfix this week! I hope it’s a sign of more to come.

Changes:

  • Added 2 new ambient occlusion algorithms: HBAO and SCAO. However, the existing VSSAO is still the best option in terms of visuals/performance.
  • Added a scale option for ambient occlusion on lower-end systems. Also useful if you want to downscale from a high resolution but not compute the SSAO at that resolution. This option works for all AO algorithms.
  • Disabled hotkeys when Dark Souls is not active. Previously, you could e.g. toggle the frame rate limit or take screenshots inadvertently when DS was not the active window.
  • Reinstated the texture filtering option with slightly better implementation.
  • Small bug fixes in frame rate limits calculation.
  • Rework WindowManager::resize to center the window along with resizing, and call on startup. Also fix small style issue with the borderless fullscreen toggle (wT)

The AO changes are rather big, the rest are small things that people have been asking for or that I wanted to do for a while.

As always, consider donating if you like the mod.

Get DSfix 1.9 at the Dark Souls nexus.
(alternative download location)

DSFix 1.8

It has been quite a while — I’ve been very busy at work for the past 3 weeks. I also haven’t had time to check/approve blog comments, so now there are 3700 in the moderation queue, and I’m sure at least 98% of those are spam. Sorry if you got stuck there!

Changes:

  • Added an FXAA option in addition to SMAA. FXAA is blurrier, but also deals with sub-pixel aliasing better because of that. It’s also very slightly cheaper in terms of performance.
  • Added the ability to bind a key to toggle a 30 FPS limit for a short time. By default it’s bound to backspace. This is useful to fix issues with the FPS unlock, see below.
  • Greatly improved depth usage in the SSAO shader. This means that the SSAO effect is now more stable at different distances from objects.

As always, consider donating if you like the mod.

Get DSfix 1.8 at the Dark Souls nexus.
(alternative download location)

In other news, I believe that all the possible issues with the FPS unlock are now known, and there is a way to deal with most of them as of version 1.8:

  • Sliding down ladders can make you fall through the ground. Workaround: don’t slide down ladders with unlocked FPS.
  • Jumping/rolling distance is slightly reduced at 60 FPS. Workaround: in case you need to do one of the 2 or 3 jumps in the game that need maximum distance, use the new toggle key to toggle a 30 FPS limit for the jump, then toggle back to 60 after it.
  • Slope interactions are slightly different at 60 FPS compared to 30, which may make you “stuck” at small obstructions from time to time. Workaround: rolling if there is space, walking back and forth a bit if there is not. Toggling the 30 FPS lock for a second in the worst case.

I have been playing for 40 hours or so with unlocked framerate, in single-player, coop and pvp, and I had no other issues.

In conclusion, I hope everyone had a nice couple of weeks and backed Project Eternity!