Wrapper_gen, a wrapper generator for COM interfaces

Posted on 2013-07-31 by petert

DSfix was based on a Direct3D9 wrapper, which was mostly taken from an existing code base and extended manually.

Recently, I’ve needed to hook Direct3D9Ex, and came to the conclusion that the manual busy work of writing the initial wrapper is better left to a computer than a human. Therefore, I wrote a Ruby script which takes a Microsoft COM dll header interface specification, and generates the C++ code for a wrapper class for it.

Here’s the script (wrapper_gen.rb), it’s rather tiny:

# parameter check
if(ARGV.size < 3 || ARGV.size > 4) 
    puts "Usage:   ruby wrapper_gen.rb INTERFACE_NAME IN_FILE_NAME OUT_NAME [LOG?]"
    puts "Examples: "
    puts "  - ruby wrapper_gen.rb IDirect3DTexture9 d3d9.h d3d9tex"
    puts "    produces d3d9tex.h and d3d9tex.cpp with no logging"
    puts "  - ruby wrapper_gen.rb IUnknown d3d9.h unknown true"
    puts "    produces unknown.h and unknown.cpp"
    puts "    and adds logging to generated wrapper methods"
    puts "wrapper_gen version 0.2"
    exit(0)
end

# parameters
interface_name = ARGV[0]
file_name = ARGV[1]
out_name = ARGV[2]
logging = ARGV.size > 3 && ARGV[3] != "false"

# logging spec
log_string_pre = 'SDLOG(20, "'
log_string_post = '\n");'

# regexps used for input file parsing
interface_exp = /DECLARE_INTERFACE_\(\s*#{interface_name}\s*,\s*(\w+)\s*\)\s*\{([^}]*)\s*};/m
method_exp1 = /STDMETHOD\s*\(\s*(\w+)\s*\)\(([^)]+)\)/
method_exp2 = /STDMETHOD_\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)\(([^)]+)\)/
param_exp = /^(.+?)\s+(\w+)$/

# used to write header to h and cpp files
def write_header(file, iname, fname)
    file.puts "// wrapper for #{iname} in #{fname}"
    file.puts "// generated using wrapper_gen.rb"
    file.puts ""
end

# do stuff
f = IO.binread(file_name)
decl = interface_exp.match(f)

if(!decl)
    puts "Could not find declaration of interface #{interface_name} in #{file_name}"
    exit(0)
end

File.open("#{out_name}.h", "w+") { |hfile|
File.open("#{out_name}.cpp", "w+") { |cppfile|

write_header(hfile, interface_name, file_name)
hfile.puts "#include \"#{file_name}\""
hfile.puts ""
hfile.puts "interface hk#{interface_name} : public #{interface_name} {"
hfile.puts "    #{interface_name} *m_pWrapped;"
hfile.puts "    "
hfile.puts "public:"
hfile.puts "    hk#{interface_name}(#{interface_name} **pp#{interface_name});"
hfile.puts "    "
hfile.puts "    // original interface"

write_header(cppfile, interface_name, file_name)
cppfile.puts "#include \"#{out_name}.h\""
cppfile.puts ""
cppfile.puts "hk#{interface_name}::hk#{interface_name}(#{interface_name} **pp#{interface_name}) {"
cppfile.puts "    m_pWrapped = *pp#{interface_name};"
cppfile.puts "    *pp#{interface_name} = this;"
cppfile.puts "}"

decl[2].each_line do |line|
    m1 = method_exp1.match(line)
    m2 = method_exp2.match(line)
    if(m1 || m2)
        # header
        l = line.strip.gsub(/,(\S)/,', \1').gsub(" PURE","").gsub("PURE","")
        l = l.gsub("THIS_ ", "").gsub("THIS", "")
        hfile.puts "    #{l}"
        # parse
        name = m1 ? m1[1] : m2[2]
        rettype = m2 ? m2[1] : "HRESULT"
        param_string = m1 ? m1[2] : m2[3]
        params = param_string.split(",")
        params.reject! {|p| p.downcase == "this" }
        params.map! {|p| p.gsub("THIS_ ", "").strip }
        pnum = 0
        params_split = params.map do |p| 
            m = param_exp.match(p)
            if(m)
                [m[1], m[2]]
            else
                pnum += 1
                [p, "param_#{pnum}"]
            end
        end
        param_string = params_split.map { |pair| "#{pair[0]} #{pair[1]}" }.join(", ")
        # cpp
        cppfile.puts ""
        cppfile.puts "#{rettype} APIENTRY hk#{interface_name}::#{name}(#{param_string}) {"
        cppfile.puts "    #{log_string_pre}hk#{interface_name}::#{name}#{log_string_post}" if logging
        cppfile.puts "    return m_pWrapped->#{name}(#{params_split.map {|p| p[1]}.join(", ")});"
        cppfile.puts "}"
    end
end

hfile.puts "};"
hfile.puts ""

} # close cppfile
} # close hfile

100

101

102

103

104

105

106

107

# parameter check

if(ARGV.size < 3 || ARGV.size > 4)

puts "Usage: ruby wrapper_gen.rb INTERFACE_NAME IN_FILE_NAME OUT_NAME [LOG?]"

puts "Examples: "

puts " - ruby wrapper_gen.rb IDirect3DTexture9 d3d9.h d3d9tex"

puts " produces d3d9tex.h and d3d9tex.cpp with no logging"

puts " - ruby wrapper_gen.rb IUnknown d3d9.h unknown true"

puts " produces unknown.h and unknown.cpp"

puts " and adds logging to generated wrapper methods"

puts "wrapper_gen version 0.2"

exit(0)

end

# parameters

interface_name = ARGV[0]

file_name = ARGV[1]

out_name = ARGV[2]

logging = ARGV.size > 3 && ARGV[3] != "false"

# logging spec

log_string_pre = 'SDLOG(20, "'

log_string_post = '\n");'

# regexps used for input file parsing

interface_exp = /DECLARE_INTERFACE_$\s*#{interface_name}\s*,\s*(\w+)\s*$\s*\{([^}]*)\s*};/m

method_exp1 = /STDMETHOD\s*$\s*(\w+)\s*$$([^)]+)$/

method_exp2 = /STDMETHOD_\s*$\s*(\w+)\s*,\s*(\w+)\s*$$([^)]+)$/

param_exp = /^(.+?)\s+(\w+)$/

# used to write header to h and cpp files

def write_header(file, iname, fname)

file.puts "// wrapper for #{iname} in #{fname}"

file.puts "// generated using wrapper_gen.rb"

file.puts ""

end

# do stuff

f = IO.binread(file_name)

decl = interface_exp.match(f)

if(!decl)

puts "Could not find declaration of interface #{interface_name} in #{file_name}"

exit(0)

end

File.open("#{out_name}.h", "w+") { |hfile|

File.open("#{out_name}.cpp", "w+") { |cppfile|

write_header(hfile, interface_name, file_name)

hfile.puts "#include \"#{file_name}\""

hfile.puts ""

hfile.puts "interface hk#{interface_name} : public #{interface_name} {"

hfile.puts " #{interface_name} *m_pWrapped;"

hfile.puts " "

hfile.puts "public:"

hfile.puts " hk#{interface_name}(#{interface_name} **pp#{interface_name});"

hfile.puts " "

hfile.puts " // original interface"

write_header(cppfile, interface_name, file_name)

cppfile.puts "#include \"#{out_name}.h\""

cppfile.puts ""

cppfile.puts "hk#{interface_name}::hk#{interface_name}(#{interface_name} **pp#{interface_name}) {"

cppfile.puts " m_pWrapped = *pp#{interface_name};"

cppfile.puts " *pp#{interface_name} = this;"

cppfile.puts "}"

decl[2].each_line do |line|

m1 = method_exp1.match(line)

m2 = method_exp2.match(line)

if(m1 || m2)

# header

l = line.strip.gsub(/,(\S)/,', \1').gsub(" PURE","").gsub("PURE","")

l = l.gsub("THIS_ ", "").gsub("THIS", "")

hfile.puts " #{l}"

# parse

name = m1 ? m1[1] : m2[2]

rettype = m2 ? m2[1] : "HRESULT"

param_string = m1 ? m1[2] : m2[3]

params = param_string.split(",")

params.reject! {|p| p.downcase == "this" }

params.map! {|p| p.gsub("THIS_ ", "").strip }

pnum = 0

params_split = params.map do |p|

m = param_exp.match(p)

if(m)

[m[1], m[2]]

else

pnum += 1

[p, "param_#{pnum}"]

end

param_string = params_split.map { |pair| "#{pair[0]} #{pair[1]}" }.join(", ")

# cpp

cppfile.puts ""

cppfile.puts "#{rettype} APIENTRY hk#{interface_name}::#{name}(#{param_string}) {"

cppfile.puts " #{log_string_pre}hk#{interface_name}::#{name}#{log_string_post}" if logging

cppfile.puts " return m_pWrapped->#{name}(#{params_split.map {|p| p[1]}.join(", ")});"

cppfile.puts "}"

end

hfile.puts "};"

hfile.puts ""

} # close cppfile

} # close hfile

To use it, you specify the interface name, input header file, output file base name, and optionally whether you want logging information to be generated for each wrapped method.

For example, ruby wrapper_gen.rb IDirect3DTexture9 d3d9.h d3d9tex true would generate a wrapper for the IDirect3DTexture9 interface, get the information from d3d9.h, and store the generated wrapper on d3d9tex.h and d3d9tex.cpp. The implementations for the latter would include logging.

Here’s are the generated files for this test case.

d3d9tex.h:

// wrapper for IDirect3DTexture9 in d3d9.h
// generated using wrapper_gen.rb

#include "d3d9.h"

interface hkIDirect3DTexture9 : public IDirect3DTexture9 {
	IDirect3DTexture9 *m_pWrapped;

public:
	hkIDirect3DTexture9(IDirect3DTexture9 **ppIDirect3DTexture9);

	// original interface
	STDMETHOD(QueryInterface)(REFIID riid, void** ppvObj);
	STDMETHOD_(ULONG, AddRef)();
	STDMETHOD_(ULONG, Release)();
	STDMETHOD(GetDevice)(IDirect3DDevice9** ppDevice);
	STDMETHOD(SetPrivateData)(REFGUID refguid, CONST void* pData, DWORD SizeOfData, DWORD Flags);
	STDMETHOD(GetPrivateData)(REFGUID refguid, void* pData, DWORD* pSizeOfData);
	STDMETHOD(FreePrivateData)(REFGUID refguid);
	STDMETHOD_(DWORD, SetPriority)(DWORD PriorityNew);
	STDMETHOD_(DWORD, GetPriority)();
	STDMETHOD_(void, PreLoad)();
	STDMETHOD_(D3DRESOURCETYPE, GetType)();
	STDMETHOD_(DWORD, SetLOD)(DWORD LODNew);
	STDMETHOD_(DWORD, GetLOD)();
	STDMETHOD_(DWORD, GetLevelCount)();
	STDMETHOD(SetAutoGenFilterType)(D3DTEXTUREFILTERTYPE FilterType);
	STDMETHOD_(D3DTEXTUREFILTERTYPE, GetAutoGenFilterType)();
	STDMETHOD_(void, GenerateMipSubLevels)();
	STDMETHOD(GetLevelDesc)(UINT Level, D3DSURFACE_DESC *pDesc);
	STDMETHOD(GetSurfaceLevel)(UINT Level, IDirect3DSurface9** ppSurfaceLevel);
	STDMETHOD(LockRect)(UINT Level, D3DLOCKED_RECT* pLockedRect, CONST RECT* pRect, DWORD Flags);
	STDMETHOD(UnlockRect)(UINT Level);
	STDMETHOD(AddDirtyRect)(CONST RECT* pDirtyRect);
};

// wrapper for IDirect3DTexture9 in d3d9.h

// generated using wrapper_gen.rb

#include "d3d9.h"

interface hkIDirect3DTexture9 : public IDirect3DTexture9 {

IDirect3DTexture9 *m_pWrapped;

public:

hkIDirect3DTexture9(IDirect3DTexture9 **ppIDirect3DTexture9);

// original interface

STDMETHOD(QueryInterface)(REFIID riid, void** ppvObj);

STDMETHOD_(ULONG, AddRef)();

STDMETHOD_(ULONG, Release)();

STDMETHOD(GetDevice)(IDirect3DDevice9** ppDevice);

STDMETHOD(SetPrivateData)(REFGUID refguid, CONST void* pData, DWORD SizeOfData, DWORD Flags);

STDMETHOD(GetPrivateData)(REFGUID refguid, void* pData, DWORD* pSizeOfData);

STDMETHOD(FreePrivateData)(REFGUID refguid);

STDMETHOD_(DWORD, SetPriority)(DWORD PriorityNew);

STDMETHOD_(DWORD, GetPriority)();

STDMETHOD_(void, PreLoad)();

STDMETHOD_(D3DRESOURCETYPE, GetType)();

STDMETHOD_(DWORD, SetLOD)(DWORD LODNew);

STDMETHOD_(DWORD, GetLOD)();

STDMETHOD_(DWORD, GetLevelCount)();

STDMETHOD(SetAutoGenFilterType)(D3DTEXTUREFILTERTYPE FilterType);

STDMETHOD_(D3DTEXTUREFILTERTYPE, GetAutoGenFilterType)();

STDMETHOD_(void, GenerateMipSubLevels)();

STDMETHOD(GetLevelDesc)(UINT Level, D3DSURFACE_DESC *pDesc);

STDMETHOD(GetSurfaceLevel)(UINT Level, IDirect3DSurface9** ppSurfaceLevel);

STDMETHOD(LockRect)(UINT Level, D3DLOCKED_RECT* pLockedRect, CONST RECT* pRect, DWORD Flags);

STDMETHOD(UnlockRect)(UINT Level);

STDMETHOD(AddDirtyRect)(CONST RECT* pDirtyRect);

};

d3d9tex.cpp:

// wrapper for IDirect3DTexture9 in d3d9.h
// generated using wrapper_gen.rb

#include "d3d9tex.h"

hkIDirect3DTexture9::hkIDirect3DTexture9(IDirect3DTexture9 **ppIDirect3DTexture9) {
	m_pWrapped = *ppIDirect3DTexture9;
	*ppIDirect3DTexture9 = this;
}

HRESULT APIENTRY hkIDirect3DTexture9::QueryInterface(REFIID riid, void** ppvObj) {
	SDLOG(20, "hkIDirect3DTexture9::QueryInterface\n");
	return m_pWrapped->QueryInterface(riid, ppvObj);
}

ULONG APIENTRY hkIDirect3DTexture9::AddRef() {
	SDLOG(20, "hkIDirect3DTexture9::AddRef\n");
	return m_pWrapped->AddRef();
}

ULONG APIENTRY hkIDirect3DTexture9::Release() {
	SDLOG(20, "hkIDirect3DTexture9::Release\n");
	return m_pWrapped->Release();
}

HRESULT APIENTRY hkIDirect3DTexture9::GetDevice(IDirect3DDevice9** ppDevice) {
	SDLOG(20, "hkIDirect3DTexture9::GetDevice\n");
	return m_pWrapped->GetDevice(ppDevice);
}

HRESULT APIENTRY hkIDirect3DTexture9::SetPrivateData(REFGUID refguid, CONST void* pData, DWORD SizeOfData, DWORD Flags) {
	SDLOG(20, "hkIDirect3DTexture9::SetPrivateData\n");
	return m_pWrapped->SetPrivateData(refguid, pData, SizeOfData, Flags);
}

HRESULT APIENTRY hkIDirect3DTexture9::GetPrivateData(REFGUID refguid, void* pData, DWORD* pSizeOfData) {
	SDLOG(20, "hkIDirect3DTexture9::GetPrivateData\n");
	return m_pWrapped->GetPrivateData(refguid, pData, pSizeOfData);
}

HRESULT APIENTRY hkIDirect3DTexture9::FreePrivateData(REFGUID refguid) {
	SDLOG(20, "hkIDirect3DTexture9::FreePrivateData\n");
	return m_pWrapped->FreePrivateData(refguid);
}

DWORD APIENTRY hkIDirect3DTexture9::SetPriority(DWORD PriorityNew) {
	SDLOG(20, "hkIDirect3DTexture9::SetPriority\n");
	return m_pWrapped->SetPriority(PriorityNew);
}

DWORD APIENTRY hkIDirect3DTexture9::GetPriority() {
	SDLOG(20, "hkIDirect3DTexture9::GetPriority\n");
	return m_pWrapped->GetPriority();
}

void APIENTRY hkIDirect3DTexture9::PreLoad() {
	SDLOG(20, "hkIDirect3DTexture9::PreLoad\n");
	return m_pWrapped->PreLoad();
}

D3DRESOURCETYPE APIENTRY hkIDirect3DTexture9::GetType() {
	SDLOG(20, "hkIDirect3DTexture9::GetType\n");
	return m_pWrapped->GetType();
}

DWORD APIENTRY hkIDirect3DTexture9::SetLOD(DWORD LODNew) {
	SDLOG(20, "hkIDirect3DTexture9::SetLOD\n");
	return m_pWrapped->SetLOD(LODNew);
}

DWORD APIENTRY hkIDirect3DTexture9::GetLOD() {
	SDLOG(20, "hkIDirect3DTexture9::GetLOD\n");
	return m_pWrapped->GetLOD();
}

DWORD APIENTRY hkIDirect3DTexture9::GetLevelCount() {
	SDLOG(20, "hkIDirect3DTexture9::GetLevelCount\n");
	return m_pWrapped->GetLevelCount();
}

HRESULT APIENTRY hkIDirect3DTexture9::SetAutoGenFilterType(D3DTEXTUREFILTERTYPE FilterType) {
	SDLOG(20, "hkIDirect3DTexture9::SetAutoGenFilterType\n");
	return m_pWrapped->SetAutoGenFilterType(FilterType);
}

D3DTEXTUREFILTERTYPE APIENTRY hkIDirect3DTexture9::GetAutoGenFilterType() {
	SDLOG(20, "hkIDirect3DTexture9::GetAutoGenFilterType\n");
	return m_pWrapped->GetAutoGenFilterType();
}

void APIENTRY hkIDirect3DTexture9::GenerateMipSubLevels() {
	SDLOG(20, "hkIDirect3DTexture9::GenerateMipSubLevels\n");
	return m_pWrapped->GenerateMipSubLevels();
}

HRESULT APIENTRY hkIDirect3DTexture9::GetLevelDesc(UINT Level, D3DSURFACE_DESC *pDesc) {
	SDLOG(20, "hkIDirect3DTexture9::GetLevelDesc\n");
	return m_pWrapped->GetLevelDesc(Level, pDesc);
}

HRESULT APIENTRY hkIDirect3DTexture9::GetSurfaceLevel(UINT Level, IDirect3DSurface9** ppSurfaceLevel) {
	SDLOG(20, "hkIDirect3DTexture9::GetSurfaceLevel\n");
	return m_pWrapped->GetSurfaceLevel(Level, ppSurfaceLevel);
}

HRESULT APIENTRY hkIDirect3DTexture9::LockRect(UINT Level, D3DLOCKED_RECT* pLockedRect, CONST RECT* pRect, DWORD Flags) {
	SDLOG(20, "hkIDirect3DTexture9::LockRect\n");
	return m_pWrapped->LockRect(Level, pLockedRect, pRect, Flags);
}

HRESULT APIENTRY hkIDirect3DTexture9::UnlockRect(UINT Level) {
	SDLOG(20, "hkIDirect3DTexture9::UnlockRect\n");
	return m_pWrapped->UnlockRect(Level);
}

HRESULT APIENTRY hkIDirect3DTexture9::AddDirtyRect(CONST RECT* pDirtyRect) {
	SDLOG(20, "hkIDirect3DTexture9::AddDirtyRect\n");
	return m_pWrapped->AddDirtyRect(pDirtyRect);
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// wrapper for IDirect3DTexture9 in d3d9.h

// generated using wrapper_gen.rb

#include "d3d9tex.h"

hkIDirect3DTexture9::hkIDirect3DTexture9(IDirect3DTexture9 **ppIDirect3DTexture9) {

m_pWrapped = *ppIDirect3DTexture9;

*ppIDirect3DTexture9 = this;

}

HRESULT APIENTRY hkIDirect3DTexture9::QueryInterface(REFIID riid, void** ppvObj) {

SDLOG(20, "hkIDirect3DTexture9::QueryInterface\n");

return m_pWrapped->QueryInterface(riid, ppvObj);

}

ULONG APIENTRY hkIDirect3DTexture9::AddRef() {

SDLOG(20, "hkIDirect3DTexture9::AddRef\n");

return m_pWrapped->AddRef();

}

ULONG APIENTRY hkIDirect3DTexture9::Release() {

SDLOG(20, "hkIDirect3DTexture9::Release\n");

return m_pWrapped->Release();

}

HRESULT APIENTRY hkIDirect3DTexture9::GetDevice(IDirect3DDevice9** ppDevice) {

SDLOG(20, "hkIDirect3DTexture9::GetDevice\n");

return m_pWrapped->GetDevice(ppDevice);

}

HRESULT APIENTRY hkIDirect3DTexture9::SetPrivateData(REFGUID refguid, CONST void* pData, DWORD SizeOfData, DWORD Flags) {

SDLOG(20, "hkIDirect3DTexture9::SetPrivateData\n");

return m_pWrapped->SetPrivateData(refguid, pData, SizeOfData, Flags);

}

HRESULT APIENTRY hkIDirect3DTexture9::GetPrivateData(REFGUID refguid, void* pData, DWORD* pSizeOfData) {

SDLOG(20, "hkIDirect3DTexture9::GetPrivateData\n");

return m_pWrapped->GetPrivateData(refguid, pData, pSizeOfData);

}

HRESULT APIENTRY hkIDirect3DTexture9::FreePrivateData(REFGUID refguid) {

SDLOG(20, "hkIDirect3DTexture9::FreePrivateData\n");

return m_pWrapped->FreePrivateData(refguid);

}

DWORD APIENTRY hkIDirect3DTexture9::SetPriority(DWORD PriorityNew) {

SDLOG(20, "hkIDirect3DTexture9::SetPriority\n");

return m_pWrapped->SetPriority(PriorityNew);

}

DWORD APIENTRY hkIDirect3DTexture9::GetPriority() {

SDLOG(20, "hkIDirect3DTexture9::GetPriority\n");

return m_pWrapped->GetPriority();

}

void APIENTRY hkIDirect3DTexture9::PreLoad() {

SDLOG(20, "hkIDirect3DTexture9::PreLoad\n");

return m_pWrapped->PreLoad();

}

D3DRESOURCETYPE APIENTRY hkIDirect3DTexture9::GetType() {

SDLOG(20, "hkIDirect3DTexture9::GetType\n");

return m_pWrapped->GetType();

}

DWORD APIENTRY hkIDirect3DTexture9::SetLOD(DWORD LODNew) {

SDLOG(20, "hkIDirect3DTexture9::SetLOD\n");

return m_pWrapped->SetLOD(LODNew);

}

DWORD APIENTRY hkIDirect3DTexture9::GetLOD() {

SDLOG(20, "hkIDirect3DTexture9::GetLOD\n");

return m_pWrapped->GetLOD();

}

DWORD APIENTRY hkIDirect3DTexture9::GetLevelCount() {

SDLOG(20, "hkIDirect3DTexture9::GetLevelCount\n");

return m_pWrapped->GetLevelCount();

}

HRESULT APIENTRY hkIDirect3DTexture9::SetAutoGenFilterType(D3DTEXTUREFILTERTYPE FilterType) {

SDLOG(20, "hkIDirect3DTexture9::SetAutoGenFilterType\n");

return m_pWrapped->SetAutoGenFilterType(FilterType);

}

D3DTEXTUREFILTERTYPE APIENTRY hkIDirect3DTexture9::GetAutoGenFilterType() {

SDLOG(20, "hkIDirect3DTexture9::GetAutoGenFilterType\n");

return m_pWrapped->GetAutoGenFilterType();

}

void APIENTRY hkIDirect3DTexture9::GenerateMipSubLevels() {

SDLOG(20, "hkIDirect3DTexture9::GenerateMipSubLevels\n");

return m_pWrapped->GenerateMipSubLevels();

}

HRESULT APIENTRY hkIDirect3DTexture9::GetLevelDesc(UINT Level, D3DSURFACE_DESC *pDesc) {

SDLOG(20, "hkIDirect3DTexture9::GetLevelDesc\n");

return m_pWrapped->GetLevelDesc(Level, pDesc);

}

HRESULT APIENTRY hkIDirect3DTexture9::GetSurfaceLevel(UINT Level, IDirect3DSurface9** ppSurfaceLevel) {

SDLOG(20, "hkIDirect3DTexture9::GetSurfaceLevel\n");

return m_pWrapped->GetSurfaceLevel(Level, ppSurfaceLevel);

}

HRESULT APIENTRY hkIDirect3DTexture9::LockRect(UINT Level, D3DLOCKED_RECT* pLockedRect, CONST RECT* pRect, DWORD Flags) {

SDLOG(20, "hkIDirect3DTexture9::LockRect\n");

return m_pWrapped->LockRect(Level, pLockedRect, pRect, Flags);

}

HRESULT APIENTRY hkIDirect3DTexture9::UnlockRect(UINT Level) {

SDLOG(20, "hkIDirect3DTexture9::UnlockRect\n");

return m_pWrapped->UnlockRect(Level);

}

HRESULT APIENTRY hkIDirect3DTexture9::AddDirtyRect(CONST RECT* pDirtyRect) {

SDLOG(20, "hkIDirect3DTexture9::AddDirtyRect\n");

return m_pWrapped->AddDirtyRect(pDirtyRect);

}

You can adjust the code generated for the logging in the Ruby script. As you can see, this can save you a lot of rote work, particularly if you want to intercept multiple large interfaces.

Update:

The original script didn’t deal with unnamed function parameters correctly. Now it should.

PtBi update and source release

Posted on 2013-06-29 by petert

I just released a new version of PtBi (5.1729). It’s a minor update that adds a few small features people were asking for:

A nearest neighbour scaling mode.
The ability to bind keys to switch directly to a given AA or scaling mode (instead of going through the available modes step by step). See keys.ini for details and some examples.

More importantly, I also uploaded an initial commit of the PtBi source to GitHub. It’s probably a bit hard to get to build initially due to the dependencies, but I hope it is useful for someone.

C++11 chrono timers

Posted on 2013-06-27 by petert

I’m a pretty big proponent of C++ as a language, and particularly enthused about C++11 and how that makes it even better. However, sadly reality still lags a bit behind specification in many areas.

One thing that was always troublesome in C++, particularly in high performance or realtime programming, was that there was no standard, platform independent way of getting a high performance timer. If you wanted cross-platform compatibility and a small timing period, you had to go with some external library, go OpenMP or roll your own on each supported platform.

In C++11, the chrono namespace was introduced. It, at least in theory, provides everything you always wanted in terms of timing, right there in the standard library. Three different types of clocks are offered for different use cases: system_clock , steady_clock and high_resolution_clock.

Yesterday I wrote a small program to query and test these clocks in practice on different platforms. Here are the results:

============================================
Linux, GCC 4.8.1
--------------------------------------------

Clock info for High Resolution Clock:
period: 1 ns
unit: 1 ns
Steady: false

Clock info for Steady Clock:
period: 1 ns
unit: 1 ns
Steady: true

Clock info for System Clock:
period: 1 ns
unit: 1 ns
Steady: false

Time/iter, no clock: 1 ns
Time/iter, clock: 120 ns
Min time delta: 110 ns

============================================
Windows, Visual Studio 2012
--------------------------------------------

Clock info for High Resolution Clock:
period: 100 ns
unit: 100 ns
Steady: false

Clock info for Steady Clock:
period: 100 ns
unit: 100 ns
Steady: true

Clock info for System Clock:
period: 100 ns
unit: 100 ns
Steady: false

Time/iter, no clock: 2 ns
Time/iter, clock: 9 ns
Min time delta: 1000000 ns

============================================

Linux, GCC 4.8.1

--------------------------------------------

Clock info for High Resolution Clock:

period: 1 ns

unit: 1 ns

Steady: false

Clock info for Steady Clock:

period: 1 ns

unit: 1 ns

Steady: true

Clock info for System Clock:

period: 1 ns

unit: 1 ns

Steady: false

Time/iter, no clock: 1 ns

Time/iter, clock: 120 ns

Min time delta: 110 ns

============================================

Windows, Visual Studio 2012

--------------------------------------------

Clock info for High Resolution Clock:

period: 100 ns

unit: 100 ns

Steady: false

Clock info for Steady Clock:

period: 100 ns

unit: 100 ns

Steady: true

Clock info for System Clock:

period: 100 ns

unit: 100 ns

Steady: false

Time/iter, no clock: 2 ns

Time/iter, clock: 9 ns

Min time delta: 1000000 ns

So, sadly everything is not as great as it could be, yet. For each platform, the first three blocks are the values reported for the clock, and the last block contains values determined by repeated measurements:

“period” is the tick period reported by each clock, in nanoseconds.
“unit” is the unit used by clock values, also in nanoseconds.
“steady” indicates whether the time between ticks is always constant for the given clock.
“time/iter, no clock” is the time per loop iteration for the measurement loop without the actual measurement. It’s just a reference value to better judge the overhead of the clock measurements.
“time/iter, clock” is the average time per iteration, with clock measurement.
“min time delta” is the minimum difference between two consecutive, non-identical time measurements.

On Linux with GCC 4.8.1, all clocks report a tick period of 1 nanosecond. There isn’t really a reason to doubt that, and it’s obviously a great granularity. However, the drawback is that it takes around 120 nanoseconds on average to get a clock measurement. This would be understandable for the system clock, but seems excessive in the other cases, and could cause significant perturbation when trying to measure/instrument small code areas.

On Windows with VS12, a clock period of 100 nanoseconds is reported, but the actual measured tick period is a whopping 1000000 ns (1 millisecond). That is obviously unusable for many of the kind of use cases that would call for a “high resolution clock”. Windows is perfectly capable of supplying a true high resolution clock measurement, so this performance (or lack of it) is quite surprising. On the bright side, a measurement takes just 9 nanoseconds on average.

Clearly, both implementations tested here still have a way to go. If you want to test your own platform(s), here is the very simple program:

#include <chrono>
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
using namespace std;

template<typename C>
void print_clock_info(const char* name, const C& c) {
	typename C::duration unit(1); 
	typedef typename C::period period;
	cout << "Clock info for " << name << ":\n"
		 << "period: " << period::num*1000000000ull / period::den << " ns \n"
		 << "unit: " << chrono::duration_cast<chrono::nanoseconds>(unit).count() << " ns \n"
		 << "Steady: " << (c.is_steady?"true":"false") << "\n\n";
}

int main(int argc, char** argv) {
	chrono::high_resolution_clock highc;
	chrono::steady_clock steadyc;
	chrono::system_clock sysc;

	print_clock_info("High Resolution Clock", highc);
	print_clock_info("Steady Clock", steadyc);
	print_clock_info("System Clock", sysc);

	const long long iters = 10000000;

	vector<long long> vec(iters); 
	auto ref_start = highc.now();
	for(int i=0; i<iters; ++i) {
		vec[i] = i;
	}
	cout << "Time/iter, no clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-ref_start).count()/iters << " ns\n";

	auto start = highc.now();
	for(int i=0; i<iters; ++i) {
		auto time = chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count();
		vec[i] = time;
	}
	cout << "Time/iter, clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count()/iters << " ns\n";

	auto end = unique(vec.begin(), vec.end());
	adjacent_difference(vec.begin(), end, vec.begin());
	auto min = *min_element(vec.begin()+1, end);
	cout << "Min time delta: " << min << " ns\n";
}

#include <chrono>

#include <iostream>

#include <vector>

#include <algorithm>

#include <numeric>

using namespace std;

template<typename C>

void print_clock_info(const char* name, const C& c) {

typename C::duration unit(1);

typedef typename C::period period;

cout << "Clock info for " << name << ":\n"

<< "period: " << period::num*1000000000ull / period::den << " ns \n"

<< "unit: " << chrono::duration_cast<chrono::nanoseconds>(unit).count() << " ns \n"

<< "Steady: " << (c.is_steady?"true":"false") << "\n\n";

}

int main(int argc, char** argv) {

chrono::high_resolution_clock highc;

chrono::steady_clock steadyc;

chrono::system_clock sysc;

print_clock_info("High Resolution Clock", highc);

print_clock_info("Steady Clock", steadyc);

print_clock_info("System Clock", sysc);

const long long iters = 10000000;

vector<long long> vec(iters);

auto ref_start = highc.now();

for(int i=0; i<iters; ++i) {

vec[i] = i;

}

cout << "Time/iter, no clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-ref_start).count()/iters << " ns\n";

auto start = highc.now();

for(int i=0; i<iters; ++i) {

auto time = chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count();

vec[i] = time;

}

cout << "Time/iter, clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count()/iters << " ns\n";

auto end = unique(vec.begin(), vec.end());

adjacent_difference(vec.begin(), end, vec.begin());

auto min = *min_element(vec.begin()+1, end);

cout << "Min time delta: " << min << " ns\n";

}

PtBi version 5

Posted on 2013-06-09 by petert

I just released a new major version of PtBi, with 2 new features.

Dolby Digital 5.1 decoding

PtBi can now decode audio streams transmitted in Dolby Digital 5.1 format. Together with the existing DTS 5.1 decoding, this should now allow for true surround sound from almost any source. I believe that PtBi is the only Blackmagic Intensity capture program with this type of audio support.
This was easier than I expected, at least at first, because the decoding library functions very similarly to the one I used for DTS, but I was stuck for hours without any progress. It turns out that someone thought it would be a good idea to standardize a bitstream format such that it can be either big-endian or little-endian. Ugh.

SMAA integration

In addition to the existing FXAA, PXAA and TPXAA post-processing AA modes PtBi now also supports SMAA1x. SMAA1x has slightly better edge quality and motion stability than FXAA. I’ll look into integrating SMAA with my predication filters at some point in the future.

Also, I plan to release the source code for PtBi soon-ish. I was always reluctant to do this, since some of it is based on code I wrote almost a decade ago which is pretty terrible, but I cleaned it up slightly now. And some parts of it, like how to integrate the AA modes in OpenGL or how to use the various libraries for audio decoding/playback might be useful to someone. Also, it could help people identify and solve problems with AMD cards, which are always very hard for me to test/debug without access to the hardware.

Texture Scaling in Emulators

Posted on 2013-05-19 by petert

PPSSPP is a great PSP emulator for all kinds of platforms, including Windows and Android. I recently started using it to play some of my PSP games, and I was surprised how nice a few of them (particularly the stylized ones) can look with some AA and a higher rendering resolution.

However, the texture resolution on many of the games is a huge blemish on the visuals. Look at this example (from Fate: Extra): Particularly the hair is absurdly pixelized, but the clothing and tree texture aren’t much better. In general, trying to make a higher resolution image from a lower resolution one is a fool’s errand, as the information just isn’t there. However, for stylized textures such as these I thought something might be done.

The first idea was to use HQ4x, an image scaling algorithm designed for pixel art. Hacking that into PPSSPP yielded the following result:
As you can see, it was pretty effective on the hard transparency edges of the hair and tree textures, but only increased the pixelation on the soft, anti-aliased edges of the cloth.

Luckily, scaling of image art has advanced quite a bit since HQx was created, and I soon found an algorithm called xBR created by Hyllian on the byuu.org message boards. The source code for xBRZ, a slightly improved and parallelizable implementation of xBR is available as part of the HqMAME project. It deals much better with anti-aliased edges, and integrating it into PPSSPP ended up looking like this:It’s a generally great result, and better than HQ4x, with one drawback: the posterization of gradients. It’s not too apparent in the image above, but it can be very distracting in other scenes and games (e.g. it can look really bad in sky textures).

To circumvent that effect I had to take to Matlab. I came up with an algorithm that calculates a mask based on the local contrast of a texture, and then chooses between xBRZ and bilinear/bicubic texture scaling based on the mask value.
Putting all of that together, and adding an additional deposterization step which improves the quality of compressed textures, I arrived at this:
The initial version was very slow, particularly with bicubic scaling. So I also parallelized everything and added a SSE 4.1 version of the scaling function. You can try the final result in any recent build of PPSSPP.

There are still many things that could be explored for even better automatic texture scaling in emulators. One particular deficiency of xBR for texture scaling is how it deals with the borders of images. It simply assumes that the texture continues as on the border (i.e. replicates it). A better idea for textures could be to assume that the edge direction continues as it does on the border – this could reduce some tiling artifacts that appear when scaling.

Another interesting topic would be the replication of noise or small-scale detail on an upscaled texture, but it would require some in-depth analysis of the texture images which might not be feasible in real-time.

Oculus Rift

Posted on 2013-04-10 by petert

Two days ago I received my Oculus Rift developer kit. If you’re unfamiliar with the Rift, it’s an affordable Virtual Reality headset that had a successful kickstarter for developer kits last year.

My kit had a pretty long journey, going to Australia first. I used to think that people (particularly in the US) mixing up Austria and Australia was just a myth, but it seems like it actually happens:

Tracking Information for the UPS order

But hey, all is well that ends well. It’s a really nicely packaged kit, and includes adapters for anywhere on earth and 3 times as many video cables as you need:

You can find much better pictures of exactly what’s inside (and the great box!) elsewhere on the web.

Sadly, I don’t have much time to do development for the Rift or even much testing right now, but here are my first impressions:

It works! When you first put it on and look around, it really feels like an entirely new experience. I had a few people at work try it today, and all were really impressed as well.
The resolution is low, but not as bad as I expected. I think with the consumer version’s planned 1080p resolution and really nicely anti-aliased rendering, we’ll be fine for a while.
The pixel switching time of the current display is too long. Ideally, I think it should use something like an OLED display, with instant response.
The headtracking is really fast, I didn’t notice any perceptible delay.

I just tested using the “Oculus World Demo” included with the SDK, and I noticed that the reaction speed and even the blur with head movement seemed significantly better with the windowed fullscreen mode instead of the “real” fullscreen mode. I’m not sure why this is the case, it could be that in real fullscreen I had VSync on.

Anyway, I hope I get more time to play around with it this weekend.

Implementing your own synchronisation primitives is tricky

Posted on 2013-02-20 by petert

This blog post is about the folly of implementing your own synchronization primitives without thinking about what compilers are allowed to do. If you’re not into low-level C/x64 parallelism programming then you can safely skip it

In the Insieme project, we use one double-ended work stealing queue per hardware thread which can be independently accessed (read and write) at both ends. It’s implemented as a circular buffer with a 64 bit control word.

// =========== Circular work buffers
// front = top, back = bottom
// top INCLUSIVE, bottom EXCLUSIVE
// Length needs to be a power of 2!
//
//  8 |       |
//    |-------|
//  7 |       |  <- top_update
//  6 |       |
//    |-------|
//  5 |#######|  <- top_val
//  4 |#######|
//  3 |#######|
//    |-------|
//  2 |       |  <- bot_val
//  1 |       |
//    |-------|
//  0 |       |  <- bot_update

typedef union _irt_cwb_state {
	uint64 all;
	struct {
		union {
			uint32 top;
			struct {
				uint16 top_val;
				uint16 top_update;
			};
		};
		union {
			uint32 bot;
			struct {
				uint16 bot_val;
				uint16 bot_update;
			};
		};
	};
} irt_cwb_state;

typedef struct _irt_circular_work_buffer {
	volatile irt_cwb_state state;
	irt_work_item* items[IRT_CWBUFFER_LENGTH];
} irt_circular_work_buffer;

#define IRT_CWBUFFER_LENGTH 32
#define IRT_CWBUFFER_MASK (IRT_CWBUFFER_LENGTH-1)

// =========== Circular work buffers

// front = top, back = bottom

// top INCLUSIVE, bottom EXCLUSIVE

// Length needs to be a power of 2!

// 8 | |

// |-------|

// 7 | | <- top_update

// 6 | |

// |-------|

// 5 |#######| <- top_val

// 4 |#######|

// 3 |#######|

// |-------|

// 2 | | <- bot_val

// 1 | |

// |-------|

// 0 | | <- bot_update

typedef union _irt_cwb_state {

uint64 all;

struct {

union {

uint32 top;

struct {

uint16 top_val;

uint16 top_update;

};

union {

uint32 bot;

struct {

uint16 bot_val;

uint16 bot_update;

};

} irt_cwb_state;

typedef struct _irt_circular_work_buffer {

volatile irt_cwb_state state;

irt_work_item* items[IRT_CWBUFFER_LENGTH];

} irt_circular_work_buffer;

#define IRT_CWBUFFER_LENGTH 32

#define IRT_CWBUFFER_MASK (IRT_CWBUFFER_LENGTH-1)

The original code for adding a new item to this queue looked something like this:

void irt_cwb_push_front(irt_circular_work_buffer* wb, irt_work_item* wi) {
	// check feasibility
	irt_cwb_state state, newstate;
	for(;;) {
		state.all = wb->state.all;
		if(state.top_update != state.top_val) continue; // operation in progress on top
		// check for space
		newstate.all = state.all;
		newstate.top_update = (newstate.top_update+1) & IRT_CWBUFFER_MASK;
		if(newstate.top_update == state.bot_update 
			|| newstate.top_update == state.bot_val) continue; // not enough space in buffer, would be full after op
		// if we reach this point and no changes happened, we can perform our op
		if(irt_atomic_bool_compare_and_swap(&wb->state.all, state.all, newstate.all)) break; // repeat if state change since check
	}

	// write actual data to buffer
	wb->items[newstate.top_update] = wi;
	// finish operation
	wb->state.top_val = newstate.top_update;
}

void irt_cwb_push_front(irt_circular_work_buffer* wb, irt_work_item* wi) {

// check feasibility

irt_cwb_state state, newstate;

for(;;) {

state.all = wb->state.all;

if(state.top_update != state.top_val) continue; // operation in progress on top

// check for space

newstate.all = state.all;

newstate.top_update = (newstate.top_update+1) & IRT_CWBUFFER_MASK;

if(newstate.top_update == state.bot_update

|| newstate.top_update == state.bot_val) continue; // not enough space in buffer, would be full after op

// if we reach this point and no changes happened, we can perform our op

if(irt_atomic_bool_compare_and_swap(&wb->state.all, state.all, newstate.all)) break; // repeat if state change since check

}

// write actual data to buffer

wb->items[newstate.top_update] = wi;

// finish operation

wb->state.top_val = newstate.top_update;

}

Now, this generally worked fine in practice, but in unit tests around ever 21 millionth insertion failed. After chasing a few wrong leads I figured out that setting newstate to volatile fixed the issue. The problem with this, of course, is that it makes no sense. It’s a local variable stored on the stack of the executing thread – it can not be accessed by any other thread.

In the end, to understand the issue, looking into the generated assembler code for both versions was required. Here’s what gcc does in the nonvolatile version:

.LVL402:
.LBB2802:
	.loc 15 80 0
	movabs	r8, -4294901761
	.p2align 4,,10
	.p2align 3
.L409:
	.loc 15 76 0
	mov	rax, QWORD PTR [rdi]
	.loc 15 77 0
	mov	rdx, rax                 ; copy state
	shr	rdx, 16                  ; move top_update right
	cmp	dx, ax                   ; compare top_update w/ top_val
	jne	.L409                    ; if not equal restart
	.loc 15 80 0
	add	edx, 1                   ; top_update+1
	mov	rcx, rax                 ; get original state to c
	and	edx, 15                  ; & IRT_CWBUFFER_MASK
	and	rcx, r8                  ; delete bits 17 to 32 from original state in rcx
	movzx  r9d, dx                  ; r9d = lower 32 bits / zero extend top_update into 32 bits of r9
	sal	r9, 16                   ; move top_val back into position
	or     rcx, r9                  ; enter new top_update into original state
	.loc 15 81 0
	mov	r9, rax
	shr	r9, 48                   ; get bot_update in r9
	cmp	dx, r9w                  ; compare bot_update with newstate.top_update
	je     .L409                    ; no space, retry
	mov	r9, rax                  ; \
	shr	r9, 32                   ;  -> same story, bot_val
	cmp	dx, r9w                  ; /
	je	.L409 
	.loc 15 84 0
	lock cmpxchg	QWORD PTR [rdi], rcx
	jne	.L409
	.loc 15 88 0
	mov	rax, rdx
	.loc 15 90 0
	mov	WORD PTR [rdi], dx
	.loc 15 88 0
	and	eax, 15
	mov	QWORD PTR [rdi+8+rax*8], rsi
.LBE2802:
	.loc 15 91 0
	ret

.LVL402:

.LBB2802:

.loc 15 80 0

movabs r8, -4294901761

.p2align 4,,10

.p2align 3

.L409:

.loc 15 76 0

mov rax, QWORD PTR [rdi]

.loc 15 77 0

mov rdx, rax ; copy state

shr rdx, 16 ; move top_update right

cmp dx, ax ; compare top_update w/ top_val

jne .L409 ; if not equal restart

.loc 15 80 0

add edx, 1 ; top_update+1

mov rcx, rax ; get original state to c

and edx, 15 ; & IRT_CWBUFFER_MASK

and rcx, r8 ; delete bits 17 to 32 from original state in rcx

movzx r9d, dx ; r9d = lower 32 bits / zero extend top_update into 32 bits of r9

sal r9, 16 ; move top_val back into position

or rcx, r9 ; enter new top_update into original state

.loc 15 81 0

mov r9, rax

shr r9, 48 ; get bot_update in r9

cmp dx, r9w ; compare bot_update with newstate.top_update

je .L409 ; no space, retry

mov r9, rax ; \

shr r9, 32 ; -> same story, bot_val

cmp dx, r9w ; /

je .L409

.loc 15 84 0

lock cmpxchg QWORD PTR [rdi], rcx

jne .L409

.loc 15 88 0

mov rax, rdx

.loc 15 90 0

mov WORD PTR [rdi], dx

.loc 15 88 0

and eax, 15

mov QWORD PTR [rdi+8+rax*8], rsi

.LBE2802:

.loc 15 91 0

ret

And here’s the volatile one:

.L409:
.LBB2766:
	.loc 16 76 0
	mov	rax, QWORD PTR [rdi]
	.loc 16 77 0
	mov	rdx, rax
	shr	rdx, 16
	cmp	dx, ax
	jne	.L409
	.loc 16 79 0
	mov	QWORD PTR [rsp-16], rax
	.loc 16 80 0
	movzx	edx, WORD PTR [rsp-14]
	add	edx, 1
	and	edx, 15
	mov	WORD PTR [rsp-14], dx
	.loc 16 81 0
	movzx	ecx, WORD PTR [rsp-14]
	mov	rdx, rax
	shr	rdx, 48
	cmp	cx, dx
	je	.L409
	movzx	ecx, WORD PTR [rsp-14]
	mov	rdx, rax
	shr	rdx, 32
	cmp	cx, dx
	je	.L409
	.loc 16 84 0
	mov	rdx, QWORD PTR [rsp-16]
	lock cmpxchg	QWORD PTR [rdi], rdx
	jne	.L409
	.loc 16 88 0
	movzx	eax, WORD PTR [rsp-14]
	movzx	eax, ax
	mov	QWORD PTR [rdi+8+rax*8], rsi
	.loc 16 90 0
	movzx	eax, WORD PTR [rsp-14]
	mov	WORD PTR [rdi], ax
.LBE2766:
	.loc 16 91 0
	ret

.L409:

.LBB2766:

.loc 16 76 0

mov rax, QWORD PTR [rdi]

.loc 16 77 0

mov rdx, rax

shr rdx, 16

cmp dx, ax

jne .L409

.loc 16 79 0

mov QWORD PTR [rsp-16], rax

.loc 16 80 0

movzx edx, WORD PTR [rsp-14]

add edx, 1

and edx, 15

mov WORD PTR [rsp-14], dx

.loc 16 81 0

movzx ecx, WORD PTR [rsp-14]

mov rdx, rax

shr rdx, 48

cmp cx, dx

je .L409

movzx ecx, WORD PTR [rsp-14]

mov rdx, rax

shr rdx, 32

cmp cx, dx

je .L409

.loc 16 84 0

mov rdx, QWORD PTR [rsp-16]

lock cmpxchg QWORD PTR [rdi], rdx

jne .L409

.loc 16 88 0

movzx eax, WORD PTR [rsp-14]

movzx eax, ax

mov QWORD PTR [rdi+8+rax*8], rsi

.loc 16 90 0

movzx eax, WORD PTR [rsp-14]

mov WORD PTR [rdi], ax

.LBE2766:

.loc 16 91 0

ret

As you can see from the comments in the first version, we started interpreting the assembly from the top. That was a mistake. If you look at the last few lines, you can see the culprit. The line mov QWORD PTR [rdi+8+rax*8], rsi corresponds to wb->items[newstate.top_update] = wi; . In the non-volatile version, gcc decides to move that line below the unlocking of the data structure. This is a perfectly valid transformation, since there are no dependencies between the two lines (gcc is obviously unaware of any parallelism going on).

There are many ways to fix the issue: add a memory barrier ( __sync_synchronize in gcc), do the assignment using an atomic exchange operation, or if you want to stay in pure C: (wb->items[newstate.top_update] = wi) && (wb->state.top_val = newstate.top_update); . Which is admittedly ugly, and only works since wi is never NULL . Sadly, all of these options have a slight performance penalty. If anyone knows any other portable way to enforce the ordering of operations in this case, I’d be happy to hear about it.

And that’s it, more or less. Lessons learned: take care when implementing your own synchronizations. If you think you are taking care, take more care. And when comparing assembly, look at the obvious differences before starting to interpret the code top down.

PtBi 4.1516

Posted on 2012-12-30 by petert

I just fixed the crash bug in PtBi introduced with the latest NVidia WHQL drivers.

If anyone from NV is reading this, I really don’t think having a:

uniform writeonly restrict image2D targetBuff;

1	uniform writeonly restrict image2D targetBuff;

Should cause the shader compiler to spit out this:

(7) : fatal error C9999: *** exception during compilation ***

1	(7) : fatal error C9999: * exception during compilation *

It works just fine without the “restrict”.

Anyway, if you’re using PtBi with a NV GPU then you can find an updated, working version on the PtBi homepage. Sorry for the delay in fixing this.

DSfix 2.0.1

Posted on 2012-12-21 by petert

With 2.0 yesterday I introduced an issue with the HUD modifications. I fixed it now. That’s all that has changed.

People are also reporting some stability problems and physics issues since the patch, but I’m not sure those are related to DSfix. On the bright side, it seems like in addition to fixing the stereo downmix, the patch also somewhat reduced the CPU load of the game.

As always, consider donating if you like the mod.

Get DSfix 2.0.1 here.

Edit: Mediafire decided to take the file down for some reason, here is a mirror.

You can also always get DSfix at the Dark Souls Nexus.

DSfix 2.0

Posted on 2012-12-21 by petert

Dark Souls was updated today, fixing the audio downmixing bug that had been present since launch (and maybe more?). Unfortunately, it also broke some features of DSfix, most significantly the FPS unlocking.

Well, with a lot of help from Clément Barnier, here is version 2.0 of DSfix which resolves these issues and adds a small new feature.

Changes:

Updated the framerate unlock feature to work with the patched version of the game (Nwks)
Updated post-processing AA to work with the patched version of the game
Fixed an issue where hudless screenshots would sometimes not correctly capture some effects
Added “presentWidth” and “presentHeight” to the .ini for full control over (windowed) downsampling. For example, if you want to downsample from 2560×1440 to 1080p, you would use renderWidth 2560, renderHeight 1440, presentWidth 1920 and presentHeight 1080. If none of that makes sense to you just leave these values at 0

I hope this allows you to enjoy Dark Souls in its full glory again. Happy holidays!

As always, consider donating if you like the mod.

Get DSfix 2.0 here.

It’s 4 am here now so if I messed up anything in this release it will have to wait until tomorrow.