I’m a pretty big proponent of C++ as a language, and particularly enthused about C++11 and how that makes it even better. However, sadly reality still lags a bit behind specification in many areas.
One thing that was always troublesome in C++, particularly in high performance or realtime programming, was that there was no standard, platform independent way of getting a high performance timer. If you wanted cross-platform compatibility and a small timing period, you had to go with some external library, go OpenMP or roll your own on each supported platform.
In C++11, the chrono namespace was introduced. It, at least in theory, provides everything you always wanted in terms of timing, right there in the standard library. Three different types of clocks are offered for different use cases: system_clock , steady_clock and high_resolution_clock.
Yesterday I wrote a small program to query and test these clocks in practice on different platforms. Here are the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
============================================ Linux, GCC 4.8.1 -------------------------------------------- Clock info for High Resolution Clock: period: 1 ns unit: 1 ns Steady: false Clock info for Steady Clock: period: 1 ns unit: 1 ns Steady: true Clock info for System Clock: period: 1 ns unit: 1 ns Steady: false Time/iter, no clock: 1 ns Time/iter, clock: 120 ns Min time delta: 110 ns ============================================ Windows, Visual Studio 2012 -------------------------------------------- Clock info for High Resolution Clock: period: 100 ns unit: 100 ns Steady: false Clock info for Steady Clock: period: 100 ns unit: 100 ns Steady: true Clock info for System Clock: period: 100 ns unit: 100 ns Steady: false Time/iter, no clock: 2 ns Time/iter, clock: 9 ns Min time delta: 1000000 ns |
So, sadly everything is not as great as it could be, yet. For each platform, the first three blocks are the values reported for the clock, and the last block contains values determined by repeated measurements:
- “period” is the tick period reported by each clock, in nanoseconds.
- “unit” is the unit used by clock values, also in nanoseconds.
- “steady” indicates whether the time between ticks is always constant for the given clock.
- “time/iter, no clock” is the time per loop iteration for the measurement loop without the actual measurement. It’s just a reference value to better judge the overhead of the clock measurements.
- “time/iter, clock” is the average time per iteration, with clock measurement.
- “min time delta” is the minimum difference between two consecutive, non-identical time measurements.
On Linux with GCC 4.8.1, all clocks report a tick period of 1 nanosecond. There isn’t really a reason to doubt that, and it’s obviously a great granularity. However, the drawback is that it takes around 120 nanoseconds on average to get a clock measurement. This would be understandable for the system clock, but seems excessive in the other cases, and could cause significant perturbation when trying to measure/instrument small code areas.
On Windows with VS12, a clock period of 100 nanoseconds is reported, but the actual measured tick period is a whopping 1000000 ns (1 millisecond). That is obviously unusable for many of the kind of use cases that would call for a “high resolution clock”. Windows is perfectly capable of supplying a true high resolution clock measurement, so this performance (or lack of it) is quite surprising. On the bright side, a measurement takes just 9 nanoseconds on average.
Clearly, both implementations tested here still have a way to go. If you want to test your own platform(s), here is the very simple program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
#include <chrono> #include <iostream> #include <vector> #include <algorithm> #include <numeric> using namespace std; template<typename C> void print_clock_info(const char* name, const C& c) { typename C::duration unit(1); typedef typename C::period period; cout << "Clock info for " << name << ":\n" << "period: " << period::num*1000000000ull / period::den << " ns \n" << "unit: " << chrono::duration_cast<chrono::nanoseconds>(unit).count() << " ns \n" << "Steady: " << (c.is_steady?"true":"false") << "\n\n"; } int main(int argc, char** argv) { chrono::high_resolution_clock highc; chrono::steady_clock steadyc; chrono::system_clock sysc; print_clock_info("High Resolution Clock", highc); print_clock_info("Steady Clock", steadyc); print_clock_info("System Clock", sysc); const long long iters = 10000000; vector<long long> vec(iters); auto ref_start = highc.now(); for(int i=0; i<iters; ++i) { vec[i] = i; } cout << "Time/iter, no clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-ref_start).count()/iters << " ns\n"; auto start = highc.now(); for(int i=0; i<iters; ++i) { auto time = chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count(); vec[i] = time; } cout << "Time/iter, clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count()/iters << " ns\n"; auto end = unique(vec.begin(), vec.end()); adjacent_difference(vec.begin(), end, vec.begin()); auto min = *min_element(vec.begin()+1, end); cout << "Min time delta: " << min << " ns\n"; } |
For W8 the correct syntax seems to be different and above command fails. The “bcdedit /set {default} USEPLATFORMCLOCK on” seems to work.
I did some tests with this issue on Win32 as we have silimar problems in one of our applications and I found a partial solution.
It seems if we are talking about systems never than XP you can use a trick that enables (or rather forces) system to use HPET. Need to use cmd in Administrator mode and command:
bcdedit /set useplatformclock true
And reboot a system after that. Running a few tests shows huge difference in timer performance after this, but it makes me wonder why HPET usage is not enabled by default ? I understand it can cause problems with older systems without HPET or some incompatible ones. Maybe this results in more problems which Im not aware at this moment.
That’s very interesting, I’ll try that the next time I reboot (which could be a while). Thanks for sharing.
I tried putting it into MinGW, on Windows (it uses GCC 4.7.2)
It compiles but the results aren’t great, I guess:
Clock info for High Resolution Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Clock info for Steady Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Clock info for System Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Time/iter, no clock: 6 ns
Time/iter, clock: 155 ns
Min time delta: 1000000 ns
and with O3 optimization:
Clock info for High Resolution Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Clock info for Steady Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Clock info for System Clock:
period: 1000 ns
unit: 1000 ns
Steady: false
Time/iter, no clock: 3 ns
Time/iter, clock: 99 ns
Min time delta: 1000000 ns
It would appear that they are using the same underlying timing mechanism as the Visual Studio version does, but lying less about the period. The steady clock not being steady is simply a violation of the standard though. Thanks for reporting the results.
I read that GCC has an –enable-libstdcxx-time=rt option when building which should generate a better implementation, I’ll have a look at that soon.
I’d be really interested about the results of the Clang standard library on OSX, but I don’t have access to the OS.