I’m a pretty big proponent of C++ as a language, and particularly enthused about C++11 and how that makes it even better. However, sadly reality still lags a bit behind specification in many areas.
One thing that was always troublesome in C++, particularly in high performance or realtime programming, was that there was no standard, platform independent way of getting a high performance timer. If you wanted cross-platform compatibility and a small timing period, you had to go with some external library, go OpenMP or roll your own on each supported platform.
In C++11, the chrono namespace was introduced. It, at least in theory, provides everything you always wanted in terms of timing, right there in the standard library. Three different types of clocks are offered for different use cases: system_clock , steady_clock and high_resolution_clock.
Yesterday I wrote a small program to query and test these clocks in practice on different platforms. Here are the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
============================================ Linux, GCC 4.8.1 -------------------------------------------- Clock info for High Resolution Clock: period: 1 ns unit: 1 ns Steady: false Clock info for Steady Clock: period: 1 ns unit: 1 ns Steady: true Clock info for System Clock: period: 1 ns unit: 1 ns Steady: false Time/iter, no clock: 1 ns Time/iter, clock: 120 ns Min time delta: 110 ns ============================================ Windows, Visual Studio 2012 -------------------------------------------- Clock info for High Resolution Clock: period: 100 ns unit: 100 ns Steady: false Clock info for Steady Clock: period: 100 ns unit: 100 ns Steady: true Clock info for System Clock: period: 100 ns unit: 100 ns Steady: false Time/iter, no clock: 2 ns Time/iter, clock: 9 ns Min time delta: 1000000 ns |
So, sadly everything is not as great as it could be, yet. For each platform, the first three blocks are the values reported for the clock, and the last block contains values determined by repeated measurements:
- “period” is the tick period reported by each clock, in nanoseconds.
- “unit” is the unit used by clock values, also in nanoseconds.
- “steady” indicates whether the time between ticks is always constant for the given clock.
- “time/iter, no clock” is the time per loop iteration for the measurement loop without the actual measurement. It’s just a reference value to better judge the overhead of the clock measurements.
- “time/iter, clock” is the average time per iteration, with clock measurement.
- “min time delta” is the minimum difference between two consecutive, non-identical time measurements.
On Linux with GCC 4.8.1, all clocks report a tick period of 1 nanosecond. There isn’t really a reason to doubt that, and it’s obviously a great granularity. However, the drawback is that it takes around 120 nanoseconds on average to get a clock measurement. This would be understandable for the system clock, but seems excessive in the other cases, and could cause significant perturbation when trying to measure/instrument small code areas.
On Windows with VS12, a clock period of 100 nanoseconds is reported, but the actual measured tick period is a whopping 1000000 ns (1 millisecond). That is obviously unusable for many of the kind of use cases that would call for a “high resolution clock”. Windows is perfectly capable of supplying a true high resolution clock measurement, so this performance (or lack of it) is quite surprising. On the bright side, a measurement takes just 9 nanoseconds on average.
Clearly, both implementations tested here still have a way to go. If you want to test your own platform(s), here is the very simple program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
#include <chrono> #include <iostream> #include <vector> #include <algorithm> #include <numeric> using namespace std; template<typename C> void print_clock_info(const char* name, const C& c) { typename C::duration unit(1); typedef typename C::period period; cout << "Clock info for " << name << ":\n" << "period: " << period::num*1000000000ull / period::den << " ns \n" << "unit: " << chrono::duration_cast<chrono::nanoseconds>(unit).count() << " ns \n" << "Steady: " << (c.is_steady?"true":"false") << "\n\n"; } int main(int argc, char** argv) { chrono::high_resolution_clock highc; chrono::steady_clock steadyc; chrono::system_clock sysc; print_clock_info("High Resolution Clock", highc); print_clock_info("Steady Clock", steadyc); print_clock_info("System Clock", sysc); const long long iters = 10000000; vector<long long> vec(iters); auto ref_start = highc.now(); for(int i=0; i<iters; ++i) { vec[i] = i; } cout << "Time/iter, no clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-ref_start).count()/iters << " ns\n"; auto start = highc.now(); for(int i=0; i<iters; ++i) { auto time = chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count(); vec[i] = time; } cout << "Time/iter, clock: " << chrono::duration_cast<chrono::nanoseconds>(highc.now()-start).count()/iters << " ns\n"; auto end = unique(vec.begin(), vec.end()); adjacent_difference(vec.begin(), end, vec.begin()); auto min = *min_element(vec.begin()+1, end); cout << "Min time delta: " << min << " ns\n"; } |