The Benchmarking Trap Crap

Blessed were the days when every time the newest Pentium processor came out it meant a 30% performance increase and people weren’t questioning it. These days it’s like everyone goes trigger-happy and tries to measure infinitessimal differences between different Core7i or Xeon processors as well as their graphics card performance with all sorts of weird benchmarking tools. The latest Cinebench is no exception and the “Mine is bigger…!” race is already on. Naturally, I couldn’t resist commenting on the pointlessness of such stuff and already clashed notably with someone on mograph, so let me reiterate and expand upon some of the points why I think this stuff is useless and meaningless, some of which I already made back then.

  • Synthetic benchmarks such as Cinebench or 3D Mark tell you very little in terms of real everyday performance. They count CPU/ GPU clock cycles, internal loops/ cycles/ ticks, measure transfer bandwidths over certain data buses and what not. Other than producing a fancy bar chart with some abstract point score this usually doesn’t tell you much beyond that computer A is faster than computer B. Naturally, these aforementioned computers could have totally different hardware configurations, one could be older than the other and so on. One would simply expect one of them to be faster no matter what.
  • Assuming the hardware outfit is the same, minor differences in software or configuration could have a huge influence. Differences in drivers, installed apps, screen configuration and auxiliary components like tablets and printers can affect timing of USB buses, screen refresh rates and so on, some of which may cause wait states, delayed screen refresh and an assortment of other oddities which in turn might for instance affect your OpenGL.
  • As indicated by the previous point, your computer doesn’t run on thin air. Let’s be honest: Unless you do some serious digging, do you really know how many processes are running at any given time? I certainly don’t most of the time, yet there’s a lot of things going on on my machine. My network may send out broadcasts looking for other computers in my workgroup and in turn hook up with other shared network resources, my system may get it in its head to check for updates at the most inconvenient time, my desktop apps may decide to check for their own updates, virus and security tools may become active at arbitrary intervals etc.. Normally you wouldn’t care about this stuff, but imagine how it messes with your meticulously counted cycles during your benchmarking test.
  • Even under ideal conditions, your benchmark will only give you a skewed result. At best it represents only a fraction of what is going on on your system and what it is capable of, even more so if you tailored your system to be a “gaming machine”, a “video edit box” or a “3D workstation” – you already have it configured specifically in some fashion, leaving some things out while emphasizing the others.
  • All the same, testing in the context of a specific program is not particularly telling and equally biased. I’ve been thinking about the penultimate After Effects benchmark for years, but it seems impossible. Once you try to figure in specifics like multiprocessing, disk I/O, computationally intensive effects or things like the EXR loader, you quickly arrive at numbers that multiply exponentially. Most people simply wouldn’t run those 120 or so test projects just to find out that their computer has been inadequate from the start.
  • There’s a difference between perceived performance as opposed to real technical performance. A poorly configured and maintained super-expensive high-end computer can still feel laggy and choppy whereas a well-maintained and cared for mid-range machine can feel smooth as butter. Of course what you will perceive as real performance will ultimately depend on what kind of work you do and how your own workflows are geared toward making efficient use of your technical resources. Anyone can make the best of computers bomb out with bad render settings in a 3D program. The trick is not to. Inevitably sometimes you won’t be able to escape some limits – swiveling around a 10 million polygon CAD scene is gonna feel slow at some point. When that happens, ask yourself: Is having just 20fps in the viewport still acceptable or would you throw 1000 bucks out of the window for a new graphics card just to get 30fps?

So what does that all mean? Mostly it’s just food for thought, but as may become clear, any benchmark should be taken with a grain, or even better, a spoonful of salt, and be measured against your own real needs. They are both completely separate things and while boasting about on forums may give you a certain satisfaction for a while, the harsh truth is that your top-performing machine now will be old news in half a year. So for what it’s worth, unless one is actively looking for a new machine or new components, such tests are not particularly relevant in any way. They certainly are not for me. I’m not a performance engineer in some software company nor do I have the money to buy new hardware every month. I just need to get my work done and whether my now 1.5 year old computer takes the lead in any list doesn’t really bake my noodle. Perhaps that will change when it gets “old” in 2 years, but until then I hope it will last me a while. And when it’s time for a replacement I’m positive I will base any buying decision more on info from sites like Tom’s Hardware and advice from my favorite hardware geek Klaus rather than some benchmark…

%d bloggers like this: