So I'm writing some code that needs to cache things in a map. The key in the map corresponds to a unique sender. I want my senders to be able to get guaranteed-unique IDs without any server, peer-to-peer or static config; what they have is a string and an int that in combination, is guaranteed unique.
But I think that's going to suck as a map key. Strings need memory accesses and lots of them. Ints can live in a register and compare in a single cycle.
I tried really hard to come up with a way of making them both merge into an int, without any luck. Mr. Boss Man suggested I should try actually measuring the difference between different types as map keys.
Here's the core of the test code I wrote:
You call this with a number of map items, number of lookups, and a pointer to a function that gets you a randomly generated key. (Random number generator collisions are probably an issue here but I chose to not bother with that.) It then fills up a vector with random keys, stuffs them into a map, and then looks them up. The times for filling the two collections are just data porn, the last one is what matters.
I then ran it with five different key types:
int
std::string
std::pair<int, int>
std::pair<int, std::string>
std::pair<std::string, int>
Here's the results in Excel-o-vision, for varying map sizes and averaged across 10,000,000 lookups:
They're roughly linear in log(n) as you might expect. Here they are as relative slowness compared to int
:
So, for small maps (as I'm expecting in my case):
string
is about 6x worse than int
pair<int, int>
is on a par with int
int
and string
, put the int
first (as it allows early exit from the comparison operator)
Update: These tests will of course have benefited hugely from the CPU caches. In real life, that doesn't always happen. So I added some code to nuke the CPU cache (just a large memcpy()
and being careful to exclude it from the timings) and got these:
A bit uglier, obviously a lot slower... the "cost" is less extreme, but it still produces similar conclusions, I think. Lookup is going to be on the microsecond level with strings and half with ints.