Back in May 2006, I needed to performance-tune a web application built on Plone. The symptom was an unacceptable 60 seconds delay upon creating a new user account on top of the already existing 10000 accounts. The source of the delay was very soon traced back by Zope’s built-in call profiler to GroupUserFolder, where the solution was instantly obvious.
The original method was checking for existing users by using linear list lookup, which is known to be algorithmically slower than dictionary key lookup. Knowing that, the performance patch was a no-brainer, and, as expected, it eliminated the reported delay. I’ve sent this patch to ingeniweb (the author of GroupUserFolder) in an email on 22 May 2006, but haven’t heard back from them since. Yet, the patch has been accepted, even got refined a bit and was applied on the code trunk on 13 Jun 2006. From the one side, I’m happy to see it at least being fixed, while on the other side, I could have been more happy receiving a feedback on my report.
Nevertheless, I was still wondering how exactly the two lookup methods compare, so I’ve put together a python script to check how much we can actually gain on using haystack_dict.has_key(needle) instead of needle in haystack_list. I’ve compared them with small to larger sample sizes in small to larger sets (and lists) of samples.
I’ve used TOPCAT, an excellent, yet easy to use visualization tool for tabular data, to graphically represent the results.

As you can see, even at the small sets of small samples region, has_key performs much faster, and with the growing set and sample sizes, it keeps performing equally well, while list lookup goes wild in the same time.
Can’t say I was surprised or learned anything by doing this, but thought I’d mention as it might turn out to be useful for someone out there, so let me know if you’ve found this useful.


Recent Comments