« Introducing ClickHeat for ColdFusion, a clicks heatmapping application | Main | "How do you pronounce URL?" Techrepublic, who cares?!? »

Yet another lesson on database indexing... (and solving "null null" and that-query-doesn't-exist errors)

Boy, am I in a much better mood today than in the last two days.

I had come into the office over the weekend to re-install ColdFusion MX 7 on our production server, thinking that switching to single-server mode from multiple-server mode might solve some of the strange, unpredictable errors that we were seeing on our live site. The re-install, which we had practiced on our dev server first, went without a hitch and the site seemed to perform quite well. But as Saturday progressed to Sunday, the site saw even more errors than before. The same happened as the week began on Monday, with our site becoming increasingly slow and unstable.

The errors that we saw varied, and happened in all different parts of the site, but had one major theme in common: they all had to do with queries. For instance, we saw errors saying that "qQuiz" wasn't defined when we tried to loop over a query of that name-- but the cfquery tag for qQuiz, which executed before the loop, didn't throw an error. We also saw errors stating that queries were timing out, and even the infamous "null null" errors, which you see if you're using cferror templates to provide nicely-formatted errors for your users. Our pages, especially the queries themselves, were taking an amazingly long time to execute-- anywhere from 2 to 15 seconds per query. Try as a might, I couldn't get a handle on what the problem was...

I tried everything I could to figure out what was going on... I tried caching more queries. I tried caching fewer queries. I monitored the performance of our web server and database server, looking at CPU utilization, RAM utilization, ColdFusion logs, and the MS-SQL query profiler. We have another separate web server (with different web applications running on it) and it didn't look to be nearly as slow as our production server was. I tried updating our ODBC drivers and JDBC drivers, but didn't manage to make a dent in the execution time of the site. I told my boss that I was ready to give up and that we should consider bringing in an expert consultant to troubleshoot the issue since it was beyond me. Goodness knows how much that would have cost us, but it would have been worth it to have the site back up and running normally.

So this morning, before contacting a consultant, I thought I would take one last look at the database performance on the site with SQL Profiler. I started watching through a trace of the queries going through the database server to see if the latency was occurring on the database, which would indicate a slow db server, or just on the web server, which might indicate network problems or issues with our ODBC drivers. It turned out that the bottleneck did exist on the database, so I started trying to identify which queries were taking the longest. I did notice that there were some really simple inserts, albeit on huge tables with a million or two records, which were taking a long, long time (over 10 seconds) with a huge number of disk reads (thousands) to process. What gives, I thought?

So I looked at the corresponding SQL I was running in my code. It was a simple SELECT on one ID column given a second known ID column. Even if it was on a table with a million records, it shouldn't have taken 10 seconds. And then it hit me.

Indexes. Did I have an index on the second ID that was in my WHERE clause?

Of course not.

Now I wasn't at all sure that adding an index on this one column would fix all of the performance problems for the site, but it was worth a shot. But after the index was created... silence. Not a single error from the site. Just the smooth, even hum of a web site performing its duties with quiet pride.

I don't know whether I'm satisfied with myself to have found the source of the problem after two days of frantic investigation, or whether I'm embarrassed that the cause was a database administration technique that I had blogged about a few months before.

Post a comment


Type the characters you see in the picture above.