NAIUA Small Logo NAIUA Reading Room Graphic
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A Procedure to Identify and Fix Longrunning Queries

A "longrunning query" is one that takes too long to run. So how long is "too long"? The amount of time that equates to "too long" is dependent on your specific environment. In my experience, longruns tend to fall into one of three categories:

This paper describes the method for detecting, analyzing, and fixing longrunning queries that I currently use. The advantage to this procedure is that with the proper scripting longrunning query detection can be automated. Additionally, problem queries are automatically e-mailed or written to a file to be addressed when the DBA has the time. While this is not a completely proactive approach it does provide enough information to identify and address problems before they get too bad. This article also includes information on what to look for once you have collected your longrunning queries, how to go about fixing the problems, and what to test. Hopefully this method will be as effective for you as it has been for me.

The System.

In order to improve query performance you must first identify the longruns. This can be performed using the Ingres utilities "iinamu" and "iimonitor". For example:

Ideally, this process can be automated through the use of a script that generates e-mail to the DBA group (and/or writes to a file) as soon as a query is seen to pass over a given threshold. The script should be coded in such a way that the threshold value can be easily adjusted. This way, as you fix problem queries, you can force the acceptable threshold down to detect the next set of problems.

QEP Review.

This document assumes that the reader has a basic understanding of Ingres query execution plans, or QEP’s.  This understanding should include knowing  how to generate and interpret the QEP. For those of you requiring a "brush up" on the process, I recommend the Ingres "DBA’s Guide", or the Ingres tech support document "US: 38697, EN".

Here are some of the basic tips for reading QEP’s.

What To Look For.

Once you have built your "hit list" of queries, you need to determine which ones you are going to spend your time on. The process of generating and analyzing a query can take from 15 minutes, to several hours, to several days depending on the complexity of the query, the amount of time it takes to run, and your familiarity with the data and the database. A common misconception is that you need to be intimately familiar with the environment prior to recommending changes. Not so! Knowing how to interpret a QEP and how to retrieve table and index information from Ingres are enough to solve most problems.

In order to start prioritizing the queries, my first step is to look at the frequency that each query appears.  Queries that are run more frequently will generally be addressed first.  Next, place all of the queries into a script along with the commands "set qep; set optmizeonly;" at the top and let it run against the production database. Since you are just generating qep’s and not running the queries, there should be no ill effects on the production machine. Remember that the QEP estimates are just that, estimates.  If a query is in your longrun list but generates a QEP that looks fine you will need to do further analysis.  That analysis includes the use of the Ingres trace point "QE90" (more information on this is provided below).  I also recommend assigning each of the queries a number at this point as it makes tracking these queries easier.

One thing that you want to look for at this point is patterns in the queries. For example, is there one table (or group of tables) that is involved in several problem queries? If so, there is a possibility that a change you make to one of these tables could fix the other queries. (Of course, it could also make them much worse.) Once you start capturing queries from the servers on a daily basis, it is possible to determine which queries are occurring with the most frequency. These make it to the top of my list, as does anything where I feel that I can make a "quick fix" (Problems solved here are usually the result of developer error – disjoint queries, use of functions on key columns in the where clause, etc.)

Now that you have identified which queries you want to focus on, you should get table information "help table xxxx" for each table involved in the query. If your site uses indexes that carry data in the indexes as non-key data, you will need to do a "help index xxxx" to see what data columns are in that index. Once you have that information handy, you can now start looking for the following:

How to Fix It.

There are many things you can do to improve a problem query. Unfortunately, there are tradeoffs involved in virtually all of them. What you need to do is to balance the improvements to the query in question versus the degradation that the change may cause in other areas. (This is usually referred to as an "impact analysis" of the change.)

Other possible solutions:

Testing Your Changes.

Once I begin testing changes intended to improve the performance of  longruns I print (or save to a file) a copy of the QEP from each change made. I  annotate the QEP to indicate what the change was, why I made it, and whether this change was helpful or not. This way I am able to go back and recreate any test situation that I encountered. Additionally, it keeps me from duplicating the same test twice. When performing this type of testing, it is important only to change one thing at a time. My personal preference is to make a change, test the change, and then determine if it worked or not. If it worked, I document it and move on to the next change. If the change does not result in the desired effect  I document this fact (including the relevant metrics) and back the change out.

A helpful tracepoint to use when working with QEP’s is "QE90". Setting this trace point causes Ingres to print actual disk and cpu statistics for the current query. This enables you to see how accurately the optimizer in estimating the disk requirement and row counts for a query. (If these number are way off, it usually points to a problem with statistics.) Usually, I will use "QE90" as part of my final testing, unless I suspect that there is a problem with the statistics during my initial QEP analysis.

An additional part of the final testing (prior to production implementation) should involve testing any other queries that use the table(s) being modified. You should test on a representative sample of data. If you have a 6 GB database and your test system is only 2 GB you are not going to be able to effectively test performance (unless it contains full copies of the tables being used). Performance problems usually do not follow a straight-line progression. (e.g., if a query runs in 20 minutes on the 2 GB database, it does not follow that it will run in 60 minutes on the 6 GB database). Additionally, when you test you should vary the selection criteria in the where clause of the query you are testing. This will cause the optimizer to read different cells in the histogram for the table in question, and may help to point out problems with statistics.

Sometimes what looks to be a problem query may in actuality be a contention problem. If a query is waiting on a lock, it will appear in the server each time you look in iimonitor until the lock is released or the query times out. The easiest way to identify these problems is to run the query (and observe it through IPM) prior to making any changes. Make sure that you are fixing the real problem.

Don’t be discouraged if your changes don’t work! There are times when you will spend all day working on a query only to determine that it’s inherently a bad query and there is nothing that you can do to fix it. Conversely, it is possible to look at a problem query and make a quick change that reduces the run time from 20 hours to 2 minutes. The more you practice this process the easier it becomes!

Feel free to send me an e-mail (mailto:JSchmidt@Comp-Soln.com) if you have any questions or comments about this article.

Jason Schmidt is a staff Consultant working in Akron, OH, and is also currently the NAIUA Product Directions Chairperson.

Copyright Ó 1999 Comprehensive Consulting Solutions, Inc., All Rights Reserved

Comprehensive Consulting Solutions, Inc.
P.O. Box 511276
New Berlin, WI 53151
U.S.A.

Phone: (262) 544-9954
FAX: (262) 544-1236

All product names referenced herein are trademarks of their respective companies

This document is provided for informational purposes only, and the information herein is subject to change without notice. Please report any errors herein to Comprehensive Consulting Solutions. Comprehensive Consulting Solutions, Inc. does not provide any warranties covering and specifically disclaims any liability in connection with this document.

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Home Search  Contact Us Members Join About...
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The NAIUA Web Site is hosted by planetingres.org