As a database developer/administrator,we will often come across queries that take long time to execute. Here is what I would do when I am told to optimize a slow running query:
Analyze the query execution plan
First step in tuning the performance of a badly performing query is to take a look at the query execution plan. If an index is used to retrieve the result set, you will see Index Seek; if an index is not used, you will see Table Scan for a heap or a Clustered Index Scan in the case of an index-based table. Try adding appropriate indexes to the table being scanned, if you are not much familiar with the data model and the business needs, leverage Database Tuning adviser for appropriate index recommendations. However, if indexes already exist on the tables, try to find out why the indexes are not being used. If you see Clustered Index Scan, try to replace it with Index Seek by creating index on the most restrictive condition used in the query. Also, make sure that the indexes are not fragmented and the statistics on the tables are up to date. If the indexes are heavily fragmented or if statistics on the tables are not up to date, the optimizer will ignore the existing indexes and will likely generate a less efficient execution plan. You can find out index fragmentation using the sys.dm_db_index_physical_stats Dynamic Management Function(DMF) and to analyze statistics on indexes, you can use DBCC SHOW_STATISTICS. If there are key lookup(also known as bookmark lookup) operations being performed, see if you can use a covering index to avoid key lookups. Key lookup requires access to both index pages and data pages and is a very expensive operation which might cause performance degradation, especially in the case of a large table.
Analyze and tune the query
Once you are done analyzing the query execution plan, the next step step will be to optimize the query itself. Try to rewrite the query in another way if you think it will help improve the performance. Here are some of the general guidelines that I follow:
- 1. Operate on small result sets – Don’t use “SELECT *”, instead limit the number of columns by including only those columns that are required. Also, try to use a highly restrictive condition in the WHERE clause to only include the required data. In short, retrieve only the rows and columns that are needed.
- 2. Avoid cursors – Avoid using cursors if you can, and try to use a Temp table with identity column to implement looping mechanism. I always create a temp table with an identity column and use a while loop to iterate over data sets.
- 3. Avoid using arithmetic operators or functions on WHERE clause column – Using an arithmetic operator or functions on a WHERE clause column prevents the optimizer from choosing the index on the column. Also, try to avoid using exclusion conditions(example !=, NOT EXISTS, NOT IN, OR) in the WHERE clause. Exclusion conditions can force the optimizer to
ignore indexes on the columns in the WHERE clause.
- 4. Fully qualify Database objects – Always fully qualify database objects with the owner. This will reduce the overhead of name resolution and might also avoid execution plan recompiles.
- 5. Avoid implicit datatype conversions in the query – Implicit conversions can prevent the indexes from being used by the optimizer and will also add overhead by costing extra CPU cycles for datatype conversions.
- 6. Don’t prefix stored procedure name with sp – Many Developers are used to prefixing stored procedure names with sp_. If a stored procedure having an sp_ prefix is executed, SQL Server always looks in the master database first to find the stored procedure. Also, let’s say you have a stored procedure named sp_Test, if Microsoft decides to use this name, all the references to this stored procedure will break, so never begin the name of a SP with sp_.
- 7. Use SET NOCOUNT – SQL Server sends messages(count of the number of rows affected) to the client after each T-SQL statement is executed. If you are using stored procedure, there is no need pass this information and using this option will turn off the messages that are sent back to the client. Though this is not a huge thing, it is definitely something to consider