The dreaded spinning mouse pointer and the progress bar moving at a snails pace, are frustrating for both website viewers and those responsible for maintaining a site. Many times a site will have run just great on a development server, but fail to scale appropriately in production. If the site’s problems could be database related, it may be time to check up on what exactly is happening on that back end.
There are several situations in which mysql slowness can originate. The first possibility, and the easiest one to check, is if the database contains slow queries. What are slow queries? To answer that, we first need to know what indexes are and how they are used.
What is an index?
An index is a sorted set of data that references the primary key of a table, similar to a phone book having a list of last names which make it easy to find a phone number. An index enables a query to very quickly find specific values in a column, for fast data retrieval from a table. In the absence of an index, a query would need to scan the entire table to check for any matches in the column. The larger the table without a proper index, the more of a performance hit a query could experience. Indexes are most helpful for direct comparisons, ranges, or where searching the beginning of a column’s value. Indexes are least useful when a query returns a large amount of the data in the table (as it is faster to read the whole table than perform large of disk seeks). They also don’t help at all for full text searching using LIKE comparisons.
So why not just add an index to every column? It might allow queries that retrieve data to be faster on all columns, however any queries that add, change or delete data would then have to update all of the indexes on the affected columns, slowing them down. Because of this it is important to be strategic about your use of indexes. Multiple column indexes help most where there are several WHERE direct comparisons in a query, but limiting the total number of indexes when possible is preferred.
What causes slow queries?
When a query cannot make use of an index, the MySQL server has to use more resources and time to process that query. If you are developing the software yourself, you may wish to check the queries against a development database with sample data as you are developing. To see interactions between queries and indexes, prefix the query you wish to check with the “explain” or “desc” keyword in MySQL shell or anywhere SQL can be directly entered. For example, I have a large table here with no index, and the output is:
The output is very detailed, but for a simple query the most important columns are possible_keys, key, and rows. In this case those columns tell us the query has no keys (indexes) it can use, isn’t using a key (index), and has to process 1.2 million records. More on how to fix this later.
If the software you are using is closed source, or you wish to generally see what is slow, then checking the MySQL slow log is a more direct way to see what the MySQL server is seeing. At DreamHost, all MySQL servers are configured to enable logging of slow queries. Getting the log depends on your hosting solution:
Shared – Contact support and we can get you your queries out of the log file. As this file contains queries from all customers on the service, we cannot provide the raw log file.
MySQL VPS – Contact support and we can copy you your entire slow log file.
Dedicated – Contact support and we can copy your entire slow log file, or if you have an admin user, the log is available at /dh/mysql/logs/mysql.mysql.slow.
Most of the queries that are logged are uninteresting as they include: queries processing a minuscule amount of data, queries processing data very quickly, or queries that are just not worth the effort to index. Some queries in the log may be interesting through, and there are various utilities to process the information found in the log.
The first tool is most useful for identifying individual queries that are slow, and it is called mysql_slow_log_filter. The code is publicly available at:
http://www.mysqlperformanceblog.com/files/utils/mysql_slow_log_filter
With this script, you can filter the log for queries taking more than X seconds with the -T flag, or X rows via the -R flag. In this way the utility can identify those queries that are slowest or that processed the greatest amount of data.
The other utility is called mysqldumpslow, and is part of the MySQL server source code package. This is installed on all DreamHost webservers by default. The manual page is located here:
http://dev.mysql.com/doc/refman/5.1/en/mysqldumpslow.html
It does something quite different from the previous script, in that it processes the entire slow log stripping out number and string data values in the queries themselves. This allows it to group together a query that has been run multiple times with different data values. From this information it will return the total number of times the query ran and cumulative run time. This utility can help you identify a query that may seem to be fast at first glance, but that is running so often it is still causing slowness. Here is an example of this command, returning queries ran by a pixelpost database.
Once you have found a slow query, from either of these scripts, the query can be ran through explain or desc to get the details of the query.
Fixing Slow Queries
In this section I will be detailing some examples of slow queries and how they can be fixed. For my examples, I will be using a table with 3 columns: “machine”, “service” and “dbname”, which use varying sizes of datatypes from small to large, and have no indexes. The table contains over 1.2 million records of example data.
Example 1
The output below examines a query using the explain keyword to determine if there is a index being used to speed up processing. The query uses the “machine” column in the WHERE clause, to try to find a match to the word “whisk”. The column “key” with a value of NULL indicates that there is not a index currently in use.
Depending on the datatype of the column, adding an index to fix this query can be attempted as a first step. For datatypes such as: int, double, smaller (<50) varchar, etc, you can index them as-is. In the case of the machine column the datatype varchar(25) is a good candidate for indexing. The text below shows the index being added.
The operation above indexed the entire 25 characters of the column. Using the explain keyword again returns the information that “machine” now has possible_keys including one named machine, is using the the key machine as its index, and is now only processing 1 row to run the query.
Example 2
For other datatypes, such as text, longtext, blog, or larger varchar columns, a prefix length will need to be provided. Indexing the full size of a large datatype column is not beneficial for searching, is slow to perform, and is not allowed by MySQL for some datatypes. It is best to examine some of your data to see how much of the beginning of a column needs to be indexed in order to reach a reasonably unique amount of data. For instance if a column contains first names, you would want to make the index long enough so that the index would see the difference between “Jack” and “Jackson”, so making it around 10 characters would achieve this. Occasionally an index will need to cover more or less data to get to a reasonable amount of uniqueness, which can be found out by examining the data, an educated guess, or trial and error. Once you know that info, you can index with the prefix length like so:
Once this is added, a search on a large varchar or text or other similar column will be much improved:
More Complex Queries
This is a pretty basic example, and queries can get enormous and complex. Using “explain” or “desc” to get some details about it can help, but fixing them may need the help of an expert or additional research. The MySQL documentation has some additional resources that can help located here:
http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.htm
What to Avoid
Here are a few examples of some practices to avoid.
Example 1
I am not a fan of “order by rand()”. It makes it easy to push off this work to the MySQL server, but the method it uses to get the result is often a lot more work than expected. ”Order by rand() limit 1″ is the equivalent of having a wall full of bookshelves, dumping all of the books onto the floor, putting them back in a random order, and then going to the first bookshelf, top shelf, and picking the left-most book. That first book is certainly a “random” book, but look at all the work you had to go through to get it. In this case there wasn’t a reason to spend the processing power to randomize all the records. Consider instead generating a random number in your website code and pulling that record number out of the table, or using an indexed column full of random numbers, which could then get you a random record by doing a greater than comparison.
Example 2
Another example, is using mysql_num_rows() when all you need is a row count. It may seem like this wouldn’t be that big of a deal:
But on the back end, what that ends up doing is sending the entire contents of the “largetable” table over to your software, so it can iterate through the records to get a count. If this table is 100MB then sending that data over the network can have a performance impact.
Consider instead letting the MySQL server do the work for you by making use of the count() function in SQL. Here the MySQL server counts the records, and sends back just the result to the webserver.
Get out there and fix some queries!
With these basics, you can begin to check on possible causes of slowness for your site. Slow queries happen, especially as data contained in a database or table grows and inefficiencies begin to be amplified. If you can get your queries optimized and needed indexes added, the spinning mount pointer and the sluggish page-load progress bar will be a less frequent occurrence.