PHP Advent Calendar Day 23

23 Dec 2007

Today's entry is provided by Jay Pipes.

Jay Pipes

Name
Jay Pipes
Blog
jpipes.com
Biography
Jay Pipes is the North American Community Relations Manager at MySQL. Coauthor of Pro MySQL (Apress, 2005), Jay regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON, SCALE, and Ohio LinuxFest, amongst others. He lives in Columbus, Ohio, with his wife, Julie, and his four animals. In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
Location
Columbus, Ohio

Recently, I've been busy getting the program for the MySQL Conference and Expo finalized (it's a big job!), and I wanted to take some time off from the incessant stream of emails to speakers and sponsors to send a little gift to the blogosphere. My gift for the PHP Advent Calendar is two completely random tips for you PHP and MySQL developers out there trying to squeeze performance out of your schemata and code. So, without further ado, here's my two random MySQL holiday tips.

Storing and Querying IPv4 Addresses

Many people aren't aware that any IPv4 address, commonly written using the dotted quad notation, are actually unsigned 32-bit integers. People are used to seeing the dotted quad notation, which contains 4 separate integers with values from 0 to 255 separated by dots. Together, the integers represent the class of the network within which the host machine is located. For instance, on my laptop, sitting here at home on my local (class C) network, I see that my IP address is 192.168.0.2.

When application developers want to store an IP address in a database, I typically see a column definition such as the following:

  1. CREATE TABLE users (
  2.     user_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  3.     /* More Columns */
  4.     ip_address CHAR(15) NOT NULL,
  5.     INDEX (ip_address)
  6. );

This makes sense, since the total amount of space possibly needed by a dotted quad notation is 15 characters (3 dots and 4 integers with a max of 3 characters per integer). However, the dotted quad notation is merely a textual representation of an unsigned integer that makes it easier for us to remember the IP address. When we store data in our schemata, though, we want to squeeze as many records into a single block of memory (or disk) as we possibly can. An INT UNSIGNED data type in MySQL needs 4 total bytes of storage, whereas the CHAR(15) needs 15 bytes of storage. If we store IPv4 addresses as unsigned integers, we can store four times as many records in an index block.

MySQL comes with two functions that translate between dotted quad and the internal unsigned integer representation of an IP address. The two functions are INET_ATON() and INET_NTOA(). The former takes the dotted quad notation and converts it into an unsigned integer. The latter does the reverse.

Using these two functions, you can both store and retrieve IPv4 addresses easily. To store, do the following:

  1. INSERT
  2. INTO users (user_id, ip_address)
  3. VALUES (NULL, INET_ATON('192.168.0.2'));

Now the IP address is stored as an unsigned integer. SELECTing from the table shows us this:

  1. mysql> SELECT *
  2.     -> FROM users;
  3. +---------+------------+
  4. | user_id | ip_address |
  5. +---------+------------+
  6. | 1 | 3232235522 |
  7. +---------+------------+
  8. 1 row in set (0.00 sec)

3232235522 isn't exactly friendly to the eyes. To convert, we use the INET_NTOA() function:

  1. mysql> SELECT user_id, INET_NTOA(ip_address) as ip_address
  2.     -> FROM users;
  3. +---------+-------------+
  4. | user_id | ip_address |
  5. +---------+-------------+
  6. | 1 | 192.168.0.2 |
  7. +---------+-------------+
  8. 1 row in set (0.00 sec)

To retrieve a range of users that have IP addresses in, my local network for instance, I can use:

  1. mysql> SELECT *
  2.     -> FROM users
  3.     -> WHERE ip_address
  4.     -> BETWEEN INET_ATON('192.168.0.1') AND INET_ATON('192.168.0.255');
  5. +---------+------------+
  6. | user_id | ip_address |
  7. +---------+------------+
  8. | 1 | 3232235522 |
  9. +---------+------------+
  10. 1 row in set (0.00 sec)

Tobias Asplundh and I did some benchmarks of storing IPv4 addresses as unsigned integers instead of CHAR(15) for last year's MySQL conference and found about an 8% performance difference searching small to medium-sized ranges on just one million records. So, it's worth the small effort to streamline your schemata and use the appropriate data types for IPv4 addresses.

The Worst-Named MySQL Status Variable Ever

Anyone who's ever seen me speak at conferences on performance tuning MySQL knows that I am fairly glib about my pet peeves with MySQL (and other things!). My biggest pet peeve used to be the old configuration variable log_long_format, which actually meant log any query not using indexes to the slow query log. Luckily, this configuration variable was renamed in MySQL 5.0.12 to log_queries_not_using_indexes. (Imagine that.)

Luckily for log_long_format, a new variable has taken its place on my pet peeves list: the status variable Qcache_free_blocks.

Before I get into why this little monster of a status variable is so, well, monstrous, here's a little background on the MySQL query cache.

Introduced way back in MySQL 4.0.1, the query cache stores a hash of the SELECT query issued against the server and the actual result set of data returned by the SELECT query. Barring any modifications to the underlying tables, a subsequent request for the exact same SELECT statement will not need to be optimized, analyzed, or even hit the storage engine layer. Instead, the query cache will simply return the pre-packaged result set directly to the requesting client. The way the query cache is structured is essentially a linked list of these result sets, stored in blocks. When an underlying table is modified, blocks containing queries that reference the modified table are marked as dirty, to be flushed out of the query cache at a later time.

For read-heavy applications, the query cache can be a significant performance boost. More on mixed and write-heavy applications later.

Now back to the little monster status variable.

The status variables which begin with Qcache represent counters that the MySQL query cache keeps in regard to the health and hit ratios of stored MYSQL_RESULTs. So, looking at the following output, what would you expect the Qcache_free_blocks variable to mean?

  1. mysql> SHOW STATUS LIKE 'Qcache%';
  2. +-------------------------+----------+
  3. | Variable_name | Value |
  4. +-------------------------+----------+
  5. | Qcache_free_blocks | 22087 |
  6. | Qcache_free_memory | 64887904 |
  7. | Qcache_hits | 23945162 |
  8. | Qcache_inserts | 8200434 |
  9. | Qcache_lowmem_prunes | 658819 |
  10. | Qcache_not_cached | 8052109 |
  11. | Qcache_queries_in_cache | 34818 |
  12. | Qcache_total_blocks | 91967 |
  13. +-------------------------+----------+
  14. 8 rows in set (0.01 sec)

If I were a betting man — and those of you whom I have played poker with know that I am — I would bet money that the Qcache_free_blocks meant the number of free blocks in the query cache available to store stuff in. Right?

Wrong. The Qcache_free_memory actually does mean what it looks like; it is the amount of bytes available to store results within the query cache. However, Qcache_free_blocks means the number of blocks within the query cache that are fragmented and need to be cleaned up. Read that again to make sure you understand it. Even the MySQL documentation doesn't understand its purpose.

What exactly do the above statistics actually tell you? (By the way, the above is from one of the MySQL.com domain's primary DB servers.)

It tells you:

  1. There are a total of 91,967 blocks in the query cache, of which 22,087 are marked as containing results that are no longer valid. In this case, more than 25% of the blocks contain invalid result sets. (That's bad.)

    How do you relieve the query cache of this fragmentation? You could issue a FLUSH QUERY CACHE and wait a bit. FLUSH TABLES would also do it. The point is to make you aware that the tricky Qcache_free_blocks variable makes it seem as if all is good with the query cache, when in fact, it isn't!

  2. The Qcache_lowmem_prunes counter variable is 685,819. This means that the query cache has had to prune dirty, old, or invalid blocks more than 680 thousand times. Take a look at how long this server has been up and running:

    1. mysql> SHOW STATUS LIKE 'Upt%';
    2. +---------------+---------+
    3. | Variable_name | Value |
    4. +---------------+---------+
    5. | Uptime | 6449178 |
    6. +---------------+---------+
    7. 1 row in set (0.00 sec)

    So, the server has been up and running for 6,449,178 seconds, or about 75 days. If you divide the seconds by the lowmem prunes, you see that the query cache is having prune itself about once every 9 seconds. (That's bad.)

Finally, find out the hit ratio for the MySQL query cache on this server. Only be concerned with the SELECT queries issued against the server, since those are the only ones that can be stored in the query cache. How do you get the number of SELECTs issued?

  1. mysql> SHOW GLOBAL STATUS LIKE 'Questions%';
  2. +---------------+-----------+
  3. | Variable_name | Value |
  4. +---------------+-----------+
  5. | Questions | 271117447 |
  6. +---------------+-----------+
  7. 1 row in set (0.00 sec)

There have been a total of 271,117,447 queries on this server since going online. Compare this number with the Qcache_hits status variable of 23,945,162. So, approximately 1 in 10 queries against the database is handled directly by the query cache. Is this good? Not particularly, but there is a catch. The server I have been showing is a master server, so it's handling the write load of the MySQL.com domain. The slave servers handle much of the read load.

In this scenario, it may be best to simply switch off the query cache on this master server due to the heavy fragmentation and lowmem pruning.

By contrast, the MySQL Forge 2.0 server, which is read-heavy, shows the following information:

  1. mysql> SHOW GLOBAL STATUS LIKE 'Qcache%';
  2. +-------------------------+-------+
  3. | Variable_name | Value |
  4. +-------------------------+-------+
  5. | Qcache_free_blocks | 45 |
  6. | Qcache_free_memory | 34744 |
  7. | Qcache_hits | 79095 |
  8. | Qcache_inserts | 43276 |
  9. | Qcache_lowmem_prunes | 7396 |
  10. | Qcache_not_cached | 20955 |
  11. | Qcache_queries_in_cache | 6539 |
  12. | Qcache_total_blocks | 13517 |
  13. +-------------------------+-------+
  14. 8 rows in set (0.00 sec)
  15.  
  16. mysql> SHOW GLOBAL STATUS LIKE 'Questions%';
  17. +---------------+--------+
  18. | Variable_name | Value |
  19. +---------------+--------+
  20. | Questions | 201050 |
  21. +---------------+--------+
  22. 1 row in set (0.00 sec)

Here you see about 1 in every 3 queries against the database being served directly from the query cache.

The bottom line is this: investigate the query cache status variables to ensure:

Happy holidays, y'all!