Is Your Database Dragging Down your PHP Application?

Here’s a quiz from our last PHP application performance post: How much of a PHP application’s execution time is used up accessing the database? 50%? 75%? You may recall the real answer, since it is so (some might say surprisingly) big: some 90% of PHP execution time happens in the database layer. So it makes sense to look to the database to reap some big performance gains. Let’s take a look at how some of the decisions you make about your database can affect performance.

Is NoSQL a No-Brainer?

Whether it’s MongoDB, Cassandra, or one of the other flavors, developers often turn to NoSQL databases as a solution to database performance. But NoSQL databases aren’t a magic bullet. As with so many application questions, it all depends on the use case. NoSQL databases perform extremely well in certain, but not all, use cases. It depends on whether the dataset for your specific application is better represented by a non-relational model. If it is, then find a NoSQL solution that matches your data and offers the tuning and scalability you need. If not, keep reading.

Your SQL, MySQL, Everybody’s SQL

We’re going to focus this discussion on the MySQL database, since it’s the one most frequently used with PHP. But almost everything we say about it will apply, at least in broad strokes, to other relational databases. We’re not going to dive into database configuration either, as that’s a whole other topic. What follows are some of the top things a PHP developer should be looking at to make sure the database is performing at the highest possible level.

Why Be Normal?

The web is full of discussions about database normalization, but let’s go with the short explanation that these are formalized ways to structure a database to minimize redundancy and therefore keep the database cleaner. But neat and clean do not always equate to fast. There are times when denormalizing — combining tables together, even if it means having duplicate information — can be more efficient for your application and improve performance. It might seem counterintuitive, but sometimes being a little messy is the quickest way to get what you want out of your database.

Let’s look at a customer database application as a simplified example. This application needs to track what company each customer works for. This would typically be stored as two tables with a foreign key constraint between them, such as: 

​CREATE TABLE `company` (
`id`    int(11) AUTO_INCREMENT NOT NULL,
`name`  varchar(512) NULL,
PRIMARY KEY(`id`)
);

CREATE TABLE `customer` (
`id`   int(11) AUTO_INCREMENT NOT NULL,
`fullName`  varchar(512) NOT NULL,
`email`   varchar(512) NULL,
`companyID` int(11) NULL,
PRIMARY KEY(`id`)
);

ALTER TABLE `customer`
ADD CONSTRAINT `fk_customer_company`
FOREIGN KEY(`companyID`)
REFERENCES `company`(`id`);

Searching for customers and including their company name in the results will now require either two queries — not efficient — or a single query with a join, such as:

SELECT cst.*, cpy.name as companyName FROM customer cst LEFT JOIN company cpy ON cst.companyID = cpy.id;

If the application will typically need to have a company name when looking up customers, then it’s obviously inefficient to make separate queries or use database joins each time. The better solution for performance is to combine these two tables into a single table, such as this:

CREATE TABLE customer (
id           int(11) AUTO_INCREMENT NOT NULL,
fullName     varchar(512) NOT NULL,
email        varchar(512) NULL,
companyName  varchar(512) NULL,
PRIMARY KEY(id)
);

This simpler query structure allows for much faster lookups. But it comes at a cost, as now company names are not standardized, and the same company could be entered differently between two customers. If it’s is important to be exact — in a billing application, for example, that needs to know what company to bill — this lack of precision could be catastrophic. But in many cases, “company” would is just another piece of data about the person. If the spelling of company — or any other descriptive information — is not critical, it’s worth it from a performance perspective to combine tables.

To Index Or Not To Index?

Indexes can speed lookups up. But too may indexes can slow things down. So a critical database performance question is whether to index a column or not.

The primary key of the database is already indexed. Database indexes are lookup tables for anything else. A general rule of thumb or starting point is to identify the columns referenced in the “where” clauses of queries, and seriously consider indexing those columns. Again, this is a general rule, and needs to be balanced with how many indexes overall are being created and whether that number will negatively impact performance.

Again using the customer database example, if the application needs to search for a customer by email address, then index the email column of the table by executing the following SQL:

CREATE INDEX idx_customer_email ON customer(email);

This will significantly increase the speed of customer search by email address. Multiple columns can be included in the index as well — so if the application needs to look for customers by name within a particular company, an index like this would speed the search up:

CREATE INDEX idx_customer_dual ON customer(fullName, companyName);

Be Wary Of Your Queries

Queries themselves can be a source of slowdowns, for a variety of reasons. “Query tuning” is the process of finding slow queries and fixing them to make them faster.

What makes a query slower than it should be? Maybe an unnecessary subquery was used, or a “group by” clause is used that can be refactored out. Learning to use the EXPLAIN command can help one understand the performance of queries. The primary goal is to find queries that are slow and rework them, testing the performance at each iteration.

A common query pitfall involves SQL functions. Whenever possible, queries should avoid wrapping a column name inside of a function, within a “where” clause. Here’s an example of how not to look for all records created in the last month:

SELECT * FROM user WHERE NOW() < DATE_ADD(dateCreated, INTERVAL 1 MONTH);

Written this way, MySQL has to perform a full table scan, since it can’t benefit from any index that exists on the dateCreated column. Since the dateCreated column is being used in a function to modify, it is valued on every access point. Now look at this way of re-writing this query:

SELECT * FROM user WHERE dateCreated > DATE_SUB(NOW(), INTERVAL 1 MONTH);

Here, all the columns are moved to one side of the comparison, and all the functions to the other side, so the comparison can use an index now. This way of writing the query delivers the same results, but performs much better than the first version.

Tuning even further, in this specific case performance can be increased a hair more by the removal of the NOW() function from needing to be called, potentially on every comparison unless the database optimizes that out. Instead, the current date can be output via PHP in a query that looks like this:

SELECT * FROM user WHERE dateCreated > DATE_SUB(’2014–10–31’, INTERVAL 1 MONTH);

These examples are a broad look at query tuning, but are a good starting point for looking for performance improvement opportunities.

Cash In On Caching

No discussion on database performance would be complete without looking at caching. Typically, the caching layer that PHP developers are concerned with is the ‘user cache.’ This is a programmable caching layer, usually provided by software such as Memcached, APC/APCu, or Redis.

The common thread with these caching mechanisms is simple, in-memory, key-value lookups, which are much faster than database queries. Therefore one of the most common techniques to speed up a database is to leverage caching lookups. Once the the data is retrieved, it’s stored in the cache for some period. Future queries will retrieve the stored data and not care about it being slightly older or somewhat stale data.

This can be done generically in code via a helper function, such as this one that assumes a SQL query that will only ever return one row: 

function cachedSingleRowQuery($query, Array $bind, $expiry, PDO $db, Memcached $cache) {
    $key = 'SingleQuery' . md5($query) . md5(implode(',', $bind);
    if (!($obj = $cache->get($key))) {
        $result = $db->prepare($query)->execute($bind);
        $obj = $result->fetchObject();
        $cache->set($key, $obj, $expiry);
    }
return $obj;
}

This function checks first to see if there is a cached version of the data that already exists — by making a unique lookup key from a hash of the query — and returns it if it does, or if it doesn’t, executes the query and caches the output for the specified amount of time, so it’ll be there for the next time it’s queried.

Another technique is to use a write-through cache, where the final copy of the data is stored in a database, but once the data is queried it is cached forever. To avoid the problem of stale data, the software proactively updates the cache any time the database layer is updated, or just deletes it, so it will be re-created the next time it’s requested.

All that said, here’s one caching caveat to pay attention to: If you’re using any framework or application layer for the application, don’t reinvent the wheel. Almost all modern frameworks insert a caching layer over the top of the database access layer or the object model automatically, in the way that the framework expects. So you’ll already be getting the benefit of efficient caching.

Add It All Up For The Highest Performance

Even from this brief discussion, you can see that there’s not one thing that makes or breaks your database performance. It’s a number of things that all have to be done well. But as we said at the outset, given that 90% of application execution time is in the database, it’s well worth the time and effort to make sure everything about your database is working in the direction of top performance.

Get a handle on PHP Handlers

5 Secrets to Better PHP Performance