In this article by Sohail Salehi, author of the book, Mastering Symfony, we are going to discuss performance improvement using cache. Caching is a vast subject and needs its own book to be covered properly. However, in our Symfony project, we are interested in two types of caches only:
We will see what caching facilities are provided in Symfony by default and how we can use them.
We are going to apply the caching techniques on some methods in our projects and watch the performance improvement.
By the end of this article, you will have a firm understanding about the usage of HTTP cache headers in the application layer and caching libraries.
(For more resources related to this topic, see here.)
Cache is a temporary place that stores contents that can be served faster when they are needed. Considering that we already have a permanent place on disk to store our web contents (templates, codes, and database tables), cache sounds like a duplicate storage.
That is exactly what they are. They are duplicates and we need them because, in return for consuming an extra space to store the same data, they provide a very fast response to some requests. So this is a very good trade-off between storage and performance.
To give you an example about how good this deal can be, consider the following image. On the left side, we have a usual client/server request/response model and let's say the response latency is two seconds and there are only 100 users who hit the same content per hour:
On the right side, however, we have a cache layer that sits between the client and server. What it does basically is receive the same request and pass it to the server. The server sends a response to the cache and, because this response is new to the cache, it will save a copy (duplicate) of the response and then pass it back to the client. The latency is 2 + 0.2 seconds.
However, it doesn't add up, does it? The purpose of using cache was to improve the overall performance and reduce the latency. It has already added more delays to the cycle. With this result, how could it possibly be beneficial? The answer is in the following image:
Now, with the response being cached, imagine the same request comes through. (We have about 100 requests/hour for the same content, remember?) This time, the cache layer looks into its space, finds the response, and sends it back to the client, without bothering the server. The latency is 0.2 seconds.
Of course, these are only imaginary numbers and situations. However, in the simplest form, this is how cache works. It might not be very helpful on a low traffic website; however, when we are dealing with thousands of concurrent users on a high traffic website, then we can appreciate the value of caching.
So, according to the previous images, we can define some terminology and use them in this article as we continue. In the first image, when a client asked for that page, it wasn't exited and the cache layer had to store a copy of its contents for the future references. This is called Cache Miss. However, in the second image, we already had a copy of the contents stored in the cache and we benefited from it. This is called Cache Hit.
If you do a quick search, you will find that a good cache is defined as the one which misses only once. In other words, this cache miss happens only if the content has not been requested before. This feature is necessary but it is not sufficient. To clarify the situation a little bit, let's add two more terminology here. A cache can be in one of the following states: fresh (has the same contents as the original response) and stale (has the old response's contents that have now changed on the server).
The important question here is for how long should a cache be kept? We have the power to define the freshness of a cache via a setting expiration period. We will see how to do this in the coming sections. However, just because we have this power doesn't mean that we are right about the content's freshness. Consider the situation shown in the following image:
If we cache a content for a long time, cache miss won't happen again (which satisfies the preceding definition), but the content might lose its freshness according to the dynamic resources that might change on the server. To give you an example, nobody likes to read the news of three months ago when they open the BBC website.
Now, we can modify the definition of a good cache as follows:
A cache strategy is considered to be good if cache miss for the same content happens only once, while the cached contents are still fresh.
This means that defining the cache expiry time won't be enough and we need another strategy to keep an eye on cache freshness. This happens via a cache validation strategy. When the server sends a response, we can set the validation rules on the basis of what really matters on the server side, and this way, we can keep the contents stored in the cache fresh, as shown in the following image. We will see how to do this in Symfony soon.
In this article, we will focus on two types of caches: The gateway cache (which is called reverse proxy cache as well) and doctrine cache. As you might have guessed, the gateway cache deals with all of the HTTP cache headers. Symfony comes with a very strong gateway cache out of the box. All you need to do is just activate it in your front controller then start defining your cache expiration and validation strategies inside your controllers.
That said, it does not mean that you are forced or restrained to use the Symfony cache only. If you prefer other reverse proxy cache libraries (that is, Varnish or Django), you are welcome to use them. The caching configurations in Symfony are transparent such that you don't need to change a single line inside your controllers when you change your caching libraries. Just modify your config.yml file and you will be good to go.
However, we all know that caching is not for application layers and views only. Sometimes, we need to cache any database-related contents as well. For our Doctrine ORM, this includes metadata cache, query cache, and result cache.
Doctrine comes with its own bundle to handle these types of caches and it uses a wide range of libraries (APC, Memcached, Redis, and so on) to do the job. Again, we don't need to install anything to use this cache bundle. If we have Doctrine installed already, all we need to do is configure something and then all the Doctrine caching power will be at our disposal.
Putting these two caching types together, we will have a big picture to cache our Symfony project:
As you can see in this image, we might have a problem with the final cached page. Imagine that we have a static page that might change once a week, and in this page, there are some blocks that might change on a daily or even hourly basis, as shown in the following image. The User dashboard in our project is a good example.
Thus, if we set the expiration on the gateway cache to one week, we cannot reflect all of those rapid updates in our project and task controllers.
To solve this problem, we can leverage from Edge Side Includes (ESI) inside Symfony. Basically, any part of the page that has been defined inside an ESI tag can tell its own cache story to the gateway cache. Thus, we can have multiple cache strategies living side by side inside a single page. With this solution, our big picture will look as follows:
Thus, we are going to use the default Symfony and Doctrine caching features for application and model layers and you can also use some popular third-party bundles for more advanced settings. If you completely understand the caching principals, moving to other caching bundles would be like a breeze.
Before diving into the Symfony application cache, let's familiarize ourselves with the elements that we need to handle in our cache strategies. To do so, open https://www.wikipedia.org/ in your browser and inspect any resource with the 304 response code and ponder on request/response headers inside the Network tab:
Among the response elements, there are four cache headers that we are interested in the most: expires and cache-control, which will be used for an expiration model, and etag and last-modified, which will be used for a validation model.
Apart from these cache headers, we can have variations of the same cache (compressed/uncompressed) via the Vary header and we can define a cache as private (accessible by a specific user) or public (accessible by everyone).
There is no complicated or lengthy procedure required to activate the Symfony's gateway cache. Just open the front controller and uncomment the following lines:
// web/app.php
<?php
//...
require_once __DIR__.'/../app/AppKernel.php';
//un comment this line
require_once __DIR__.'/../app/AppCache.php';
$kernel = new AppKernel('prod', false);
$kernel->loadClassCache();
// and this line
$kernel = new AppCache($kernel);
// ...
?>
Now, the kernel is wrapped around the Application Cache layer, which means that any request coming from the client will pass through this layer first.
Log in to your project and click on the Request/Response section in the debug toolbar. Then, scroll down to Response Headers and check the contents:
As you can see, only cache-control is sitting there with some default values among the cache headers that we are interested in.
When you don't set any value for Cache-Control, Symfony considers the page contents as private to keep them safe.
Now, let's go to the Dashboard controller and add some gateway cache settings to the indexAction() method:
// src/AppBundle/Controller/DashboardController.php
<?php
namespace AppBundleController;
use SymfonyBundleFrameworkBundleControllerController;
use SymfonyComponentHttpFoundationResponse;
class DashboardController extends Controller
{
public function indexAction()
{
$uId = $this->getUser()->getId();
$util = $this->get('mava_util');
$userProjects = $util->getUserProjects($uId);
$currentTasks= $util->getUserTasks($uId, 'in progress');
$response = new Response();
$date = new DateTime('+2 days');
$response->setExpires($date);
return $this->render(
'CoreBundle:Dashboard:index.html.twig',
array(
'currentTasks' => $currentTasks,
'userProjects' => $userProjects
),
$response
);
}
}
You might have noticed that we didn't change the render() method. Instead, we added the response settings as the third parameter of this method. This is a good solution because now we can keep the current template structure and adding new settings won't require any other changes in the code.
However, you might wonder what other options do we have? We can save the whole $this->render() method in a variable and assign a response setting to it as follows:
// src/AppBundle/Controller/DashboardController.php
<?php
// ...
$res = $this->render(
'AppBundle:Dashboard:index.html.twig',
array(
'currentTasks' => $currentTasks,
'userProjects' => $userProjects
)
);
$res->setExpires($date);
return $res;
?>
Still looks like a lot of hard work for a simple response header setting. So let me introduce a better option. We can use the @Cache annotation as follows:
// src/AppBundle/Controller/DashboardController.php
<?php
namespace AppBundleController;
use SymfonyBundleFrameworkBundleControllerController;
use SensioBundleFrameworkExtraBundleConfigurationCache;
class DashboardController extends Controller
{
/**
* @Cache(expires="next Friday")
*/
public function indexAction()
{
$uId = $this->getUser()->getId();
$util = $this->get('mava_util');
$userProjects = $util->getUserProjects($uId);
$currentTasks= $util->getUserTasks($uId, 'in progress');
return $this->render(
'AppBundle:Dashboard:index.html.twig', array(
'currentTasks' => $currentTasks,
'userProjects' => $userProjects
));
}
}
Have you noticed that the response object is completely removed from the code? With an annotation, all response headers are sent internally, which helps keep the original code clean. Now that's what I call zero-fee maintenance. Let's check our response headers in Symfony's debug toolbar and see what it looks like:
The good thing about the @Cache annotation is that they can be nested. Imagine you have a controller full of actions. You want all of them to have a shared maximum age of half an hour except one that is supposed to be private and should be expired in five minutes. This sounds like a lot of code if you going are to use the response objects directly, but with an annotation, it will be as simple as this:
<?php
//...
/**
* @Cache(smaxage="1800", public="true")
*/
class DashboardController extends Controller
{
public function firstAction() { //... }
public function secondAction() { //... }
/**
* @Cache(expires="300", public="false")
*/
public function lastAction() { //... }
}
The annotation defined before the controller class will apply to every single action, unless we explicitly add a new annotation for an action.
In the previous example, we set the expiry period very long. This means that if a new task is assigned to the user, it won't show up in his dashboard because of the wrong caching strategy. To fix this issue, we can validate the cache before using it.
There are two ways for validation:
We are going to try both of them in the Dashboard controller and see them in action.
Using the right validation header is totally dependent on the current code. In some actions, calculating modified dates is way easier than creating a digital footprint, while in others, going through the date and time function might looks costly. Of course, there are situations where generating both headers are critical. So creating it is totally dependent on the code base and what you are going to achieve.
As you can see, we have two entities in the indexAction() method and, considering the current code, generating the ETag header looks practical. So the validation header will look as follows:
// src/AppBundle/Controller/DashboardController.php
<?php
//...
class DashboardController extends Controller
{
/**
* @Cache(ETag="userProjects ~ finishedTasks")
*/
public function indexAction() { //... }
}
The next time a request arrives, the cache layer looks into the ETag value in the controller, compares it with its own ETag, and calls the indexAction() method; only, there is a difference between these two.
Imagine that we want to keep the cache fresh for 10 minutes and simultaneously keep an eye on any changes over user projects or finished tasks. It is obvious that tasks won't finish every 10 minutes and it is far beyond reality to expect changes on project status during this period.
So what we can do to make our caching strategy efficient is that we can combine Expiration and Validation together and apply them to the Dashboard Controller as follows:
// src/CoreBundle/Controller/DashboardController.php
<?php
//...
/**
* @Cache(expires="600")
*/
class DashboardController extends Controller
{
/**
* @Cache(ETag="userProjects ~ finishedTasks")
*/
public function indexAction() { //... }
}
Keep in mind that Expiration has a higher priority over Validation. In other words, the cache is fresh for 10 minutes, regardless of the validation status. So when you visit your dashboard for the first time, a new cache plus a 302 response (not modified) is generated automatically and you will hit cache for the next 10 minutes.
However, what happens after 10 minutes is a little different. Now, the expiration status is not satisfying; thus, the HTTP flow falls into the validation phase and in case nothing happened to the finished tasks status or the your project status, then a new expiration period is generated and you hit the cache again.
However, if there is any change in your tasks or project status, then you will hit the server to get the real response, and a new cache from response's contents, new expiration period, and new ETag are generated and stored in the cache layer for future references.
In this article, you learned about the basics of gateway and Doctrine caching. We saw how to set expiration and validation strategies using HTTP headers such as Cache-Control, Expires, Last-Modified, and ETag. You learned how to set public and private access levels for a cache and use an annotation to define cache rules in the controller.