Common mistakes in configuration
Reading default configuration files may turn out interesting and educating, but more useful thing is, of course, looking at examples of configuration that is actually used in production. We will now look at some common mistakes that happen during the configuration of Nginx.
If you don't see something that has happened to you and you need help immediately, by all means skip and browse the rest of the book. There are a lot more examples throughout the chapters grouped by the nature of the problem or the effects it has.
Semicolons and newlines
One common feature of truly great software is forgiving. Nginx will understand and autocorrect some syntax violations when the result is unambiguous. For example, if your hands insist on enclosing values in quotes—you can actually do this.
This is completely legal and works okay:
... user "nobody" 'www-data'; worker_processes '2'; ...
On the other hand, here is a case when Nginx will not allow you to leave a stray, unneeded semicolon although it does not introduce any ambiguity:
... events { worker_connections 768; # multi_accept on; }; ...
% sudo nginx -t nginx: [emerg] unexpected ";" in /etc/nginx/nginx.conf:13 nginx: configuration file /etc/nginx/nginx.conf test failed
The author once had a configuration file saved in the older Mac format, that is, with <CR> as the newline separator. This is a format used on pre-OS X Apple operating systems. Text editors and pagers work around this rare curiosity, and you will have a hard time noticing anything unusual. Nginx could not split the file into lines at all:
% sudo nginx -t nginx: [emerg] unexpected end of file, expecting "}" in /etc/nginx/nginx.conf:1 nginx: configuration file /etc/nginx/nginx.conf test failed
The way to fix it is to convert newlines from <CR> to <LF> or to <CR><LF>. The easiest method, using tr
from the Unix/Linux command line, looks like this:
% tr '\r' '\n' < /etc/nginx/nginx.conf > /tmp/nginx.conf
(After this, check it manually and replace the old file with mv
.)
File permissions
Have you noticed that we run nginx -t
with sudo
? Let us try without sudo
and see what happens:
It is actually quite interesting. Nginx reported that the syntax of the file is okay, but then it decided to dig deeper and check not only the syntax but also the availability of some features mentioned in the configuration. First, it complained about not being able to change the effective user under whose permissions all the worker processes should run. Do you remember the user
directive? It also tried to open both the main server-wide error log and the pid
file that is rewritten on each restart of Nginx. Both of them are not available for writing from the main working account (and they should not be, of course). That is why sudo is needed when running nginx -t
.
Variables
Here is another example of a simple syntax error that might bite you once or twice in your career. Do you remember variables that we discussed several pages before? Nginx uses $syntax
that is very familiar to everyone with the UNIX shell, awk, Perl, or PHP programming experience. Still, it is very easy to miss the dollar character and Nginx will not notice that because a variable will just turn into a simple string constant.
When you set up your Nginx as a proxy for another web server (such configuration is traditionally named "reverse accelerator", but less and less often so in recent times), you will quickly find that all client connections to your backend server come from the same IP address, the address of your Nginx host. The reason is obvious, but once you have some backend logic depending on getting the actual client address, you will need to work around this limitation of proxying. A common practice is to include an additional HTTP request header on all requests from Nginx to the backend. Here is how you do that:
... proxy_set_header X-Real-IP $remote_addr; ...
The application will have to check for this header, and only in its absence use the actual client IP address from the socket. Now, imagine losing that dollar sign in the beginning of $remote_addr
. Suddenly, your Nginx will add a very strange header of X-Real-IP: remote_addr
to all requests. nginx -t
won't be able to detect this. Your backend application might blow up in case there is a strict IP address parser or, and this is ironically worse, it might skip the unparsable IP address of remote_addr
and default to the actual address of your Nginx never ever reporting this to any logs. You will end up with a working configuration that silently loses valuable information! Depending on luck, this could be in production for months before someone notices that some fresh "rate-limiting by IP" feature of the application starts to affect all users at once!
Ah, the horrors!
Regular expressions
Let us get to something less destructive. Many Nginx directives make use of regular expressions. You should be familiar with them. If not, we would recommend stopping your work as soon as possible and leaving for a bookstore. Regular expressions are considered by many IT practitioners to be the single most important technology or skill for everyday use after fast typing.
Most often, you will see regexps in location
multiline directive. Besides this, they are very useful (and sometimes unavoidable) in URL rewriting and hostname matching. Regular expressions are a mini-language that uses several characters as metacharacters to construct patterns from strings. Those patterns cover sets of strings (very often infinite sets); the process of checking whether a particular string is contained in the set corresponding to a pattern is named matching. This is a simple regexp from the default nginx.conf
file:
... #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} ...
The tilde after the location
command word means that what follows is a regular expression to match against incoming URLs. \.php$
covers an infinite set of all strings in the universe that have these exact four characters in the very end: .php
. The backslash before the dot cancels the metavalue of the dot, which is "any character". The dollar sign is a metacharacter that matches the very end of a string.
How many ways are there to make a mistake in that expression? A lot. A very big number. Will nginx -t
point out those errors? Most probably, no, unless you happen to make the whole directive somehow invalid and due to the very expressive nature of the mini-language, almost all character combinations are valid. Let's try some:
... location ~ \.php { ...
Did you notice? Right, no dollar, again as in the variable example shown previously. This is perfectly valid. It will also pass most tests because this regexp covers an even larger infinite set of all strings that have .php
as a substring, not necessarily in the end. What could possibly go wrong? Well, first, you could get a request for the URL that looks like "
/mirrors/www.phpworld.example.com/index.html
" and blow up. And second, matching by comparing the last 4 characters is logically much simpler than searching the whole buffer for the substring. This could have performance effects, however, small.
Let's skip the backslash instead:
... location ~ .php$ { ...
Evil. This will also pass the tests but again, the set of matching strings grew. Now the dot before the php
is not literal. It is a metadot meaning any character. You have to be lucky to get something like /download/version-for-php
, but once you get this, the location will match. You have a time bomb.
Now, let's drop the tilde:
... location \.php$ { ...
Do you like our game by the way? You should already predict what will happen and how to fix it, that is, if you do like it and are starting to think like an Nginx instance.
The missing tilde will turn this location directive into its simplest form—no regular expressions whatsoever. The string \.php$
is interpreted as a prefix to search for in all incoming URLs, together with the backslash and the dollar. Will this location block ever process a single request? We don't know. One important thing here is that nginx -t
still does not have anything to say about this directive. It is still valid syntactically.