Pooling connection counts
The fundamental idea behind sizing a connection pool is that you should have enough connections to use all of the available resources, but not significantly more than that. The right size to saturate the server in most cases depends on the number of CPU cores, how much of the database is cached in memory, and the speed of the underlying disks. Once you've moved beyond the point where the server is busy all the time, adding more connections only serves to reduce efficiency, forcing the server to swap among multiple tasks when it would be better served with a smaller number.
It's hard to predict how many pooled connections to the database are necessary to keep it fully loaded without overloading it. Many users report optimal pooling counts to be between two and three times the number of cores on the system, perhaps more on a system with a large number of drives. A standard iterative technique is to start with 100 connections and then tune the number down from there if the server load seems too high. Prepare to be surprised at how low the optimal count really is; it's probably lower than you'd expect. Even though you might think that your server needs lots of connections to service a heavy load, in reality once you've saturated all of the CPUs on the server trying to execute more things at once just decreases efficiency. Almost all of the time, you'll be better off queuing the connections, so that they wait without causing contention and then execute on a lightly loaded server.
There are a few rules of thumb for how many connections define too many. On a typical UNIX derived operating systems such as Linux, the point at which adding additional connections becomes really ineffective is generally between 500 and 1000 active ones. If many of your connections spend a good chunk of their time IDLE
(as shown by pg_stat_activity
), you can discount their average overhead to some extent. Generally, having fewer connections than that is optimal, but you don't necessarily want to add the overhead of maintaining a pool until the server is really swamped. If your connections count is well into the thousands of active sessions, you definitely want to use a pool rather than direct connections.
Windows systems don't scale to as many connections in general, and there's a hard limit you'll run into in most cases too. If you are running the PostgreSQL server as a service, rather than directly from the command line, it will typically be assigned 512 KB of "Desktop Heap" space to work with. Since each connection takes approximately 3.2 KB of space, expect your server to run out of space in the heap and therefore stop accepting new connections after approximately 125 of them. It's possible to increase the heap size, but there's a potential that your system will not boot if you set it too high. See the "I cannot run with more than about 125 connections at once" entry at http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows for more information about this limitation and possible workarounds. Generally, the best workaround is to use a connection pooler that limits the number of connections to be below this threshold, as this will improve server efficiency too.
From a monitoring perspective, connection pooling is likely to help if you have hundreds or more of connections and you see that most of your system's processors are being fully utilized. Connection overhead will show up as a mix of user and system/kernel time, so expect both to reduce with a pooler in front of the database. If your system spends most of its time waiting for disk I/O instead, it's unlikely a pooler will help you out. Caching might however.