When it comes to performance tuning an environment, often the first place to start is with the database. The reason for this is that most applications rely very heavily on a database of some sort.
Unfortunately, databases can be one of the most complex areas to tune. The reason I say that is because tuning a database service properly often involves tuning more than the database service itself; it often requires making hardware, OS, or even application modifications.
On top of requiring a diverse skill set, one of the biggest challenges with tuning a database is creating enough simulated database traffic to stress the database service. Which is why today's article will explore pgbench
, a benchmarking tool used to measure performance of a PostgreSQL instance.
PostgreSQL is a highly popular open-source relational database. One of the nice things about PostgreSQL is that there are quite a few tools that have been created to assist with the management of PostgreSQL; pgbench
is one such tool.
While exploring pgbench
, we will also use it to measure the performance gains/loss for a common PostgreSQL tunable.
Setting Up a PostgreSQL Instance
Before we can use pgbench
to tune a database service, we must first stand up that database service. The below steps will outline how to set up a basic PostgreSQL instance on an Ubuntu 16.04 server.
Installing with
Installing PostgreSQL on an Ubuntu system is fairly easy. The bulk of the work is accomplished by simply running the apt-get
command.
# apt-get install postgresql postgresql-contrib
The above apt-get
command installs both the postgresql
and postgresql-contrib
packages. The postgresql
package installs the base PostgreSQL service.
The postgresql-contrib
package installs additional contributions to PostgreSQL. These contributions have not yet been added to the official package but often provide quite a bit of functionality.
With the packages installed, we now have a running PostgreSQL instance. We can verify this by using the systemctl
command to check the status of PostgreSQL.
# systemctl status postgresql ● postgresql.service - PostgreSQL RDBMS Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled) Active: active (exited) since Mon 2017-01-02 21:14:36 UTC; 7h ago Process: 16075 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 16075 (code=exited, status=0/SUCCESS) Jan 02 21:14:36 ubuntu-xenial systemd[1]: Starting PostgreSQL RDBMS... Jan 02 21:14:36 ubuntu-xenial systemd[1]: Started PostgreSQL RDBMS.
The above indicates our instance started without any issues. We can now move on to our next step, creating a database.
Creating a database
When we installed the postgresql
package, this package included the creation of a user named postgres
. This user is used as the owner of the running instance. It also serves as the admin user for the PostgreSQL service.
In order to create a database, we will need to login to this user, which is accomplished by executing the su
command.
# su - postgres
Once switched to the postgres
user, we can log in to the running instance by using the PostgreSQL client, psql
.
$ psql psql (9.5.5) Type "help" for help. postgres=#
After executing the psql
command, we were dropped into PostgreSQL's command line environment. From here, we can issue SQL statements or use special client commands to perform actions.
As an example, we can list the current databases by issuing the \list
command.
postgres-# \list List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (3 rows)
After issuing the \list
command, three databases were returned. These are default databases that were set up during the initial installation process.
For our testing today, we will be creating a new database. Let's go ahead and create that database, naming it example. We can do so by issuing the following SQL statement:
CREATE DATABASE example;
Once executed, we can validate that the database has been created by issuing the \list
command again.
postgres=# \list List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- example | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (4 rows)
At this point, we now have an empty database named example. From this point, we will need to return to our bash
shell to execute pgbench
commands. We can do this by issuing the \q
(quit) command.
postgres-# \q
Once logged out of the PostgreSQL command line environment, we can get started using pgbench
to benchmark our database instance's performance.
Using pgbench to Measure Performance
One of the most difficult things in measuring database performance is generating enough load. A popular option is to simply bombard test instances of the target application/s with test transactions. While this is a useful test that provides DB performance in relation to the application, it can be problematic sometimes as application bottlenecks can limit database testing.
For situations such as this, tools like pgbench
come in handy. With pgbench
, you can either use a sample database provided with pgbench
or have pgbench
run custom queries against an application database.
In this article, we will be using the example database that comes with pgbench
.
Setting up the
The set up of the sample database is quite easy and fairly quick. We can start this process by executing pgbench
with the -i
(initialize) option.
$ pgbench -i -s 50 example creating tables... 5000000 of 5000000 tuples (100%) done (elapsed 5.33 s, remaining 0.00 s) vacuum... set primary keys... done.
In the command above, we executed pgbench
with the -i
option and the -s
option followed by the database name (example
).
The -i
(initialize) option will tell pgbench
to initialize the database specified. What this means is that pgbench
will create the following tables within the example
database.
table # of rows --------------------------------- pgbench_branches 1 pgbench_tellers 10 pgbench_accounts 100000 pgbench_history 0
By default, pgbench
will create the tables above with the number of rows shown above. This creates a simple 16MB
database.
Since we will be using pgbench
to measure changes in performance, a small 16MB
database will not be enough to stress our instance. This is where the -s
(scaling) option comes into play.
The -s
option is used to multiply the number of rows entered into each table. In the command above, we entered a "scaling" option of 50
. This told pgbench
to create a database with 50 times the default size.
What this means is our pgbench_accounts
table now has 5,000,000
records. It also means our database size is now 800MB
(50 x 16MB
).
To verify that our tables have been created successfully, let's go ahead and run the psql
client again.
$ psql -d example psql (9.5.5) Type "help" for help. example=#
In the command above, we used the -d
(database) flag to tell psql
to not only connect to the PostgreSQL service but to also switch to the example database.
Since we are currently using the example database, we can issue the \dt
command to list the tables available within that database.
example=# \dt List of relations Schema | Name | Type | Owner --------+------------------+-------+---------- public | pgbench_accounts | table | postgres public | pgbench_branches | table | postgres public | pgbench_history | table | postgres public | pgbench_tellers | table | postgres (4 rows)
From the table above, we can see that pgbench
created the four expected tables. This means our database is now populated and ready to be used to measure our database instance's performance.
Establishing a baseline
When doing any sort of performance tuning, it is best to first establish a baseline performance. This baseline will serve as a measurement as to whether or not the changes you have performed have increased or decreased performance.
Let's go ahead and call pgbench
to establish the baseline for our "out of the box" PostgreSQL instance.
$ pgbench -c 10 -j 2 -t 10000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 10000 number of transactions actually processed: 100000/100000 latency average: 4.176 ms tps = 2394.718707 (including connections establishing) tps = 2394.874350 (excluding connections establishing)
When calling pgbench
, we add quite a few options to the command. The first is -c
(clients), which is used to define the number of clients to connect with. For this testing, I used 10
to tell pgbench
to execute with 10 clients.
What this means is that when pgbench
is executing tests, it opens 10 different sessions.
The next option is the -j
(threads) flag. This flag is used to define the number of worker processes for pgbench
. In the above command, I specified the value of 2
. This will tell pgbench
to start two worker processes during the benchmarking.
The third option used is -t
(transactions), which is used to specify the number of transactions to execute. In the command above, I provided the value of 10,000
. However this doesn't mean that only 10,000
transactions will be executed against our database service. What it means is that each client session will execute 10,000
transactions.
To summarize, the baseline test run was two pgbench
worker processes simulating 10,000
transactions from 10
clients for a total of 100,000
transactions.
With that understanding, let's take a look at the results of this first test.
$ pgbench -c 10 -j 2 -t 10000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 10000 number of transactions actually processed: 100000/100000 latency average: 4.176 ms tps = 2394.718707 (including connections establishing) tps = 2394.874350 (excluding connections establishing)
The output of pgbench
has quite a bit of information. Most of it describes the test scenarios being executed. The part that we are most interested in is the following:
tps = 2394.718707 (including connections establishing) tps = 2394.874350 (excluding connections establishing)
From these results, it seems our baseline is 2,394
database transactions per second. Let's go ahead and see if we can increase this number by modifying a simple configuration parameter within PostgreSQL.
Adding More Cache
One of the go-to parameters for anyone tuning PostgreSQL is the shared_buffers
parameter. This parameter is used to specify the amount of memory the PostgreSQL service can utilize for caching. This caching mechanism is used to store the contents of tables and indexes in memory.
To show how we can use pgbench
for performance tuning, we will be adjusting this value to test performance gains/losses.
By default, the shared_buffers
value is set to 128MB
, a fairly low value considering the amount of available memory on most servers today. We can see this setting for ourselves by looking at the contents of the /etc/postgresql/9.5/main/postgresql.conf
file. Within this file, we should see the following.
# - Memory - shared_buffers = 128MB # min 128kB # (change requires restart)
Let's go ahead and switch this value to 256MB
, effectively doubling our available cache space.
# - Memory - shared_buffers = 256MB # min 128kB # (change requires restart)
Once completed, we will need to restart the PostgreSQL service. We can do so by executing the systemctl
command with the restart
option.
# systemctl restart postgresql
Once the service is fully up and running, we can once again use pgbench
to measure our performance.
$ pgbench -c 10 -j 2 -t 10000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 10000 number of transactions actually processed: 100000/100000 latency average: 3.921 ms tps = 2550.313477 (including connections establishing) tps = 2550.480149 (excluding connections establishing)
In our earlier baseline test, we were able to hit a rate of 2,394
transactions per second. In this last run, after updating the shared_buffers
parameter, we were able to achieve 2,550
transactions per second, an increase of 156
. While this is not a bad start, we can still go further.
While the shared_buffers
parameter might start off at 128MB
, the recommended value for this parameter is one-fourth the system memory. Our test system has 2GB
of system memory, a value we can verify with the free
command.
$ free -m total used free shared buff/cache available Mem: 2000 54 109 548 1836 1223 Swap: 0 0 0
In the output above, we can see that the total
column shows a value of 2000MB
on the row for memory. This column shows the total physical memory available to the system. We can also see in the available
column that 1223MB
is showing available. This means we have up to 1.2
GB of free memory we can use for our tuning purposes.
If we change our shared_buffers
parameter to the recommended value of one-fourth system memory, we would need to change it to 512MB
. Let's go ahead and make this change and rerun our pgbench
test.
# - Memory - shared_buffers = 512MB # min 128kB # (change requires restart)
With the shared_buffers
value updated in the /etc/postgresql/9.5/main/postgresql.conf
, we can go ahead and restart the PostgreSQL service.
# systemctl restart postgresql
After restarting, let's rerun our test.
$ pgbench -c 10 -j 2 -t 10000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 10000 number of transactions actually processed: 100000/100000 latency average: 3.756 ms tps = 2662.750932 (including connections establishing) tps = 2663.066421 (excluding connections establishing)
This time, our system was able to reach 2,662
transactions per second, an additional increase of 112
transactions per second. Since our transactions per second increased by at least 100
both times, let's go a step further and see what happens when changing this value to 1GB
.
# - Memory - shared_buffers = 1024MB # min 128kB # (change requires restart)
After updating the value, we will need to once again restart the PostgreSQL service.
# systemctl restart postgresql
With the service restarted, we can now rerun our test.
$ pgbench -c 10 -j 2 -t 10000 example starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 50 query mode: simple number of clients: 10 number of threads: 2 number of transactions per client: 10000 number of transactions actually processed: 100000/100000 latency average: 3.744 ms tps = 2670.791865 (including connections establishing) tps = 2671.079076 (excluding connections establishing)
This time, our transactions per second went from 2,662
to 2,671
and increase of 9
transactions per second. This is a situation where we are hitting diminishing returns.
While it is feasible for many environments to increase the shared_buffers
value beyond the one-fourth guideline, doing so does not return the same results for this test database.
Summary
Based on the results of our testing, we can see that changing the value of the shared_buffers
from 128MB
to 512MB
on our test system resulted in a 268
transactions per second increase in performance. Based on our baseline results, that is a 10 percent increase in performance.
We did this all on a base PostgreSQL instance using pgbench
's sample database. Meaning, we did not have to load our application to get a baseline metric on how well PostgreSQL performs.
While we were able to increase our throughput by modifying the shared_buffers
parameter within PostgreSQL, there are many more tuning parameters available. For anyone looking to tune a PostgreSQL instance, I would highly recommend checking out PostgreSQL's wiki.