Improving Linux System Performance with I/O Scheduler Tuning

Written by: Ben Cane

9 min read

In a previous article, I wrote about using pgbench to tune PostgreSQL. While I covered a very common tunable shared_buffers, there are many other tuning options that can be used to gain performance from PostgreSQL.

Today's article is going to cover one of those additional options. However, this tunable does not belong to PostgreSQL. Rather this tunable belongs to the Linux Kernel.

In today's article, we will be adjusting the Linux I/O scheduler and measuring the impact of those changes with pgbench. We will do this using the same PostgreSQL environment we used in the previous article. All of the tuning parameters from the previous article have already been applied to the environment we will be using today.

What Is an I/O Scheduler?

The I/O Scheduler is an interesting subject; it's something that's rarely thought about unless you are trying to get the best performance out of your Linux systems. Before going too deep into how to change the I/O scheduler, let's take a moment to better familiarize ourselves with what I/O schedulers provide.

Disk access has always been considered the slowest method of accessing data. Even with the growing popularity of Flash and Solid State storage, accessing data from disk is considered slower when compared to accessing data from RAM. This is especially true when you have infrastructure that is using spinning disks.

The reason for this is because traditional spinning disks write data based on locations on a spinning platter. When reading data from a spinning disk it is necessary for the physical drive to spin the disk platters to a specific location to read the data. This process is known as "seeking" and in terms of computing, this process can take a long time.

I/O schedulers exist as a way to optimize disk access requests. They traditionally do this by merging I/O requests to similar locations on disk. By grouping requests located at similar sections of disk, the drive doesn't need to "seek" as often, improving the overall response time for disk operations.

On modern Linux implementations, there are several I/O scheduler options available. Each of these have their own unique method of scheduling disk access requests. In the rest of this article, we will break down how each of these schedulers prioritizes disk access and measure the performance changes from scheduler to scheduler.

Changing the I/O Scheduler

For today's article, we will be using an Ubuntu Linux server for our tests. With Ubuntu, changing the I/O Scheduler can be performed at both runtime and on bootup. The method for changing the scheduler at runtime is as simple as changing the value of a file located within /sys. Changing the value on bootup, which allows you to maintain the setting across reboots, will involve changing the Kernel parameters passed via the Grub boot loader.

Before we change the I/O scheduler however, let's first identify our current I/O scheduler. This can be accomplished by reading the /sys/block/<disk device>/queue/scheduler file.

# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

The above shows that the I/O scheduler for disk sda is currently set to deadline.

One important item to remember is that I/O scheduling methods are defined at the Linux Kernel level, but they are applied on each disk device separately. If we were to change the value in the file above, this would mean that all filesystems on disk device sda will use the new I/O scheduler.

As with anything performance-tuning related, it is important to understand what types of workloads exist for the environment being tuned. Each I/O scheduler has a unique way to prioritize disk operations. Understanding the workload required makes it easier to select the right scheduler.

However, like any other performance-tuning change, it is always best to test multiple options and choose based on the results. This is exactly what we will be doing in this article.

Runtime modification of I/O scheduler

As I mentioned earlier, there are two ways to change the I/O scheduler. You can change the scheduler at runtime, which is applied immediately to a running system, or we can modify the Grub boot loader's configuration to apply the scheduler on boot.

Since we will be performing benchmark tests to evaluate which scheduler provides the best results for our PostgreSQL instance, we will start off by changing the scheduler at runtime.

To accomplish this, we simply need to overwrite the /sys/block/<disk device>/queue/scheduler file with the new I/O scheduler selection.

# echo "cfq" > /sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

From the above, we can see that echoing cfq to the /sys/block/sda/queue/scheduler file changed our current I/O scheduler to CFQ. This change takes effect immediately. This means we can start testing the scheduler performance without having to restart PostgreSQL or any other service.

Testing PostgreSQL & I/O Scheduler Performance

Since we have already changed the I/O scheduler to CFQ, we will go ahead and start our testing with the CFQ I/O scheduler.

CFQ

The Complete Fairness Queueing (CFQ) I/O scheduler works by creating a per-process I/O queue. The goal of this I/O scheduler is to provide a fair I/O priority to each process. While the CFQ algorithm is complex, the gist of this scheduler is that after ordering the queues to reduce disk seeking, it services these per-process I/O queues in a round-robin fashion.

What this means for performance is that the CFQ scheduler tries to provide each process with the same priority for disk access. However, in doing so it makes this scheduler less optimal for environments that might need to prioritize one request type (such as reads) from a single process.

With that understanding of the CFQ scheduler, let's go ahead and establish a benchmark performance metric for our PostgreSQL database instance with pgbench.

# su - postgres

In order to run pgbench, we first need to switch to the postgres user. Once there, we can execute the same pgbench command executed in our previous article.

$ pgbench -c 100 -j 2 -t 1000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 100
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 100000/100000
latency average: 60.823 ms
tps = 1644.104024 (including connections establishing)
tps = 1644.228715 (excluding connections establishing)

From the above, we can see that our tps reached roughly 1,644 transactions per second. While not a bad start, this is not the fastest scheduler for this workload.

Deadline

The Deadline scheduler works by creating two queues: a read queue and a write queue. Each I/O request has a time stamp associated that is used by the kernel for an expiration time.

While this scheduler also attempts to service the queues based on the most efficient ordering possible, the timeout acts as a "deadline" for each I/O request. When an I/O request reaches its deadline, it is pushed to the highest priority.

While tunable, the default "deadline" values are 500 ms for Read operations and 5,000 ms for Write operations. Based on these values, we can see why the Deadline scheduler is considered an optimal scheduler for read-heavy workloads. With these timeout values, the Deadline scheduler may prioritize reads more than writes.

Now that we understand the Deadline scheduler a bit better, let's go ahead and change to the Deadline scheduler and see how it holds up to our pgbench testing.

# echo deadline > /sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

With the above, we can see that our I/O scheduler is now the Deadline scheduler. Let's go ahead and run our pgbench test again.

# su - postgres
$ pgbench -c 100 -j 2 -t 1000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 100
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 100000/100000
latency average: 46.700 ms
tps = 2141.318132 (including connections establishing)
tps = 2141.489076 (excluding connections establishing)

This time it seems that pgbench was able to reach 2,141 transactions per second. This is a 500 transactions-per-second increase, a pretty sizable increase.

What this tells us is that even though pgbench is creating a database workload that is both read and write heavy, the overall PostgreSQL instance benefits from a read-priority-based I/O scheduler.

Noop

The Noop scheduler is a unique scheduler. Rather than prioritizing specific I/O operations, it simply places all I/O requests into a FIFO (First in, First Out) queue. While this scheduler does try to merge similar requests, that is the extent of the complexity of this scheduler.

This scheduler is optimized for systems that essentially do not need an I/O scheduler. This scheduler can be used in numerous scenarios such as environments where the underlying disk infrastructure is performing I/O scheduling on Virtual Machines.

Since a VM is running within a Host Server/OS, that host already may have an I/O scheduler in use. In this scenario, each disk operation is passing through two I/O schedulers: one for the VM and one for the VM Host.

Let's take a look at what kind of performance Noop has in our environment.

# echo noop > /sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

With the above, the scheduler has been changed to the Noop scheduler. We can now run pgbench to measure the impact of this I/O scheduler.

# su - postgres
$ pgbench -c 100 -j 2 -t 1000 example
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 100
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 100000/100000
latency average: 46.364 ms
tps = 2156.838618 (including connections establishing)
tps = 2157.102989 (excluding connections establishing)

From the above, we can see that we were able to reach 2,156 transactions per second. This is only a slightly better performance over the Deadline scheduler. One of the reasons this scheduler may have better performance in our case is because the environment we are testing with is hosted within a VM.

This means that regardless of the changes being made within the VM, the I/O scheduler in use on the VM host will stay the same.

Changing the Scheduler on Boot

Since the Noop scheduler provided quite a bit of improvement over the CFQ scheduler, let's go ahead and make that change permanent. To do this, we will need to edit the /etc/default/grub configuration file.

# vi /etc/default/grub

The /etc/default/grub configuration file is used to configure the Grub boot loader. In this case, we will be looking for an option named GRUB_CMDLINE_LINUX. This option is used to add kernel boot parameters on startup.

The parameter we need to add is the elevator parameter. This is used to specify the desired I/O scheduler. Let's go ahead and add the parameter specifying the Noop scheduler.

GRUB_CMDLINE_LINUX="elevator=noop"

In the above, we added elevator=noop. This is used to define that the I/O scheduler on boot should be the Noop I/O scheduler. Once the changes have been made, we will need to run the update-grub2 command to apply the changed configurations.

# update-grub2
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-62-generic
Found initrd image: /boot/initrd.img-4.4.0-62-generic
Found linux image: /boot/vmlinuz-4.4.0-57-generic
Found initrd image: /boot/initrd.img-4.4.0-57-generic
done

With the grub configurations applied, we can now reboot the system and validate that the changes are still in effect.

# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

Summary

In this article, we learned about the various I/O schedulers available on a typical Ubuntu Linux system. We also used pgbench to explore the effects these I/O schedulers have on our PostgreSQL instance.

While our testing showed the Noop scheduler was the most performant for our environment, each environment is different. The type of service being executed and the use of that service can change the performance profile of an environment greatly.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.