Best practice for optimizing disk performance for the Cassandra database is to lower the default disk readahead for the drive or partition where your Cassandra data is stored. By default, the Linux kernel reads additional file data so that subsequent reads can be satisfied from the cache. The file access patterns of Cassandra queries result in the readaheads mostly being unused, therefore polluting the cache, driving up I/O time and also results in excessive disk I/O levels.
Before you begin
You can view your current readahead settings with either of these commands:
<button class="bx--copy-btn" tabindex="0" title="Copy to clipboard" type="button" data-copy-btn="" aria-label="Copy to clipboard"></button>
<button class="bx--copy-btn" tabindex="0" title="Copy to clipboard" type="button" data-copy-btn="" aria-label="Copy to clipboard"></button>
Examples:
<button class="bx--copy-btn" tabindex="0" title="Copy to clipboard" type="button" data-copy-btn="" aria-label="Copy to clipboard"></button>
<button class="bx--copy-btn" tabindex="0" title="Copy to clipboard" type="button" data-copy-btn="" aria-label="Copy to clipboard"></button>
Looking at the RA and Size columns, the readahead of 8192 combined with the size of 512 results in a readahead of 4096 KB. That means any read on the
lsblk --output NAME,KNAME,TYPE,MAJ:MIN,FSTYPE,SIZE,RA,MOUNTPOINT,LABEL
blockdev --report
lsblk --output NAME,KNAME,TYPE,MAJ:MIN,FSTYPE,SIZE,RA,MOUNTPOINT,LABEL
NAME KNAME TYPE MAJ:MIN FSTYPE SIZE RA MOUNTPOINT LABEL
fd0 fd0 disk 2:0 4K 128
sda sda disk 8:0 80G 4096
├─sda1 sda1 part 8:1 xfs 1G 4096 /boot
└─sda2 sda2 part 8:2 LVM2_member 79G 4096
├─rhel-root dm-0 lvm 253:0 xfs 75G 4096 /
└─rhel-swap dm-1 lvm 253:1 swap 4G 4096 [SWAP]
sdb sdb disk 8:16 xfs 100G 4096 /docker
sdc sde disk 8:32 2T 128
blockdev --report
RO RA SSZ BSZ StartSec Size Device
rw 256 512 4096 0 4096 /dev/fd0
rw 8192 512 4096 0 85899345920 /dev/sda
rw 8192 512 512 2048 1073741824 /dev/sda1
rw 8192 512 4096 2099200 84824555520 /dev/sda2
rw 8192 512 512 0 107374182400 /dev/sdb
rw 8192 512 512 0 80530636800 /dev/dm-0
rw 8192 512 4096 0 4290772992 /dev/dm-1
rw 256 512 4096 0 2148557389824 /dev/sdc
/
root drive results in 4 MB of disk I/O into the system cache. Best practice is to use a separate drive for the Cassandra data, as well as the other StatefulSet
services requiring disk space.About this task
tuned.services
disk settings to make the readahead settings persistent. These steps need to be performed on each VM running Cassandra.Procedure
Manually set readahead on an existing drive or volume.
- To set the readahead, use the
blockdev
command with the internal kernel device name (KNAME):- To find the KNAME of the device to modify, run the following command:
lsblk --output NAME,KNAME,TYPE,MAJ:MIN,FSTYPE,SIZE,RA,MOUNTPOINT,LABEL
dm-2
. Your KNAME might be different based on your system settings:NAME KNAME TYPE MAJ:MIN FSTYPE SIZE RA MOUNTPOINT LABEL fd0 fd0 disk 2:0 4K 128 sda sda disk 8:0 80G 4096 ├─sda1 sda1 part 8:1 xfs 1G 4096 /boot └─sda2 sda2 part 8:2 LVM2_member 79G 4096 ├─rhel-root dm-0 lvm 253:0 xfs 75G 4096 / └─rhel-swap dm-1 lvm 253:1 swap 4G 4096 [SWAP] sdb sdb disk 8:16 xfs 100G 4096 /docker sdc sdc disk 8:32 LVM2_member 2T 4096 ├─vg_sdc-lv_cassandra dm-2 lvm 253:2 xfs 2T 4096 /k8s/data/cassandra cassandra
- Enter the blockdev command with
--setra
in number of blocks (for example, a readahead of 16 with size of 512 bytes results in an 8KB readahead):blockdev --setra <var class="keyword varname">16</var> <var class="keyword varname">device </var>
blockdev --setra 16 /dev/dm-2
- Verify the readahead settings:
lsblk --output NAME,KNAME,TYPE,MAJ:MIN,FSTYPE,SIZE,RA,MOUNTPOINT,LABEL
dm-2
:NAME KNAME TYPE MAJ:MIN FSTYPE SIZE RA MOUNTPOINT LABEL fd0 fd0 disk 2:0 4K 128 sda sda disk 8:0 80G 4096 ├─sda1 sda1 part 8:1 xfs 1G 4096 /boot └─sda2 sda2 part 8:2 LVM2_member 79G 4096 ├─rhel-root dm-0 lvm 253:0 xfs 75G 4096 / └─rhel-swap dm-1 lvm 253:1 swap 4G 4096 [SWAP] sdb sdb disk 8:16 xfs 100G 4096 /docker sdc sdc disk 8:32 LVM2_member 2T 4096 ├─vg_sdc-lv_cassandra dm-2 lvm 253:2 xfs 2T 8 /k8s/data/cassandra cassandra
- If you are modifying a running environment, restart the Cassandra Docker container to use the new readahead values.
You can restart the Cassandra Docker container as an IBM® Cloud Private
admin
, either through the IBM Cloud Private UI or kubectl.
- To find the KNAME of the device to modify, run the following command:
With this method, the tuned service adjusts the configuration settings to optimize system performance. Tuned profiles overwrite the smaller readahead setting used in the LVM setup. The tuned service adjusts the configuration settings to optimize system performance. The service can modify settings such as disk device readahead. Tuned profiles overwrite the smaller readahead setting used in the LVM setup. To prevent the overwrite, add the setting to your tuned profile. For more information, see Performance tuning with tuned and tuned-adm in the Red Hat Performance Tuning Guide.
- Modify
tuned.service
disk settings to make readahead persistent:- Format a blank drive for the Cassandra data to be stored as described in Configuring the disk drives for services.
- Use the tuned-adm command to see the current active profile:
tuned-adm active Current active profile: virtual-guest
virtual-guest
. Note: Your profile and configuration may be different. - Copy the profile to the /etc/tuned directory.
The default profile definitions are stored in /usr/lib/tuned/.In our example, the definitions are in /usr/lib/tuned/virtual-guest/tuned.conf. The definitions for virtual-guest contain include=throughput-performance, which means the settings inherit the settings of throughput-performance. Looking at /usr/lib/tuned/throughput-performance/tuned.conf, we see that this is where the readahead=>4096 is being set.
cp -a /usr/lib/tuned/throughput-performance/ /etc/tuned/
- Copy the profile to the /etc/tuned directory.
The default profile definitions are stored in /usr/lib/tuned/.In our example, the definitions are in /usr/lib/tuned/virtual-guest/tuned.conf. The definitions for virtual-guest contain include=throughput-performance, which means the settings inherit the settings of throughput-performance. Looking at /usr/lib/tuned/throughput-performance/tuned.conf, we see that this is where the readahead=>4096 is being set.
cp -a /usr/lib/tuned/throughput-performance/ /etc/tuned/
- Add the following section to the /etc/tuned/throughput-performance/tuned.conf file, making sure that it is above the existing
[disk]
section.[disk-cassandra] type=disk devices=dm-2 readahead=8
- Reload the tuned profile:
tuned-adm profile virtual-guest
- Verify the new readahead setting:
lsblk --output NAME,KNAME,TYPE,MAJ:MIN,FSTYPE,SIZE,RA,MOUNTPOINT,LABEL
dm-2
in this example:NAME KNAME TYPE MAJ:MIN FSTYPE SIZE RA MOUNTPOINT LABEL fd0 fd0 disk 2:0 4K 128 sda sda disk 8:0 80G 4096 ├─sda1 sda1 part 8:1 xfs 1G 4096 /boot └─sda2 sda2 part 8:2 LVM2_member 79G 4096 ├─rhel-root dm-0 lvm 253:0 xfs 75G 4096 / └─rhel-swap dm-1 lvm 253:1 swap 4G 4096 [SWAP] sdb sdb disk 8:16 xfs 100G 4096 /docker sdc sdc disk 8:32 LVM2_member 2T 4096 ├─vg_sdc-lv_cassandra dm-2 lvm 253:2 xfs 2T 8 /k8s/data/cassandra cassandra
- If you are modifying a running environment, restart the Cassandra Docker container to use the new readahead values.
You can restart the Cassandra Docker container as an IBM Cloud Private
admin
, either through the IBM Cloud Private UI or kubectl.