Thursday, May 28, 2009

Improving XFS unlink() performance

I knew for quite some time that XFS has an abysmal performance on huge deletes - e.g. deleting a Linux kernel source tree can take several minutes. But when I had to do multiple images with kiwi yesterday on my relatively new server machine at home, it started to really annoy me, so I invested some time to find out if there is a way to improve the situation.
This is what I started from (numbers are from bonnie++):
/dev/sda1 /space3 xfs rw,noatime,nodiratime,noquota 0 0

Version 1.01d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
server 4G 69024 31 71616 10 32732 6 75462 39 77511 6 120.9 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 341 1 +++++ +++ 311 1 340 2 +++++ +++ 249 1


The first hint I got from a colleague was to use "logbufs=8" mount option. This increases the number of buffers that XFS uses for its log. Whatever that means. At the first try, it did not change anything until I realized that a "mount -o remount,..." was not enough, I had to unmount and newly mount the filesystem with the option to make it work.
This is the result, still not too impressive with regard to delete performance, the difference might be purely statistical noise:
/dev/sda1 /space3 xfs rw,noatime,nodiratime,logbufs=8,noquota 0 0

bonnie++:
Version 1.01d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
server 4G 66413 31 68420 10 32467 5 76001 40 77143 6 107.2 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 338 2 +++++ +++ 313 1 332 2 +++++ +++ 269 1


But the really good results came only after also increasing the logbuffer size with the "logbsize=262144" option:
/dev/sda1 /space3 xfs rw,noatime,nodiratime,logbufs=8,logbsize=256k,noquota 0 0

Version 1.01d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
server 4G 77932 33 85196 10 32134 6 59205 32 76792 6 184.5 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1495 5 +++++ +++ 848 2 1468 7 +++++ +++ 916 3


Now that's a real improvement and I'll keep it like that.

But beware, there are some caveats:
  • The file system has to support the larger buffer sizes, which means you need a sufficiently new kernel. As I am running openSUSE 11.1 on this machine, this is not really a problem, but I had to enable the "version 2 log format" with "xfs_admin -j /dev/sda1" because this partition was created using SLES10 and I am not sure if I still would be able to mount it on an old machine.

  • It will use more memory. I guess that it needs at least 2 MB (256kB * 8 buffers) per filesystem, probably more. But on a box with 2GB of RAM, I'll gladly spare that for better performance.




One note: those "measurements" were taken on a Core2 Duo machine using an abit IP35-E mainboard. The used disk drive was my old Maxtor STM3500630A 500GB IDE drive. I did not do any statistical corrections etc., so your mileage may vary.

3 comments:

  1. Note that "noatime" implies "nodiratime".

    ReplyDelete
  2. thank you this really helped.

    i was originally refering to the article posted here:

    http://everything2.com/index.pl?node_id=1479435 but I did not see a whole lot of improvement on my system. However, mounting with the logbsize option i managed to get almost a 10 fold improvement in both the delete and random create performance.

    I'm running on tests on a 250G SATA drive on an ubuntu 8.02 64-bit server with a 2.6.24-19 kernel.

    ReplyDelete
  3. [...] http://seife.kernalert.de/blog/2009/...k-performance/ [...]

    ReplyDelete