Sunday, March 15, 2015

FITRIM/discard with qemu/kvm for thin provisioning

My notebook computer is running with an SSD, and usually I'm creating logical volumes for the KVM VM's I install on it for testing purposes. On my normal file systems, I regularly run "fstrim" manually, to help the SSD firmware figure out which blocks can be reused. However, the LV's of the virtual machines usually stayed un-TRIM'ed. I had heard, that KVM/QEMU now supports the discard commands, but had not yet gotten to testing it.
I finally got to figuring out how it works:

First, you need to switch the VM to using virtio-scsi instead of virtio-blk:

Before:
<disk type='block' device='disk'>
  <driver name='qemu' type='raw'/>
  <source dev='/dev/main/factory'/>
  <target dev='vda' bus='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/
</disk>
After:
<disk type='block' device='disk'>
  <driver name='qemu' type='raw'/>
  <source dev='/dev/main/factory'/>
  <target dev='sda' bus='scsi'/>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='scsi' index='0' model='virtio-scsi'/>
Note the added scsi controller, and the only things you need to change are "target" and "address", if your source is different, that's ok.
Now check that your VM still boots. If it does not, then it is missing the virtio-scsi driver in the initrd. Reboot with the old configuration and build an initrd that includes all drivers, or at least the virtio-scsi driver. Another possible problem is the change from "/dev/vda1" to "/dev/sda1", check your fstab and use UUID or filesystem label for booting. Both problems did not occur to me on a stock Factory install (it uses UUID by default and had all drivers in initrd), but a hand-built kernel (built with "make localmodconfig"...) failed to boot, so be prepared.

Now you are using virtio-scsi for your device, but fstrim will still give you a "operation not supported" message. You'll need another parameter in your VM's configuration:
<driver name='qemu' type='raw' discard='unmap'/>
Restart the VM, and...
factory-vm:~ # fstrim -v /
/: 8,7 GiB (9374568448 bytes) trimmed
factory-vm:~ # 
Now what about thin-provisioning?
I converted the same VM from LV to a plain raw file.
This is the file on the host, it is sparse:
susi:/local/libvirt-images # ls -lh factory.raw
-rw-r----- 1 qemu qemu 20G Mar 15 14:05 factory.raw
susi:/local/libvirt-images # du -sh factory.raw
12G     factory.raw
Now let's delete some stuff inside the VM and run fstrim:
factory-vm:~ # du -sh /home/seife/linux-2.6/
3.9G    /home/seife/linux-2.6/
factory-vm:~ # rm -rf /home/seife/linux-2.6/
factory-vm:~ # fstrim -v /
/: 12.7 GiB (13579157504 bytes) trimmed
Checking again on the host:
susi:/local/libvirt-images # ls -lh factory.raw
-rw-r----- 1 qemu qemu 20G Mar 15 14:08 factory.raw
susi:/local/libvirt-images # du -sh factory.raw
6.4G    factory.raw
So this is really neat, as you now can free up space on the host after cleaning up in the VM. Maybe I should reconsider my "put all VMs into logical volumes" strategy again, as this wastes quite some valuable SSD space in my case.