Skip to content
Paolo Cozzi edited this page Dec 16, 2015 · 5 revisions

Introduction

What is a snapshot

A full backup of a large data set may take a long time to complete. On multi-tasking or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being atomic and introduces a version skew that may result in data corruption. For example, if a user moves a file into a directory that has already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. Version skew may also cause corruption with files which change their size or contents underfoot while being read [1].

A simple solution to overcome this is to temporarily disable write access to data during backup, by stopping the accessing applications, shutting down the sistem or using locking API in order to provide read-only accesses. Such solution require lengthy downtime of the services when backup is still running. To avoid downtime, high-availability systems may instead perform the backup on a snapshot - a read-only copy of the data set frozen at a point in time - and allow applications to continue writing to their data.

Snapshot is the ability to record the state of a storage device at any given moment and preserve that snapshot as a guide for restoring the storage device in the event that it fails. A snapshot primarily creates a point-in-time copy of the data. Point in time copy or snapshot are virtual or physical copies of data that capture the state of data set contents at a single instant. Both virtual (copy-on-write) and physical (full-copy) snapshots protect against corruption of data’s content. Additionally, full-copy snapshots can protect against physical destruction. These copies can be used for a backup, a checkpoint to restore the state of an application , data mining, test data and kind of off-host processing. An important peculiarity of these copies is that from the application point of view the copy seems to occur atomically. That means that all the data updates that happens on the original data are applied before or after the point in time copy. Typically, snapshot copy is done instantly and made available for use by other applications such as data protection, data analysis and reporting, and data replication applications. The original copy of the data continues to be available to the applications without interruption, while the snapshot copy is used to perform other functions on the data [2].

Snapshot technology is becoming prevalent to perform data protection and other tasks such as data mining and data cloning. Most leading storage hardware and software vendors provide snapshot support. Use of snapshot technology for data protection offers critical business value, such as zero impact backup with minimal or no application downtime, frequent backups (for example, hourly) to reduce recovery time, efficient backup of large volumes of data, reduced exposure to data loss, and instant recovery from snapshot [3]

Snapshot in KVM

Snapshots in QEMU[4] are images that refer to an original image using Redirect-on-Write[5] to avoid changing the original image. The main advantage is that new writes to the original volume are redirected to another location set aside for snapshot. The original location contains the point-in-time data of the Guest, that is, snapshot, and the changed data reside on the snapshot storage. While snapshotting, QEMU Guest Agent[6] ensure you have a consistent disk state. The two main guest agent features of interest to live snapshots are:

  • File system freeze (fsfreeze/fsthaw): This puts the guest file systems into a consistent state, avoiding the need for fsck next time they are mounted.
  • Guest application notification: This allows guest applications to register and be notified prior to a snapshot, in order for them to allow flushing their data to disk.

Communication with the QEMU guest agent is performed via a virtio serial channel. Commands are sent over the channel encoded as QMP commands, and replies are encoded as QMP replies. There are future plans to implement a passthrough mechanism for agent commands issued via QMP, allowing these commands to be accessible via the QMP monitor instead of an external agent socket on the host. Note that guest agent collaboration is also needed for snapshots using other methods, such as snapshots performed on btrfs, LVM, enterprise storage, etc [7]. Once snapshot is completed, a backup could be done by copying the original image in another location. Once backup is completed, the data from the snapshot storage must be reconciled back into the original volume, before removing snapshot. This operation is done by blockcommit, which reduces the length of a backing image chain, by committing changes at the top of the chain (snapshot or delta files) into backing images [8]. Those operation are done in different steps using virsh and are necessary to get a functional image. Moreover, to recreate the entire domain, it's betted to dump the XML configuration file, in order to recreate easily the archived domain.

Rerences

[1]: https://en.wikipedia.org/wiki/Snapshot_(computer_storage)
[2]: http://www.veritas.com/community/blogs/storage-foundation-point-time-copy-or-snapshot
[3]: http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html
[4]: http://wiki.qemu.org/Documentation/CreateSnapshot
[5]: http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html
[6]: http://wiki.libvirt.org/page/Qemu_guest_agent
[7]: http://wiki.qemu.org/Features/Snapshots#Guest_Agent
[8]: http://linux.die.net/man/1/virsh