Skip to content
Paolo Cozzi edited this page Dec 16, 2015 · 5 revisions

Introduction

What is a snapshot

A full backup of a large data set may take a long time to complete. On multi-tasking or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being atomic and introduces a version skew that may result in data corruption. For example, if a user moves a file into a directory that has already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. Version skew may also cause corruption with files which change their size or contents underfoot while being read [1].

A simple solution to overcome this is to temporarily disable write access to data during backup, by stopping the accessing applications, shutting down the sistem or using locking API in order to provide read-only accesses. Such solution require lengthy downtime of the services when backup is still running. To avoid downtime, high-availability systems may instead perform the backup on a snapshot - a read-only copy of the data set frozen at a point in time - and allow applications to continue writing to their data.

Snapshot is the ability to record the state of a storage device at any given moment and preserve that snapshot as a guide for restoring the storage device in the event that it fails. A snapshot primarily creates a point-in-time copy of the data. Point in time copy or snapshot are virtual or physical copies of data that capture the state of data set contents at a single instant. Both virtual (copy-on-write) and physical (full-copy) snapshots protect against corruption of data’s content. Additionally, full-copy snapshots can protect against physical destruction. These copies can be used for a backup, a checkpoint to restore the state of an application , data mining, test data and kind of off-host processing. An important peculiarity of these copies is that from the application point of view the copy seems to occur atomically. That means that all the data updates that happens on the original data are applied before or after the point in time copy. Typically, snapshot copy is done instantly and made available for use by other applications such as data protection, data analysis and reporting, and data replication applications. The original copy of the data continues to be available to the applications without interruption, while the snapshot copy is used to perform other functions on the data [2].

Snapshot technology is becoming prevalent to perform data protection and other tasks such as data mining and data cloning. Most leading storage hardware and software vendors provide snapshot support. Use of snapshot technology for data protection offers critical business value, such as zero impact backup with minimal or no application downtime, frequent backups (for example, hourly) to reduce recovery time, efficient backup of large volumes of data, reduced exposure to data loss, and instant recovery from snapshot [3]

Rerences

[1]: https://en.wikipedia.org/wiki/Snapshot_(computer_storage) [2]: http://www.veritas.com/community/blogs/storage-foundation-point-time-copy-or-snapshot [3]: http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html