An article to understand the development of snapshot (Snapshot) technology

特邀专栏作者

2021-08-06 06:26

This article is about 6493 words, reading the full article takes about 10 minutes

With the continuous development of computer technology and network technology, the level of information technology has been continuously improved.

With the continuous development of computer technology and network technology, the level of information technology has been continuously improved. After mankind has entered the 21st century known as the information society, massive data-based applications such as digital communications, digital multimedia, e-commerce, search engines, digital libraries, weather forecasts, geological exploration, and scientific research have emerged, and various information presentations Explosive growth trend, storage has become the center of information computing technology. The requirements of applications for storage systems continue to increase, and the storage capacity continues to upgrade, from GB to TB, PB, and EB, and it is getting bigger and bigger.

Turing Award winner Jim Gray proposed a new empirical law: the amount of data produced every 18 months in the network environment is equal to the sum of the amount of data in history. At the same time, the dependence of modern enterprises on computers has been seriously enhanced, and information data has gradually become the basis for the survival of enterprises. Data damage or loss will bring huge losses to enterprises. Due to hackers, viruses, failure of hardware equipment, and natural disasters such as fires and earthquakes, the system and data information will be damaged or even destroyed. If it is not restored in time, it will cause huge losses to the enterprise. Therefore, backup disaster recovery technology appears to be particularly important. In particular, the catastrophic consequences caused by events such as 9.11 have made people more deeply aware of the value and significance of data information, and have paid more and more attention to data protection.

In the past 20 years, although computer technology has made great progress, data backup technology has not made great progress. The cost and cost of data backup operations are still relatively high, and consume a lot of time and system resources, and the recovery time objective and recovery point objective of data backup are relatively long. Traditionally, people have been using technologies such as data replication, backup, and recovery to protect important data information, and regularly back up or replicate data. Since the data backup process affects application performance and is time-consuming, data backups are usually scheduled to be performed when the system load is light (such as at night). In addition, in order to save storage space, full and incremental backup technologies are usually combined.

Obviously, there is a significant deficiency in this data backup method, that is, the problem of the backup window. During the data backup period, the enterprise business needs to temporarily stop providing external services. With the acceleration of enterprise data volume and data growth rate, this window may be required to be longer and longer, which is unacceptable for critical business systems. For institutions such as banks and telecommunications, information systems require 24x7 uninterrupted operation, and short-term downtime or loss of a small amount of data will cause huge losses. Therefore, it is necessary to reduce the data backup window as much as possible, or even reduce it to zero. Technologies such as data snapshot (Snapshot) and continuous data protection (CDP, Continuous Data Protection) are data protection technologies that have emerged to meet such requirements.

What is a snapshot (Snapshot)

A snapshot (Snapshot) is a mirror image of a data set at a specific moment, also known as an instant copy, which is a complete and available copy of the data set. The storage network industry association SNIA defines a snapshot as: a fully available copy of a specified data set, which includes an image of the corresponding data at a certain point in time (the point in time when the copy starts). A snapshot can be either a duplicate of the data it represents or a replica of the data.

Snapshots have a wide range of applications, such as as a source for backups, as a source for data mining, as a checkpoint to save application state, or even as a means of simple data replication. There are also many ways to create snapshots. According to the definition of SNIA, snapshot technologies are mainly divided into three categories: split mirror, changed block, and concurrent. The latter two are usually implemented using pointer remapping and copy on write techniques. The flexibility of the changedblock method and the high efficiency of using storage space make it the mainstream of the snapshot technology.

The first type of snapshot is mirror split. Data mirroring is constructed before instant copying. When a complete mirroring is available for replication, instant copying can be generated by instantly "separating" the mirroring. The advantage of this technique is that it is fast and requires no extra work to create the snapshot. But the disadvantages are also obvious. First, it is not flexible, and snapshots cannot be taken at any time; second, it requires a mirror volume with the same capacity as the data volume; third, continuous mirror data changes affect the overall performance of the storage system.

The second type of snapshot is to change blocks. After the snapshot is successfully created, the source and target share the same physical data copy until the data is written, at which point the source or target will be written to the new storage space. The shared data unit can be block, sector, sector or other granular levels. In order to record and track block changes and replication information, a bitmap (bitmap) is required, which is used to determine the location of the actual copied data and determine whether to obtain the data from the source or the target.

The third type of snapshot is concurrent. It's very similar to altering a block, but it always physically copies the data. When an instant copy is performed, no data is copied. Instead, it creates a bitmap to record how the data was copied, and does the actual physical copying of the data in the background.

Snapshot implementation of different storage levels

image description

Figure 1 Storage system stack and snapshot implementation

The storage stack consists of a set of hardware and software components that provide physical storage media for application systems running on the host operating system, as shown in Figure 1. Snapshots can be implemented in many different ways, and can also be implemented at different levels in the storage stack. They can be roughly divided into two types: software layer and hardware layer, and can also be divided into controller-based snapshots and host-based snapshots.

Controller-based snapshots are implemented at the storage device layer or hardware layer, managed by the storage system hardware provider and integrated into the disk array. This snapshot is done at the LUN level (block level), independent of the operating system and file system. Host-based snapshots are implemented between the device driver and the file system level, usually performed by the file system, volume manager, or third-party software. This type of snapshot does not depend on storage hardware, but on file system and volume management software. This snapshot acts on the logical data view, unlike controller-based snapshots, which act on the physical data.

Among the above storage levels, the physical storage layer and the volume manager are the two most suitable components for implementing snapshots. They can easily utilize physical storage and are currently the mainstream implementation level. Implementing snapshots at the file system layer is a viable option. However, applications such as databases directly choose to use logical volumes to implement snapshots because they cannot be managed by snapshot technology at the file system layer. Generally speaking, there is no need to implement snapshots at the application layer. For the backup mechanism, the underlying file system or volume manager interface can be used to implement, but the application needs to be temporarily suspended to ensure the consistency of snapshot data. In general, snapshots based on the software layer are easy to operate and provide better recovery, while snapshots based on the hardware layer tend to have higher performance and fault tolerance.

Snapshot Implementation Method and Technology

Snapshot technology can realize real-time image of data, and snapshot image can support online backup. Full snapshot is to achieve a complete read-only copy of all data. In order to reduce the storage space occupied by snapshots, people have proposed copy-on-write (COW, Copy-On-Write) and write redirection (ROW, Redirecton Write) snapshot technologies . In addition, there are other implementations of snapshot technologies, such as logs and continuous data protection, which can improve the performance of snapshots.

1. Mirror separation (SplitMirror)

The mirror splitting snapshot technology first creates and maintains a complete physical mirror volume for the source data volume before the snapshot time point arrives: two copies of the same data are respectively stored on the mirror pair consisting of the source data volume and the mirror volume. When the snapshot time point arrives, the mirroring operation is stopped, the mirrored volume is converted into a snapshot volume, and a data snapshot is obtained. After the snapshot volume completes applications such as data backup, it will resynchronize with the source data volume and become a mirror volume again.

For the source data volume that needs to retain multiple consecutive point-in-time snapshots at the same time, multiple mirror volumes must be created for it in advance. When the first mirror volume is converted into a snapshot volume as a data backup, the second mirror volume created initially Synchronize with the source data volume, and become a new mirror pair with the source data volume. The snapshot operation time of mirror splitting is very short, only the time required to disconnect the mirror volume pair, usually only a few milliseconds, such a small backup window will hardly affect the upper layer application, but this snapshot technology lacks flexibility and cannot Create snapshots for any data volume at any point in time. In addition, it requires one or more mirror volumes with the same capacity as the source data volume, and the overall performance of the storage system will be reduced when mirroring is synchronized.

image description

Figure 2 Copy-on-write snapshot

Copy-on-write snapshots use pre-allocated snapshot space for snapshot creation. After the snapshot time point, no physical data copy occurs, and only the metadata of the physical location of the original data is copied. Therefore, snapshot creation is very fast and can be done in an instant. Then, the snapshot copy tracks the data changes of the original volume (that is, the original volume write operation). Once the original volume data block is updated for the first time, the original volume data block is first read and written to the snapshot volume, and then the original volume is overwritten with the new data block. (Figure 2). Copy-on-write, hence the name.

This snapshot technology only creates a snapshot volume when creating a snapshot, but only needs to allocate a relatively small amount of storage space to save the updated data in the source data volume after the snapshot time point. Each source data volume has a data pointer table, and each record holds pointers to corresponding data blocks. When creating a snapshot, the storage subsystem creates a copy of the pointer table of the source data volume as the data pointer table of the snapshot volume. When the snapshot time point ends, the snapshot creates a logical copy that can be accessed by upper-layer applications. The snapshot volume and the source data volume share the same physical data through their respective pointer tables. After the snapshot is created, when some data in the source data volume is about to be updated, in order to ensure the integrity of the snapshot operation, the copy-on-write technology is used. To access the data in the snapshot volume, the physical storage location of the accessed data is determined according to the pointer of the corresponding data block by querying the data pointer table.

The copy-on-write technology ensures that the copy operation occurs before the update operation, so that data updates after the snapshot time point will not appear on the snapshot volume, ensuring the integrity of the snapshot operation. A copy-on-write snapshot will not occupy any storage resources before the snapshot time point, nor will it affect system performance; and it is very flexible in use, and can create a snapshot for any data volume at any point in time. The length of the "backup window" generated at the snapshot time point is linearly proportional to the capacity of the source data volume, usually a few seconds, which has little impact on the application, but the storage space allocated for the snapshot volume is greatly reduced; Occurs only when the source data volume is updated, so the system overhead is very small. However, since the snapshot volume only saves the updated data of the source data volume, this snapshot technology cannot obtain a complete physical copy, and there is nothing that can be done when encountering an application that requires a complete physical copy, and if the amount of updated data exceeds the reserved space, the snapshot will be lost invalidated.

image description

Figure 3 pointer remapping snapshot

This implementation is very similar to copy-on-write, except that the first write operation to the original data volume will be redirected to the reserved snapshot space. The snapshot maintains pointers to all source and copy data. When the data is rewritten, a new location will be selected for the updated data, and the pointer to the data will be remapped to point to the updated data. If the copy is read-only, then the pointer to the data is not modified at all. The redirected write operation improves the snapshot I/O performance. Only one write operation is required to directly write new data to the snapshot volume and update the bitmap mapping pointer at the same time; while copy-on-write requires one read and two write operations, that is, the original volume Data blocks are read and written to the snapshot volume, and updated data is written to the original volume.

It is not difficult to find that the snapshot volume saves the original copy, and the original volume saves the snapshot copy. This leads to the need to synchronize the data in the snapshot volume to the original volume before deleting the snapshot, and when multiple snapshots are created, the access to the original data, the tracking of the snapshot volume and the original volume data, and the deletion of the snapshot will become extremely complicated. Furthermore, the snapshot copy is dependent on the original copy, and the original copy dataset quickly becomes fragmented.

4. Log-structuredfile architecture

This form of snapshot technology utilizes log files to record write operations to the original data volume. All write operations on the original data volume are recorded in the log system, which is equivalent to generating a snapshot for each data change. Therefore, this is very similar to a database system transaction or file system log, and data can be recovered from the log or transactions can be rolled back to any reasonable state as needed. Strictly speaking, this method cannot be called a snapshot, but it can indeed achieve the goal of a snapshot. Many file systems have realized this function, such as ZFS, JFS, EXT3, NTFS, etc.

5. Clone snapshot (Copyon write with background copy)

The snapshots mentioned above basically do not generate a complete snapshot copy, which cannot meet the business needs of a complete physical data copy. The clone snapshot can generate a mirror snapshot consistent with the source data volume, and it fully utilizes the advantages of the copy-on-write and mirror split snapshot technologies. At the snapshot time point, it first uses the copy-on-write method to quickly generate a snapshot copy, and then starts a copy process in the background to perform the block-level data copy task from the source data volume to the snapshot volume. Once the replication is complete, a clone snapshot can be obtained through the mirror split technique. Clone snapshots also inherit the disadvantages of split-mirror snapshots. In addition to requiring a snapshot volume equal to the capacity of the source data volume, it will also affect the overall performance of the storage system to a certain extent.

6. Continuous data protection

The above snapshot technologies all have a common disadvantage, that is, it is impossible to create as many snapshots as desired at any point. Although log-type snapshots do not have the above disadvantages, they depend on specific file systems and cannot be directly applied to applications using different file systems, and are helpless for data applications that are not based on file systems.

Continuous data protection, also known as continuous backup, automatically and continuously captures changes in source data volume data blocks, and continuously and completely records the versions of these data blocks. Every data block change will be recorded to generate an instant snapshot, which is different from other snapshot technologies that create snapshots at snapshot time points. Because all write operations are recorded and saved, it is possible to dynamically access the data state at any point in time, providing fine-grained data recovery, enabling instant and instant recovery, and effectively reducing the recovery point target. The advantages of block-level continuous data protection technology are loose coupling with applications, high performance and efficiency, continuous and uninterrupted system operation, and no snapshot window problem. Its disadvantage is that it requires relatively high storage space, which is also the fundamental reason that limits the wide application of block-level continuous data protection technology.

The following table analyzes and compares the above snapshot technologies from different perspectives.

Conclusion and Exhibition

The snapshot technology is a major innovation to the traditional data backup and replication technology, which solves the problem of the backup window, effectively shortens the recovery time objective and the recovery time point objective, and has become the de facto storage industry standard.

Since the snapshot technology was invented, people have made a lot of significant improvements. The snapshot window keeps shrinking, from a few seconds to an instant; snapshots can be created almost at any time, the granularity is getting finer and the number is increasing; the performance of snapshots is greatly improved, and the impact on hosts and applications is reduced to micro; the flexibility of snapshots, The scalability and manageability are continuously enhanced. However, people's requirements for technological progress have never been endless. For various current solutions, snapshot technology still has a lot of room for improvement in terms of comprehensive performance, flexibility, and manageability. Storage vendors continue to launch new snapshot storage products or new versions, which is the most powerful proof.

【Reference】

【1】Snapshot.

http://www.snia.org/education/dictionary/s/#snapshot

【2】Point in time copy.

http://www.snia.org.cn/dic.php?word=p

【3】Alain AzagIIry, Michael E Factor, Julian Satran. Point-in-time copy，Yesterday, Today and Tomorrow[C]. College Park, USA: the 19th IEEESymposium on Mass Storage Systems. 2002:259-270.

【4】Snapshot.

http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html

【5】Yuan Xiaoming, Lin An. Analysis and comparison of several mainstream snapshot technologies. Microprocessor, No. 1, 2008.

【6】Wang Shupeng, Yun Xiaochun, Guo Li. A Review of the Development of Continuous Data Protection (CDP) Technology. Information Technology Letters, Volume 6, Issue 6, 2008.

【7】EMCTimeFinder.

http://china.emc.com/products/detail/software/timefinder.htm

【8】EMCTimeFinder.

http://china.emc.com/collateral/software/data-sheet/1700-timefinder.pdf

【9】HDSShadowImage.

http://www.hds.com/cn/products/storage-software/shadowimage-in-system-replication.html

【10】NetAppSnapshot.

http://www.netapp.com/us/products/platform-os/snapshot.html

【11】VeritasSnapshot.

http://eval.symantec.com/mktginfo/enterprise/yellowbooks/using_local_copy_services_03_2006.en-us.pdf

——End——

Filecoin