How we tested data deduplication
- 12 September, 2011 14:44
A Windows 2008R2 server was attached to a Compellent Storage Center S30 SAN with two volumes, both created from snapshots of a production server volume several months apart. Netbackup 7.0 was installed on the server.
Each appliance was installed, initial configuration completed, and attached to the server via a CIFS share, OST share, or as an iSCSI volume. A backup of the first volume was then completed. If necessary, a manual deduplication was scheduled as soon as the backup was completed, or the scheduling was set to immediately after backup. Time to complete backup was recorded as the time to finish the backup plus the time to deduplicate.
The time to complete the backup was nearly identical with the four Ethernet-attached appliances, indicating that the backup system was fully saturating the link. The space used was calculated by taking the total space available before any backups, and subtracting the space available after each backup - if the total space available before any backups was 10.000TB, and after the first backup there was 9.438TB available, then the first backup used 562GB.
The space used by the first backup was less than the actual size of the files. This is due to the fact that there were duplicates among the files being backed up, though not a large number. The second backup was of the same volume as the first, and the total space used for both backups was generally around 1GB more than for the first one, indicating that deduplication was entirely effective - the only additional space used was for indices or pointers to the first backup. The third backup was mostly the same as the first, with about 32GB of new or changed files. Efficiency on the third backup was also very high, with less than 32GB of additional space used by the new files.
For the two on-line file appliances, testing was different, since the data was not being backed up. The first test consisted of copying the same volume used for backup testing to the appliance. Both appliances used less space than the actual data. Additional 12 32GB .VMDK files (all of Windows 2008R2 VMs) were then copied to the appliances. These VMDKs contained almost entirely the same data, but had different file names and time/date stamps. Finally, an additional 20 560MB simulated user directories containing mostly the same files were copied to each appliance.
The Compellent SAN was not tested per se, but was used to illustrate another use of deduplication. A snapshot was taken of a production volume, and then another four months later. The two snapshots were both 589GB, but about 32GB of files had been added, deleted or modified. Each snapshot was then converted to a new volume and mounted on the test server. The three volumes, the original production volume, and the two test volumes, each contained 589GB of files, so if each was created separately, the total space used would be 1767GB. Instead, the total space used for the production volume and the two snapshots was 621GB.
Read more about data center in Network World's Data Center section.