3. A collection is the set of files (or virtual tapes) and logical filesystem snapshots under which the
system identifies and eliminates duplicates inline, before writing compressed deduplicated segments to
physical disk. Segments are unique within the collection (not counting specific duplicates to enable
self-correction or recovery). A Data Domain system has a single collection that is stored in a log of
segment locality containers. For more about segment localities, see the white paper EMC Data Domain
SISL Scaling Architecture.
4. These collection containers layer over RAID-enabled disk drive blocks. Data Domain deduplication
storage systems use Data Domain RAID 6 internal disk and storage expansion shelves.
Leveraging logical storage layers to meet different replication
requirements
Data Domain Replicator software offers three replication policies that leverage these different logical levels
of the system for different effects. Replication using any of these three policies enables the same
applications to use data on the replica that use it on the originator, as soon as it arrives. This is critically
important for optimizing utility of the replica for DR and archive access.
At a high level, the three replication policies are:
" Directory replication transfers deduplicated changes of any file or subdirectory within a Data Domain
filesystem directory that has been configured as a replication source to a directory configured as a
replication target on a different system. Directory replication offers the most flexible replication
topologies including system mirroring, bi-directional, many-to-one, one-to-many, and cascaded,
enabling efficient cross-site deduplication. With the VTL namespace, specified pools of virtual
cartridges may be treated as a directory with respect to this policy.
" NetBackup optimized duplication is an application-specific variant of directory replication
mechanisms integrated with Symantec s NetBackup OpenStorage option. Optimized duplication
directly transfers one backup image at a time on request from NetBackup. NetBackup keeps track of
all copies, allowing easy monitoring of replication status and recovery from replica copies. In the Data
Domain implementation, this leverages the method s underlying directory replication, yielding the
same cross-site deduplication effects and flexible network deployment topologies.
" Collection replication performs whole-system mirroring in a one-to-one topology, continuously
transferring changes in the underlying collection, including all of the logical directories and files of the
Data Domain filesystem. Directory and collection replication modes support multi-protocol and multi-
application access to the data. While collection replication does not support the flexibility of the other
two policies, it is very simple and lightweight, so it can support higher throughput and more objects
with less overhead, ideal in high-scale enterprise cases.
By design, DD OS is tuned for backup, archive, or other nearline applications. A backup typically includes
all of the changes made over the previous 24 hours; therefore synchronous replication of backups of such
hours-old data is overkill. The replication must be timely but it is more important to handle small, flaky
networks and recover gracefully with very high data integrity and resilience, therefore all replication
approaches operate asynchronously. A detailed examination of each replication policy follows.
Directory replication
With directory replication, a directory (and all files and directories below it) on a source system replicates
to a destination directory on a different system, as seen in Figure 2. The destination directory will be read-
only and it can coexist on the same system with other replication destination directories, replication source
directories, and other local directories, all of which will share deduplication in that system s collection. As
a result, directory replication offers a wide variety of topologies: simple system mirroring, bi-directional,
many-to-one, one-to-many, and cascading.
With directory replication, a replication context pairs a source directory (specified as a pathname) with a
destination directory on different systems. With DD OS version 4.8, an EMC Data Domain Global
EMC Data Domain Replicator
A Detailed Review
6