Trying out deduplication

I moved to DragonFly 2.10 over the past few days, and I tried out deduplication, to see what kind of results I would get.  The procedure is outlined below.  I’m using /home here as an example, just to reduce the amount of text pasted in.

/pfs/@@-1:00004     966000640 566434576 399566064    59%    /home

Move my various Hammer pseudo-file systems to version 5, which supports deduplication.

# hammer version-upgrade /home 5

Issue a deduplication simulate command, to see what it guesses will be the savings:

# hammer dedup-simulate /home
Dedup-simulate /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4
Dedup-simulate /home succeeded
Simulated dedup ratio = 1.22

That ratio turned out to be pretty accurate for the actual deduplication.  I didn’t time it, unfortunately.  I don’t know if the time taken is proportional to the amount of deduplication or the total volume of data, though I suspect the latter.

# hammer dedup /home
Dedup /home: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 4
Dedup /home succeeded
Dedup ratio = 1.22
462 GB referenced
378 GB allocated
14 MB skipped
6869 CRC collisions
0 SHA collisions
0 bigblock underflows

The end result?

/pfs/@@-1:00004     966000640 505887504 460113136    52%    /home

That data space is shared across all file systems, and it’s a 1TB disk, so it’s 7%, or 70GB. I was hoping for more, but I don’t have any obviously duplicated data (no local mail store, no on-disk backups), so perhaps this is normal. 70GB that I didn’t have before is no bad thing, though.

Incidentally, I was able to upgrade my installed software from pkgsrc-2009Q4 to pkgsrc-2011Q1 entirely using pkg_radd -u <pkgname>.  Remarkably quick and painless, though pkgin may have been able to do it even faster since it would pull from the same place.

Posted by     Categories: DragonFly, Google Code-In, Hammer, pkgsrc     1 Comment
1 Comment on Trying out deduplication

Respond | Trackback

  1. […] won’t surprise you if you’ve seen previous deduplication links here, like my previous results or some real-world tests.  It’s still great.  I’d suggest turning it on if you […]