Setting up the lab environment – Deduplication

The next step for the lab or so-called home data center: Installing and Configuring Deduplication

I was going to use a USB stick for the Windows Server 2016 OS.
The main reason for this: DEDUPLICATION.

I did start out with a USB stick, but due to performance issues this was changed – read the follow-up post (https://blog.thomasmarcussen.com/follow-up-on-the-home-datacenter-hardware/)

The reason for having the OS on a separate volume: Deduplication is not supported on system or boot volumes. Read more about Deduplication here: About Data Deduplication

Let’s get started

Installing and Configuring Deduplication

  1. Open an elevated PowerShell prompt
  2. Execute: Import-Module ServerManager
  3. Execute: Add-WindowsFeature -Name FS-Data-Deduplication
  4. Execute: Import-Module Deduplication

Installing Deduplication

Now we installed data Deduplication and it’s ready for configuration.

My Raid 0 volume is D:
The volume will primarily hold Virtual Machines (Hyper-V)
I’m going to execute the following command: Enable-DedupVolume D: -UsageType HyperV

Enable Deduplication for volume

You can read more about the different usage types here: Understanding Data Deduplication

Some quick info for the usage type Hyper-V:

  • Background optimization
  • Default optimization policy:
    • Minimum file age = 3 days
    • Optimize in-use files = Yes
    • Optimize partial files = Yes
  • “Under-the-hood” tweaks for Hyper-V interop

You can start the optimization job and limited (if needed) the amount of consumed memory for the process: Start-DedupJob -Volume “D:” -Type Optimization -Memory 50

 

 

 

You can get the deduplication status with the command: Get-DedupStatus

 

 

 

 

The currently saved space on my volume is 46.17 GB
That is for a 2 ISO files and a reference machine for Windows Server 2016 and the reference disks copied to separate folder.

More usefull powershell cmdlets here: Deduplication Cmdlets in Windows PowerShell

I do love deduplication especially for virtual machines, hence most of the basic data is the same.
The disks are also rather expensive so getting the most out of them is preferred.