Optimizing NetApp Block IO Performance


Here is my guide to getting the most out of your NetApp storage solution for the experienced storage professional. This guide is focused on optimizing performance within the NetApp Data OnTap solution and not with how you may be connecting.

Thin provisioning, common belief is that thin provisioning weather it is on the volume or the LUN will lead to performance issues due to fragmentation over time and that it is best practice to full reserve your volumes, LUNs and snapshot reserve. This belief is false the WAFL (Write Anywhere File Layout) will fragment sequential blocks over time, environments were you have large files such as virtual disks or large database files are more prone to the effect of a performance impact over time. This is a way by design NetApp writes anywhere on the aggregate in a effort to reduce the latency to write data to disk, the problem is that over time the performance grains of sequential reads are lost as data becomes fragmented and read ahead benefits are lost. Given how data is written their is no reason to fully reserve you volumes and LUN’s. Next lets look at how to easily fix this and other performance issues.

First we need to configure a NetApp option for de-duplication, by default when you enable sis on a volume the schedule is to automatic and started when a percentage of delta occurs on the volume. This is great as long as you a bunch of volumes and multiple volumes don’t decide to de-dupe at the same time, so you could set each volume on a schedule and de-dupe volumes that don’t need it. Or you could limit the number of concurrent volume that can de-dupe at the same. Use the following command to set the limit:

options sis.max_active_ops 1

Example: Thin provisioning block storage to a Hyper-V host for a virtualized SQL server, the VM will need 4 virtual disks:

  • 60GB: Operating System Disk
  • 400GB: SQL Data Disk
  • 200GB: SQL TLog Disk
  • 100GB: SQL TempDB Disk

The first step is to perform a sizing calculation we are going to allocate a lot of white space on disk be it won’t be lost because it is all thin provisioned.

120GB = 60GB Virtual Disk + 60GB for Hyper-V Config and Hyper-V Snapshots + 24GB (20% Reserve)

480GB = 400GB Virtual Disk + 80GB (20% Reserve)

240GB = 200GB Virtual Disk + 40GB (20% Reserve)

120GB = 100GB Virtual Disk + 20GB (20% Reserve)

Adding it up we will provision 960GB in LUN allocations if we add our NetApp 20% snapshot reserve (196GB) we will end up creating a thin provisioned volume that is 1156GB in size.

Note: If we were building two SQL servers for database mirroring we put LUNs for both virtual SQL servers in the same volume so that de-duplication would reclaim additional storage.

To create the volume and LUNs and map them to a hyper-v host initiator called HV1 we would do the following:

vol create volSQL1 –l en_US –s none aggr0 1156GB

vol options volSQL1 no_atime_update on

vol options volSQL1 snapshot_clone_dependency on

vol options volSQL1 read_realloc space_optimized

snap autodelete volSQL1 on

snap sched volSQL1 0 1 0

sis on /vol/volSQL1

sis config –s auto /vol/volSQL1

reallocate measure /vol/volSQL1

qtree create /vol/volSQL1/qtree

lun create –s 144g –t hyper_v –o noreserve /vol/volSQL1/qtree/SQL1_OS.lun

lun create –s 480g –t hyper_v –o noreserve /vol/volSQL1/qtree/SQL1_DATA.lun

lun create –s 240g –t hyper_v –o noreserve /vol/volSQL1/qtree/SQL1_TLOG.lun

lun create –s 120g –t hyper_v –o noreserve /vol/volSQL1/qtree/SQL1_TEMPDB.lun

lun map /vol/volSQL1/qtree/SQL1_OS.lun HV1

lun map /vol/volSQL1/qtree/SQL1_DATA.lun HV1

lun map /vol/volSQL1/qtree/SQL1_TLOG.lun HV1

lun map /vol/volSQL1/qtree/SQL1_TEMP.lun HV1

The above script does the following tasks:

  • Create a new thin provisioned volume in aggr0
  • Set the volume to not record the last access timestamps on the LUNs within the volume
  • Set  the volume allow us to delete snapshots from the volume even if a busy snap is present
  • Set the volume perform read reallocations, (read below for more detail)
  • Set the volume to automatically delete snapshots if you are running of space
  • Set the volume to not take snapshots as this should be done with a NetApp SnapManager
  • Enabling de-duplication on the volume
  • Setting de-duplication on the volume to automatic
  • Enabling reallocation measurements on the volume
  • Creating a qtree within the volume
  • Creating multiple thin provisioned LUNs for host type of Hyper-V
  • Mapping multiple LUNs to a initiator with a name of HV1

To resolve the defragmentation created over time by WAFL we use reallocation (defragmentation) their are two ways to reallocate:

  1. Manual reallocation, if you have never preformed a reallocation on an existing volume you may want to manually run a single pass of a reallocation task. Note to run a manual reallocation you must remove your snapshots from the volume. This may take some time to complete and is an expensive operation on system resources.
  2. Read reallocation, as blocks are read the NetApp automatically examines the block and optimizes the data layout by moving the block to an optimized location on disk (small defragmentation)

This is interesting because as data is written to the WAFL in a fragmented manager for performance when your application reads that data it automatically gets optimized for performance. This also has another advantage if your IO pattern changes it will automatically adjust to the new pattern. For example if a SQL table was re-indexed/reorganized on disk or the access pattern was changed by a modification to a stored procedure the NetApp’s read reallocation would automatically detect the change in the IO pattern and optimize it on disk.

Reallocation measurement was turned on so that you can pull a reallocation status to see the state of fragmentation on your volumes with ‘reallocate status’.

Using this design you will obtain the cost benefits of thin provisioning, and de-duplication while maintaining optimal performance over time automatically.

Advertisements

8 thoughts on “Optimizing NetApp Block IO Performance

  1. Read reallocation, nice theory. But what happens when a backup agent takes a incremental backup during the week and full backup during the weekend, on LUNS where it’s not possible to take a consistent snapshot ? I think the IO Pattern will change a couple of times then ? Not very interresting for read_realloc functionnality, I suppose ?

  2. In that scenario I would recommend setting turning off read reallocation during your backup if that is a concern.

    But its important to note read reallocation isn’t forced on a single pass. For example one adhoc single pass data read won’t cause the data to get reallocated unless that is the only read on that block.

    Consider a scenario where you have a OLTP data warehouse storing 5 years of data on a LUN and you have reports that run throughout the day on the last year of data. Block read data patterns for the last year would get optimized for read patterns of report query. Then you are performing a daily full backup on that databases, the block data for the last year would remain unchanged but data older than a year would get optimized for backup as only the backup reads them. In this method you get the best of both 1 year of read optimized for the application and the rest optimized for backup. An additional benefit is as data ages you don’t have to manage the read patterns they will over time get optimized between application and backup read patterns.

    • Thanks for the feedback. For us it isn’t usefull however, since we have a syncmirror setup and then the read_realloc isn’t possibl, I think (just like aggr reallocation and vol reallocation -p aren’t posssible). Very unpleasant to find out. Nobody tells you that when they promote syncmirror configs !

  3. I don’t run this in my environment but I’m thinking I should after reading this blog. I don’t know of a lot of other folks running this either.(I’ve asked around)

    How are other folks doing this? Do you turn it on on everything or just particular volumes for specific workloads?

    Should you just leave it on or run it manually and turn it off?

    Trying to understand how I would implement this in a large environment.

    Thanks

  4. It depends on your environement. Start with a one time reallocation measurement on the volumes. Results will tell you if (scheduled) reallocation could be interresting.

  5. Good article, but as this stuff is not very well documented there are a few finer points I thought I’d add.

    When accepting writes Data ONTAP collects data from a given point in time and writes it out as a batch to disk. So data written at about the same point in time will be next to each other on disk. So for fully sequential workloads, writes will be next to each other on disk, and subsequent reads of them will be fast requiring few disk seeks. For fully random workloads the on disk placement will have temporal locality, that means data written around the same time will be around each other on disk. So at worse case a fully random workload will still be fully random on disk, in a better case where related data is written random and then read later on it might perform better. The one workload that is not natively handled by Data ONTAP is random write followed by sequential read. In that case the sequential reads are going to force random reads at the disk layer. Typical workloads that are sequential read after random write are traditional streaming backups, database verification jobs, and database batch jobs.

    The way the read_realloc volume option works is whenever Data ONTAP notices that a sequential read from the client was not sequential on disk it re-writes that data out. The next time a sequential read comes in for that data it will also be sequential on disk.

    If you use the reallocate command to do it manually Data ONTAP will rewrite anything that isn’t sequential on disk. Because it’s running as a batch job it has to scan all the data, read it in, write it out. It may also be rewriting stuff that is never read sequentially; i.e. work that yields no benefit. Sometimes we have data that is never read sequentially, in which case who cares if it is sequential on disk!

    Since the read_realloc volume option which just re-writes data it already has in memory and knows is read sequentially and to me the better technique in 99% of cases will be read_realloc. The only downside is you have some additional write workload during normal operations (as opposed to in some time window of low activity as you could schedule).

    You should use the volume option read_realloc or reallocate command, not both. If you have an existing volume you might do a manual reallocate (reallocate start -f -p /vol/volname) as a one time fix-up and then enable the volume option to keep it that way. If using read_realloc volume option there is also no point in enabling reallocate scans as in the steps.

    There is also a comment that a manual reallocate requires deleting of snapshots, this is not true if you use the -p option [which has been around from DOT 7.3 at least]. Also as of 8.1.1 you can also use reallocate -p option and read_realloc volume option on SyncMirror / MetroCluster configs.

    Hope this helps!

    Christopher Madden
    Storage Architect, NetApp EMEA

  6. Just wanted to point out that you would never want to dedupe both the principal and mirror copies of the same SQL Server database. The whole point of database mirroring is to protect your data in case the primary copy becomes unavailable or corrupt. Having both on the same array is asking for trouble.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s