Thursday, January 5, 2012

Storage

Steve Glowacki, Director Systems Engineering at OU, and I have had several interesting conversations since the holiday break.   Steve ended up doing trouble-shooting and problem resolution (with his team) over the break.    I thought I would interview Steve for this blog post.  This follows on to my earlier impacts post, as we talk about data and content, and the need to store ever-larger files.  Nothing is ever deleted, it seems.  What is back-up and disaster planning in this growing data environment?  We had a discussion about problems and tactics.

Steve, can you describe some of the challenges experienced over the break?
We noticed that the back-up amounts had basically quadrupled started just before the holiday.  We went from approximately 500GB differential per night to usually slightly more than 2TB per night.

Did you find out the source and stop it?
Finding the source wasn't terribly difficult but we couldn't stop it.  It was valid stuff to back-up.  There were a lot of changes going on.   Over the break, we allocated an additional 3 TB of storage to back-ups, which was everything that remained within the architecture, and added 10 additional LT04 tapes, which the system promptly consumed.  We placed an emergency order for 20 additional tapes while still on break.

What next steps did you take when you returned?
We started reviewing how to consolidate tape utilization for optimized tape utilization.  As tapes are written to, the tape is consumed or full.   Then as data ages off, portions of the tape are freed, but the physical tape is still consumed.  Out of 130+ tapes, a good portion have low percentage physical utilization.

What we are looking for now is a process or procedure that will consolidate the data in use, distributed across several tapes, to a single highly utilized tape.

The other thing we are looking at is migrating to a new complete new architecture for backups, restoration, disaster recovery and data de-duplication.

What are you looking at?
The architecture is based on VMWare / NetApp, so the backup environment needs to work very closely with that architecture.


So, a minute ago, you said you were trying to review something and said:

"In some obscure way you log into this thing and control backups, restorations, bare metal restores, or tape archiving....  The tech team is having discussions about how this all works."

Tell me more.
What I found funny was that it took two university engineers and a vendor sales engineer three days to find the compatibility list for tape libraries.  It is complicated.

What makes it complicated?
Tape is becoming very limited use.  

What should we be doing?
Remote site de-duplicated replication is one option.  The remote site may be here or the cloud.  Remote site is likely phase two.  Phase one is implementing solutions to work with the VMWare / NetApp architecture.  Phase two is the remote site capabilities.  The reason we are breaking it into two pieces is timing and procurement process.  De-duplication and how that is done technically is extremely important to consider.  

So several purchases have been made over the past couple days.
We purchased another shelf of high speed disks.  This will allow us to optimize server performance and through-put.  Looking at the total input/output per second across the two types of shelves we have, and made a decision to improve performance by going with smaller, faster disks for specific services.  For example, we have a virtual server which may have enhanced performance by being located on faster disks while the connected storage may be on slower disks.

A lot of analysis about what to put where...
It's an ongoing thing.  

And your second purchase?
Several software options driven by recent requests.  One request was video streaming, so one of the options is for turning on native shares for CIFS.   Another software request was for NFS, to allow for UNIX-based mounts.  This has the potential to additionally augment throughput.  Another is SnapVault, but it is just for swarming snapshots, and that takes us back to where we started this conversation.

Is your head spinning?
A bit.  It's my job, though.  A lot of conversations will be needed with the Network team too.