Site Recovery Manager – A Proven Test

Okay, so the key thing with any Disaster Recovery solution is to test it to make sure that it works… the unfortunate situation is that most companies will not allow you to go ahead and power down the production systems to confirm that Disaster Recovery will work.  This therefore means that we have to come up with solutions to try and test the environments as best as we can.

The company that I work for, on the other hand, said that the only way to prove that the company can survive a disaster was to perform a real disaster recovery test, in which the virtual machines in production would be powered down (as if affected by a real disaster) and powered up in the recovery location.  The recovery location would then run the virtual machines for a period of one week before failing them back to the production site.  The chosen software to provide this solution was VMware Site Recovery Manager.

Our preparation time for this disaster recovery test was short, as we had to coincide with some other systems being recovered and therefore we decided to perform this first test in this manner by recovering roughly 60 virtual machines to our recovery site.  We began this test just over 1 week ago, with all 60 virtual machines being recovered to our recovery site in 1hr 40 minutes (including some IP Address changes)… in reality this was actually only around 1hr but there was a long delay whilst the storage LUNs were removed from the production environment… in a real disaster, where the protected site no longer exists, these steps would be skipped.  This is a great improvement on our previous disaster recovery tests, where we were only able to recover roughly 50 virtual machines and it would take 4 – 8 hrs.

Site Recovery Manager worked well for this recovery although we did come across an unusual situation that we believe is linked to using Ephemeral Ports on a vDS with Site Recovery Manager.  Randomly, some virtual machines would be recovered but would not be connected to the network when they were powered up at the recovery site.  VMware seem to be a little blank on this when questioned about it but we believe that the number of available ports does not increase quickly enough to cope with the extra virtual machines being recovered and therefore the machines are left in a state where their network ports are disconnected.  We are currently testing a solution where we are switching back to Static port binding (where under v5 of ESXi, they will auto expand as required) to see if this resolves the issue.  I will provide an update on this once we have completed the testing.

This morning we performed the failback of the environment after utilising the ‘Reprotect’ functionality of the Site Recovery Manager software.  This again worked really well, the same delay was experienced when removing the storage and the same issue with nics on virtual machines being disconnected was experienced but the recovery time was roughly the same as the original failover.

Site Recovery Manager is a very capable piece of software and performs really well… my only gripe is around the cost of the solution… if you have the opportunity purchase this as part of the vCloud Suite (the Enterprise Edition has Site Recovery Manager included) as this proves to be more cost effective when utilising a high density virtual environment.  Purchasing Site Recovery Manager on its own is actually very expensive compared to other solutions but it does work.

About the Author


I have been in IT for the past 15 years and using virtualisation technologies for around the past 8 years. I started, as quite a lot of people do, working with PCs after playing with such iconic systems like the ZX81, ZX Spectrum and then progressing through 386s, 486s, Pentiums etc. After being headhunted at sixth form to work for a small company based around Hertfordshire, UK. I began working with small businesses and gaining a lot of hardware experience. Three years later, after helping to increase the size of the business, I needed to gain exposure to a larger environment to progress my own career. I joined a large manufacturing company around Electronic Test and Measurement which progressed my skills onto more PC work, hardware work and then onto Server Operating Systems. I progressed again onto a consultancy company based in Reading, UK. Initially working as an engineer performing hardware / software installations for larger companies contracted out to the consultancy company, I moved up into a Consultant position continuing my travel across the UK assisting and providing solutions to companies. I finally moved on again to my current position, working back in Hertfordshire, UK. Again working for a large manufacturing company, this time with over 50,000 users worldwide. I am responsible for the datacenter hardware, the storage environment, the vmware environment and also implementing their new Citrix XenApp farm. My days are busy but also productive, its a friendly environment and in my four years of being with the company, I have seen many changes in technology and infrastructure in use within the company. About the site I started this site as I had been thinking of having more of a presence on the web for a while. On a daily basis, I perform tasks and use tools that others may not use or may not think to do and therefore I thought that I would share some of these experiences and tips with others to help with their day to day work. Currently, my main focus of work is around VMware and Veeam Backup & Replication but hopefully as my tasks progress, I’ll be able to share useful bits of information about other areas of IT as well.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.