vSphere & Storage DR without SRM – Part 1: A thought Process

I’m hoping to provide a possible option to an issue that some people may come across in their environments. Creating a workable Disaster Recovery solution without utilising SRM… but the scenario I’m working through this solution for is a little more complex.

So, with a normal vSphere environment there are a number of ways to provide a Disaster Recovery solution… many of them I have used myself. With two data centres being used, you could utilise Site Recovery Manager for your DR option (as long as you have two vCenter servers as well). Another option is to utilise vSphere replication on its own to provide a level of Disaster Recovery with manual failover. Other options include tools such as Veeam Backup and Replication to provide a replica of the virtual machines in the second site.
The scenario I’m investigating is where there is already array replication taking place between two data centres and where the data centres are configured in vCenter in a metro cluster. In this scenario, there is no desire to introduce a second vCenter server. There is no desire to switch away from the array replication process, as this is more efficient than other solutions. The final item is that there is no desire to have a number of additional virtual machines showing in the environment from other forms of replication. Please note that although the storage is replicated, the LUNs are only actively presented to a single data centre at a time and therefore a metro cluster failover is not an option.
As a side note, when it comes to the actual commands, they will be created for an IBM XIV environment.

So my thought process is to create a process (which may utilise different scripts) that will allow the environment to perform a test failover to confirm operation (failback as well) and that could also be performed in a real disaster scenario.

With this in mind, we have to think about the obvious disconnect between the storage replication and the virtual machines themselves. Understanding where the virtual machine files reside (well at least the vmx file), what datastore name and path is going to be a requirement for almost any solution in this type of scenario. So I would think that we should have a script running that will regularly take an extraction from the vCenter environment of the virtual machines, their datastores and paths.
We may also want to take a regular export of the LUN ID mappings on the storage as well, this is all data that may prove to be useful later on.

For a Disaster Recovery Test, my thought process says that the following steps will be required:
* Power down VMs in primary data centre
* Remove VMs from primary hosts – remove from inventory
* Unmount storage in primary data centre
* Remove storage mappings for LUNs on primary storage
* Refresh hosts in primary data centre

* Stop Storage Replication and wait for final replication

* Map storage LUNs to secondary hosts from secondary storage
* Mount storage in secondary data centre
* Register VMs into secondary data centre using secondary storage
* Power on VMs on secondary hosts
* Test communications

* Reverse Storage replication
* Start storage replication

The process can then be repeated for a failback.

In a real disaster, the process would probably begin at the secondary location section. This process also means that you are focusing on ALL VMs on the particular LUNs you are failing over, you can’t do a piecemeal recovery.

Over the next few parts, I hope to develop this solution further and start to work on the relevant scripts to piece it all together.

About the Author


I have been in IT for the past 15 years and using virtualisation technologies for around the past 8 years. I started, as quite a lot of people do, working with PCs after playing with such iconic systems like the ZX81, ZX Spectrum and then progressing through 386s, 486s, Pentiums etc. After being headhunted at sixth form to work for a small company based around Hertfordshire, UK. I began working with small businesses and gaining a lot of hardware experience. Three years later, after helping to increase the size of the business, I needed to gain exposure to a larger environment to progress my own career. I joined a large manufacturing company around Electronic Test and Measurement which progressed my skills onto more PC work, hardware work and then onto Server Operating Systems. I progressed again onto a consultancy company based in Reading, UK. Initially working as an engineer performing hardware / software installations for larger companies contracted out to the consultancy company, I moved up into a Consultant position continuing my travel across the UK assisting and providing solutions to companies. I finally moved on again to my current position, working back in Hertfordshire, UK. Again working for a large manufacturing company, this time with over 50,000 users worldwide. I am responsible for the datacenter hardware, the storage environment, the vmware environment and also implementing their new Citrix XenApp farm. My days are busy but also productive, its a friendly environment and in my four years of being with the company, I have seen many changes in technology and infrastructure in use within the company. About the site I started this site as I had been thinking of having more of a presence on the web for a while. On a daily basis, I perform tasks and use tools that others may not use or may not think to do and therefore I thought that I would share some of these experiences and tips with others to help with their day to day work. Currently, my main focus of work is around VMware and Veeam Backup & Replication but hopefully as my tasks progress, I’ll be able to share useful bits of information about other areas of IT as well.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.