Disaster Recovery Testing

An Essential Component to the Ongoing IT Availability Lifecycle

 

Ensuring Recovery Goes As Planned

Even with the best technology and process, unexpected problems can arise when executing a recovery plan – especially under the pressure of a disaster. One of the best ways to uncover and mitigate these issues in a controlled environment is to build a routine of comprehensive disaster recovery testing.

 

“The most important aspect of a disaster recovery process implementation is testing and remediation.  Chances are organizations will not get it perfect the first time. That’s ok, what’s more important is the ability to identify the issues and correct them, on a regular basis.”

Matt Sprauge: Manager of Infrastructure Services, CDI Managed Services

 

Testing also provides an excellent opportunity to train employees, perform maintenance and set expectations across the organization on recovery capabilities and limitations. For companies serving clients with “always-on” business expectations, testing is an essential component to the ongoing DRaaS Availability Lifecycle.

DRaaS Availability Lifecycle

 

Recovery Assurance Lifecycle

Example Phases of DR Testing

Pretesting

During the pretesting process, a recovery team should review the entire plan to ensure testing runs as smoothly as possible. It’s important that each new recovery test challenges the technology and the team in ways that pervious tests have not. Additionally, efforts should be made to plan for scenarios where key IT team members may not be available to execute the plan.

Pretesting activities may include the following:

  • Reviewing recovery objectives and targets
  • Reviewing process checklists and recovery runbook
  • Verifying that production and the recovery environment are in-sync
  • Understanding test objectives to set the right expectations
  • Scripting out recovery for all VMs – to expedite recovery, especially in the event key team members are unavailable

Testing

If the pretesting phase has been thorough, the testing process should be straight-forward. During this time, systems and processes should be closely monitored for unexpected changes or problems.

Testing activities may include the following:

  • Recovery of all physical and virtual machines
  • Complete review of network connectivity
  • Ensure production replication is not impacted by the testing

Debriefing

After the test is completed, the results should be reviewed and matched to the original test objectives. If issues are uncovered during the testing process, the recovery runbook should be updated to reflect the required changes.

Debriefing activities may include the following:

  • Matching results to documented recovery objectives
  • Discussion of how the process could have been streamlined for a faster recovery
  • Communication of results to key stakeholders
  • Documentation of uncovered issues in runbook
  • At a high level, document objectives for the next test to prepare for and ensure a different testing scenario
Example of Issues and Challenges Uncovered During Tests
  • Network configurations: Networking settings may not account for the transition from production to the recovery environment
  • Load balancing: Performance issues may arise if the recovery environment is not prepared to accommodate the load balancing requirements of production
  • Firewalls: The recovery environment should mirror production to prevent access from unauthorized users, while also remaining available for authorized users
  • Technology co-dependencies: Issues may arrise due to changing IPs, remote or proprietary technologies and lack of proper change management
Choosing a Partner

With the daily pressures put on IT teams to meet increasing business demands, committing to ongoing disaster recovery testing can be challenging. Unfortunately, statistics show that most organizations fail to pass their own tests. To stay ahead, many organizations are turning to a partner to help them manage their availability with Disaster Recovery as a Service (DRaaS).

At InterVision, we pride ourselves on delivering confidence in IT systems availability through our Recovery Assurance™ program. We manage the recover testing processes end-to-end so that your team can focus on other more-immediate priorities. Contact us today if you’d like to learn more about our specific approach to recovery testing and how we help over-burdened IT teams meet availability demands through Recovery Assurance.