Carnegie Mellon University

Disaster Recovery Services Management

Disaster Recovery Services facilitates the coordination, guidance, and assistance in the creation and ongoing management of:

  • Application Records
  • Disaster Recovery Planning
  • Disaster Recovery Solutions
  • Annual Disaster Recovery Exercises

Application Records is a data collection activity that is achieved by leveraging Technology System Descriptions as well as via interview with technology owners and administrators of an Application/IT Service to understand:

  • Description of the Application/IT Service and its service availability requirements
  • Dependencies that the Application/IT Service has in order to operate (i.e., other Applications/IT Services, Components – Servers, Network Connectivity, Vendor)
  • Potential risk impact that the University could experience in the event the Application/IT Service could not recover and restore service within its recovery objective(s).

Disaster Recovery Planning documents the actions and activities that technology teams will execute to:

  • Assess and Respond to the impact of a disruption
  • Recover and Restore the Application/IT Service within the required recovery objective
  • Resume and Validate Application/IT Services to users, mitigating the business impact of a disruption to dependent users of the Application/IT Service.

Three Disaster Recovery Solutions are available:

  • High Availability – dedicated machines at both Primary and Secondary Data Centers with load balancing to enable automatic failover.
  • Active/Passive – dedicated machines at both Primary and Secondary Data Center with manual intervention required to enable failover.
  • Bridged Net Image Restore – dedicated machines at Primary Data Center with virtual machines on reserve at Secondary Data Center with manual intervention required to enable failover.
Disaster Recovery Exercising validates the viability of a Disaster Recovery Plan by enabling technology teams and their business partners to execute the actions and activities described in the Disaster Recovery Plan. Exercise information and results (i.e., Recovery Time Capability, i.e., RTC) are maintained within the Fusion Framework, a third-party cloud solution from Fusion Risk Management that resides on a Salesforce platform.

The Recovery Time Capability (RTC) of an Application/IT Services is calculated based upon the length of time (hours) that it took to execute the Disaster Recovery Plan. The RTC is populated within the Application Record of an Application/IT Service to demonstrate and enable transparency to dependent users if the Application/IT Service meets their recovery requirements.