Throughout The Shortcut Guide to Availability, Continuity, and Disaster Recovery, we have explored how to address the business and technical requirements of data protection. Some of the requirements are obvious and apply to all organizations: restoring from isolated failures, for example, accidently deleting a file, and recovering from catastrophic failure, such as a natural disaster that destroys a data center. There are also less obvious technical and business needs. For example, server virtualization is widely adopted for its ability to improve server utilization and help control costs, but it introduces additional technical challenges with regards to backup and recovery. Business strategies can also influence recovery management objectives. A move to improve customer service by providing longer periods of access to online data directly affects the cost and required resources of recovery services.
These and other considerations have been woven into both the business strategy discussions and the technical assessments documented in earlier chapters. In this chapter, we take a different approach and consolidate key recovery management issues according to business types and the special case of failover recovery. We will consider five scenarios. Each scenario delves into typical business and technical issues faced by particular types of businesses or technology use cases; in particular, we will consider:
These scenarios are not mutually exclusive. Some of the discussion of small business backup and recovery services may be relevant to midsize businesses, especially those with remote offices. Similarly, virtual machine recovery management may be relevant to all types of businesses, regardless of size. Continuity and failover is such an important topic that we address it separately, although we will touch on failover in other sections when relevant. We will conclude this guide with a summary of best practices in availability, continuity, and disaster recovery.
When it comes to recovery management, one size does not fit all. Business requirements will vary by industry and company size. Consider how different the needs may be in the following examples that highlight different industries:
These examples show the range of needs with regard to recovery management. Perhaps the most telling aspect of these scenarios is the one common theme: ad hoc solutions may work for some period of time but sooner or later core systems must be recovered. Recovery management is not an option; it is a requirement in business. What type of recovery management solution you implement will vary according to your needs.
Figure 4.1: When data is lost or systems fail, it is only a matter of time before ad hoc solutions fail and business operations suffer significantly.
The notion of how long your business can operate without key systems is a helpful first step to understanding your recovery management needs. Thought experiments like this help get you into the right ballpark. To get the kinds of precise information you need to make business decisions, you have to delve into more detailed questions. In the next several sections, we will consider different business and technical scenarios and see how they influence a recovery management strategy. Across these scenarios, we will consider several key dimensions of recovery management, including:
How you answer these questions dictates the type of recovery management solutions you will put in place. There is no one set of answers for particular‐size businesses or for particular industries. A small health care provider may be subject to the same regulations as a large network of hospitals. A midsize construction company will likely have different recovery management needs than a similar‐size retailer. In the following scenarios, we consider a range of examples that show how to understand the dimensions of recovery management, ask relevant questions, and discern the information about one's business that will help you understand the requirements of your business.
Mention the term "small business," and you are likely to conjure up images of aspiring entrepreneurs filling niche markets while creating more jobs than their larger enterprise counterparts. Ask a small business owner what it is like running one of those companies and you'll probably get some of that positive imagery along with a healthily dose of reality. That reality includes a significant amount of time spent with accountants, lawyers, and bankers on top of the time spent on "real" work. Now add to the list the need to be the resident IT professional, and you can understand that recovery management might not get the attention it deserves.
One of the first requirements for small business recovery management is that it should not require a significant amount of time to manage backups. Small businesses, by definition, have limited staff. Automation and ease of use is a key factor. Backup software that provides agents for desktops, laptops, and servers can reduce the systems management overhead for small businesses.
In addition to easy‐to‐use software, small businesses should have easy‐to‐follow practices. If you have 10 desktop PCs and are installing backup agents on all of them, it is probably best to keep all 10 on the same backup schedule. That schedule should fit the needs of the most critical data. For example, the CFO's desktop should be backed up every night (at least). A good case could be made for not backing up as frequently the shared desktop used by part‐time interns. After all, the data on that device is less important than the company financial data. The problem is that this strategy optimizes for storage space but increases the complexity of maintaining backup scripts and procedures. You now have two scripts to maintain rather than one. The savings in storage space is hardly likely to offset the extra management overhead.
In small businesses, a single, standardized backup schedule is better than many different ones. Larger businesses should classify their data and establish recovery management strategies based on different levels of criticality. For the small business, it is safe to assume all data is critical. (The one exception to this rule regards archiving, more on that follows).
Small businesses should define their RTOs and RPOs. If core business operations cannot continue without particular systems, those systems require short RTOs; financial services and retailers are examples. For other types of businesses, an RTO of several hours may be acceptable with next‐day recovery conceivably sufficient for some low‐demand systems.
RPOs address the question of how much work and data can be lost without adversely affecting the business. Weekly backups may be sufficient for businesses with relatively slow rates of change and those with comprehensive paper trails. In those cases, employees could re‐enter transactions lost since the last backup. With advances in backup software, including ease of use and data deduplication technologies, it is probably a better idea to implement incremental nightly backups. There is not likely any marginal increase in the cost of backup software, additional storage costs are marginal, and the reduced risk is often worth those small, additional costs.
A fire, flood, or other natural disaster can wipe out small business. If buildings are damaged, inventory destroyed, and paper files are lost, a small business could still recover. New office and industrial space can be rented and insurance can help cover inventory losses. Offsite backups may be the only way to recover important business records and transaction data.
Disaster recovery for a small business does not have to entail a complex set of data replication services and high‐availability servers (although those are important for larger midsize and enterprise businesses). A simple strategy of keeping a full backup of all business data in a bank safe deposit box or other secure, offsite location can mean the difference between a disaster that sets back a business and one that breaks it. Cloud computing should also be considered. Cloud providers offer offsite storage that does not require you to transport, store, and manage physical media; as cloud services are available from any where with Internet access, your backups are readily accessible from virtually any location.
When it comes to keeping copies of data for long periods of time, you have to ask yourself what data is worth keeping. In larger businesses, the cost of data management warrants categorizing and prioritizing different types of data in the business. The high‐priority and more critical the data, the more protection it receives. We noted earlier that small business should treat all business data as equally business critical in order to reduce management overhead. This keep‐it‐simple strategy can run into problems when we archive data for long periods of time.
The difference is that when we backup up data, it is written to tapes or disks on site. Both can be reused according to some schedule that meets data protection requirements. For example, tapes or disk space used for incremental backups can be reused once a full backup is made (assuming an RPO does not require the ability to restore to that incremental point). The goal of archiving is to keep a long‐term copy of data, preferably in an offsite location in case of disaster. A practical consideration is how much storage media can be kept in an archive location, like a safe deposit box. Another question is, What is worth keeping long term? This is where the "treat all data as critical" rule of thumb breaks down for small businesses. Legal and financial professionals should be consulted on this question.
Archiving is a recovery management concern where we start to see some of the issues that midsize businesses have to address. Figure 4.2 shows a chart depicting the relative importance of various recovery management requirements to a typical small business. Of course, small businesses will vary in which topics they find important and this graphic is not meant to categorically describe all small business; however, it is useful for understanding the differences small businesses face when compared with midsize or emerging enterprises.
Figure 4.2: Top priorities for small businesses are easeofuse considerations and the ability to provide basic data protection features.
As companies grow in size, additional requirements move from the "nice to have" category to the "essential for business" category. Growing from small to midsize business tends to bring with it more demands on recovery management and data protection.
In that way, a midsize business is more like an emerging enterprise with more risks to manage. In other ways, a midsize company may be more like small businesses in that they cannot support a substantial internal IT staff with deep and broad experience in a range of IT management and technical issues. This results in a typical set of requirements that is a mix of both small business and emerging enterprise requirements.
Ease of use, the ability to meet somewhat more stringent RTOs and RPOs, and increasing need for disaster recovery and more formal life cycle management procedures will be found in midsize companies. Another area of concern that midsize and larger businesses face is the need to protect data in remote office locations.
Midsize and lager businesses often face the challenge of protecting data in remote office locations. Each of these remote offices can, in some ways, be considered like a small business. There are multiple devices requiring backup services but no dedicated staff to ensure data is properly protected. For the midsize business, there are a number of considerations with regard to remote office backups, including:
All of these considerations can be met with a backup system that allows for remotely managed backups. Backup agents can be installed on remote office desktops and servers. The agents then function with backup servers in a central office to perform backup, restore, and management reporting operations. There is minimal need for onsite tasks, except for steps such as powering on devices. (Although this is less of a problem with remote management software that takes advantage of hardware that allows for network‐based power‐on and other basic management tasks).
The scenario depicted in Figure 4.3 assumes fixed, remote offices. In addition to these, midsize businesses should consider data protection for mobile users. We are far less tethered to our offices than we were even 10 years ago. Businesses do not need a dedicated office in a region to have a presence there. The US map in Figure 4.3 could easily become a global map if we were to include regional representatives for a midsize business who might be located in Asia, Europe, Oceania, or other regions of the world.
Figure 4.3: Remote offices should back up to centralized backup servers for efficiency, reliability, and manageability.
Note, when selecting backup systems to support laptop users, consider how often those users will have slow or unreliable Internet connections. Ideally, backup software will be robust enough to reliably perform backup and restore operations in spite of less than ideal connectivity.
Figure 4.4: Top priorities for midsize businesses are similar to small business priorities but there is increasing emphasis on RPOs, RTOs, and amount of data to protect. Remote office backups become a concern for midsize companies.
Moving to larger businesses and emerging enterprises brings with it more complexities in recovery management and more diversity in requirements. Rather than try to capture a typical large business, we will consider three broad but common areas of concern:
Operations management in larger organizations includes virtually all the requirements found in small and midsize business but with different levels of importance. For example, there is less importance on ease of use when selecting backup software. This is not to say that ease of use is not important, it is; however, in large organizations, essential functionality must be in place even if it might not come with an easy‐to‐use interface. There are also features that become increasingly important as we move from small to midsize and emerging enterprises, such as:
These additional features are driven by a range of business requirements, including compliance, cost control, and the ability to meet service level agreements (SLAs).
Encryption usually comes up in discussions about security. Discussions about transmitting sensitive business data over the Internet will quickly focus on encryption technologies to protect the confidentiality of that data. When businesses need to ensure confidential information is protected on laptops, even if they are lost or stolen, full disk encryption should be considered. Backups can share important similarities with these use cases and for those reasons, encryption can be an important feature of a backup solution.
Backups are often moved or transmitted to locations other than where the data originated. Offsite storage mitigates the risk of losing both a server and a backup to damage to a facility. Backup tapes may be lost or stolen while in transit. Security procedures at an offsite facility may be more lax than one would expect. In both cases, you have confidential data outside the protection of a business' normal access controls and physical security. Encrypting data on backup tapes and disks can provide an additional level of protection for ensuring the confidentiality of private and sensitive data. As a general rule, all business data that is stored offsite or in the cloud should be encrypted.
Reducing the amount of storage required for backups and the time required to generate backups translate into cost savings. Deduplication will be useful for any size business, but the larger the organization, the greater the benefit.
Deduplication can occur on either the source or the target system. Deduplication on the target side has the advantage of reducing demand on the source device's CPU. Backup agents do not have to support deduplication functionality on the client side when the operation is performed on the target. This can reduce problems with increasing the footprint of the backup agent on the source system. In addition, the target backup server can be sized appropriately to handle the computing requirements for deduplicating data from multiple source systems.
Management features are important for any size business and any type of business. At minimum, you need to know that your backup processes run, your restores are successful, and the storage you have allocated for backups is sufficient. Management complexities increase quickly with the size of an organization, leading to several additional needs:
Centralized management affects how you perform backup operations, how you collect data about those operations, and how efficiently you can perform these operations. As Figure 4.5 shows, centralized management becomes one of the most important features of recovery management systems as the size of the organization grows.
Figure 4.5: Top priorities for emerging enterprises expand beyond the set found for smaller business. Encryption, deduplication, and centralized management are important functions for larger organizations.
Server virtualization can significantly increase server utilization and help control hardware costs. There are, however, some recovery management considerations that need to be taken into account, especially for midsize and larger companies that deploy large numbers of virtual servers. Some of the most important issues are:
Implementation details are important to understand when planning the use of backup software with virtual servers. Suppose you need to restore a file from a virtual server backup. Does the backup software support individual file restores from a virtual machine backup? If not, you will need to plan for additional time and storage space to accommodate the recovery operation. One way to handle the lack of selective file restores is to restore the entire image to a staging server and then copy the needed files to their target location. This process will increase recovery time as well as require additional storage space.
Figure 4.6: Backups should be staggered on virtual machines to minimize competing demands for shared resources.
Another implementation detail you need to watch carefully is the demand on CPUs during backup operations. Backups should not be scheduled to compete with each other or with other peak‐demand periods on other virtual machines sharing the same physical server. Virtual machines isolate many management concerns to a single virtual machine, but backups are one of the areas where due consideration has to be made for global operations on the server.
Another issue to consider is how to maximize backup performance and available features by leveraging the features offered by the virtual machine vendor and recovery management software vendors. Each has specialty expertise. The virtual machine vendors can optimize low‐level operations in the hypervisor, typically those close to the hardware, to increase performance. Recovery management software vendors are likely to offer more management functionality and provide a common feature set that hides some of the implementation details when dealing with virtual machines. Virtual machines are staples of day‐to‐day operations in midsize and larger organizations, but they can also play a crucial role in continuity and failover protection.
This chapter opened with a discussion of how different business requirements will drive different recovery management strategies. The question of how long and how well a business can function with systems down is a telling example of how wide ranging requirements can be. The ability to maintain continuous IT services increases in importance with the size and complexity of business operations.
Let's consider how a hypothetical midsize business might tackle the problem of disaster recovery. Our fictitious firm is based in the US with a headquarters in Chicago and regional offices in Atlanta, Boston, Houston, Denver, and Los Angeles. The executives in the business have determined that the company needs to be able to provide core business services even if the headquarters is hit by a natural disaster. The CIO and IT staff know five things they need to do to ensure a sound disaster recovery strategy and practice:
Small businesses can reasonably assume that all data in the business is equally valuable and should be protected at the same levels. This is not necessarily true, but it is a useful fiction because it reduces the management overhead. As data volumes grow, the advantages of simplified management no longer outweigh the cost of maintaining unnecessary RTOs and RPOs.
From a disaster recovery perspective, mission‐critical applications must have their data and servers protected in such a way that recovery times are short and recovery points are close to the time of failure. In our example business, management determines that the sales processing, financials, and customer relationship management (CRM) systems are critical. Other back‐office applications, such as human resources applications, inventory management, and decision support are designated second‐tier systems. They can be unavailable for up to 48 hours.
The sales processing system is accessible to customers through a self‐service Web application, so management does not want customers to experience degraded service; that application will run on a dedicated server as well. Management also determines that they can tolerate some drop in performance in disaster recovery situations for other applications. The systems designers take advantage of this by running the financials and CRM systems, which normally each run on their own dedicated servers, in virtual machines hosted on a single physical server. The second‐tier applications will run on a single physical server.
Designers decide to use the Atlanta and Denver offices as disaster recovery sites. The Atlanta office, like headquarters, has a dedicated IT staff, so it is chosen to host critical applications. Two servers are installed and dedicated to disaster recovery; one for the sales processing system and the other to host virtual machines for the other critical back‐office applications. Denver does not have a dedicated IT staff, but they do have excess server capacity that can accommodate virtual machines running the second‐tier applications.
To implement the disaster recovery plan, a high‐availability application is installed in
Chicago and Atlanta. Data is continuously replicated from Chicago to Atlanta to ensure RPOs and RTPs are met. High availability is not needed for secondary applications, so backups of those systems are copied from Chicago to Denver on a daily basis. In the event of a disaster, the Chicago IT staff will work with Denver staff to start virtual machine instances and restore the second‐tier applications from backups.
The design seems sound. RPOs and RTOs can be met, business objectives are accounted for, and staff is in place to implement the plan as needed. Good disaster management strategies, no matter how well designed, should be tested. Things are bound to go wrong. Disasters tend to be, well, disasters, so you need to ensure that plans cover as many decision points as possible. Businesses should at least test at the unit levels that data is replicated from primary to standby servers, backups are reliable, and staff understand the procedures to follow in the event of a disaster.
Continuity and failover services are important parts of a recovery management strategy. Backup software and well‐defined recovery procedures provide for basic disaster recovery services; larger businesses with more complex information management needs should consider high‐availability and data replication applications as well.
The Shortcut Guide to Availability, Continuity, and Disaster Recovery has covered a lot of ground in backup, recovery, and high availability. Sometimes we have delved into the technical challenges, other times we've looked at recovery management from a business perspective, and in some cases, we've tried to bridge the technical and business perspectives. As we conclude this shortcut guide, it is time to consolidate some of the topics we've examined and compile a summary of best practices in availability, continuity, and disaster recovery.
Best Practices: Making the Business Case for Recovery Management
Best Practices: Overcoming Technical Challenges
Best Practices: Recovery Management Practices
From small businesses to emerging enterprise, businesses are facing technical and business challenges to protect data and maintain continuous business services. Although there are not "one size fits all" solutions, there are sound practices for determining your business' particular needs. Backup and recovery software has continued to evolve with technology. Backup solutions have adapted to virtual environments and take advantage of low‐cost disk storage and changing business needs. These solutions provide centralized management consoles, consolidated reporting, and other tools to help IT staff keep up with increasing demands for data protection. Business practices have also matured as businesses leverage techniques such as remote office backups and practices such as managing data life cycle requirements to control costs without sacrificing data protection. Technology and business practices will no doubt continue to advance.