CCSP Domain 3 - Business Continuity Management (BCM)

Download FREE Audio Files of all the MindMaps
and a FREE Printable PDF of all the MindMaps

Your information will remain 100% private. Unsubscribe with 1 click.

Transcript

Introduction

Hey, I’m Rob Witcher from Destination Certification, and I’m here to help you pass the CCSP exam. We are going to go through a review of the major topics related to business continuity management in Domain 3, to understand how they interrelate, and to guide your studies.

Image of Business Continuity Management - Destination Certification

This is the seventh of seven videos for Domain 3. I have included links to the other MindMap videos in the description below. These MindMaps are a miniscule part of our complete CCSP MasterClass.

Business Continuity Management (BCM)

Business continuity management (BCM) is the business process that drives the planning and preparation for disasters. It starts by conducting the business impact analysis process, the BIA, and then using the results of the BIA (the measurements of time: RPO, RTO, WRT and MTD) to create, test, train people for, and maintain business continuity plans, BCPs, and disaster recovery plans, DRPs. The point of all this planning, preparation and training within business continuity management is to ensure critical processes and systems continue to operate during a disaster–to ensure the survival of the business.

Business Impact Assessment

The first major process that we perform in business continuity management is the business impact assessment.

Measurements of Time

The major output of the BIA process is four different measurements of time that have been approved by the process or system owner. The owner must approve these numbers because ultimately the owner must pay for the costs associated with achieving these numbers. And let me re-emphasize each of these numbers as a measurement of time: seconds, minutes, hours and days.

RPO

The recovery point objective, the RPO, is a measurement of how much data an organization is willing to lose as a result of a disaster. So, if the server explodes, what is the maximum tolerable data loss as a measurement of time. 5 seconds worth of data? 10 minutes? 3 hours? 2 days?

RTO

The recovery time objective, the RTO, is a measurement of the maximum tolerable time to recover systems to a defined service level. Typically, this means how long it takes to bring backup systems online.

WRT

The work recovery time, the WRT, is the maximum tolerable amount of time to verify system and/or data integrity as part of returning systems to normal operations.

MTD

The maximum tolerable downtime, the MTD, also sometimes referred to as the maximum allowable downtime, MAD, is the maximum time a critical process or system can be disrupted before there are unacceptable consequences to the business. The MTD is always going to be greater than or equal to the RTO plus the WRT.

Types of Plans

Now let’s talk about the two major types of plans that these numbers drive the creation of.

Business Continuity Plan (BCP)

Business continuity plans, BCPs, focus on critical business processes. For example, paying employees is typically considered to be a critical business process, so in the event of a disaster, like our automated payroll system blowing up, the BCP plan would focus on how to continue the business process of paying employees. BCP plans essentially focus on the survival of the business.

Disaster Recovery Plan (DRP)

Disaster recovery plans, DRPs, focus on the recovery of critical technology, infrastructure and systems. So, in the example of the payroll system exploding, while the BCP is focused on keeping the payroll business process running, the DRP would be focused on recovering the actual payroll system.

Cloud Recovery

Lots of organizations are now using the cloud as a major part of their recovery plans. The cloud can be an extremely cost-effective recovery option. You only pay for what you use, so you can have a bunch of virtual infrastructure prepared but suspended–ready and waiting. And you can switch it on very quickly and you only start paying once the infrastructure is up and running. So the cloud can provide a very flexible and cost-effective recovery option. Let's go through a few options of how you can use the cloud for recovery.

Recovery to Cloud

The first is to have your primary systems on premises, and your recovery systems in the cloud.

Recovery within same CSP

The second option is to have your primary systems in the cloud, with your backup systems also hosted with the same cloud provider. If you choose this architecture, you will want to make sure that your backup is held in a different availability zone so that a disaster doesn’t take down both copies.

Recovery to alternate CSP

The third choice is to keep your primary systems in the cloud and your recovery systems in a separate provider’s cloud. Under this setup, if a disaster takes down your primary provider, you can switch to the backup one. This is technically challenging to implement but provides better resiliency than recovery within the same CSP.

Architect for failure

Failures are bound to happen, so it’s important for us to design our architectures to handle failures gracefully–we need to architect for failure. Some of the common techniques we use include using cloud services with multiple availability zones, having backup services that are geographically remote, and having automatic failovers in place.

Chaos Engineering

Chaos engineering is an approach that involves intentionally introducing faults into a system to test its resilience. By deliberately placing our systems under these stresses, we can test the stability under diverse conditions. When chaos engineering breaks something, this allows us to identify the weaknesses in our architectures, and we can improve systems to be more stable and resilient.

Basically, you intentionally and continually cause disasters in your infrastructure and you learn quickly to build and operate resilient systems. This is often way more effective than testing a disaster recovery plan every few years.

Vendor Lock-in

Image of Vendor Lock-in - Destination Certification

Vendor lock-in happens when it becomes too difficult or costly for you to switch your systems to another provider. It often happens if your provider doesn’t use interoperable formats, and if your systems aren’t portable. Vendor lock-in can be a major challenge from a recovery perspective. If you are locked into a specific vendor, and they go down you may have limited or no options for recovery. You’re entirely reliant on the provider to bring your system back up-hopefully in a timely manner.

Image of Business Continuity Management - Destination Certification

That’s all for our MindMap on business continuity management within Domain 3,. We’ve discussed many of the concepts you need to know for the exam.

Image of next mindmap - Destination Certification

If you found this video helpful you can hit the thumbs up button and if you want to be notified when we release additional videos in this MindMap series, then please subscribe and hit the bell icon to get notifications.

I will provide links to the other MindMap videos in the description below.

Thanks very much for watching! And all the best in your studies!

Image of masterclass video - Destination Certification

The easiest way to get your CCSP Certification

Learn more about our CCSP MasterClass

Pass the CCSP Exam Easily

Checkout the other

CCSP MindMap Videos

Domain 1

Domain 2

Domain 3

Domain 4

Domain 5

Cloud Operations

Domain 6