CCSP Domain 5 - Cloud Infrastructure Maintenance MindMap
Download FREE Audio Files of all the MindMaps
and a FREE Printable PDF of all the MindMaps
Your information will remain 100% private. Unsubscribe with 1 click.
Transcript
Introduction
This is our second MindMap for Domain 5, and we're going to be discussing IT Service Management. Throughout this MindMap, we will be explaining how these ideas interrelate to help guide your studies. This MindMap is just a small fraction of our complete CCSP MasterClass.

This is the second of three videos for Domain 5. I have included links to the other MindMap videos in the description below. These MindMaps are a small part of our complete CCSP MasterClass.
IT Service Management
IT Service Management is focused on designing, delivering, managing, and improving the way services are used within an organization’s cloud environment. The most famous and well used frameworks in this space are based on ITIL (Information Technology Infrastructure Library). The ITIL frameworks do an excellent job of defining best practices for all the major processes. The ITIL processes are broken down in five major categories:
- Service strategy
- Service design
- Service transition
- Service operation
- Continual service improvement
We’re going to work our way through each of these categories now and drill down into each of them where needed.
Strategy
First up, we have strategy management, which focuses on aligning the IT services with the business goals and objectives. If our strategy doesn’t meet the overall business needs, then we are going in the wrong direction. Security and IT exist to help the business achieve its goals and objectives, so we had better be aligned with business goals and objectives.
Service strategy includes service portfolio management, demand management, financial management, and business relationship management. It’s not super important to know the details of each of these, so let’s move on!
Design
Service design is focused on ensuring services are designed with reliability, availability, scalability, and compliance in mind. Cloud-specific considerations include selecting cloud service types (IaaS, PaaS, SaaS), geographic regions, and configurations that best meet performance and regulatory requirements. Let’s delve deeper into a few process:
Availability
Availability management ensures cloud services are consistently available and reliable. This process includes designing redundancy, failover, and high-availability solutions specific to cloud environments.
Capacity
Capacity management involves planning to ensure that we have sufficient capacity to meet our needs. The cloud is great for capacity management, because the rapid elasticity and scalability means that we can meet demand spikes more easily than in an on-premises model where we would have to buy and set up the servers in advance. Capacity management in cloud environments involves optimizing auto-scaling, provisioning, and resource allocation.
Continuity
Continuity management involves planning for business continuity and disaster recovery, ensuring critical cloud services remain operational or that they can be quickly restored in case of a disruption.
Service Level
Service-level management focuses on having the right contracts in place. It is critical for a customer to ensure their cloud provider is going to be able to meet their requirements before they move to the cloud. So service-level management is focused on ensuring the right contract is put in place with the right terms, as well as monitoring and enforcing the contract provisions. Remember, a customer remains accountable for anything they move to the cloud so it’s ultimately up to the customer to ensure their requirements are met by the service provider.
Contract
There are a variety of different contracts that can be used in service-level management.
SLA, PLA, & OLA
A service-level agreement (SLA) is a formal contract between a cloud provider and the customer that defines the expected level of service and security, including uptime, performance, encryption, access controls, incident response, vulnerability management, and so on. An SLA also specifies measurable metrics, and outlines penalties or compensation if the provider fails to meet the agreed standards.
A PLA–privacy-level agreement–specifically focuses on how personal and sensitive data is handled in the cloud to meet privacy regulations and safeguard customer information.
An operational-level agreement (OLA) is an internal agreement within the cloud provider’s organization that defines responsibilities and processes for maintaining security standards to support the SLAs and PLAs. Although not directly seen by the customer, OLAs are critical for internal alignment of the cloud service provider to meet customer requirements.
Negotiable
Agreements may be either negotiable or non-negotiable. Negotiable agreements are generally only provided by smaller cloud providers or in special circumstances, and they basically mean that the provider will be willing to negotiate with you on the terms of the contract.
Non-negotiable
Non-negotiable agreements are, as expected, non-negotiable. The major cloud providers are generally unwilling to negotiate on their terms. Instead, they will just offer you a standard contract, and it's your choice whether you sign or not. Non-negotiable agreements are far more common.
Click-wrap Agreements
Click-wrap agreements are very common in the cloud. They are digital contracts that users accept by clicking an "I agree" or similar button, typically when signing up for or accessing a cloud service. These agreements outline the terms of service, including usage rights, data handling, and limitations of liability. They are legally binding and enforceable, provided they’re presented clearly and the user has had the opportunity to review them. But let's be honest: how often do you actually read one of these 400 page monstrosities of impenetrable legalese before you click “yeah, yeah, I agree”?
Transition
Alright, next major set of processes: Service transition focuses on moving services into the cloud or migrating from one cloud provider to another. Service transition covers change management, configuration management, and knowledge transfer to ensure smooth transitions, including preparing users and managing any risks associated with the change.
Change
First, there’s change management, which involves having specific processes and oversight when making changes. This helps to limit the risks of you making a change, and accidentally blowing up your systems.
Configuration
Next, we have configuration management, which is focused on ensuring we have an accurate inventory of all the assets and ensuring all those assets are properly configured.
Release & Deployment
When we are releasing new services and features, we need to be really careful with our release and deployment management. These processes involve things like coordination, documentation, and training, to ensure that releases go smoothly.
Operation
Moving on to the next major group of processes in IT service management, service operation, which focuses on maintaining and supporting cloud services once they are live.
Patching
Let’s start with discussing patch management, which is a proactive process for creating a consistently configured environment that is secure against known vulnerabilities. We need to be patching our software in a timely manner.
Patch levels
There are three different approaches for monitoring the patch levels of systems and ensuring they have the requisite patches installed in a timely manner.
Agent
Agent-based monitoring is an approach that involves installing a small software component (an agent) on each device to monitor and report on patch status. Agents can provide real-time data and deeper insights by running directly on the system, making this method ideal for detailed, frequent updates and complex environments.
Agentless
Agentless monitoring systems don’t require software installation on each device; instead, they use protocols like SSH (Secure Shell) or WMI (Windows Management Instrumentation) to scan systems from a central location. Agentless is quicker to deploy but may have limited access to certain data, depending on permissions and the system’s configuration. It's suitable for environments where minimal system intrusion is preferred–where you don’t want an agent running on a system.
Passive
Passive monitoring involves observing network traffic or system logs to infer patch status without actively scanning or installing software. It’s less intrusive but also less comprehensive, often used to identify obvious patching gaps rather than detailed compliance. This method is useful for monitoring in environments where active scanning might disrupt services.
Deploying
Lets now talk about deploying patches, we can do it either manually or through automation.
Manual
Manual patching requires IT administrators–people–to manually download, test, and install patches. This gives them full control over the timing and selection of updates, making it ideal for critical systems where patches must be thoroughly vetted to avoid disruptions. However, manual patching can be time-consuming and may result in delays, increasing the risk of vulnerabilities if patches are not applied promptly.
Automated
Automated patching uses software tools to apply patches without direct human intervention, often on a schedule or as soon as updates are available. This approach ensures faster, more consistent application of security patches, reducing the risk of unpatched vulnerabilities. While automated patching is efficient, it is much more likely to break things. Automated patching requires careful configuration to avoid issues like unintended downtime, and it may not suit all systems, especially those requiring custom updates or extensive testing before patching.
Considerations
There are a few considerations that you need to be wary of when it comes to patching
Time zones
The first is timezones. Historically, we’ve often patched important systems in the middle of the night–or whenever they are least used–just in case something breaks. This gives us plenty of time to fix the problem before most of our users wake up and log in. But how do you handle this in a globalized world with systems and users spread all around the planet? It makes planning a lot more complex. Often the deployment of patches has to be staggered around the clock so that patches are deployed at the ideal time in each timezone.
Instant-on gaps
Another concern is instant-on gaps. I discussed instant-on gaps in more detail back in the fourth MindMap of domain 3 - link below. To briefly summarize: if a VM has been dormant for six months, when you power it back up, it won’t have the last six months of patches, which means that it will likely not be protected against the latest threats. This patching gap is known as an instant-on gap. There are two major ways you can deal with this:
- Wake up all of your VMs during a patching cycle to ensure they are patched.
- When you wake up a dormant VM, immediately put it in quarantine, install the requisite patches, and then take it out of quarantine.
Incident Management
Incident management is the process of detecting, responding to, and resolving incidents—such as outages, performance issues, or security breaches—that impact cloud services. It involves identifying the root cause of an issue, mitigating its impact, and restoring normal service operations as quickly as possible. Cloud incident management often leverages automated monitoring, alerts, and predefined workflows to streamline response. A critical part of incident management is learning from an incident and figuring out what improvements can be made to prevent such incidents from happening again.
Problem Management
Problem management in the cloud focuses on identifying and addressing the root causes of recurring incidents to prevent them from happening in the future. Unlike incident management, which is concerned with immediate response and restoration, problem management is proactive and investigative. It involves analyzing patterns, conducting root cause analysis, and implementing permanent solutions or workarounds for issues within the cloud environment. Prevention is much better than a cure.
Continual Improvement
The final major group of processes in IT service management is continual improvement. As an organization, we want to be continually improving.
Continual Service Improvement
Continual service improvement management involves measuring service performance, identifying improvement opportunities, and implementing enhancements. It’s about constantly improving the effectiveness and efficiency of IT processes and services. If we aren’t always improving, then we aren’t doing our share to push the business forward into the future.

That’s it for our overview of IT service management within Domain 5, covering the most critical concepts you need to know for the exam.

If you found this video helpful you can hit the thumbs up button and if you want to be notified when we release additional videos in this MindMap series, then please subscribe and hit the bell icon to get notifications.
I will provide links to the other MindMap videos in the description below.
Thanks very much for watching! And all the best in your studies!