ServiceOps: Balancing Speed and Risk in DevOps
The CrowdStrike outage highlighted the interconnected nature of today’s business operations, where a single error can affect many organizations and people. What happened at CrowdStrike could happen to any company, even ones with good change control and release practices, particularly when different development teams in an organization adopt different release practices to accelerate software delivery. The lack of a single approach to releasing software increases the risk of introducing bugs or performance issues.
ServiceOps offers a new operating model for accelerating change while predicting and managing risk. By bridging IT service management (ITSM), IT operations (ITOps) and DevOps, ServiceOps enables organizations to standardize change control and release practices and elevate monitoring and automated remediation, as well as prevent CrowdStrike-like issues from happening.
ServiceOps Enables Change Risk Assessment
The most effective way to manage change and its impact is to proactively analyze risks to avoid making changes that are likely to cause an incident. ServiceOps combines the configuration management database (CMDB), release management and change management capabilities of ITSM with observability and AIOps solutions to make software delivery safer and more collaborative. By integrating workflows between ITSM and observability and using AIOps tools for change risk assessment, organizations can reduce outages and costs related to changes.
The integration between ITSM and AIOps tools automates identification of risky changes by analyzing risk information from the service history and operational data in a single pane of glass. AI models correlate past changes and determine their impact on operational variables such as service availability and health. This information decreases time spent on change requests by helping teams quickly understand the risk factors and the scope of impact by using powerful service dependency maps from AIOps tools. This AI-driven assessment also provides great feedback to DevOps and SRE teams, enabling them to deploy faster and with greater confidence.
What does this look like in practice? Here’s an example of a database upgrade request. Based on the risk assessment using past situations and the current service health status, you’ll know the risk of deploying this change or upgrade in your environment.
This visibility into the real-time deployment landscape, operations metrics and more within the change window helps DevOps teams know what’s being changed, what will be impacted, if there are any collisions and if those collisions need to be scheduled to avoid changing issues at the same time.
Having visibility into both service and operational data provides opportunities to automate some of these decisions. For example, if the change isn’t a risk and there are no infrastructure changes happening at the same time, you can let the developer release it to avoid slowing them down. If it’s risky, have someone look at it.
What’s Next? Unlocking the Potential of GenAI in ServiceOps
Generative AI (GenAI) plays a powerful role in ServiceOps. For example, using predictive and causal AI can complement other forms of AI to predict the risk of change, and a conversational user interface can enable shared assessment.
Using GenAI as the interface layer during change risk assessment offers many benefits, such as:
- It makes complex risk insights accessible to different cross-functional team members, including site reliability engineers (SREs), DevOps, developers, service managers and change managers who have different familiarity levels with the operational environment.
- It accelerates the change risk assessment process by enabling people to ask and answer questions to help make well-informed decisions.
- It increases organizational confidence in DevOps by equipping DevOps teams with recommendations for risk mitigation.
A conversational interface for change risk assessment can make risk insights understandable and actionable for teams tasked with delivering high-quality software rapidly.
Imagine giving teams tasked with approving software changes access to a chat-based interface for asking questions and getting answers tailored to the specific environments where their software will be deployed. They could get answers to questions like, “What are the risky changes?” and “Can I look at change collisions?”
The pace of change driven by DevOps presents significant challenges to IT service and IT operations teams. Both need to accelerate change without risking downtime. Regardless of differences in charter, scope and skill sets, more and more service and operations teams are coming together to redefine service excellence and business alignment.
ServiceOps is a smart way for teams to work together to deliver and maintain excellence in service delivery. Enabled by automation and AI, ServiceOps is a new service delivery operating model that cuts costs, reduces downtime risks and increases IT productivity and efficiency.