Observability Maturity Model

Modern systems and applications span numerous architectures and technologies — they are also becoming increasingly more dynamic, distributed, and modular in nature. In order to support the availability and performance of their systems, IT operations and SRE teams need advanced monitoring capabilities. This Refcard reviews the four distinct levels of observability maturity, key functionality at each stage, and next steps organizations should take to enhance their monitoring practices.

Truth and Proof: Building Trust in Machines Through AIOps

IT systems are only getting more complex, with greater pressures to solve issues faster and demonstrate value consistently. Issues within systems, which dev teams could once handle all on their own, sprout up too fast and too often for direct human intervention. Artificial intelligence for IT Operations (AIOps) tools exist today to deliver automated monitoring and solution development, “no humans required” — significantly easing dev teams’ many burdens.

Adopting AIOps should be simple enough, then. But one of the tougher sticking points has been trust. Can humans trust a machine to identify root causes of issues and create accurate and effective solutions? The stakes are high — if a machine gets it wrong, the burden on human teams compounds quickly.

What Does AIOps Mean for SREs? It’s Complicated

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier.

Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter.

How Monitoring and AIOps Delivers the Ultimate DevOps Platform

When it comes to delivering software through a DevOps model, the primacy of the platform is increasingly evident. DevOps platforms are multi-tenant, self-service oriented, developer-centric, and are an essential component of a multi-cloud strategy. They provide guide rails and standardized tools and technologies for developers to build, test, and iterate with ease. A core component that must not be neglected when operating a DevOps model, however, is resilience.  

DevOps breaks down monolithic products into smaller value streams that can be delivered as independent cloud-based services. Once teams are set up to deliver under this model, it will be formalized through service level agreements (SLAs). To deliver against these, robust monitoring and alerting practices must be put in place. As with any DevOps practice, automation is the ultimate goal — and when it comes to monitoring and alerting, an AIOps platform is the gold standard. 

Five Ways Developers Can Help SREs

It is not easy to be a Site Reliability Engineer (SRE). Monitoring system infrastructure and aligning it with the key reliability metrics is quite a daunting task. Whereas, a software engineer's job is to deliver high-quality software.

Relationships between software engineers and site reliability engineers can sometimes be tricky. To begin with, developers are generally assigned to write code that goes into production. Then, there are SREs who are responsible for improving the product's reliability and performance. 

Understanding AI Ops: Part 2

Welcome everyone to the second of my AIOps introduction. In Part 1 I talked about the challenges that exist today with digital transformation and how more and more automation is being used in software build and delivery areas.  I also talked about the fact that there is more and more pressure on IT Operations to react quickly, deal with infra, software, config, connectivity, security, multi-cloud; the list goes on and on.

Today, I am going to consider AIOps from the vantage point of the tools used for Operations Management and how they leverage AI/ML to solve common problems in the IT management domain. I have also included a link to part two of the video I highlighted in my last blog which imagines a future that AIOps could enable for IT Operations teams, allowing those teams to be true enablers of business outcomes.

Top 3 NLP Use Cases for ITSM

What is NLP

Natural Language Processing is a specialized subdomain of Machine Learning which is generally concerned with the interactions between the human and machine using a human verbal or written language.

NLP helps in processing huge volumes of text which would take a significant amount of time for a human to comprehend and process otherwise. Hence a lot of organizations take advantage of NLP to gain useful insights out of their text and free formatted data.

Understanding AI Ops: Part 1

Think about this question. How can we start to make use of AI/ML if you are not a developer or data scientist? How about applying these capabilities to the discipline of IT or Cloud Operations. In this two-part blog, I am going to explore the problem and the opportunity that exists in the emerging area known as AIOps, some of the tools that can help, and what I think the future looks like in this area.

My Background

My role in VMware is all about Cloud and Cloud Management. My conversations are usually with people who care about building, running, or managing applications in public and private clouds and care about everything that's required to do so. That led me to think about how AI is going to affect these people and what opportunities it creates for different roles, specifically in the IT and Cloud Ops space.

How AIOps Helps in Application Monitoring

There’s no one-size-fits-all approach regarding application monitoring, especially for companies using applications in various cloud environments. Companies are rapidly investing in microservices, mobile apps, data science programs, data ops, etc. Subsequently, they’re also integrating monitoring tools to improve domain-centric monitoring abilities.

AIOps tools help streamline the use of monitoring applications. It allows companies that need high application services to efficiently manage the complexities of IT workflows and monitoring tools. AIOps extends machine learning and automation abilities to IT operations. These robust technologies aim to detect vulnerabilities and issues to resolve them, determine operational trends, and simplify the remediation of the problems that affect their applications’ performance and availability.

Common Use Cases for Observability With AIOps

“We can't build tomorrow using yesterday's tools.” - Scott McDonald

IT infrastructures have been evolving constantly and rapidly, along with Big Data. Businesses worldwide are moving from predictable and static physical systems to intuitive software resources that can reconfigure and adapt based on consumer behaviours.

Observability and AIOps: The Perfect Combination for Dynamic Environments

IT teams live in dynamic environments and continuous integration/continuous delivery has been in high demand. In the dynamic environment, DevOps and underlying technologies such as containers and microservices, continue to grow more dynamic, and complex. Now, just like DevOps, observability has become a part of the software development life cycle.

With basic monitoring techniques, ITOps and DevOps teams lack the visibility to support the explosive growth in data volumes that arise in these modern environments. And, that’s also because they cannot scale with manual processes. Traditional monitoring systems focused on capturing, storing, and presenting data generated by underlying IT systems. Human operators were responsible for analyzing the resulting data sets and making necessary decisions, making the IT processes human-dependent.

What is AIOps or Artificial Intelligence for IT Operations? Top 10 AIOps Use Cases

What is AIOps

Artificial Intelligence for IT Operations (AIOps) involves using Artificial Intelligence and Machine Learning technologies along with big data, data integration, and automation technologies to help make IT operations smarter and more predictive. AIOps complement manual operations with machine-driven decisions.

Types of AIOps Solutions

At a high level, AIOps solutions are categorized into two areas: domain-centric and domain-agnostic, as defined by Gartner. Domain-centric solutions apply AIOps for a certain domain like network monitoring, log monitoring, application monitoring, or log collection. You will often see monitoring vendors claim AIOps but primarily they are domain-agnostic, bringing the power of AI to the domain they manage. Domain-agnostic solutions operate more broadly and work across domains, monitoring, logging, cloud, infrastructure, etc., and they take data from all domains/tools and learn from this data to more accurately establishing patterns and inferences.

6 Best Practices to Improve Your Data Center Operations

In his article, ROI Valuation, The IT Productivity GAP, Erik Brynjolfsson states, “The critical question facing IT managers today is not, ‘Does IT pay off?’ but rather, ‘How can we best use computers?’” This is not a simple question for CTOs to answer because each data center and IT operation is unique, with a multitude of variables affecting the overall operation. Two different companies with almost identical IT ecosystems yet one might have a fraction of their competitor's productivity, argues Brynolfsson. However, there are several best practices that CTOs can follow to ensure their IT operation is efficient, running within capacity, and executing as productively as possible.  

1. Clean Up and Declutter 

“Cleanliness is godliness” as the old saying goes, and it could also be stress-relieving when it comes to IT. Servers and networking equipment all have set lifespans and old equipment should be decommissioned on a schedule defined by the manufacturers. Old equipment should be properly destroyed, recycled, or returned to the manufacturer, with all data wiped clean to ensure proper security. 

Accelerate Incident Response and Incident Management With AIOps

Artificial Intelligence for ITOps (AIOps) can help accelerate incident response with all the incident context, impact assessment, triage data, and collaboration and automation tools in one place. More specifically, AIOps can help automate root causing analysis, enrich the incident with full-stack for impact analysis, present incident relevant Observability data (metrics, logs, and traces) in a single pane triage dashboard, and provide built-in diagnostic commands and workflow or runbook automation leveraging integrations with RPA tools.

AIOps: What Is Incident Response?

It is a standard practice in IT organizations to capture IT operational problems or issues as incidents in an IT Service Management (ITSM) system like ServiceNow, BMC Remedy, PagerDuty, Jira ServiceDesk, etc. A Majority of these incidents are created directly by monitoring tools, or when fed through an AIOps platform for event correlation and alert noise reduction. IT users and stakeholders (like LoB users, managed service customers, etc.) can also report IT problems (via phone or portal), which get recorded as Incidents in the ITSM system.

Trends Influencing DevOps/DevSecOps Adoptions

Essentially a set of practices coupling software development (Dev) and information technology operations (Ops), DevOps is the combination of employees, methods, and products to allow for perpetual, seamless delivery of quality and value to the clients.

Adding to a set of DevOps practices, a DevSecOps approach provides multiple layers of security and reliability to the clients by integrating highly secure, robust, and dependable processes and tools into the work cycle and the product.

Trends Predicted in 2020 to Shape Autonomous Digital Enterprises

DevOps Trends 2020

The DevOps market had generated two and half billion dollars profit in 2017, and it is estimated that it would make a profit of nearly $7 billion dollars by 2022. By the above statistics, one can identify the growth of the technological trend of the present era. According to Statista, a German online market portal, the DevOps Market had grown into a 3.5 billion dollar industry by 2018. The DevOps.com had predicted the key trends that would shape the Autonomous Digital Enterprise.

DevOps Services

The services include faster and efficient management of software mechanisms. Firstly, they create a plan to develop software and implement the plan, by building and testing the software and operate the software by deployment and monitoring. Since its inception in the technological world in 2014, the DevOps market had expanded itself beyond horizons. According to the sources, many government organizations and private companies use DevOps services to fasten up their services and improve their administrative agility. Recently, the inclusion of Artificial Intelligence and IoT (Internet of Things), IMARC Group, one of the finest software groups, had predicted the market value of DevOps services will be 10.5 billion US Dollars by 2024.

AIOps: The Solution for Software Change Management to Save Millions a Year?

Problem, Legacy Change Management

We all might have been in that situation that making the actual software change takes just a few hours, but getting approval to go live takes almost two weeks!?

It is the story of a painful go-live experience for an engineering team who is ready to release their changes but needs to go through lengthy organisation red-tapes to get 10 approvals by filling the same forms with so much information that may or may not be even used. As change management is not fully aware of what's in the change and because they get so many of these requests, the tendency might be to just decline some of these requests with no clear reason for the team, until they are convinced that the change is safe after a lot of back and forth, and sometimes by just seeing how committed you are to go live!