How Is Machine Learning Used In AIOps?
This is a follow up to my previous post which you can find here – Intelligent Machine Monitoring.
Machine Learning isn’t perfect
When we think of computers, we typically think in terms of exactness. For example, if we ask a computer to do a numeric calculation and it gives us a result, we are 100% sure that the result is correct. And if we write an algorithm and it gives an incorrect result, we know we have coded improperly and it needs to be corrected. This exactness however, is not the case when dealing with Machine Learning. As a matter of fact, it is par for the course, that Machine Learning will be incorrect a percentage of the time.
When it comes to Machine Learning, words and phrases such as: non-algorithmic processing, perception, fuzzy, imprecise knowledge and reasoning have been used. ‘Statistics’ is the basis for much of Machine Learning. Statistics never was an exact science. It deals in probabilities. Neural nets are also a big part of Machine Learning. Neural nets are tree-like digital structures that resemble the connections of neurons and synapses found in the brain and can recognize relationships between vast amounts of data. And just like a brain, when recognizing and using those relationships to draw conclusions, it’s not always right.
So when implementing Machine Learning, a company’s goal is to take an “in-exact” science and get as much “exactness” as they can. Okay, so how do companies do this? How do companies implement Machine Learning and how do they get Machine Learning to have a high degree of accuracy?
It’s all about “The Models”
Just like teaching a child, when working with Machine Learning we need to teach the computer. We do this by taking as much historical data as a company can muster up and feeding it into the Machine Learning. With this data, the Machine Learning will build “models”.
Machine Learning models are software programs that have been trained on a set of data. Using technologies such as statistics and neural networks, models can recognize patterns in the data and determine conclusions (i.e. future predictions or forecasts) based on those patterns.
So the goal is to “build a model that is able to make conclusions with a high degree of accuracy”
Steps for building and maintaining high-accuracy models
To explain the steps used in building a high-accuracy model, we’ll use a weather example. In this example, we wish to predict extreme weather conditions: Drought, Hurricane, Tornado using: Temperature, Precipitation, Humidity, Atmospheric Pressure, Wind, Cloud Cover, and Date.
Business Understanding
We first need to understand the business and the problem we are trying to solve with the model.
Example: Please be a weather expert or consult one before attempting to build a weather model.
Data Understanding
We need a very clear understanding of the data that will be used to build the model. For instance, what pieces of data identify the problem we are solving? Let’s refer to this data as “target” data. What pieces of data affect the target data? Let’s refer to this data as “independent variables”. What pieces of data are we sure are irrelevant to the problem we are solving?
Example: Understand what affects extreme weather. Things like temperature, precipitation, humidity affect it. Traffic does not. Also, in our weather example, extreme weather is the target, with values of drought, hurricane, tornado. Temperature, Precipitation, Humidity, Atmospheric Pressure, Wind, Cloud Cover, Date are the independent variables that affect the target.
Data Preparation
This is usually the most challenging step in building a high-accuracy model. For instance, data may reside in several data sources. To build a model, the data must be combined into a single source. Also, it needs to be stored in such a way that the model can work with it. For instance, a model can’t be properly built when the independent variables that affect the target can’t be associated with the target.
Example: If temperature data is on one table and humidity in a second table, wind in a third table, etc… then this data must be combined. Also, the wind and temperature from 8/9/22, should not be combined with the cloud cover and humidity from 8/10/22.
Modelling
Before getting into the details of this step, first understand that there are different “categories” of models: models with numeric targets, models with targets that need to be classified and time-series models where we forecast a target into the future. Within each category there are several different “types” of models. To name a few of the different types of models, there are: linear regression, standard vector machine, random forest, tensor flow, naive bayes, prophet, extreme gradient boosting.
In this step, data first gets segregated into a training set and a testing set. Typically, a 70/30 or 80/20 is fine. Then we create one of each type of model (depending on the category of the target) using the same set of training/testing data.
Example: If we are predicting an extreme weather condition, the category of model is classification and random forest, naïve bayes, extreme gradient boosting, and tensor flow are some of the types of models we can build.
Evaluation
In this step we analyze the results of the different types of models that we built in the last step. We pass the testing data through each model and see how well it performs. The type of model that has the highest accuracy is then chosen.
Example: The 20 to 30 percent of data that was held back for testing would be passed through the classification, random forest, naïve bayes, extreme gradient boosting, and tensor flow models we built in the last step. The model that performs the best will be chosen.
Deployment
After a model is chosen and put into production, it is very important to monitor it. Its performance may degrade in which case it will need to be re-trained. Degradation can happen because data changes over time. An alternative is to simply re-build the models on a schedule.
Example: Every month, we repeat the last step (Evaluation). Most likely the same type of model will be chosen. However if data changed significantly, it’s possible that a new type of model will be the most accurate. But whether or not we have a new type of model, the important thing is that we re-trained with a new month’s worth of data. The new model, trained with this additional data, will be better than the prior one.
How Nastel builds and maintains high-accuracy models
Nastel specializes in Integration Infrastructure Management (i2M). This makes us very good at monitoring middleware. So with regards to understanding the business and the problem we are trying to solve, we understand our clients’ challenges when it comes to proactively addressing middleware issues before the issues become customer-facing.
Nastel understands middleware monitoring metrics (data). We understand which metrics are most critical to our clients. We created a Machine Learning platform as part of Nastel XRay that builds timeseries, numeric and classification models for middleware metrics. The timeseries models can forecast into the future, how a metric should behave based on its prior history. If the value of the actual metric should differ from the forecasted metric, we flag that event as anomalous and automatically notify personnel so that the problem can be addressed before escalating. We also build non-timeseries models that will indicate how the middleware metrics are affecting overall system performance.
Nastel’s Machine Learning platform takes streamed middleware metrics, formats the data so relevant data is associated together and multiple models can be built. We then evaluate the models and choose the one with the highest accuracy. All of this functionality is automatic. Our clients simply need to install the Machine Learning Solution and kick it off after enough historical streamed data has been collected. The solution will discover the metrics, format the data, build the models, run anomaly detection, alert users to potential problems …all automatically.
The new release of Nastel XRay includes specific models focused on AIOps for Apache Kafka use cases and its derivatives.
Conclusion
Machine learning is complex. Building highly-accurate models takes much time and specialized understanding of the problem domain. So the companies that do the best with Machine Learning are companies that understand the problem domain and understand how to properly implement Machine Learning. Because Nastel is a leader in the middleware domain and because we have a solid understanding of how to properly work with Machine Learning, we can assist our clients in using this wonderful technology to proactively identify and resolve their middleware issues.