Intelligent Machine Monitoring
Artificial Intelligence (AI, also called Machine Learning) is certainly making its way in the world. Technologies such as Voice Recognition, Face Recognition, Predictive Analytics, Self-driving cars, and Robotics are now becoming embedded into our society. With the advent of big-data, these technologies can become more and more powerful and more and more a part of our everyday lives. I’m sure that there is much controversy over this. I’m sure that many people consider it invasive. For instance, many people want to always drive their own cars. Will technology ever get to the place where it’s common that we as human beings don’t ever drive our own cars and our own cars always drive us? Are we in essence giving up our humanness to be run by machines, much the way the movie Terminator depicted? Maybe we are, and maybe we aren’t. Maybe it depends on how responsible people are with the use of AI. I suppose time will tell.
I would however like to discuss a use of AI that I believe most people would agree is a benefit to society and has an almost freeing aspect to it. It has to do with machines monitoring other machines. You see machines, especially computers, have become so complex that to have human beings monitor them for issues is almost impossible. And even if it were possible, doing so is in essence de-humanizing.
Let’s take middleware (also called integration infrastructure) as an example. To tell if middleware is behaving properly, metrics need to be monitored. And there are thousands of them. Who wants to sit around all day long and monitor a bunch of metrics? How boring is that. That is the reason that middleware monitoring tools came into being. Historically monitoring tools monitor metrics and alert when the metrics exceed specified thresholds. And they have done a very good job at this. AI however has the ability to take monitoring to an entirely new level. This is because AI can account for natural fluctuations in metrics. AI systems can learn over time what normal fluctuations are and account for them. Conventional systems can account for fluctuations, but it involves a person either programming the fluctuations into the system or creating program parameters to account for them. And if the fluctuations ever change, the programming or parameters would need adjusting. AI systems learn on their own using statistical models. Therefore, there is no need for a person to ever account for the fluctuations.
Let’s take Kafka’s ‘BytesInPerSec’ metric as an example. This metric tracks network throughput on brokers and can be used to analyze network traffic to consider adding additional resources to increase throughput. BytesInPerSec can fluctuate due to changes in network traffic during the day. For instance, at 2:00 PM there may be very high traffic whereas at 2:00 AM traffic may be almost nothing. Using statistical models, AI would learn this behavior over time. Then if something out of the ordinary occurred that did not fit this pattern, the AI can flag the occurrence as an ‘anomaly’.
AI also has the ability, using statistics, to correlate metrics to other metrics. For example, it can tell which Kafka metrics are affecting overall system performance the most. This can in turn help System Administrators make adjustments to maximize system performance.
Nastel is soon to announce their newest AI solution, ‘Machine Learning for Kafka Solution‘. This i2M solution will monitor Kafka metrics, alert System Administrators when something unexpected occurs (anomalies) and/or automatically correct the issue if possible. The alert will include a visual representation of the metric over time so that the extent of the anomaly can be determined. The solution will also provide additional visual representations that will give insights into how the most significant Kafka metrics are affecting the overall system performance.
Nastel’s Machine Learning for Kafka Solution will allow Administrators the peace-of-mind to know that Kafka is being monitored 24×7 and if anything goes wrong they can act pro-actively and not have their customers impacted. It will also assist them in fine-tuning Kafka to help maximize system performance.