Why You Need APM—and How it Works
There’s a lot to consider when engineering and implementing software, whether as an update patch or a newly-introduced product. End users have certain expectations when introduced to new or updated software—at the top of the list are aesthetics, ease of use, stability, and response time—the last two of which can be significantly improved when you employ application performance management or APM.
What Does APM Do?
The primary purpose of APM is to improve response time between a source application, the final destination, and ultimately—in many cases— back to the source. The source and destination may vary. In one situation, the source and the destination may be a mobile game app—such as Jem Junkies, a fictional puzzle game.
In another situation, this time with fictional online auction site BiddingBuddies.com, the initial source is a web page. The initial destination is the server on which all Bidding Buddies data resides. In this case, the source and destination change, depending on which entity is sending and receiving the data.
Both situations are dramatically different, but their ultimate technical objectives are very similar: to offer end users the best experience they can provide so they’ll come back and play again.
A key component for maintaining the end user’s satisfaction is response time. If an app is slow, if an auction site continually hoses the bidder out of a final bid due to the poor reflexes of the site, end-users will move on. Nobody wants that. APM is put into place at each site to inject new agility and quicker response times into their systems.
APMs can employ either passive or active monitoring, and the monitoring method used depends on the needs of the software.
APM Passive Monitoring Watches — But Doesn’t Fix
You’re playing Jem Junkies, your favorite puzzle game, on your phone. You’ve reached level 36, and you go to click on a gem—and nothing happens. Just as the gem finally lights up as if to say, “You can do what you want with me now!” your board blows up, because your timer ends a split second before you can move it. This isn’t the first time this has happened, and you’ve been stuck on level 36 for far too long.
Now, Jem Junkies doesn’t maintain a continuous endpoint connection with user devices. It’s a game that, once installed, remains resident on the user’s device and occasionally receives an update. For the player of this game, the response time is dependent entirely on the app’s interactions between the device and the end-user.
The APM for this game is pretty much self-contained within the app itself, with only occasional interaction with Jem Junkies headquarters. When the user agrees to the EULA, the app receives permission to send data to the mothership for purposes of improving and updating the user’s gaming experience. Once the user gives their permission, an additional module is installed along with the game itself.
Each time you open Jem Junkies, the installed module tracks each game session, taking note of things like device specs, device location, and the username that’s tied to the game. The module monitors activity like how long you played, which level you reached, how many times you play each day, and what time of day you most often play. The module holds onto this information, and once you open the game again to play, it uploads the information of the last play session to Jem Junkies headquarters, where all of the data from all of the users who play Jem Junkies is aggregated. This information can be called up into a database and arranged into human-readable tables for QA techs to peruse.
Through painstaking scrutiny provided by a pattern recognition algorithm, QA can discern that an abnormally high percentage of users who play the game on a particular smartphone with a particular operating system seems to stop playing at level 36. QA also notices that these same users open the app repeatedly to level 36 but can’t seem to get past it.
To their dismay, QA further notices that a good 25 percent of these same users haven’t opened the app in three days or more—they seem to have given up and moved onto another game, which Jem Junkies cannot track. The marketing team is not going to like this.
The company immediately gets game testers on it and soon finds out that when a player plays the game on one particular phone with one particular operating system, about midway through level 36, whatever gem lands in column 3, row 12 has a tendency to get stuck. The gem will light up, but it will not move when the user attempts to drag it across the screen, which is the whole point of the game!
It turns out that the issue wasn’t that the level was unwinnable. It was just that the level became unplayable.
Testers inform the development team of the issue with the particular phone brand and operating system. Developers scour through the code and find the culprit: a misplaced semicolon. Had the semicolon been placed one character to the right, this level would have been easily winnable.
A programmer fixes the bug, pushes an update to the offending operating system, and you’re finally able to move that gem and get out of level 36.
This example highlights Passive Monitoring. The module collects data as the user plays the game then sends the data to a location where the information is parsed and interpreted by someone else. Although the app provides information crucial to diagnosing and repairing issues, all diagnostics and fixes are left to the software’s QA and development teams to figure out.
Active Monitoring Can Diagnose and Fix Many Problems
We can almost feel the dismay and defeat that you experienced when, in the last two seconds of a Bidding Buddies war, you missed out on that oversized Elvis face pillow because your final click—which you made just a moment before the timer ended—didn’t register until two seconds after the timer was done.
You can blame your home network, which is having another bad hair day, or you could blame biddingbuddies.com for the lag they experience on the reg because they have a substandard or non-existent APM system.
Bidding Buddies gets wind of the frequent lagging problem and decides to employ an automated monitoring service that inserts virtual bots into their system. They immediately begin to follow the flow of the site’s traffic. These bots report errors and anomalies back to a bank of monitors, attended by a systems administrator or some other tech-savvy interweb guru. This data can reveal any number of pain points within the auction site’s internal system.
Although many APM software have onboard tools integrated into the overall system that can execute improvement and repair functions with little to no human intervention, such systems can be quite spendy, so Bidding Buddies decides that they will go with just the monitoring function and leave it up to the systems administrators to come up with the solutions.
With APM systems that utilize this type of active monitoring structure, admins can quickly identify, locate, and diagnose issues that can affect the response time performance of their entire architecture. They can keep watch on the flow of information between their database and their customer-facing app and locate patterns in traffic flow.
For instance, Bidding Buddies admins find that they experience a midday increase of traffic as people across the country spend their lunch break engaging in a virtual battle for the right to possess random knick-knacks, baubles, and gewgaws. Admins are then able to determine a corresponding slow down—the response times between end users’ work computers and the Bidding Buddies servers become longer and longer, until around 1:00 PM Bidding Buddies time, at which point traffic begins to fall off and response times improve once more.
APM that employs active monitoring allows admins to quickly determine when certain load thresholds slow down the internal network. With this knowledge, they know that they may need to request additional servers to handle the high-traffic overflow. The admins can decide that their server capacity is just right. Still, there may be a bottleneck in their routing, so they can add physical switches or implement automated redirects when the traffic load gets too high.
The active monitoring system Bidding Buddies employs searches out patterns that can cause a breakdown in application performance. This type of active monitoring service can diagnose many problems—and with the right integration, it can also improve performance by actively repairing some of those breakdowns with little or no need for human intervention.
In the case of Bidding Buddies’ lunchtime slowdown, admins were able to pin down a temperature spike that approaches dangerous thermal levels on a specific bank of servers at a certain time of day. This information helps them trace the rising heat levels to Phil from accounting, who naps in the server room at lunchtime and blocks a particularly critical AC vent with his sleeping bag.
It was Phil the whole time.
APM Saves the Day
While these are just two fairly rudimentary examples of APM, they highlight the necessity for vigilant monitoring and management of any given system’s performance. The experiences with Jem Junkies and Bidding Buddies—or very similar issues—aren’t all that uncommon. They hit mobile and PC gamers, online bidders, and purchasing agents every day.
Jem Junkies implemented their APM in short order and managed to woo back many of their lost players with a promise of 100 Jem Bucks (redeemable only in-app for powerups, of course).
Likewise, Bidding Buddies could use their APM system to finally rid themselves of that pesky Phil from Accounting. Had biddingbuddies.com implemented the right APM much sooner, you might be snuggling with your Elvis pillow right now. We would offer you ours, but we won the bid fair and square.