Application monitoring principles – Why, when, what!

Valerio Barbera
Why, what, and When monitoring your application

A lot of confusion has arisen in the world of monitoring, probably because so many different kinds of data can be used in so many different ways. At first approach, it’s not easy for developers to find the best combination to solve emergencies efficiently while improving everyday work. In this article, I share my experiences trying to differentiate:

  • When, or in which situations, monitoring can be effective;
  • Why you should monitor certain parts of your system and not others, based on your stage of growth; and
  • What is the right tool for each specific monitoring problem.

Hi, I’m Valerio, a software engineer from Italy and chief technology officer (CTO) at Inspector.

Solving customers’ critical problems can generate great business opportunities, but in these situations, you need to be ready for really high customer expectations.

To serve these customers and seize these business opportunities, I quickly realized that it was necessary to automate most of the activities that were taking up a lot of my time every day with a negative impact on my productivity.

After more than ten years as a software engineer, I spent a lot of time selecting the best set of tools to exponentially improve my productivity.

What application monitoring tools are

Application monitoring tools generally consist of two parts: the agent and the analytics platform.

The agent is a software package that developers install in their servers or applications (based on how the agent is designed). Its goal is to collect relevant information about application behavior and performance.

This information is sent to the remote platform that analyzes that data and generates visual charts to help developers easily understand what’s happening in their system. The platform should be able to send alerts to developers if something goes wrong in a convenient way.

What application monitoring tools are not

This is obviously a simplistic description that could cover a huge amount of tools out there.

In fact, many tools look like application monitoring tools, but they have nothing to do with application monitoring. These similarities made it difficult for me to figure out which was the right tool to solve my productivity problems.

Here is what I learned on my journey.

Logs management tools

A logs management tool is often the first kind of tool we approach because watching application logs has been one the most important daily activities to be informed about what’s happening inside the most important parts of our application, since the very beginning of the application development journey.

I was no different. But when the application started to scale (it runs on multiple servers, requires a complex architecture, etc.), I realized that it was very difficult to extract relevant information from logs about application performance and monitor the impact of the code changes over time in terms of stability and resource consumption.

When the car was invented, people initially looked for a faster horse because they were used to using the horse. Then they realized that a different tool was needed to take it to the next level.

Uptime monitoring tools

Uptime monitoring tools can be described as a more sophisticated “ping.”

The main purpose is simple: the tools ping your application endpoints from multiple regions to understand how well it can be reached by users located in various parts of the world.

This information helps understand how the cloud infrastructure works to bring your application to the end-users (load balancer, CDN, network, etc.) and if some of these systems generate issues. It does not provide any information on what is going on inside your application.

In my case, my application serves users all around the world, so external ping stats help us to understand what regions suffer the highest latency by making decisions about in which regions we should place our servers to improve our customer experience.

The downside is they monitor the external environment. If your database slows down, you will never know.

Server vs Application monitoring

This is the most challenging difference to understand, and I have not found any interesting article that helped me clarify the separation of duties.

The application runs on a server, so they are obviously two strictly related components of the system. That’s why it might be confusing at first.

But server and application monitoring accomplish two completely different tasks.

Server monitoring focuses on infrastructure, and it’s also basically provided for free by any decent cloud provider.

Google GCP, AWS, and DigitalOcean provide the most important metrics by default, like CPU usage, storage, bandwidth, and more, completely free with no extra cost other than running the virtual machine (VM) itself.

Server monitoring offered free by cloud providers.

Understanding the time your VMs must scale up (or down) is a necessity, but having the CPU at 100 percent could mean everything and nothing:

  • What part of your application do you need to refactor if your application consumes too many resources?
  • How can you identify why a particular part of your app is slowing down, causing a negative experience to your users?
  • How can you be aware if your application is firing an exception, and why?

As mentioned at the beginning of the article, server monitoring works by installing an agent at the server level, “outside” of your application. But it’s really hard to look at your application from the outside and know what’s going on inside your code.

Application monitoring finally focuses you on “application” 🙆🙆.

This class of tools provides you a software library, not a package to install in the OS. Developers install the integration library in their application like any other dependency without touching the server’s configuration. It automatically collects relevant information about your code performance, errors, and trends to alert you if something goes wrong, like a sentinel.

Issues with all-in-one platforms

The monitoring tool market is currently dominated by gigantic, all-in-one platforms like Dynatrace, Instana, AppDynamics, Datadog, and more that provide one platform containing logs, server metrics, uptime metrics, application metrics, unstructured data, search indices, etc.

During a business event, I had the opportunity to present Inspector to one of the big utility companies in Italy (€5 billion in annual revenues) that had already entered into an agreement with Dynatrace for €2 million per year.

I immediately thought this couldn’t be the case for the millions of software houses and software-as-a-service (SaaS) startups out there. This kind of platform often requires a dedicated engineering team for configuration and maintenance, and the difficulty of being used by smaller companies increases even more.

What problem does an application monitoring tool solve?

Application monitoring tools provide metrics and alerts to identify bugs and bottlenecks in your application without waiting for the customers to report an issue.

It acts like a sentinel that allows you to visually explore how your code runs, doing 90 percent of the analysis work in complete autonomy.

Why is application monitoring important?

Application monitoring is essential because happy customers are paying customers.

Having an application is the easy part, relatively speaking; anyone can do it.

The real work starts by building your rapport with the customer and making them number one.

If you put the customer first, they’ll remain loyal fans of your product. On the other hand, one of the worst things for your business is error-prone, buggy software.

Nothing will drive potential paying customers away faster than waiting for the site to load up, or finding it down altogether. So do whatever it takes to make them happy, and the revenue will follow.

What can you monitor in an application?

You should be able to easily know how long your application takes to fulfill http requests or complete background processes, like jobs, cron tasks, etc., to understand what are the most consuming processes in your system.

Each execution cycle is typically called a “transaction.” During a transaction, the application can perform many different tasks, such as SQL queries, read/write files, call external systems, algorithms, etc.

In Inspector, you can explore your running code visually, like in the image below:

All of this information is automatically collected by Inspector without any tricky configuration by developers.

Have you ever desired to watch your code running, instead of having to just imagine it?

That’s what Inspector is designed to do, and how it is positioned in the monitoring market. It focuses your attention on the code.

Conclusion

I really believe that clear and simple information is the most important thing that can help you make better decisions.

Learning why, when, and how to use monitoring tools was one of the most confusing parts of my developer journey. I hope my experience building Inspector can help you become more aware of your needs, and the right tools to solve your problems and improve your productivity.

Thank you so much for reading. Share this article on your social accounts if you think it would be helpful to others.

Related Posts

DAZN platform down on sunday

Last Sunday, DAZN (one of the most known streaming platforms in Europe) crashed and subscribers were unable to regularly watch the portal’s flagship product: FOOTBALL. The result was an apology post published on the DAZN facebook page (you can see it below) in which a supplier was blamed, and customers left with no alternatives, so

Nodejs async engine in action (visually)

Hi, I’m Valerio, software engineer and CTO at Inspector. Whether you’ve looked at async/await and promises in javascript before, but haven’t quite mastered them yet, this article aims to help you to better understand the real effects of the nodejs async engine on your code execution flow. Furthermore we’ll do it visually, navigating the code

How to deploy a NodeJs server using Laravel Forge

Hi, I’m Valerio, software engineer from Italy, and C.T.O. at Inspector. We recently worked to replace the http handler behind our ingestion endpoint (ingestion.inspector.dev) with a new implementation in pure NodeJs. This endpoint receives monitoring data sent from all applications connected to our Code Execution Monitoring engine, and it treats more than 5 million http