You become responsible forever for what you’ve tamed, stated the Fox from the Little Prince. In software development, the famous quote can be paraphrased: you become responsible for what you’ve coded. And by responsibility, we mean monitoring the software condition, functioning, and responding to issues before they affect users.
Health checks – where to check the pulse
A few words of introduction on the process. Since health checks are most commonly executed on the backend, this is what we’ll focus on.
Each backend-side application needs to communicate with an array of external resources in order to properly process incoming requests. The resources in question include everything apart from the application itself – a database to query on, a rest API to call to, a queue to push an item into, etc. Even if the application in question works locally, it uses a drive to serve and save data, making the hard drive an external resource as well.
Without these resources being fully operational and accessible*, the application can’t perform properly. Depending on the scenario, it’s an unambiguous answer from the service that might signal the handicap. In other cases, the buffer between an error occurring and the error-causing app malfunction may last considerably longer – up to several days or even weeks, during which the issue remains unrecognized.
*Accessible and operational resource – what does it mean exactly? In the discussed context, the health check doesn’t end after assessing whether the resource is present, visible, and accessible. The check verifies whether the resource is capable of performing actions required by the application (as in suiting its needs).
For example, a database health check will also execute a simple, casual, query like “SELECT 1”. Such a step ensures the database can process queries and the application is authorized to execute them. As a result, the risks of app disruptions are reduced to an extent.
In short, it’s not the resource existence that is verified – health check updates the status of granted access, authorization for saving data, and others, according to the needs.
Liveness and readiness
Without diverging (at least this time) into topics like Kubernetes, containers, and other in-depth matters, we’ll just mention that during checks, we assess the architecture’s liveness and readiness.
As the concepts aren’t synonyms, a check signaling the resource to be alive but not responding in a short time may indicate its malfunction too hastily. The case may be that the said resource is working properly, but its readiness requires more time (as long, of course, as it fits set benchmarks for the architecture in question).
Why do you need software health checks?
There’s no complicated explanation needed. Software architecture needs to be monitorable – allowing for the ongoing supervision of its condition. By ongoing, we don’t mean a dedicated admin, a site reliability engineer, or a DevOps professional to be assigned to constantly check the health of a service 8 hours a day, 365 days a year. Clearly, there is a lot of space for automation here, but the process itself always starts with some kind of indicator, triggering further actions.
Health checks can serve as such agents since they indicate the general state of the microservices and their dependencies.
Having health checks exposed, the dedicated monitoring software can periodically check on them. As a result, the engineering team has information about the solutions’ architecture elements experiencing downtimes or malfunctions. This should be subject to automated reactions.
What reactions can it spark?
For example, a load balancer redirecting traffic to another service, a container orchestrator attempting to restart the unresponsive database, or even sending emails to the people responsible for certain processes can’t be easily and automatically diagnosed and recovered.
How often should you check the software’s health?
The most intuitive answer would be: often and regularly. But that’s a rather vague response. Regular could mean every single minute, every 15 minutes, or even once a day. It depends entirely on the use case, the application’s expected availability, and the risk at stake. Health checks by default shouldn’t be expensive operations, but sometimes, in more complex scenarios they might be (e.g. if the health checks are performed in the cascade form on the app dependencies) and the amplitude should be adjusted.
Again, in short – as often as possible, but without exaggeration.
One could say that the health status of the examined service relies on the health status of all its dependencies (as we somewhat stated a few sentences above). Assuming this is the case in the given application, it might be a good idea to implement health checks that happen in a cascade way, so the external resource is verified by calling its own health check. Calling health checks on the roots of the architecture graph will give the information about the overall health.
Are health checks vital to software development and management?
We don’t mean to sound overly dramatic, but assuring the application works properly is one of the keys to delivering stable and safe services to end-users.
What we’re saying is that health checks are not a whim, but a necessity. Maintaining functional, efficient, and safe software cannot do without constant monitoring of the builds’ health. Software developers need to assure that all resources are available, the authorizations are up-to-date, and data saving is possible. As we’ve elaborated above, this reduces the chance of app disruptions to an extent.
Software health checks aren’t the only area to oversee in the development process. Are you looking for a digital transformation partner to take care of your software project? Book your free one-hour consultation to find the answers to all your questions regarding the process.