kubernetes health check

The day before thanksgiving, I was pondering an issue I was having. I was pinning a package to a specific version in my Docker container and the repository I grabbed it from stopped offering this specific version. This resulted in a container that Jenkins responded as being built correctly, but missing an integral package that allowed my application to function properly. This led me to believe I had to implement Puppet’s Lumogon in my Jenkins build process. Curious if anyone had something like this already developed, I headed over to github.com which eventually led me to compose this tweet:

This readinessProbe communicates with kubeproxy to either allow a pod to be in service or out of service. At first I thought the readinessProbe was a once-and-done check, but I found out later this is not the case. When a pod gets launched, kubernetes waits until the container is in the ready state. We can define what consists of a ready container by the use of probes. Coupled with a kubernetes strategy, we can also define and ensure our application survives broken container updates.

Since the application I am supporting is already HTTP based, making an HTTP check to an endpoint that reports on connectivity to core services was the most trivial to implement. I created a script to verify connectivity to MariaDB, MongoDB, Memcached, Message Queue, and verified certain paths on the NFS share were present. All of these items are important to my application and most of them require certain configuration values in my containers to operate. Having kubernetes run this script every time there is a new pod verifies I will never an experience an outage due to a missing package again. As I mentioned before, I thought the readinessProbe was a once-and-done, however, I found that after implementing it, my metrics indicated the script was running every 10 seconds per every replica… this quickly added up!

After some chatting in the #kubernetes-users slack, I was able to get more understanding of the readinessProbe and how it was designed to communicate with kubeproxy so that you could “shut off” a container by taking it out of rotation. This was not the behavior I wanted so it was suggested that I create a state file. This state file is created after the check and, if it is present, it skips all checks. Due to the ephemeral nature of container storage, it can be assumed this file will never exist on a pod where this check has not been performed.