Self-Inflicted Wounds: The SSL Failure on the Linux Build Server

A nasty cousin of It Works On My Machine is the It Fails On That Machine. It is nasty because you know that there is something wrong, but you can’t reproduce it.

The machine in question was our Linux build agent, and the failure in question was a set of failing tests that failed to perform a certain operation when TLS was enabled. The problem? They were failing with I/O errors, but only with TLS, and the connection was using localhost. Further investigation showed that the most likely reason for the failure was a timeout. But how could that be? For fun, sometimes, the test passed. So it wasn’t an issue of a firewall of some kind. Testing using openssl s_server and connecting to it manually didn’t show any issues.