The Supervisor should refuse to start if it cannot bind its ports #7734
Labels
Team:Habitat
All issues tied to Chef habitat team
Type: Bug
Issues that describe broken functionality
The Supervisor's various subsystems spin up on different threads at startup. However, once the threads are spawned, they no longer communicate with the main thread of the Supervisor directly. This can lead to situations where, say, some other process on the machine has already bound the incoming gossip port of the Butterfly server. This can lead to a Supervisor that appears to be running normally, but is in fact completely oblivious to gossip coming from the outside world. The thread trying to bind that port panics and dies, and will log a message saying as much. This single log message can quickly be lost in a sea of other messages, which themselves contribute to the overall appearance of a properly functioning Supervisor.
We should more closely monitor these threads as they start up and exit with an error if any of them fail. Additionally, we should also look into graceful ways of shutting down / restarting / otherwise handling crashes of such threads in an running Supervisor.
The text was updated successfully, but these errors were encountered: