Well… this is an old question, but I ended up here, so I thought I’d give my two cents here:
It seems pretty clear that a 2xx should be returned if all is OK
If health is not OK, I think it should return a 5xx result (4xx talks about the client being at fault in the request; 2xx and 3xx are all successful to some degree).
I think that a 5xx is correct because this is a special request that is answering about the state of the whole service. Also, because most Load Balancers offer liveliness checks based on response codes and not all offer a way to parse a more complex payload (other than perhaps a RegExp Match which can make the check brittle).
I agree with @Julien that a 500 (specifically) doesn’t seem appropriate, and we’ve decided on 503 Service Unavailable.
503 seems to fit for a couple of reasons:
- It’s a 5xx family result code which indicates that something is going on on the server side.
- It has a temporary nature to it indicating that it may recover.