Unfortunately this is a problem with django + psycopg2 + celery combo.
It’s an old and unsolved problem.Take a look on this thread to understand:
https://github.com/celery/django-celery/issues/121Basically, when celery starts a worker, it forks a database connection
from django.db framework. If this connection drops for some reason, it
doesn’t create a new one. Celery has nothing to do with this problem
once there is no way to detect when the database connection is dropped
using django.db libraries. Django doesn’t notifies when it happens,
because it just start a connection and it receives a wsgi call (no
connection pool). I had the same problem on a huge production
environment with a lot of machine workers, and sometimes, these
machines lost connectivity with postgres server.I solved it putting each celery master process under a linux
supervisord handler and a watcher and implemented a decorator that
handles the psycopg2.InterfaceError, and when it happens this function
dispatches a syscall to force supervisor restart gracefully with
SIGINT the celery process.
Edit:
Found a better solution. I implemented a celery task baseclass like this:
from django.db import connection
import celery
class FaultTolerantTask(celery.Task):
""" Implements after return hook to close the invalid connection.
This way, django is forced to serve a new connection for the next
task.
"""
abstract = True
def after_return(self, *args, **kwargs):
connection.close()
@celery.task(base=FaultTolerantTask)
def my_task():
# my database dependent code here
I believe it will fix your problem too.