These technologies aren’t as similar as they initially seem.
They address different portions of the application stack and are actually complementary.
Gunicorn is for scaling web request concurrency, while celery should be thought of as a worker queue. We’ll get to kubernetes soon.
Gunicorn
Web request concurrency is primarily limited by network I/O or “I/O bound”. These types of tasks can be scaled using cooperative scheduling provided by threads. If you find request concurrency is limiting your application, increasing gunicorn worker threads may well be the place to start.
Celery
Heavy lifting tasks e.g. compress an image, run some ML algo, are “CPU bound” tasks. They can’t benefit from threading as much as more CPUs. These tasks should be offloaded and parallelized by celery workers.
Kubernetes
Where Kubernetes comes in handy is by providing out-of-the-box horizontal scalability and fault tolerance.
Architecturally, I’d use two separate k8s deployments to represent the different scalablity concerns of your application.
One deployment for the Django app and another for the celery workers.
This allows you to independently scale request throughput vs. processing power.
I run celery workers pinned to a single core per container (-c 1) this vastly simplifies debugging and adheres to Docker’s “one process per container” mantra. It also gives you the added benefit of predictability, as you can scale the processing power on a per-core basis by incrementing the replica count.
Scaling the Django app deployment is where you’ll need to DYOR to find the best settings for your particular application.
Again stick to using --workers 1 so there is a single process per container but you should experiment with --threads to find the best solution. Again leave horizontal scaling to Kubernetes by simply changing the replica count.
HTH
It’s definitely something I had to wrap my head around when working on similar projects.