You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

301 lines
9.2 KiB
Markdown

---
title: "RQ: Simple job queues for Python"
layout: docs
---
A worker is a Python process that typically runs in the background and exists
solely as a work horse to perform lengthy or blocking tasks that you don't want
to perform inside web processes.
## Starting workers
To start crunching work, simply start a worker from the root of your project
directory:
{% highlight console %}
$ rq worker high normal low
*** Listening for work on high, normal, low
Got send_newsletter('me@nvie.com') from default
Job ended normally without result
*** Listening for work on high, normal, low
...
{% endhighlight %}
Workers will read jobs from the given queues (the order is important) in an
endless loop, waiting for new work to arrive when all jobs are done.
Each worker will process a single job at a time. Within a worker, there is no
concurrent processing going on. If you want to perform jobs concurrently,
simply start more workers.
### Burst mode
By default, workers will start working immediately and will block and wait for
new work when they run out of work. Workers can also be started in _burst
mode_ to finish all currently available work and quit as soon as all given
queues are emptied.
{% highlight console %}
$ rq worker --burst high normal low
*** Listening for work on high, normal, low
Got send_newsletter('me@nvie.com') from default
Job ended normally without result
No more work, burst finished.
Registering death.
{% endhighlight %}
This can be useful for batch work that needs to be processed periodically, or
just to scale up your workers temporarily during peak periods.
### Worker arguments
In addition to `--burst`, `rq worker` also accepts these arguments:
* `--url` or `-u`: URL describing Redis connection details (e.g `rq worker --url redis://:secrets@example.com:1234/9`)
* `--path` or `-P`: multiple import paths are supported (e.g `rq worker --path foo --path bar`)
* `--config` or `-c`: path to module containing RQ settings.
* `--worker-class` or `-w`: RQ Worker class to use (e.g `rq worker --worker-class 'foo.bar.MyWorker'`)
* `--job-class` or `-j`: RQ Job class to use.
* `--queue-class`: RQ Queue class to use.
* `--connection-class`: Redis connection class to use, defaults to `redis.StrictRedis`.
## Inside the worker
### The worker life-cycle
The life-cycle of a worker consists of a few phases:
1. _Boot_. Loading the Python environment.
2. _Birth registration_. The worker registers itself to the system so it knows
of this worker.
3. _Start listening_. A job is popped from any of the given Redis queues.
If all queues are empty and the worker is running in burst mode, quit now.
Else, wait until jobs arrive.
4. _Prepare job execution_. The worker tells the system that it will begin work
by setting its status to `busy` and registers job in the `StartedJobRegistry`.
5. _Fork a child process._
A child process (the "work horse") is forked off to do the actual work in
a fail-safe context.
6. _Process work_. This performs the actual job work in the work horse.
7. _Cleanup job execution_. The worker sets its status to `idle` and sets both
the job and its result to expire based on `result_ttl`. Job is also removed
from `StartedJobRegistry` and added to to `FinishedJobRegistry` in the case
of successful execution, or `FailedQueue` in the case of failure.
8. _Loop_. Repeat from step 3.
## Performance notes
Basically the `rq worker` shell script is a simple fetch-fork-execute loop.
When a lot of your jobs do lengthy setups, or they all depend on the same set
of modules, you pay this overhead each time you run a job (since you're doing
the import _after_ the moment of forking). This is clean, because RQ won't
ever leak memory this way, but also slow.
A pattern you can use to improve the throughput performance for these kind of
jobs can be to import the necessary modules _before_ the fork. There is no way
of telling RQ workers to perform this set up for you, but you can do it
yourself before starting the work loop.
To do this, provide your own worker script (instead of using `rq worker`).
A simple implementation example:
{% highlight python %}
#!/usr/bin/env python
import sys
from rq import Connection, Worker
# Preload libraries
import library_that_you_want_preloaded
# Provide queue names to listen to as arguments to this script,
# similar to rq worker
with Connection():
qs = sys.argv[1:] or ['default']
w = Worker(qs)
w.work()
{% endhighlight %}
### Worker names
Workers are registered to the system under their names, see [monitoring][m].
By default, the name of a worker is equal to the concatenation of the current
hostname and the current PID. To override this default, specify the name when
starting the worker, using the `--name` option.
[m]: /docs/monitoring/
### Retrieving worker information
`Worker` instances store their runtime information in Redis. Here's how to
retrieve them:
{% highlight python %}
from redis import Redis
from rq import Queue, Worker
# Returns all workers registered in this connection
redis = Redis()
workers = Worker.all(connection=redis)
# Returns all workers in this queue (new in version 0.10.0)
queue = Queue('queue_name')
workers = Worker.all(queue=queue)
{% endhighlight %}
_New in version 0.10.0._
If you only want to know the number of workers for monitoring purposes, using
`Worker.count()` is much more performant.
{% highlight python %}
from redis import Redis
from rq import Worker
redis = Redis()
# Count the number of workers in this Redis connection
workers = Worker.count(connection=redis)
# Count the number of workers for a specific queue
queue = Queue('queue_name', connection=redis)
workers = Worker.all(queue=queue)
{% endhighlight %}
### Worker statistics
_New in version 0.9.0._
If you want to check the utilization of your queues, `Worker` instances
store a few useful information:
{% highlight python %}
from rq.worker import Worker
worker = Worker.find_by_key('rq:worker:name')
worker.successful_job_count # Number of jobs finished successfully
worker.failed_job_count. # Number of failed jobs processed by this worker
worker.total_working_time # Number of time spent executing jobs
{% endhighlight %}
## Taking down workers
If, at any time, the worker receives `SIGINT` (via Ctrl+C) or `SIGTERM` (via
`kill`), the worker wait until the currently running task is finished, stop
the work loop and gracefully register its own death.
If, during this takedown phase, `SIGINT` or `SIGTERM` is received again, the
worker will forcefully terminate the child process (sending it `SIGKILL`), but
will still try to register its own death.
## Using a config file
_New in version 0.3.2._
If you'd like to configure `rq worker` via a configuration file instead of
through command line arguments, you can do this by creating a Python file like
`settings.py`:
{% highlight python %}
REDIS_URL = 'redis://localhost:6379/1'
# You can also specify the Redis DB to use
# REDIS_HOST = 'redis.example.com'
# REDIS_PORT = 6380
# REDIS_DB = 3
# REDIS_PASSWORD = 'very secret'
# Queues to listen on
QUEUES = ['high', 'normal', 'low']
# If you're using Sentry to collect your runtime exceptions, you can use this
# to configure RQ for it in a single step
# The 'sync+' prefix is required for raven: https://github.com/nvie/rq/issues/350#issuecomment-43592410
SENTRY_DSN = 'sync+http://public:secret@example.com/1'
{% endhighlight %}
The example above shows all the options that are currently supported.
_Note: The_ `QUEUES` _and_ `REDIS_PASSWORD` _settings are new since 0.3.3._
To specify which module to read settings from, use the `-c` option:
{% highlight console %}
$ rq worker -c settings
{% endhighlight %}
## Custom worker classes
_New in version 0.4.0._
There are times when you want to customize the worker's behavior. Some of the
more common requests so far are:
1. Managing database connectivity prior to running a job.
2. Using a job execution model that does not require `os.fork`.
3. The ability to use different concurrency models such as
`multiprocessing` or `gevent`.
You can use the `-w` option to specify a different worker class to use:
{% highlight console %}
$ rq worker -w 'path.to.GeventWorker'
{% endhighlight %}
## Custom Job and Queue classes
_Will be available in next release._
You can tell the worker to use a custom class for jobs and queues using
`--job-class` and/or `--queue-class`.
{% highlight console %}
$ rq worker --job-class 'custom.JobClass' --queue-class 'custom.QueueClass'
{% endhighlight %}
Don't forget to use those same classes when enqueueing the jobs.
For example:
{% highlight python %}
from rq import Queue
from rq.job import Job
class CustomJob(Job):
pass
class CustomQueue(Queue):
job_class = CustomJob
queue = CustomQueue('default', connection=redis_conn)
queue.enqueue(some_func)
{% endhighlight %}
## Custom exception handlers
_New in version 0.5.5._
If you need to handle errors differently for different types of jobs, or simply want to customize
RQ's default error handling behavior, run `rq worker` using the `--exception-handler` option:
{% highlight console %}
$ rq worker --exception-handler 'path.to.my.ErrorHandler'
# Multiple exception handlers is also supported
$ rq worker --exception-handler 'path.to.my.ErrorHandler' --exception-handler 'another.ErrorHandler'
{% endhighlight %}