mirror of https://github.com/peter4431/rq.git
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
374 lines
12 KiB
Markdown
374 lines
12 KiB
Markdown
---
|
|
title: "RQ: Workers"
|
|
layout: docs
|
|
---
|
|
|
|
A worker is a Python process that typically runs in the background and exists
|
|
solely as a work horse to perform lengthy or blocking tasks that you don't want
|
|
to perform inside web processes.
|
|
|
|
|
|
## Starting Workers
|
|
|
|
To start crunching work, simply start a worker from the root of your project
|
|
directory:
|
|
|
|
```console
|
|
$ rq worker high default low
|
|
*** Listening for work on high, default, low
|
|
Got send_newsletter('me@nvie.com') from default
|
|
Job ended normally without result
|
|
*** Listening for work on high, default, low
|
|
...
|
|
```
|
|
|
|
Workers will read jobs from the given queues (the order is important) in an
|
|
endless loop, waiting for new work to arrive when all jobs are done.
|
|
|
|
Each worker will process a single job at a time. Within a worker, there is no
|
|
concurrent processing going on. If you want to perform jobs concurrently,
|
|
simply start more workers.
|
|
|
|
You should use process managers like [Supervisor](/patterns/supervisor/) or
|
|
[systemd](/patterns/systemd/) to run RQ workers in production.
|
|
|
|
|
|
### Burst Mode
|
|
|
|
By default, workers will start working immediately and will block and wait for
|
|
new work when they run out of work. Workers can also be started in _burst
|
|
mode_ to finish all currently available work and quit as soon as all given
|
|
queues are emptied.
|
|
|
|
```console
|
|
$ rq worker --burst high default low
|
|
*** Listening for work on high, default, low
|
|
Got send_newsletter('me@nvie.com') from default
|
|
Job ended normally without result
|
|
No more work, burst finished.
|
|
Registering death.
|
|
```
|
|
|
|
This can be useful for batch work that needs to be processed periodically, or
|
|
just to scale up your workers temporarily during peak periods.
|
|
|
|
|
|
### Worker Arguments
|
|
|
|
In addition to `--burst`, `rq worker` also accepts these arguments:
|
|
|
|
* `--url` or `-u`: URL describing Redis connection details (e.g `rq worker --url redis://:secrets@example.com:1234/9` or `rq worker --url unix:///var/run/redis/redis.sock`)
|
|
* `--path` or `-P`: multiple import paths are supported (e.g `rq worker --path foo --path bar`)
|
|
* `--config` or `-c`: path to module containing RQ settings.
|
|
* `--results-ttl`: job results will be kept for this number of seconds (defaults to 500).
|
|
* `--worker-class` or `-w`: RQ Worker class to use (e.g `rq worker --worker-class 'foo.bar.MyWorker'`)
|
|
* `--job-class` or `-j`: RQ Job class to use.
|
|
* `--queue-class`: RQ Queue class to use.
|
|
* `--connection-class`: Redis connection class to use, defaults to `redis.StrictRedis`.
|
|
* `--log-format`: Format for the worker logs, defaults to `'%(asctime)s %(message)s'`
|
|
* `--date-format`: Datetime format for the worker logs, defaults to `'%H:%M:%S'`
|
|
* `--disable-job-desc-logging`: Turn off job description logging.
|
|
* `--max-jobs`: Maximum number of jobs to execute.
|
|
|
|
## Inside the worker
|
|
|
|
### The Worker Lifecycle
|
|
|
|
The life-cycle of a worker consists of a few phases:
|
|
|
|
1. _Boot_. Loading the Python environment.
|
|
2. _Birth registration_. The worker registers itself to the system so it knows
|
|
of this worker.
|
|
3. _Start listening_. A job is popped from any of the given Redis queues.
|
|
If all queues are empty and the worker is running in burst mode, quit now.
|
|
Else, wait until jobs arrive.
|
|
4. _Prepare job execution_. The worker tells the system that it will begin work
|
|
by setting its status to `busy` and registers job in the `StartedJobRegistry`.
|
|
5. _Fork a child process._
|
|
A child process (the "work horse") is forked off to do the actual work in
|
|
a fail-safe context.
|
|
6. _Process work_. This performs the actual job work in the work horse.
|
|
7. _Cleanup job execution_. The worker sets its status to `idle` and sets both
|
|
the job and its result to expire based on `result_ttl`. Job is also removed
|
|
from `StartedJobRegistry` and added to to `FinishedJobRegistry` in the case
|
|
of successful execution, or `FailedJobRegistry` in the case of failure.
|
|
8. _Loop_. Repeat from step 3.
|
|
|
|
|
|
### Performance Notes
|
|
|
|
Basically the `rq worker` shell script is a simple fetch-fork-execute loop.
|
|
When a lot of your jobs do lengthy setups, or they all depend on the same set
|
|
of modules, you pay this overhead each time you run a job (since you're doing
|
|
the import _after_ the moment of forking). This is clean, because RQ won't
|
|
ever leak memory this way, but also slow.
|
|
|
|
A pattern you can use to improve the throughput performance for these kind of
|
|
jobs can be to import the necessary modules _before_ the fork. There is no way
|
|
of telling RQ workers to perform this set up for you, but you can do it
|
|
yourself before starting the work loop.
|
|
|
|
To do this, provide your own worker script (instead of using `rq worker`).
|
|
A simple implementation example:
|
|
|
|
```python
|
|
#!/usr/bin/env python
|
|
import sys
|
|
from rq import Connection, Worker
|
|
|
|
# Preload libraries
|
|
import library_that_you_want_preloaded
|
|
|
|
# Provide queue names to listen to as arguments to this script,
|
|
# similar to rq worker
|
|
with Connection():
|
|
qs = sys.argv[1:] or ['default']
|
|
|
|
w = Worker(qs)
|
|
w.work()
|
|
```
|
|
|
|
|
|
### Worker Names
|
|
|
|
Workers are registered to the system under their names, which are generated
|
|
randomly during instantiation (see [monitoring][m]). To override this default,
|
|
specify the name when starting the worker, or use the `--name` cli option.
|
|
|
|
{% highlight python %}
|
|
from redis import Redis
|
|
from rq import Queue, Worker
|
|
|
|
redis = Redis()
|
|
queue = Queue('queue_name')
|
|
|
|
# Start a worker with a custom name
|
|
worker = Worker([queue], connection=redis, name='foo')
|
|
{% endhighlight %}
|
|
|
|
[m]: /docs/monitoring/
|
|
|
|
|
|
### Retrieving Worker Information
|
|
|
|
_Updated in version 0.10.0._
|
|
|
|
`Worker` instances store their runtime information in Redis. Here's how to
|
|
retrieve them:
|
|
|
|
```python
|
|
from redis import Redis
|
|
from rq import Queue, Worker
|
|
|
|
# Returns all workers registered in this connection
|
|
redis = Redis()
|
|
workers = Worker.all(connection=redis)
|
|
|
|
# Returns all workers in this queue (new in version 0.10.0)
|
|
queue = Queue('queue_name')
|
|
workers = Worker.all(queue=queue)
|
|
worker = workers[0]
|
|
print(worker.name)
|
|
```
|
|
|
|
Aside from `worker.name`, worker also have the following properties:
|
|
* `hostname` - the host where this worker is run
|
|
* `pid` - worker's process ID
|
|
* `queues` - queues on which this worker is listening for jobs
|
|
* `state` - possible states are `suspended`, `started`, `busy` and `idle`
|
|
* `current_job` - the job it's currently executing (if any)
|
|
* `last_heartbeat` - the last time this worker was seen
|
|
* `birth_date` - time of worker's instantiation
|
|
* `successful_job_count` - number of jobs finished successfully
|
|
* `failed_job_count` - number of failed jobs processed
|
|
* `total_working_time` - amount of time spent executing jobs, in seconds
|
|
|
|
_New in version 0.10.0._
|
|
|
|
If you only want to know the number of workers for monitoring purposes,
|
|
`Worker.count()` is much more performant.
|
|
|
|
```python
|
|
from redis import Redis
|
|
from rq import Worker
|
|
|
|
redis = Redis()
|
|
|
|
# Count the number of workers in this Redis connection
|
|
workers = Worker.count(connection=redis)
|
|
|
|
# Count the number of workers for a specific queue
|
|
queue = Queue('queue_name', connection=redis)
|
|
workers = Worker.all(queue=queue)
|
|
```
|
|
|
|
## Worker with Custom Serializer
|
|
|
|
When creating a worker, you can pass in a custom serializer that will be implicitly passed to the queue.
|
|
Serializers used should have at least `loads` and `dumps` method.
|
|
The default serializer used is `pickle`
|
|
|
|
```python
|
|
import json
|
|
from rq import Worker
|
|
|
|
job = Worker('foo', serializer=json)
|
|
```
|
|
|
|
or when creating from a queue
|
|
|
|
```python
|
|
import json
|
|
from rq import Queue, Worker
|
|
|
|
w = Worker(Queue('foo'), serializer=json)
|
|
```
|
|
|
|
Queues will now use custom serializer
|
|
|
|
|
|
### Worker Statistics
|
|
|
|
_New in version 0.9.0._
|
|
|
|
If you want to check the utilization of your queues, `Worker` instances
|
|
store a few useful information:
|
|
|
|
```python
|
|
from rq.worker import Worker
|
|
worker = Worker.find_by_key('rq:worker:name')
|
|
|
|
worker.successful_job_count # Number of jobs finished successfully
|
|
worker.failed_job_count # Number of failed jobs processed by this worker
|
|
worker.total_working_time # Amount of time spent executing jobs (in seconds)
|
|
```
|
|
|
|
## Better worker process title
|
|
Worker process will have a better title (as displayed by system tools such as ps and top)
|
|
after you installed a third-party package `setproctitle`:
|
|
```sh
|
|
pip install setproctitle
|
|
```
|
|
|
|
## Taking Down Workers
|
|
|
|
If, at any time, the worker receives `SIGINT` (via Ctrl+C) or `SIGTERM` (via
|
|
`kill`), the worker wait until the currently running task is finished, stop
|
|
the work loop and gracefully register its own death.
|
|
|
|
If, during this takedown phase, `SIGINT` or `SIGTERM` is received again, the
|
|
worker will forcefully terminate the child process (sending it `SIGKILL`), but
|
|
will still try to register its own death.
|
|
|
|
|
|
## Using a Config File
|
|
|
|
If you'd like to configure `rq worker` via a configuration file instead of
|
|
through command line arguments, you can do this by creating a Python file like
|
|
`settings.py`:
|
|
|
|
```python
|
|
REDIS_URL = 'redis://localhost:6379/1'
|
|
|
|
# You can also specify the Redis DB to use
|
|
# REDIS_HOST = 'redis.example.com'
|
|
# REDIS_PORT = 6380
|
|
# REDIS_DB = 3
|
|
# REDIS_PASSWORD = 'very secret'
|
|
|
|
# Queues to listen on
|
|
QUEUES = ['high', 'default', 'low']
|
|
|
|
# If you're using Sentry to collect your runtime exceptions, you can use this
|
|
# to configure RQ for it in a single step
|
|
# The 'sync+' prefix is required for raven: https://github.com/nvie/rq/issues/350#issuecomment-43592410
|
|
SENTRY_DSN = 'sync+http://public:secret@example.com/1'
|
|
|
|
# If you want custom worker name
|
|
# NAME = 'worker-1024'
|
|
```
|
|
|
|
The example above shows all the options that are currently supported.
|
|
|
|
_Note: The_ `QUEUES` _and_ `REDIS_PASSWORD` _settings are new since 0.3.3._
|
|
|
|
To specify which module to read settings from, use the `-c` option:
|
|
|
|
```console
|
|
$ rq worker -c settings
|
|
```
|
|
|
|
|
|
## Custom Worker Classes
|
|
|
|
There are times when you want to customize the worker's behavior. Some of the
|
|
more common requests so far are:
|
|
|
|
1. Managing database connectivity prior to running a job.
|
|
2. Using a job execution model that does not require `os.fork`.
|
|
3. The ability to use different concurrency models such as
|
|
`multiprocessing` or `gevent`.
|
|
|
|
You can use the `-w` option to specify a different worker class to use:
|
|
|
|
```console
|
|
$ rq worker -w 'path.to.GeventWorker'
|
|
```
|
|
|
|
|
|
## Custom Job and Queue Classes
|
|
|
|
You can tell the worker to use a custom class for jobs and queues using
|
|
`--job-class` and/or `--queue-class`.
|
|
|
|
```console
|
|
$ rq worker --job-class 'custom.JobClass' --queue-class 'custom.QueueClass'
|
|
```
|
|
|
|
Don't forget to use those same classes when enqueueing the jobs.
|
|
|
|
For example:
|
|
|
|
```python
|
|
from rq import Queue
|
|
from rq.job import Job
|
|
|
|
class CustomJob(Job):
|
|
pass
|
|
|
|
class CustomQueue(Queue):
|
|
job_class = CustomJob
|
|
|
|
queue = CustomQueue('default', connection=redis_conn)
|
|
queue.enqueue(some_func)
|
|
```
|
|
|
|
|
|
## Custom DeathPenalty Classes
|
|
|
|
When a Job times-out, the worker will try to kill it using the supplied
|
|
`death_penalty_class` (default: `UnixSignalDeathPenalty`). This can be overridden
|
|
if you wish to attempt to kill jobs in an application specific or 'cleaner' manner.
|
|
|
|
DeathPenalty classes are constructed with the following arguments
|
|
`BaseDeathPenalty(timeout, JobTimeoutException, job_id=job.id)`
|
|
|
|
|
|
## Custom Exception Handlers
|
|
|
|
If you need to handle errors differently for different types of jobs, or simply want to customize
|
|
RQ's default error handling behavior, run `rq worker` using the `--exception-handler` option:
|
|
|
|
```console
|
|
$ rq worker --exception-handler 'path.to.my.ErrorHandler'
|
|
|
|
# Multiple exception handlers is also supported
|
|
$ rq worker --exception-handler 'path.to.my.ErrorHandler' --exception-handler 'another.ErrorHandler'
|
|
```
|
|
|
|
If you want to disable RQ's default exception handler, use the `--disable-default-exception-handler` option:
|
|
|
|
```console
|
|
$ rq worker --exception-handler 'path.to.my.ErrorHandler' --disable-default-exception-handler
|
|
```
|