* Enhanced Redis Connection Reliability
The Redis connection may fail for several reasons. As the connection can be
(1) explicitly passed to the worker or (2) implicity set, this will improve the
Connection configuration by setting a timeout to the socket, and adding
an ExponentialBackoff Retry logic.
* Simpler Connection logic
* Add simple retry logic to Redis Connection Error
* Make retry exponential, add keepalive & socket_connect_timeout
* Handles configuration on Redis' connection pool
* Simplifies timeout exception logic
* Fix burst bug, add test
* Add docs related to `socket_timeout`, improve compatibility with older RedisPy versions
* Fixes
* New timeout private method
* Fix timeout
* Remove unused code from compat module
* Remove unused dictconfig
* Remove total_ordering compat layer
* Remove compatibility layer
This completely removes the compat module. It moves utilities
functions (`as_text` and `decode_redis_hash`) to the `utils`
module, is eliminates the use of the proxies `text_type` and
`string_types`, using the `str` construct directly.
* Remove compat module
Finishes the cleaning of the compatibility module.
The last function being the `is_python_version` function
which was being used internally.
* Fix old import
* Fix Imports
* Remove Dummy (Force GH Actions)
* Fix Imports
* Organize Imports
* Persist worker_name after job is finished
Persisting the worker_name on the job object in Redis would allow for debugging and
analyzing logs from the worker
* Remove redundent job.save() method call
* Remove check for null worker
Now that worker name is persisted after job finishes or fails,
no need to assert that worker name is None
* Change github runner to Ubuntu 20.04
* Change github runner to Ubuntu 20.04
* WIP job results
* Result can now be saved
* Successfully saved and restored result
* result.save() should accept pipeline
* Successful results are saved
* Failures are now saved properly too.
* Added test for Result.get_latest()
* Checkpoint
* Got Result.all() to work
* Added Result.count(), Result.delete()
* Backward compatibility for job.result and job.exc_info
* Added some typing
* More typing stuff
* Fixed typing in job.py
* More typing updates
* Only keep the last 10 results
* Documented job.results()
* Got results test to pass
* Don't run test_results.py on Redis server < 5.0
* Fixed mock import on some Python versions
* Remove Redis 3 from test matrix
* Jobs should never use the new Result implementation if server is < 5.0
* Results should only be created is Redis stream is supported.
* Added back Redis 3 to test matrix
* Fixed job.supports_redis_streams
* Fixed worker test
* Updated docs.
* Move common flake8 options into config file
Currently --max-line-length being specified in two places. Just use the
existing value in the config file as the source of truth.
Move --count and --statistics to config file as well.
* Fix some lints
* added Dependency class with allow_failures
* Requested changes
* Check type before setting `job.dependency_allow_fail` within `Job.create`
* Set `job.dependency_allow_fail` within `Job.create`
* Added test to ensure persistence of `dependency_allow_fail`
* Removed typing and allow mixed list of ints and Job objects
* Convert dependency_allow_fail boolean to integer during serialization to avoid redis DataError
* Updated `test_multiple_dependencies_are_accepted_and_persisted` test to include `Dependency` cases
* Adding placeholder test to test actual behavior of new `Dependency` usage in `depends_on`
* Updated `test_job_dependency` to include cases using `Dependency`
* Added dependency_allow_fail logic to `Job.restore`
* Renamed `dependency_allow_fail` to a simpler `allow_failure`
* Update docs to add section about the new `Dependency` class and use-case
* Updated `Job.dependencies_are_met` logic to take `FAILED` and `STOPPED` jobs into account when `allow_failure=True`
* Updated `test_job_dependency` test. Still failing with `Dependency` case.
* Fix `allow_failure` type coercion in `Job.restore`
* Re-arrange tests, so default `Dependency.allow_failure` is before explicit `allow_failure=True`
* Fixed Dependency, so it works correctly when allow_failure=True
* Attempt to execute pipeline prior to queueing a failed job's dependents. test_create_and_cancel_job_enqueue_dependents_in_registry test now passes.
* Added `Depedency` test utilizing multiple dependencies
* Removed irrelevant on_success and on_failure keyword arguments in example
* Replaced use of long_running_job
* Add test to verify `Dependency.jobs` contraints
* Suppress connection error in handle_job_failure
* test_dependencies have passed
* All tests pass if enqueue_dependents called without pipeline.watch()
* All tests now pass
* Removed print statements
* Cleanup Dependency implementation
* Renamed job.allow_failure to job.allow_dependency_failures
Co-authored-by: mattchan <mattchan@tencent.com>
Co-authored-by: Mike Hill <mhilluniversal@gmail.com>
* rq.worker: remove useless set_state call in horse
The state should already have been set properly by the worker in
`execute_job`
`prepare_job_execution` is only called by `perform_job` which should only be
called by `main_work_horse`/`fork_work_horse` (themselves only called by `execute_job`).
Let `execute_job` do the bookkeeping.
* worker: update SimpleWorker's state in execute_job
* Fixes a bug that causes leftover job keys when result_ttl=0
* Fixed a buggy worker.maintain_heartbeats() behavior
* Fixed a bug in worker.maintain_heartbeats().
* adds unit test for a deserialization error
This tests that deserialization exceptions are properly logged, and fails in
the manner described in #1422 .
* Catch deserializing errors in Worker.handle_exception()
This fixes#1422 , and makes
tests/test_worker.py::TestWorker::test_deserializing_failure_is_handled
pass.
* made unit test less specific
This is required to get the test to pass under other serializers / other
python versions.
* Added generic DeserializationError
* switched ValueError to DeserializationError in a test
The changed test is creating an invalid job, which now raises
DeserializationError when data is accessed, as opposed to ValueError.
* cleanup jobs that are not really running due to zombie workers
* remove registry entries for zombie jobs
* return only the job ids on cleanup
* test zombie job cleanup
* format code
* rename variable to explain that second element in tuple is expiry, not score
* remove worker_key
* detect zombie jobs using old heartbeats
* reuse get_expired_job_ids
* set score using current_timestamp
* test idle jobs using stale heartbeats
* extract timeout into variable
* move heartbeats into StartedJobRegistry
* use registry.heartbeat in tests
* remove heartbeats when job removed from StartedJobRegistry
* remove idle and expired jobs from both wip and heartbeats set
* send heartbeat_ttl to registry.add
* typo
* revert everything 😶
* only keep job heartbeats as score (and get rid of job timeouts as scores
* calculate heartbeat_ttl in an overrideable function + override it in SimpleWorker + move storing StartedJobRegistry scores to job.heartbeat()
* set heartbeat to monitoring interval for infinite timeouts
* track elapsed_execution_time as part of worker
* reset current job working time when work on a job is done
* persisting the job working time as part of monitoring
* implemented round-robin and random access to queues
* added tests for RoundRobinQueue
* reverted change in gitignore
* removed linebreak
* added tests for random queues
* added documentation for round robin and random queues
* moved round robin strategy to worker
* reverted changes to queue.py
* reverted changes to workers.md
* reverted changes to test_queue
* added tests for RoundRobinWorker and RandomWorker
* added doc for round robin and random workers
* removed f-strings for backward compatibility
* corrected a mistake
* minor changes (code style)
* now using _ordered_queues instead of queues for reordering queues
* Ensure that the custom serializer defined is passed into the job fetch calls
* add serializer as argument to fetch_many and dequeue_any methods
* add worker test for custom serializer
* move json serializer to serializers.py
* feat: avoided "zombie" processes after killing work horse by setting work horse process group and killing this group
* fixed tests
* tests: added test to check that all workhorse subprocesses are killed
* tests: updated guthub run tests dependencies since they are not using (dev-)requirements.txt
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* handled unhandled exceptions in horse to prevent a job from being silently dropped without going into FailedRegistry
* changes after review
* made sure that work_horse always terminates in a proper way with tests
* minor refactoring
* fix for failing test
* fixes for the other tests
- removed exception handling (done in monitor_work_horse)
- adjusted some tests for the checks that are not relevant anymore
* review suggested changes
* cleanup
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* Initial implementation of Retry class
* Fixes job.refresh() under Python 3.5
* Remove the use of text_type in job.py
* Retry can be scheduled
* monitor_work_horse() should call handle_job_failure() with queue argument.
* Flake8 fixes
* Added docs for job retries
* Add a hard kill from the parent process with a 10% increased timeout in case the forked process gets stuck and cannot stop itself.
* Added test for the force kill of the parent process.
* Changed 10% to +1 second, and other misc changes based on review comments.
* First RQScheduler prototype
* WIP job scheduling
* Fixed Python 2.7 tests
* Added ScheduledJobRegistry.get_scheduled_time(job)
* WIP on scheduler's threading mechanism
* Fixed test errors
* Changed scheduler.acquire_locks() to instance method
* Added scheduler.prepare_registries()
* Somewhat working implementation of RQ scheduler
* Only call stop_scheduler if there's a scheduler present
* Use OSError rather than ProcessLookupError for PyPy compatibility
* Added `auto_start` argument to scheduler.acquire_locks()
* Make RQScheduler play better with timezone
* Fixed test error
* Added --with-scheduler flag to rq worker CLI
* Fix tests on Python 2.x
* More Python 2 fixes
* Only call `scheduler.start` if worker is run in non burst mode
* Fixed an issue where running worker with scheduler would fail sometimes
* Make `worker.stop_scheduler()` more resilient to errors
* worker.dequeue_job_and_maintain_ttl() should also periodically run maintenance tasks
* Scheduler can now work with worker in both burst and non burst mode
* Fixed scheduler logging message
* Always log scheduler errors when running
* Improve scheduler error logging message
* Removed testing code
* Scheduler should periodically try to acquire locks for other queues it doesn't have
* Added tests for scheduler.should_reacquire_locks
* Added queue.enqueue_in()
* Fixes queue.enqueue_in() in Python 2.7
* First stab at documenting job scheduling
* Remove unused methods
* Remove Python 2.6 logging compatibility code
* Remove more unused imports
* Added convenience methods to access job registries from queue
* Added test for worker.run_maintenance_tasks()
* Simplify worker.queue_names() and worker.queue_keys()
* Updated changelog to mention RQ's new job scheduling mechanism.
* Added FailedJobRegistry.
* Added job.failure_ttl.
* queue.enqueue() now supports failure_ttl
* Added registry.get_queue().
* FailedJobRegistry.add() now assigns DEFAULT_FAILURE_TTL.
* StartedJobRegistry.cleanup() now moves expired jobs to FailedJobRegistry.
* Failed jobs are now added to FailedJobRegistry.
* Added FailedJobRegistry.requeue()
* Document the new `FailedJobRegistry` and changes in custom exception handler behavior.
* Added worker.disable_default_exception_handler.
* Document --disable-default-exception-handler option.
* Deleted worker.failed_queue.
* Deleted "move_to_failed_queue" exception handler.
* StartedJobRegistry should no longer move jobs to FailedQueue.
* Deleted requeue_job
* Fixed test error.
* Make requeue cli command work with FailedJobRegistry
* Added .pytest_cache to gitignore.
* Custom exception handlers are no longer run in reverse
* Restored requeue_job function
* Removed get_failed_queue
* Deleted FailedQueue
* Updated changelog.
* Document `failure_ttl`
* Updated docs.
* Remove job.status
* Fixed typo in test_registry.py
* Replaced _pipeline() with pipeline()
* FailedJobRegistry no longer fails on redis-py>=3
* Fixes test_clean_registries
* Worker names are now randomized
* Added a note about random worker names in CHANGES.md
* Worker will now stop working when encountering an unhandled exception.
* Worker should reraise SystemExit on cold shutdowns
* Added anchor.js to docs
* Support for Sentry-SDK (#1045)
* Updated RQ to support sentry-sdk
* Document Sentry integration
* Install sentry-sdk before running tests
* Improved rq info CLI command to be more efficient when displaying lar… (#1046)
* Improved rq info CLI command to be more efficient when displaying large number of workers
* Fixed an rq info --by-queue bug
* Fixed worker.total_working_time bug (#1047)
* queue.enqueue() no longer accepts `timeout` argument (#1055)
* Clean worker registry (#1056)
* queue.enqueue() no longer accepts `timeout` argument
* Added clean_worker_registry()
* Show worker hostname and PID on cli (#1058)
* Show worker hostname and PID on cli
* Improve test coverage
* Remove Redis version check when SSL is used
* Bump version to 1.0
* Removed pytest_cache/README.md
* Changed worker logging to use exc_info=True
* Removed unused queue.dequeue()
* Fixed typo in CHANGES.md
* setup_loghandlers() should always call logger.setLevel() if specified
* modify zadd calls for redis-py 3.0
redis-py 3.0 changes the zadd interface that accepts a single
mapping argument that is expected to be a dict.
https://github.com/andymccurdy/redis-py#mset-msetnx-and-zadd
* change FailedQueue.push_job_id to always push a str
redis-py 3.0 does not attempt to cast values to str and is left
to the user.
* remove Redis connection patching
Since in redis-py 3.0, Redis == StrictRedis class, we no longer
need to patch _zadd and other methods.
Ref: https://github.com/rq/rq/pull/1016#issuecomment-441010847