* Added send_stop_job_command().
* send_stop_job_command now accepts just connection and job_id
* Document send_job_job_command
* Updated test coverage
* tests: updated github worklow for tests to use requirements.txt and dev-requirements.txt
* build: updated dev-requirements.txt
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* feat: added job heartbeat to track whether job is actually executing
heartbeat might be needed in cases when worker was hardkilled or the whole VM/docker was forcibly rebooted.
* fixed tests
* fixed test coverage issue
* chore: renamed job.heartbeat stuff according to review feedback
* chore: pipelined worker heartbeat and job heartbeat
* docs: documented job.heartbeat property
* fixes after review
* docs: updated last_heartbeat description
* chore: review
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* tests: added ability to run tests in Docker
useful to run full test suit on Mac
* tests: minor improvement in dockerfile for tests
* tests: typo in Dockerfile
* tests: updated dev requirements
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* scheduler: now operates with chunks of jobs
* scheduler: set default chunk_size for ScheduledJobRegistry.get_jobs_to_schedule
* scheduler: fixed missing indent
* scheduler: added test for get_jobs_to_schedule() with chunk_size parameter
* scheduler: fixed test for passing python 3.5 (no f-strings)
* scheduler: fixed chunk_size in test make it lighter to run
* feat: avoided "zombie" processes after killing work horse by setting work horse process group and killing this group
* fixed tests
* tests: added test to check that all workhorse subprocesses are killed
* tests: updated guthub run tests dependencies since they are not using (dev-)requirements.txt
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* handled unhandled exceptions in horse to prevent a job from being silently dropped without going into FailedRegistry
* changes after review
* made sure that work_horse always terminates in a proper way with tests
* minor refactoring
* fix for failing test
* fixes for the other tests
- removed exception handling (done in monitor_work_horse)
- adjusted some tests for the checks that are not relevant anymore
* review suggested changes
* cleanup
Co-authored-by: Ruslan Mullakhmetov <ruslan@twentythree.net>
* Initial implementation of Retry class
* Fixes job.refresh() under Python 3.5
* Remove the use of text_type in job.py
* Retry can be scheduled
* monitor_work_horse() should call handle_job_failure() with queue argument.
* Flake8 fixes
* Added docs for job retries
In our systems, this bug seemed the be the cause of the disappearing workers: worker keys would get a very small TTL in Redis and would eventually expire, thus mysteriously "disappearing" from dashboards.
The variable contains the server version and allows to determine
available features. This is relevant for API changes like HSET mappings
in version 4.0.0 or LPOS in version 6.0.6.
To keep the number of connection.info() calls low, the information is
*cached* once determined, as a server version unlikely changes while
keeping the connection up.
Signed-off-by: Paul Spooren <mail@aparcar.org>