# Changelog All notable changes to this project will be documented in this file. The format mostly follows [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). ## UNRELEASED ### Added - New `enabled` option for all jobs. Set to false to disable a job without needing to remove it or comment it out (Requested in #625 by snowman, contributed in #785 by jamstah) - New option `ignore_incomplete_reads` (Requested in #725 by wschoot, contributed in #787 by wfrisch) - New option `wait_for` in browser jobs (Requested in #763 by yuis-ice, contributed in #810 by jamstah) - Added tags to jobs and the ability to select them at the command line (#789 by jamstah) ### Changed - Remove EOL'd Python 3.7 (new minimum requirement is Python 3.8), add Python 3.12 testing - Adds optional `reply_to` option for email reporters (#794 by trevorshannon) - Replace the dead dependency `appdirs` with `platformdirs` (#811 by Maxime Werlen, #819 via e-dschungel) - New concurrency test (#806 by Jamstah) ### Fixed - `email` reporter: Allow multiple recipients for `sendmail` method (#797, by monperrus) - Fix documentation for watching Github tags and releases, again (#723) - Fix `--test-reporter` command-line option so `separate` configuration option is no longer ignored when sending test notifications (#772, by marunjar) - Fix line height and dark mode regression (#774 reported by kongomongo, PRs #777 and #778 by trevorshannon) - Fix compatibility with lxml >= 5 which caused the CSS Selector filter to fail (#783 reported by jamesquilty, PR #786 by jamstah) - Fix pep8 test to ignore files in the site-packages directory for cases where the venv is in the project directory (#788 by jamstah) - Fix HTML diff table rendering for long line lengths (#793 by trevorshannon) - Fix IndexError after failed edit (#801 by jwilk) - Fix concurrency issue in Python 3.12 by upgrading to minidb 2.0.8 (fixes #779) ## [2.28] -- 2023-05-03 ### Changed - Browser jobs: Migrate from Pyppeteer to Playwright (#761, by Paul Sattlegger, fixes #751) ## [2.27] -- 2023-05-03 ### Added - `css` and `xpath` filters now accept a `sort` subfilter to sort matched elements lexicographically ### Fixed - Rework handling of running from a source checkout, fixes issues with example files when `urlwatch` was run as `/usr/sbin/urlwatch`, e.g. on Void Linux (fixes #755) - Add support for docutils >= 0.18, which deprecated `frontend.OptionParser` (fixes #754) - Browser jobs: Fix support for Python 3.11 with `@asyncio.coroutine` removal (#759, by Faster IT) ## [2.26] -- 2023-04-11 ### Added - `browser` job: Add support for specifying `useragent` (#700, by Francesco Versaci) - Document how to ignore whitespace changes (PR#707, by Paulo Magalhaes) - `shell` reporter: Call a script or program when changes are detected (fixes #650) - New `separate` configuration option for reporters to split reports into one-per-job (contributed by Ryne Everett) - `--change-location` option allowing job location to be changed without losing job history (#739, by trevorshannon) ### Changed - Docs: Re-group diff-related topics and improve wording (PR#712, by neutric) - Improved HTML e-mail diff style, including Dark Mode support (#730, by trevorshannon) - Require Python >= 3.7, as Python 3.6 was EOL'd on 2021-12-23 - `Dockerfile`: Shrink image by switching to an Alpine-based Python 3.11 base image, this reduces the container size from 1 GiB to 151 MiB (#731, by Scott Edlund) - `--gc-cache` can now take a parameter to keep more than 1 historical snapshot (#732, by trevorshannon) ### Fixed - Limit e-mail header length to 78 characters to avoid issues with some SMTP servers (PR#703, fixes #702, by Julien Palard) - Fix a ResourceWarning for unclosed files when running unit tests (PR#698, by Louis Sautier) - Add support for html2text 2.1.1 and newer by feature-checking `-utf8` support via `-help` (fixes #718) - html2text options were only applied to the first job when using `job_defaults` (PR#726, fixes #588, by trevorshannon) - Update Github tags watch filter documentation with new XPath (fixes #723, by Luis Aranguren) - Fix `--gc-cache` to clear unknown keys when using Redis storage (fixes #743, by scottmac) ## [2.25] -- 2022-03-15 ### Added - Add a `colored` setting for the Discord reporter, enabled by default (PR#683 by Michał Ciołek) - Add a `splitlines` filter for trimming leading/trailing whitespace in each line (PR#693 by Lukas Anzinger) - If a shell job fails, the job's `stdout` and `stderr` are added to the error message (fixes #689) - `shell` job: Add optional `stderr` key to customize how output on `stderr` is treated - Add `--dump-history JOB` command-line option to print historic job outputs (fixes #681) - Add `display` / `empty-diff` configuration option to skip reports when diffs are empty due to `diff_filter` (fixes #692) - New man pages: `urlwatch-intro(7)`, `urlwatch-deprecated(7)`, `urlwatch-cookbook(7)`, `urlwatch-jobs(5)`, `urlwatch-filters(5)`, `urlwatch-config(5)` and `urlwatch-reporters(5)`. ### Changed - Require minidb 2.0.6; issue `VACUUM` only with `--gc-cache` (fixes #690) - For shell jobs, `stderr` output isn't sent to urlwatch's stdout anymore; add `stdout: urlwatch` to your shell job definition if you depend on the old default behavior ### Fixed - `pytest` command-line arguments are not wrongly interpreted by `CommandConfig` anymore (fixes #677) ### Packaging - Man pages in `share/man/` are generated from `docs/source/` using Sphinx. In order to not require Sphinx for normal installation, `update-manpages.sh` is used to generate and fix up man pages stored in `shared/`. These man pages are stored in Git and in the release tarballs, so installations from source do not need to have Sphinx available for the manpages to be available. - Packagers can customize the `manpages_url` setting in `docs/source/conf.py` to point to the distribution's web man pages for the generated HTML documentation (if Sphinx is used to generate HTML docs). ## [2.24] -- 2021-11-07 ### Added - The Telegram reporter has gained two new options: - `silent`: Receive message notification without sound - `monospace`: Format message in monospace style - Support for running a subset of jobs by specifying their index on the command line ### Changed - Migrated CI pipeline from Travis CI to Github Actions - `user_visible_url` can now be specified for all job types (#654, by kongomongo) - Added a `remove-duplicate-lines` filter. - Added a `csv2text` filter. - Set envelope from (`-f` option) when sending emails using `sendmail` - It is now possible to override the HTTP `method` when `data` is set on a URL job ### Fixed - Fix UnboundLocalError when SMTP auth is enabled, but keyring is not installed (#667) ## [2.23] -- 2021-04-10 ### Added - New filter: `pretty-xml` to indent/pretty-print XML documents - New filter: `jq` to parse, transform, and extract JSON data - New reporter: `prowl` (by nitz) ### Fixed - Proper multi-line highlighting for wdiff (PR#615, by kongomongo) - Fix command-line generation for html2text (PR#619, by Eloy Paris) ## [2.22] -- 2020-12-19 ### Added - Added 'wait_until' option to browser jobs to configure how long the headless browser will wait for pages to load. - Jobs now have an optional `treat_new_as_changed` (default `false`) key that can be set, and will treat newly-found pages as changed, and display a diff from the empty string (useful for `diff_tool` or `diff_filter` with side effects) - New reporters: `discord`, `mattermost` - New key `user_visible_url` for URL jobs that can be used to show a different URL in reports (useful if the watched URL is a REST API endpoint, but the report should link to the corresponding web page) - The Markdown reporter now supports limiting the report length via the `max_length` parameter of the `submit` method. The length limiting logic is smart in the sense that it will try trimming the details first, followed by omitting them completely, followed by omitting the summary. If a part of the report is omitted, a note about this is added to the report. (PR#572, by Denis Kasak) ### Changed - Diff output is now generated more uniformly, independent of whether the input data has a trailing newline or not; if this behavior is not intended, use an external `diff_tool` (PR#550, by Adam Goldsmith) - The `--test-diff-filter` output now properly reports timestamps from the history entry instead of the current date and time (Fixes #573) - Unique GUIDs for jobs are now enforced at load time, append "#1", "#2", ... to the URLs to make them unique if you have multiple different jobs that share the same request URL (Fixes #586) - When a config, urls file or hooks file does not exist and should be edited or inited, its parent folders will be created (previously only the urlwatch configuration folder was created; Fixes #594) - Auto-matched filters now always get `None` supplied as subfilter; any custom filters must accept a `subfilter` parameter after the existing `data` parameter - Drop support for Python 3.5 ## Fixed - Make imports thread-safe: This might increase startup times a bit, as dependencies are imported on bootup instead of when first used. Importing in Python is not (yet) thread-safe, so we cannot import new modules from the worker threads reliably (Fixes #559, #601) - The Matrix reporter was improved in several ways (PR#572, by Denis Kasak): - The maximum length of the report was increase from 4096 to 16384. - The report length limiting is now implemented via the new length limiting functionality of the Markdown reporter. Previously, the report was simply trimmed at the end which could break the diff blocks and make them render incorrectly. - The diff code blocks are now tagged as diffs which will allow the diffs to be syntax highlighted as such. This doesn't yet work in Element, pending on the resolution of trentm/python-markdown2#370. ## [2.21] -- 2020-07-31 ### Added - Added `--test-reporter REPORTER` command-line option to send an example report using any configured notification service - `JobBase` now has `main_thread_enter()` and `main_thread_exit()` functions that can be overridden by subclasses to run code in the main thread before and after processing of a job (based on an initial implementation by Chenfeng Bao) ### Removed - The `--test-slack` command line option has been removed, you can test your Slack reporter configuration using `--test-reporter slack` ### Changed - The `browser` job now uses Pyppeteer instead of Requests-HTML for rendering pages while executing JavaScript; this makes JavaScript execution work properly (based on code by Chenfeng Bao) ### Fixed - Applying legacy `hooks.py` filters (broken since 2.19; reported by Maxime Werlen) ## [2.20] -- 2020-07-29 ### Added - A job can now have a `diff_filter` set, which works the same way as the normal `filter` (and has the same filters available), but applies to the `diff` output instead of the page content (can be tested with `--test-diff-filter`, needs 2 or more historic snapshots in the cache) - Documentation now has a section on the configuration settings (`--edit-config`) - New filter: ``ocr`` to convert text in images to plaintext (using Tesseract OCR) - New reporters: - ``ifttt`` to send an event to If This Then That (ifttt.com) (#512, by Florian Gaultier) - ``xmpp`` to send a message using the XMPP (Jabber) protocol (#533, by Thorben Günther) ### Changed - The `urlwatch` script (Git only) now works when run from different paths - Chunking of strings (e.g. for Slack and Telegram) now adds numbering (e.g. ` (1/2)`) to the messages (only if a message is split into multiple parts) - Unit tests have been migrated from `nose` to `pytest` and moved from `test/` to `lib/urlwatch/tests/` - The ``css`` and ``xpath`` filters now accept ``skip`` and ``maxitems`` as subfilter - The ``shellpipe`` filter now inherits all environment variables (e.g. ``$PATH``) of the ``urlwatch`` process ### Fixed - The ``html2text`` method ``lynx`` now treats any subfilters with a non-``null`` value as command-line argument ``-key value`` (previously only the value ``true`` was treated like this, and any other values were silently dropped) ## [2.19] -- 2020-07-17 ### Added - Documentation is now available at [urlwatch.readthedocs.io](https://urlwatch.readthedocs.io) and shipped in the source tarball under `docs/`; filter examples in the docs are unit-tested - New filters: - `reverse`: Reverse input items (default: line-based) with optional `separator` - `pdf2text`: Convert PDF files to plaintext (must be first filter in chain) - `shellpipe`: Filter text with arbitrary command-line utilities / shell scripts - `FilterBase` API improvements for specifying subfilters: - Add `__supported_subfilters__` for sub filter checking and `--features` output - Add `__default_subfilter__` to map value-only parameters to dict parameters, for example the `grep` filter now has a default subfilter named `re` - Support for using Redis as a cache backend via `--cache=redis://localhost:6379/` ### Fixed - Declare updated Python 3.5 dependency in `setup.py` (already a requirement since urlwatch 2.18) ### Changed - Filter improvements: - `sort`: Add `reverse` option to reverse the sorting order - `sort`: Add `separator` option to specify item separator (default is still line-based) - `beautify`: The `jsbeautifier` (for `