{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "from nbdev import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# fastlinkcheck\n", "\n", "> Check for broken external and internal links. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`fastlinkcheck` checks for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and internal links are checked. Internal links are checked by verifying local files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pip install fastlinkcheck`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "> link_check(**`path`**:\"Root directory searched recursively for HTML files\", **`host`**:\"Host and path (without protocol) of web server\"=*`''`*, **`config_file`**:\"Location of file with urls to ignore\"=*`None`*, **`actions_output`**:\"Toggle GitHub Actions output on/off\"=*`False`*, **`exit_on_found`**:\"(CLI Only) Exit with status code 1 if broken links are found\"=*`False`*, **`print_logs`**:\"Toggle printing logs to stdout.\"=*`False`*)\n", "\n", "Check for broken links recursively in `path`." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from fastlinkcheck import link_check\n", "show_doc(link_check)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [_example/](https://github.com/fastai/fastlinkcheck/tree/master/_example) directory in this repo contains sample HTML files which we can use for demonstration. \n", "\n", "The `path` parameter specifies the directory that will be searched recursively for HTML files that you wish to check.\n", "\n", "Specifying the `host` parameter allows you detect links that are internal by identifying links with that host name. External links are verified by making a request to the appropriate website. On the other hand, internal links are verified by inspecting the presence and content of local files. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n", "- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "from fastlinkcheck import link_check\n", "\n", "broken_links = link_check(path='_example', host='fastlinkcheck.com')\n", "print(broken_links)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print logs to stdout \n", "\n", "You can optionally print logs to stdout with the `print_logs` parameter. This can be useful for debugging:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "ERROR: The Following Broken Links or Paths were found:\n", "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n", "- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "broken_links = link_check(path='_example', host='fastlinkcheck.com', print_logs=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of broken links found 2\n" ] } ], "source": [ "print(f'Number of broken links found {len(broken_links)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ignore links with a configuration file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file `linkcheck.rc` contains a list of urls I want to ignore:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test.js\n", "https://www.google.com\n", "\n" ] } ], "source": [ "with open('_example/linkcheck.rc', 'r') as f: print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case `example/test.js` will be filtered out from the list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')\n", "print(broken_links)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CLI Function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "link_check can also be called from the command line. We can see various options by passing the `--help` flag. These options correspond to the same parameters as calling the `link_check` function described above.\n", "\n", "> link_check --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE]\n", " [--actions_output] [--exit_on_found] [--print_logs] [--pdb]\n", " [--xtra XTRA]\n", " path\n", "\n", "Check for broken links recursively in `path`.\n", "\n", "positional arguments:\n", " path Root directory searched recursively for HTML files\n", "\n", "optional arguments:\n", " -h, --help show this help message and exit\n", " --host HOST Host and path (without protocol) of web server\n", " (default: )\n", " --config_file CONFIG_FILE\n", " Location of file with urls to ignore\n", " --actions_output Toggle GitHub Actions output on/off (default: False)\n", " --exit_on_found Exit with status code 1 if broken links are\n", " found (default: False)\n", " --print_logs Toggle printing logs to stdout. (default: False)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }