{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "from nbdev import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# fastlinkcheck\n", "\n", "> Check for broken external and internal links. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and external links are checked, with intelligent behavior for internal links." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pip install fastlinkcheck`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "> link_check(**`path`**:\"Root directory searched recursively for HTML files\", **`host`**:\"Host and path (without protocol) of web server\"=*`''`*, **`config_file`**:\"Location of file with urls to ignore\"=*`None`*, **`actions_output`**:\"Toggle GitHub Actions output on/off\"=*`False`*, **`exit_on_found`**:\"(CLI Only) Exit with status code 1 if broken links are found\"=*`False`*, **`print_logs`**:\"Toggle printing logs to stdout.\"=*`False`*)\n", "\n", "Check for broken links recursively in `path`." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from fastlinkcheck import link_check\n", "show_doc(link_check)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [_example/](https://github.com/fastai/fastlinkcheck/tree/master/_example) directory in this repo contains sample HTML files which we can use for demonstration:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n", "- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "from fastlinkcheck import link_check\n", "\n", "broken_links = link_check(path='_example', host='fastlinkcheck.com')\n", "print(broken_links)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print logs to stdout \n", "\n", "You can optionally print logs to stdout with the `print_logs` parameter. This can be useful for debugging:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "ERROR: The Following Broken Links or Paths were found:\n", "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n", "- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "broken_links = link_check(path='_example', host='fastlinkcheck.com', print_logs=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of broken links found 2\n" ] } ], "source": [ "print(f'Number of broken links found {len(broken_links)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ignore links with a configuration file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file `linkcheck.rc` contains a list of urls I want to ignore:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test.js\n", "https://www.google.com\n", "\n" ] } ], "source": [ "with open('_example/linkcheck.rc', 'r') as f: print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case `example/test.js` will be filtered out from the list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "- 'http://somecdn.com/doesntexist.html' was found in the following pages:\n", " - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`\n" ] } ], "source": [ "broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')\n", "print(broken_links)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CLI Function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "link_check can also be called use from the command line like this:\n", "\n", "The `-h` or `--help` flag will allow you to see the command line docs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE]\r\n", " [--actions_output] [--exit_on_found] [--print_logs] [--pdb]\r\n", " [--xtra XTRA]\r\n", " path\r\n", "\r\n", "Check for broken links recursively in `path`.\r\n", "\r\n", "positional arguments:\r\n", " path Root directory searched recursively for HTML files\r\n", "\r\n", "optional arguments:\r\n", " -h, --help show this help message and exit\r\n", " --host HOST Host and path (without protocol) of web server\r\n", " (default: )\r\n", " --config_file CONFIG_FILE\r\n", " Location of file with urls to ignore\r\n", " --actions_output Toggle GitHub Actions output on/off (default: False)\r\n", " --exit_on_found (CLI Only) Exit with status code 1 if broken links are\r\n", " found (default: False)\r\n", " --print_logs Toggle printing logs to stdout. (default: False)\r\n", " --pdb Run in pdb debugger (default: False)\r\n", " --xtra XTRA Parse for additional args (default: '')\r\n" ] } ], "source": [ "!link_check -h" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }