{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#default_exp page"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pagination\n",
    "\n",
    "> Parallel and serial pagination"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "from fastcore.utils import *\n",
    "from fastcore.foundation import *\n",
    "from ghapi.core import *\n",
    "\n",
    "import re\n",
    "from urllib.parse import parse_qs,urlsplit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Paged operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some GitHub API operations return their results one page at a time. For instance, there are many thousands of [gists](https://docs.github.com/en/free-pro-team@latest/github/writing-on-github/creating-gists), but if we call `list_public` we only see the first 30:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "api = GhApi()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gists = api.gists.list_public()\n",
    "len(gists)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's because this operation takes two optional parameters, `per_page`, and `page`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "[gists.list_public](https://docs.github.com/v3/gists/#list-public-gists)(since, per_page, page): *List public gists*"
      ],
      "text/plain": [
       "[gists.list_public](https://docs.github.com/v3/gists/#list-public-gists)(since, per_page, page): *List public gists*"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api.gists.list_public"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a common pattern for `list_*` operations in the GitHub API. One way to get more results is to increase `per_page`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(api.gists.list_public(per_page=100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, `per_page` has a maximum of `100`, so if you want more, you'll have to pass `page=` to get pages beyond the first. An easy way to iterate through all pages is to use `paged`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def paged(oper, *args, per_page=30, max_pages=9999, **kwargs):\n",
    "    \"Convert operation `oper(*args,**kwargs)` into an iterator\"\n",
    "    yield from itertools.takewhile(noop, (oper(*args, per_page=per_page, page=i, **kwargs) for i in range(1,max_pages+1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll demonstrate this using the `repos.list_for_org` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "[repos.list_for_org](https://docs.github.com/v3/repos/#list-organization-repositories)(org, type, sort, direction, per_page, page): *List organization repositories*"
      ],
      "text/plain": [
       "[repos.list_for_org](https://docs.github.com/v3/repos/#list-organization-repositories)(org, type, sort, direction, per_page, page): *List organization repositories*"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api.repos.list_for_org"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(30, 'fast-image')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "repos = api.repos.list_for_org('fastai')\n",
    "len(repos),repos[0].name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To convert this operation into a Python iterator, pass the operation itself, along with any arguments (either keyword or positional) to `paged`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "repos = paged(api.repos.list_for_org, 'fastai')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can now iterate through `repos` using Python, e.g:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "30 fast-image\n",
      "30 fastforest\n",
      "30 .github\n",
      "3 tweetrel\n"
     ]
    }
   ],
   "source": [
    "for page in repos: print(len(page), page[0].name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Link header (RFC 5988)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GitHub tells us how many pages are available using the [link header](https://tools.ietf.org/html/rfc5988). Unfortunately the pypi [LinkHeader](https://pypi.org/project/LinkHeader/) library appears to no longer be maintained, so we've put a refactored version of it here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "class _Scanner:\n",
    "    def __init__(self, buf): self.buf,self.match = buf,None\n",
    "    def __getitem__(self, key): return self.match.group(key)\n",
    "    def scan(self, pattern):\n",
    "        self.match = re.compile(pattern).match(self.buf)\n",
    "        if self.match: self.buf = self.buf[self.match.end():]\n",
    "        return self.match\n",
    "\n",
    "_QUOTED        = r'\"((?:[^\"\\\\]|\\\\.)*)\"'\n",
    "_TOKEN         = r'([^()<>@,;:\\\"\\[\\]?={}\\s]+)'\n",
    "_RE_COMMA_HREF = r' *,? *< *([^>]*) *> *'\n",
    "_RE_ATTR       = rf'{_TOKEN} *(?:= *({_TOKEN}|{_QUOTED}))? *'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def _parse_link_hdr(header):\n",
    "    \"Parse an RFC 5988 link header, returning a `list` of `tuple`s of URL and attr `dict`\"\n",
    "    scanner,links = _Scanner(header),[]\n",
    "    while scanner.scan(_RE_COMMA_HREF):\n",
    "        href,attrs = scanner[1],[]\n",
    "        while scanner.scan('; *'):\n",
    "            if scanner.scan(_RE_ATTR):\n",
    "                attr_name, token, quoted = scanner[1], scanner[3], scanner[4]\n",
    "                if quoted is not None: attrs.append([attr_name, quoted.replace(r'\\\"', '\"')])\n",
    "                elif token is not None: attrs.append([attr_name, token])\n",
    "                else: attrs.append([attr_name, None])\n",
    "        links.append((href,dict(attrs)))\n",
    "    if scanner.buf: raise Exception(f\"parse() failed at {scanner.buf!r}\")\n",
    "    return links"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def parse_link_hdr(header):\n",
    "    \"Parse an RFC 5988 link header, returning a `dict` from rels to a `tuple` of URL and attrs `dict`\"\n",
    "    return {a.pop('rel'):(u,a) for u,a in _parse_link_hdr(header)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example of a link header with just one link:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'foo bar': ('http://example.com', {'type': 'text/html'})}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parse_link_hdr('<http://example.com>; rel=\"foo bar\"; type=text/html')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "links = parse_link_hdr('<http://example.com>; rel=\"foo bar\"; type=text/html')\n",
    "link = links['foo bar']\n",
    "test_eq(link[0], 'http://example.com')\n",
    "test_eq(link[1]['type'], 'text/html')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's test it on the headers we received on our last call to GitHub. You can access the last call's headers in `recv_hdrs':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'<https://api.github.com/organizations/20547620/repos?per_page=30&page=4>; rel=\"prev\", <https://api.github.com/organizations/20547620/repos?per_page=30&page=4>; rel=\"last\", <https://api.github.com/organizations/20547620/repos?per_page=30&page=1>; rel=\"first\"'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api.recv_hdrs['Link']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's what happens when we parse that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'prev': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=4',\n",
       "  {}),\n",
       " 'last': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=4',\n",
       "  {}),\n",
       " 'first': ('https://api.github.com/organizations/20547620/repos?per_page=30&page=1',\n",
       "  {})}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parse_link_hdr(api.recv_hdrs['Link'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting pages in parallel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rather than requesting each page one at a time, we can save some time by getting all the pages we need in parallel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "@patch\n",
    "def last_page(self:GhApi):\n",
    "    \"Parse RFC 5988 link header from most recent operation, and extract the last page\"\n",
    "    header = self.recv_hdrs.get('Link', '')\n",
    "    last = nested_idx(parse_link_hdr(header), 'last', 0) or ''\n",
    "    qs = parse_qs(urlsplit(last).query)\n",
    "    return int(nested_idx(qs,'page',0) or 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To help us know the number of pages needed, we can use `last_page`, which uses the link header we just looked at to grab the last page from GitHub.\n",
    "\n",
    "We will need multiple pages to get all the repos in the `github` organization, even if we get 100 at a time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api.repos.list_for_org('github', per_page=100)\n",
    "api.last_page()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def _call_page(i, oper, args, kwargs, per_page):\n",
    "    return oper(*args, per_page=per_page, page=i, **kwargs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def pages(oper, n_pages, *args, n_workers=None, per_page=100, **kwargs):\n",
    "    \"Get `n_pages` pages from `oper(*args,**kwargs)`\"\n",
    "    return parallel(_call_page, range(1,n_pages+1), oper=oper, per_page=per_page, args=args, kwargs=kwargs,\n",
    "                    progress=False, n_workers=ifnone(n_workers,n_pages), threadpool=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pages` by default passes `per_page=100` to the operation.\n",
    "\n",
    "Let's look at some examples. To get all the pages for the repos in the `github` organization in parallel, we can use this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "367"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gh_repos = pages(api.repos.list_for_org, api.last_page(), 'github').concat()\n",
    "len(gh_repos)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you already know ahead of time the number of pages required, there's no need to call `last_page`. For instance, the GitHub docs specify that we can get at most 3000 gists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3000"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gists = pages(api.gists.list_public, 30).concat()\n",
    "len(gists)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GitHub ignores the `per_page` parameter for some API calls, such as listing public events, which it limits to 8 pages of 30 items per page. To retrieve all pages in these cases, you need to explicitly pass the lower per page limit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "api.activity.list_public_events()\n",
    "api.last_page()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "232"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evts = pages(api.activity.list_public_events, api.last_page(), per_page=30).concat()\n",
    "len(evts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Converted 00_core.ipynb.\n",
      "Converted 01_actions.ipynb.\n",
      "Converted 02_auth.ipynb.\n",
      "Converted 03_page.ipynb.\n",
      "Converted 04_event.ipynb.\n",
      "Converted 10_cli.ipynb.\n",
      "Converted 50_fullapi.ipynb.\n",
      "Converted 80_tutorial_actions.ipynb.\n",
      "Converted 90_build_lib.ipynb.\n",
      "Converted Untitled.ipynb.\n",
      "Converted ghapi demo.ipynb.\n",
      "Converted index.ipynb.\n"
     ]
    }
   ],
   "source": [
    "#hide\n",
    "from nbdev.export import notebook2script\n",
    "notebook2script()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}