{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n",
"---\n",
"Start with [convert](https://nbviewer.jupyter.org/github/annotation/banks/blob/master/programs/convert.ipynb)\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting data from online repos\n",
"\n",
"We show the various automatic ways by which you can get data that is out there on GitHub to your computer.\n",
"\n",
"The work horse is the function `checkoutRepo()` in `tf.applib.repo`.\n",
"\n",
"Text-Fabric uses this function for all operations where data flows from GitHub to your computer.\n",
"\n",
"There are quite some options, and here we explain all the `checkout` options, i.e. the selection of\n",
"data from the history.\n",
"\n",
"See also the [documentation](https://annotation.github.io/text-fabric/tf/advanced/repo.html)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Leading example\n",
"\n",
"We use markdown display from IPython purely for presentation.\n",
"It is not needed to run `checkoutRepo()`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from tf.advanced.helpers import dm\n",
"from tf.advanced.repo import checkoutRepo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We work with our tiny example TF app: `banks`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"ORG = \"annotation\"\n",
"REPO = \"banks\"\n",
"MAIN = \"tf\"\n",
"MOD = \"sim/tf\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`MAIN`points to the main data, `MOD` points to a module of data: the similarity feature."
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"## Presenting the results\n",
"\n",
"The function `do()` just formats the results of a `checkoutRepo()` run.\n",
"\n",
"The result of such a run, after the progress messages, is a tuple.\n",
"For the explanation of the tuple, read the [docs](https://annotation.github.io/text-fabric/tf/advanced/repo.html)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"def do(task):\n",
" md = f\"\"\"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`{task[0]}` | `{task[1]}` | `{task[2]}` | `{task[3]}` | `{task[4]}`\n",
"\"\"\"\n",
" dm(md)"
]
},
{
"cell_type": "markdown",
"metadata": {
"incorrectly_encoded_metadata": "toc-hr-collapsed=false"
},
"source": [
"## All the checkout options\n",
"\n",
"We discuss the meaning and effects of the values you can pass to the `checkout` option."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `clone`\n",
"\n",
"> Look whether the appropriate folder exists under your `~/github` directory.\n",
"\n",
"This is merely a check whether your data exists in the expected location.\n",
"\n",
"* No online checks take place.\n",
"* No data is moved or copied.\n",
"\n",
"**NB**: you cannot select releases and commits in your *local* GitHub clone.\n",
"The data will be used as it is found on your file system.\n",
"\n",
"**When to use**\n",
"\n",
"> If you are developing new feature data.\n",
"\n",
"When you develop your data in a repository, your development is private as long as you\n",
"do not push to GitHub.\n",
"\n",
"You can test your data, even without locally committing your data.\n",
"\n",
"But, if you are ready to share your data, everything is in place, and you only\n",
"have to commit and push, and pass the location on GitHub to others, like\n",
"\n",
"```\n",
"myorg/myrepo/subfolder\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/github/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `clone` | `~/github` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"clone\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We show what happens if you do not have a local GitHub clone in `~/github`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/github/annotation/banks/tf ~/github/annotation/banks/tfxxx"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The requested data is not available offline\n"
]
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `False` | `False` | `None`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"clone\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that no attempt is made to retrieve online data."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/github/annotation/banks/tfxxx ~/github/annotation/banks/tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `local`\n",
"\n",
"> Look whether the appropriate folder exists under your `~/text-fabric-data` directory.\n",
"\n",
"This is merely a check whether your data exists in the expected location.\n",
"\n",
"* No online checks take place.\n",
"* No data is moved or copied.\n",
"\n",
"**When to use**\n",
"\n",
"> If you are using data created and shared by others, and if the data\n",
"is already on your system.\n",
"\n",
"You can be sure that no updates are downloaded, and that everything works the same as the last time\n",
"you ran your program.\n",
"\n",
"If you do not already have the data, you have to pass `latest` or `hot` or `''` which will be discussed below."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `local` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"local\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You see this data because earlier I have downloaded release `v2.0`, which is a tag for\n",
"the commit with hash `9713e71c18fd296cf1860d6411312f9127710ba7`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you do not have any corresponding data in your `~/text-fabric-data`, you get this:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/text-fabric-data/annotation/banks/tf ~/text-fabric-data/annotation/banks/tfxxx"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The requested data is not available offline\n"
]
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `False` | `False` | `None`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"local\"))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/text-fabric-data/annotation/banks/tfxxx ~/text-fabric-data/annotation/banks/tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `''` (default)\n",
"\n",
"This is about when you omit the `checkout` parameter, or pass `''` to it.\n",
"\n",
"The destination for local data is your `~/text-fabric-data` folder.\n",
"\n",
"If you have already a local copy of the data, that will be used.\n",
"\n",
"If not:\n",
"\n",
"> Note that if your local data is outdated, no new data will be downloaded.\n",
"You need `latest` or `hot` for that.\n",
"\n",
"But what is the latest online copy? In this case we mean:\n",
"\n",
"* the latest *release*, and from that release an appropriate attached zip file\n",
"* but if there is no such zip file, we take the files from the corresponding commit\n",
"* but if there is no release at all, we take the files from the *latest commit*.\n",
"\n",
"**When to use**\n",
"\n",
"> If you need data created/shared by other people and you want to be sure that you always have the\n",
"same copy that you initially downloaded.\n",
"\n",
"* If the data provider makes releases after important modifications, you will get those.\n",
"* If the data provider is experimenting after the latest release, and commits them to GitHub,\n",
" you do not get those.\n",
"\n",
"However, with `hot`, you `can` get the latest commit, to be discussed below."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `local` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that no data has been downloaded, because it has detected that there is already local data on your computer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you do not have any checkout of this data on your computer, the data will be downloaded."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"rm -rf ~/text-fabric-data/annotation/banks/tf"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The requested data is not available offline\n",
"rate limit is 5000 requests per hour, with 4994 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\tdownloading https://github.com/annotation/banks/releases/download/v2.0/tf-0.2.zip ... \n",
"\tunzipping ... \n",
"\tsaving data\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Note about versions and releases\n",
"\n",
"The **version** of the data is not necessarily the same concept as the **release** of it.\n",
"\n",
"It is possible to keep the versions and the releases strictly parallel,\n",
"but in text conversion workflows it can be handy to make a distinction between them,\n",
"e.g. as follows:\n",
"\n",
"> the version is a property of the input data\n",
"> the release is a property of the output data\n",
"\n",
"When you create data from sources using conversion algorithms,\n",
"you want to increase the version if you get new input data, e.g. as a result of corrections\n",
"made by the author.\n",
"\n",
"But if you modify your conversion algorithm, while still running it on the same input data,\n",
"you may release the new output data as a **new release** of the **same version**.\n",
"\n",
"Likewise, when the input data stays the same, but you have corrected typos in the metadata,\n",
"you can make a **new release** of the **same version** of the data.\n",
"\n",
"The conversion delivers the features under a specific version,\n",
"and Text-Fabric supports those versions: users of TF can select the version they work with.\n",
"\n",
"Releases are made in the version control system (git and GitHub).\n",
"The part of Text-Fabric that auto-downloads data is aware of releases.\n",
"But once the data has been downloaded in place, there is no machinery in Text-Fabric to handle\n",
"different releases.\n",
"\n",
"Yet the release tag and commit hash are passed on to the point where it comes to recording\n",
"the provenance of the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download a different version\n",
"\n",
"We download version `0.1` of the data."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The requested data is not available offline\n",
"rate limit is 5000 requests per hour, with 4985 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\ttf/0.1/author.tf...downloaded\n",
"\ttf/0.1/gap.tf...downloaded\n",
"\ttf/0.1/letters.tf...downloaded\n",
"\ttf/0.1/number.tf...downloaded\n",
"\ttf/0.1/oslots.tf...downloaded\n",
"\ttf/0.1/otext.tf...downloaded\n",
"\ttf/0.1/otype.tf...downloaded\n",
"\ttf/0.1/punc.tf...downloaded\n",
"\ttf/0.1/terminator.tf...downloaded\n",
"\ttf/0.1/title.tf...downloaded\n",
"\tOK\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.1\", checkout=\"\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Several observations:\n",
"\n",
"* we obtained the older version from the *latest* release, which is still release `v2.0`\n",
"* the download looks different from when we downloaded version `0.2`;\n",
" this is because the data producer has zipped the `0.2` data and has attached it to release `v2.0`,\n",
" but he forgot, or deliberately refused, to attach version `0.1` to that release;\n",
" so it has been retrieved directly from the files in the corresponding commit, which is\n",
" `9713e71c18fd296cf1860d6411312f9127710ba7`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the verification, an online check is needed. The verification consists of checking the release tag and/or commit hash.\n",
"\n",
"If there is no online connection, you get this:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"networksetup -setairportpower en0 off"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"no internet\n",
"The offline data may not be the latest\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.1\", checkout=\"latest\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"or if you do not have local data:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/text-fabric-data/annotation/banks/tf/0.1 ~/text-fabric-data/annotation/banks/tf/0.1xxx"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"no internet\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"The requested data is not available offline\n"
]
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `False` | `False` | `None`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.1\", checkout=\"latest\"))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mv ~/text-fabric-data/annotation/banks/tf/0.1xxx ~/text-fabric-data/annotation/banks/tf/0.1"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"networksetup -setairportpower en0 on"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `latest`\n",
"\n",
"> The latest online release will be identified,\n",
"and if you do not have that copy locally, it will be downloaded.\n",
"\n",
"**When to use**\n",
"\n",
"> If you need data created/shared by other people and you want to be sure that you always have the\n",
"latest *stable* version of that data, unreleased data is not good enough.\n",
"\n",
"One of the difference with `checkout=''` is that if there are no releases, you will not get data."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4963 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"latest\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is no sim/tf data in any release commit, so if we look it up, it should fail."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4960 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"No directory sim/tf/0.2 in #9713e71c18fd296cf1860d6411312f9127710ba7\tFailed"
]
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `False` | `False` | `None`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MOD, version=\"0.2\", checkout=\"latest\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But with `checkout=''` it will only be found if you do not have local data already:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/sim/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`8d87675fd02ee96ad6f4c3a5ce99e0bda8277a54` | `None` | `local` | `~/text-fabric-data` | `annotation/banks/sim/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MOD, version=\"0.2\", checkout=\"\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In that case there is only one way: `hot`:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4950 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/sim/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`8d87675fd02ee96ad6f4c3a5ce99e0bda8277a54` | `None` | `None` | `~/text-fabric-data` | `annotation/banks/sim/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MOD, version=\"0.2\", checkout=\"hot\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `hot`\n",
"\n",
"> The latest online commit will be identified,\n",
"and if you do not have that copy locally, it will be downloaded.\n",
"\n",
"**When to use**\n",
"\n",
"> If you need data created/shared by other people and you want to be sure that you always have the\n",
"latest version of that data, whether released or not.\n",
"\n",
"The difference with `checkout=''` is that if there are releases,\n",
"you will now get data that may be newer than the latest release."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4947 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\ttf/0.2/author.tf...downloaded\n",
"\ttf/0.2/gap.tf...downloaded\n",
"\ttf/0.2/letters.tf...downloaded\n",
"\ttf/0.2/number.tf...downloaded\n",
"\ttf/0.2/oslots.tf...downloaded\n",
"\ttf/0.2/otext.tf...downloaded\n",
"\ttf/0.2/otype.tf...downloaded\n",
"\ttf/0.2/punc.tf...downloaded\n",
"\ttf/0.2/terminator.tf...downloaded\n",
"\ttf/0.2/title.tf...downloaded\n",
"\tOK\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`8d87675fd02ee96ad6f4c3a5ce99e0bda8277a54` | `None` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"hot\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Observe that data has been downloaded, and that we have now data corresponding to a different commit hash,\n",
"and not corresponding to a release.\n",
"\n",
"If we now ask for the latest *stable* data, the data will be downloaded anew."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4931 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\tdownloading https://github.com/annotation/banks/releases/download/v2.0/tf-0.2.zip ... \n",
"\tunzipping ... \n",
"\tsaving data\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"latest\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `v1.0` a specific release\n",
"\n",
"> Look for a specific online release to get data from.\n",
"\n",
"**When to use**\n",
"\n",
"> When you want to replicate something, and need data from an earlier point in the history."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4924 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\tdownloading https://github.com/annotation/banks/releases/download/v1.0/tf-0.1.zip ... \n",
"\tunzipping ... \n",
"\tsaving data\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`5b7dca212dd456e705f4c2cb1aa0f895ab5b2fc9` | `v1.0` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.1\", checkout=\"v1.0\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We might try to get version `0.2` from this release."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4917 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"No directory tf/0.2 in #5b7dca212dd456e705f4c2cb1aa0f895ab5b2fc9\tFailed"
]
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `False` | `False` | `None`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version=\"0.2\", checkout=\"v1.0\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At that early point in the history there is not yet a version `0.2` of the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `a81746c` a specific commit\n",
"\n",
"> Look for a specific online commit to get data from.\n",
"\n",
"**When to use**\n",
"\n",
"> When you want to replicate something, and need data from an earlier point in the history, and there is no\n",
"release for that commit."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 4907 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\ttf/0.1/author.tf...downloaded\n",
"\ttf/0.1/gap.tf...downloaded\n",
"\ttf/0.1/letters.tf...downloaded\n",
"\ttf/0.1/number.tf...downloaded\n",
"\ttf/0.1/oslots.tf...downloaded\n",
"\ttf/0.1/otext.tf...downloaded\n",
"\ttf/0.1/otype.tf...downloaded\n",
"\ttf/0.1/punc.tf...downloaded\n",
"\ttf/0.1/terminator.tf...downloaded\n",
"\ttf/0.1/title.tf...downloaded\n",
"\tOK\n"
]
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/annotation/banks/tf/0.1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`a81746c5f9627637db4dae04c2d5348bda9e511a` | `None` | `None` | `~/text-fabric-data` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(\n",
" checkoutRepo(\n",
" org=ORG,\n",
" repo=REPO,\n",
" folder=MAIN,\n",
" version=\"0.1\",\n",
" checkout=\"a81746c5f9627637db4dae04c2d5348bda9e511a\",\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"incorrectly_encoded_metadata": "toc-hr-collapsed=false"
},
"source": [
"## `source` and `dest`: an alternative for `~/github` and `~/text-fabric-data`\n",
"\n",
"Everything so far uses the hard-wired `~/github` and `~/text-fabric-data` directories.\n",
"But you can change that:\n",
"\n",
"* pass `source` as a replacement for `~/github`.\n",
"* pass `dest` as a replacement for `~/text-fabric-data`.\n",
"\n",
"**When to use**\n",
"\n",
"> if you do not want to interfere with the `~/text-fabric-data` directory.\n",
"\n",
"Text-Fabric manages the `~/text-fabric-data` directory,\n",
"and if you are experimenting outside Text-Fabric\n",
"you may not want to touch its data directory.\n",
"\n",
"> if you want to clone data into your `~/github` directory.\n",
"\n",
"Normally, TF uses your `~/github` directory as a source of information,\n",
"and never writes into it.\n",
"But if you explicitly pass `dest=~/github`, things change: downloads will\n",
"arrive under `~/github`. Use this with care.\n",
"\n",
"> if you work with cloned data outside your `~/github` directory,\n",
"\n",
"you can let the system look in `source` instead of `~/github`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We customize source and destination directories:\n",
"\n",
"* we put them both under `~/Downloads`\n",
"* we give them different names"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"MY_GH = \"~/Downloads/repoclones\"\n",
"MY_TFD = \"~/Downloads/textbase\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download a fresh copy of the data to `~/Downloads/textbase` instead."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The requested data is not available offline\n",
"rate limit is 5000 requests per hour, with 4891 left for this hour\n",
"\tconnecting to online GitHub repo annotation/banks ... connected\n",
"\tdownloading https://github.com/annotation/banks/releases/download/v2.0/tf-0.2.zip ... \n",
"\tunzipping ... \n",
"\tsaving data\n"
]
},
{
"data": {
"text/html": [
"data: ~/Downloads/textbase/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `None` | `~/Downloads/textbase` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(\n",
" checkoutRepo(\n",
" org=ORG,\n",
" repo=REPO,\n",
" folder=MAIN,\n",
" version=\"0.2\",\n",
" checkout=\"\",\n",
" source=MY_GH,\n",
" dest=MY_TFD,\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lookup the same data locally."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/Downloads/textbase/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`9713e71c18fd296cf1860d6411312f9127710ba7` | `v2.0` | `local` | `~/Downloads/textbase` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(\n",
" checkoutRepo(\n",
" org=ORG,\n",
" repo=REPO,\n",
" folder=MAIN,\n",
" version=\"0.2\",\n",
" checkout=\"\",\n",
" source=MY_GH,\n",
" dest=MY_TFD,\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We copy the local GitHub data to the custom location:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"mkdir -p ~/Downloads/repoclones/annotation\n",
"cp -R ~/github/annotation/banks ~/Downloads/repoclones/annotation/banks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lookup the data in this alternative directory."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"data: ~/Downloads/repoclones/annotation/banks/tf/0.2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"commit | release | local | base | subdir\n",
"--- | --- | --- | --- | ---\n",
"`None` | `None` | `clone` | `~/Downloads/repoclones` | `annotation/banks/tf`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"do(\n",
" checkoutRepo(\n",
" org=ORG,\n",
" repo=REPO,\n",
" folder=MAIN,\n",
" version=\"0.2\",\n",
" checkout=\"clone\",\n",
" source=MY_GH,\n",
" dest=MY_TFD,\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the directory trees under the customised `source` and `dest` locations have exactly the same shape as before."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"With the help of `checkoutRepo()` you will be able to make local copies of online data in an organized way.\n",
"\n",
"This will help you when\n",
"\n",
"* you use other people's data\n",
"* develop your own data\n",
"* share and publish your data\n",
"* go back in history."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"All chapters:\n",
"\n",
"* [use](use.ipynb)\n",
"* [share](share.ipynb)\n",
"* [app](app.ipynb)\n",
"* *repo*\n",
"* [compose](compose.ipynb)\n",
"\n",
"---"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
},
"toc-autonumbering": true
},
"nbformat": 4,
"nbformat_minor": 4
}