{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial 2: Getting Dataframes (DataView)\n",
"\n",
"The WikiWho class also provides the `dv` attribute (`DataView` class), which returns the data in pandas dataFrames."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Create an instance of WikiWho"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from wikiwho_wrapper import WikiWho\n",
"ww = WikiWho(lng='en')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Retrieving pandas DataFrames\n",
"\n",
"The DataView has exactly the same functions (**and parameters**) as the WikiWhoAPI and they are mapped one to one:\n",
"\n",
"- `all_content()`\n",
"- `rev_ids_of_article()`\n",
"- `last_rev_content()`\n",
"- `specific_rev_content_by_rev_id()`\n",
"- `range_rev_content_by_article_title()`\n",
"\n",
"This is how you use them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 All Content"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by article title\n",
"df = ww.dv.all_content(\n",
" article='Bioglass 45S5') \n",
"\n",
"# or, by article id\n",
"# ww.dv.all_content(2161298)\n",
"\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Revision IDS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by article title\n",
"df = ww.dv.rev_ids_of_article(\n",
" article='Bioglass 45S5') \n",
"\n",
"# or, by article id\n",
"# ww.dv.rev_ids_of_article(2161298)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Revision Content"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by article title\n",
"df = ww.dv.last_rev_content(\n",
" article='Bioglass 45S5') \n",
"\n",
"# or, by article id\n",
"# ww.dv.last_rev_content(2161298)\n",
"\n",
"df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by revision id directly\n",
"df = ww.dv.specific_rev_content_by_rev_id(\n",
" rev_id=189370281) \n",
"\n",
"# or by also specifying the article_title (it is not documented in Wikipedia that revisions ids are unique\n",
"# and don't depend on the article)\n",
"# df = ww.dv.specific_rev_content_by_rev_id(article='bioglass', rev_id=189370281)\n",
"\n",
"df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# providing a range (two revision ids)\n",
"df = ww.dv.range_rev_content_by_article_title(\n",
" article='Bioglass 45S5', start_rev_id=18064039, end_rev_id=207995408)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Editor Content"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by page id\n",
"df = ww.dv.edit_persistence(page_id=2161298)\n",
"df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by editor id\n",
"df = ww.dv.edit_persistence(editor_id=101935)\n",
"df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# by page id and editor id\n",
"df = ww.dv.edit_persistence(page_id=15745, editor_id=101935)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Summary\n",
"\n",
"You just learn how to use the functions of the APIWrapper. The next tutorial will show simple examples how to extract some information of the `all_content()` function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utils.notebooks import get_next_notebook\n",
"from IPython.display import HTML\n",
"try:\n",
" display(HTML(f'Go to next workbook'))\n",
"except:\n",
" HTML('Go to next workbook')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}