# Wikidata Training: the Mediawiki API


## Table of Contents

1. [Documentation](#Documentation)
1. [Data model](#Data-model)
1. [Use the API directly](#Use-the-API-directly)
1. [Use pywikibot](#User-pywikibot)

## Documentation

Wikidata uses the Wikibase extension to Mediawiki (a wiki software), therefore you can find it's documentation there:

* [Wikibase API](https://www.mediawiki.org/wiki/Wikibase/API/en)
* [Mediawiki API tutorial](https://www.mediawiki.org/wiki/API:Tutorial)
* [Wikidata API sandbox](https://www.wikidata.org/wiki/Special:ApiSandbox)

All Wikibase-specific API calls start with **`wb`**, e.g.:

* Documentation of [wbentities](https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities)
* Documentation of [wbclaims](https://www.wikidata.org/w/api.php?action=help&modules=wbgetclaims)

Let's play around with the sandbox a bit to get a feeling for the API:

* [Douglas Adams Wikidata entry (Q42) _`wbentities`_](https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42)
* [Douglas Adams Wikidata entry (Q42) _`wbgetclaims` for P31_ ](https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P31)


## Data model

The [Wikibase data model](https://www.mediawiki.org/wiki/Wikibase/DataModel) needs some explaination.

### Items

> Items are Entities that are typically represented by a Wikipage (at least in some Wikipedia languages). They can be viewed as "the thing that a Wikipage is about," which could be an individual thing (the person Albert Einstein), a general class of things (the class of all Physicists), and any other concept that is the subject of some Wikipedia page (including things like History of Berlin).

**Examples:**

* [Q42 (Douglas Adams)](https://www.wikidata.org/wiki/Q42)
* [Q72 (Zürich)](https://www.wikidata.org/wiki/Q72)

### Properties

> Properties are Entities that describe a relationship between Items (or other Entities) and Values of the property. Typical properties are population (using numbers as values), binomial name (using strings as values), but also has father and author of (both using Items as values).

**Examples:**

* [P31 (instance of)](https://www.wikidata.org/wiki/Property:P31)
* [P6 (head of government)](https://www.wikidata.org/wiki/Property:P6)

### Snaks

> Snaks are the basic information structures used to describe Entities in Wikidata. They are an integral part of each Statement (which can be viewed as collection of Snaks about an Entity, together with a list of references).

**Examples:**

* (Berlin has a) population (property) of 3499879 (value)
* (Zürich has) head of government (property) of Corinne Mauch (value)

Note: Snaks do not mention the subject to which they refer (Berlin, Zürich), this is given by the context in which a Snak is used (typically as part of a Statement).

### Statements

> Statements describe the claim of a statement and list references for this claim. Every Statement refers to one particular Entity, called the subject of the Statement. There is always one main Snak that forms the most important part of the statement. Moreover, there can be zero or more additional PropertySnaks that describe the Statement in more detail. These qualifier Snaks (or "qualifiers" for short) store additional information that does not directly refer to the subject (e.g., the time at which the main part of the statement was valid). References are provided as a list (the order is significant in some contexts, especially for displaying a main reference).

**Examples:**

* Berlin (subject) has a population (property) of 3499879 (value)
* Zürich (subject) has head of government (property) of Corinne Mauch (value)
  * 2 _qualifier snaks_ are used:
    * start time 1. May 2009
    * end time: no value

In [32]:
import requests
import pandas as pd

SSL_VERIFY = True
# maybe set SSL_VERIFY to False if connection to https://www.wikidata.org doesn't work (e.g. because of a proxy)
# To disable the SSL verification, remove comment sign (#) from next line
SSL_VERIFY = False
if not SSL_VERIFY:
    import urllib3
    urllib3.disable_warnings()

## Use the API directly

In [33]:
get_douglas_adams = 'https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=json'
res = requests.get(get_douglas_adams, verify=SSL_VERIFY)
result = res.json()
result

{'entities': {'Q42': {'pageid': 138,
   'ns': 0,
   'title': 'Q42',
   'lastrevid': 1039091876,
   'modified': '2019-10-26T10:05:13Z',
   'type': 'item',
   'id': 'Q42',
   'labels': {'fr': {'language': 'fr', 'value': 'Douglas Adams'},
    'ru': {'language': 'ru', 'value': 'Дуглас Адамс'},
    'pl': {'language': 'pl', 'value': 'Douglas Adams'},
    'it': {'language': 'it', 'value': 'Douglas Adams'},
    'en-gb': {'language': 'en-gb', 'value': 'Douglas Adams'},
    'nb': {'language': 'nb', 'value': 'Douglas Adams'},
    'es': {'language': 'es', 'value': 'Douglas Adams'},
    'en-ca': {'language': 'en-ca', 'value': 'Douglas Adams'},
    'hr': {'language': 'hr', 'value': 'Douglas Adams'},
    'pt': {'language': 'pt', 'value': 'Douglas Adams'},
    'ko': {'language': 'ko', 'value': '더글러스 애덤스'},
    'nl': {'language': 'nl', 'value': 'Douglas Adams'},
    'el': {'language': 'el', 'value': 'Ντάγκλας Άνταμς'},
    'ar': {'language': 'ar', 'value': 'دوغلاس آدمز'},
    'arz': {'language': 'arz',

In [27]:
pd.DataFrame(result['entities']['Q42'])

Unnamed: 0,pageid,ns,title,lastrevid,modified,type,id,labels,descriptions,aliases,claims,sitelinks
P1005,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P1006,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P1015,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P103,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P106,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P108,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P109,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P119,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P1196,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",
P1207,138,0,Q42,1039091876,2019-10-26T10:05:13Z,item,Q42,,,,"[{'mainsnak': {'snaktype': 'value', 'property'...",


In [28]:
result['entities']['Q42']['claims']['P31'][0]

{'mainsnak': {'snaktype': 'value',
  'property': 'P31',
  'hash': 'ad7d38a03cdd40cdc373de0dc4e7b7fcbccb31d9',
  'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 5, 'id': 'Q5'},
   'type': 'wikibase-entityid'},
  'datatype': 'wikibase-item'},
 'type': 'statement',
 'id': 'Q42$F078E5B3-F9A8-480E-B7AC-D97778CBBEF9',
 'rank': 'normal',
 'references': [{'hash': '2b369d0a4f1d4b801e734fe84a0b217e13dd2930',
   'snaks': {'P248': [{'snaktype': 'value',
      'property': 'P248',
      'hash': '6b7d4330c4aac4caec4ede9de0311ce273f88ecd',
      'datavalue': {'value': {'entity-type': 'item',
        'numeric-id': 54919,
        'id': 'Q54919'},
       'type': 'wikibase-entityid'},
      'datatype': 'wikibase-item'}],
    'P214': [{'snaktype': 'value',
      'property': 'P214',
      'hash': '20e5c69fbf37b8b0402a52948a04f481028e819c',
      'datavalue': {'value': '113230702', 'type': 'string'},
      'datatype': 'external-id'}],
    'P813': [{'snaktype': 'value',
      'property': 'P813',


## Use pywikibot

[pywikibot](https://www.mediawiki.org/wiki/Manual:Pywikibot) is a python library based on the Mediawiki API.
In this notebook we will see how to use the API using Python with pywikibot and lay the groundwork to later develop a bot or tool for Wikidata.

Use pywikibot for Wikidata:

- https://www.mediawiki.org/wiki/Manual:Pywikibot/Wikidata
- https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts#Wikidata

If you want to setup pywikibot on your computer, check this tutorial: https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_up_Shop

Quick steps:

1. Create a new directory for your project
1. Clone `pywikibot` in this directory: `git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot`
1. Run `python generate_user_files.py` to create user-config.py
1. Run `python pwb.py login` to login with your account

In [7]:
import pywikibot
site = pywikibot.Site('en', 'wikipedia')
repo = site.data_repository()

In [8]:
page = pywikibot.Page(site, 'Douglas Adams')
page

Page('Douglas Adams')

In [9]:
item = pywikibot.ItemPage.fromPage(page)  # this can be used for any page object
item

ItemPage('Q42')

In [10]:
item = pywikibot.ItemPage(repo, 'Q42')  # This will be functionally the same as the other item we defined
item.get()  # you need to call it to access any data.

{'aliases': {'en': ['Douglas Noel Adams',
   'Douglas Noël Adams',
   'Douglas N. Adams'],
  'ru': ['Адамс, Дуглас'],
  'nb': ['Douglas Noël Adams', 'Douglas N. Adams'],
  'fr': ['Douglas Noel Adams', 'Douglas Noël Adams'],
  'de': ['Douglas Noel Adams', 'Douglas Noël Adams'],
  'pt-br': ['Douglas Noël Adams', 'Douglas Noel Adams'],
  'be-tarask': ['Дуглас Адамс'],
  'zh': ['亞當斯'],
  'es': ['Douglas Noel Adams', 'Douglas Noël Adams'],
  'it': ['Douglas Noel Adams', 'Douglas N. Adams'],
  'cs': ['Douglas Noël Adams', 'Douglas Noel Adams', 'Douglas N. Adams'],
  'hy': ['Ադամս, Դուգլաս'],
  'el': ['Ντάγκλας Νόελ Άνταμς'],
  'nl': ['Douglas Noel Adams', 'Douglas Noël Adams'],
  'pt': ['Douglas Noël Adams', 'Douglas Noel Adams'],
  'ja': ['ダグラス・アダムス'],
  'pa': ['ਡਗਲਸ ਨੋਏਲ ਐਡਮਜ਼'],
  'tl': ['Douglas Noël Adams', 'Douglas Noel Adams'],
  'eu': ['Douglas Noel Adams', 'Douglas Noël Adams'],
  'uk': ['Дуглас Ноел Адамс', 'Адамс Дуглас'],
  'hr': ['Douglas Noël Adams', 'Douglas N. Adams', 'Dougla

In [11]:
sitelinks = item.sitelinks
aliases = item.aliases
if 'en' in item.labels:
    print('The label in English is: ' + item.labels['en'])
if item.claims:
    if 'P31' in item.claims: # instance of
        print(item.claims['P31'][0].getTarget())
        print(item.claims['P31'][0].sources[0])  # let's just assume it has sources.

The label in English is: Douglas Adams
[[wikidata:Q5]]
OrderedDict([('P248', [Claim.fromJSON(DataSite("wikidata", "wikidata"), {'snaktype': 'value', 'property': 'P248', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 54919}, 'type': 'wikibase-entityid'}, 'hash': '2b369d0a4f1d4b801e734fe84a0b217e13dd2930'})]), ('P214', [Claim.fromJSON(DataSite("wikidata", "wikidata"), {'snaktype': 'value', 'property': 'P214', 'datatype': 'external-id', 'datavalue': {'value': '113230702', 'type': 'string'}, 'hash': '2b369d0a4f1d4b801e734fe84a0b217e13dd2930'})]), ('P813', [Claim.fromJSON(DataSite("wikidata", "wikidata"), {'snaktype': 'value', 'property': 'P813', 'datatype': 'time', 'datavalue': {'value': {'time': '+00000002013-12-07T00:00:00Z', 'precision': 11, 'after': 0, 'before': 0, 'timezone': 0, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}, 'type': 'time'}, 'hash': '2b369d0a4f1d4b801e734fe84a0b217e13dd2930'})])])
