{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8f9f61bf",
   "metadata": {},
   "source": [
    "set [SDC](https://commons.wikimedia.org/wiki/Commons:Structured_data#:~:text=Structured%20data%20on%20Commons%20is,from%20Wikidata%2C%20Wikimedia's%20knowledge%20base.) depicts from files uploaded in [spa2Commons](https://commons.wikimedia.org/wiki/Category:Uploaded_with_spa2Commons)\n",
    "\n",
    "### Theory depicts\n",
    "* all pictures from [SPA](https://portrattarkiv.se/about) depicts one person\n",
    "* one category in WIkicommons is connected to the person that the picture depicts\n",
    "\n",
    "#### How to \n",
    "##### Using Minefield and a pile \n",
    "see below _1. get WD object the Picture depict_\n",
    "* a csv file is created from a pile using\n",
    "  * petscan creates the pile ex. [set output pagepile](https://petscan.wmflabs.org/?psid=20485153)--> [pagepile 39223](https://pagepile.toolforge.org/api.php?action=get_data&id=39223)\n",
    "     * example [petscan files modified after 20211103](https://petscan.wmflabs.org/?psid=20577669)\n",
    "  * [minefield](https://hay.toolforge.org/minefield/) creates the csv file  \n",
    "    * see [gist](https://gist.github.com/salgo60/b5d05fae5c865b678edb338b09e4b302)\n",
    "* [video](https://www.youtube.com/watch?v=FUoG0veIeMY&feature=youtu.be)     \n",
    "    \n",
    "### Theory SPA identifier\n",
    "* a picture uploaded to Wikicommons using [SPA2common javascript](https://github.com/salgo60/spa2Commons) will have the link to the SPA picture in the template Information and param **Source** ex. [File:Axel_Sammuli_SPA3.jpg](https://commons.wikimedia.org/wiki/File:Axel_Sammuli_SPA3.jpg) -->  SPA id = [SPA idYB0QHyfj0hAAAAAAAAAf8g](https://portrattarkiv.se/details/YB0QHyfj0hAAAAAAAAAf8g)\n",
    "#### How to\n",
    "* read all pictures in category  [spa2Commons](https://commons.wikimedia.org/wiki/Category:Uploaded_with_spa2Commons) and check param **Source**\n",
    "  * this is done in a [PAWS notebook](https://hub.paws.wmcloud.org/user/Salgo60/notebooks/Traverse%20category%20to%20find%20SPA%20id%20.ipynb)\n",
    "### Misc\n",
    "* [SPARQL mwapi](https://en.wikibooks.org/wiki/SPARQL/SERVICE_-_mwapi)\n",
    "* [API:Categories](https://www.mediawiki.org/wiki/API:Categories)\n",
    "* Test SPARQL\n",
    "   * [get files in Categories](https://wcqs-beta.wmflabs.org/#%23Wikidata%20items%20of%20files%20in%20Category%3AArtworks%20with%20structured%20data%20with%20redirected%20P6243%20property%0ASELECT%20%3Ffile%20%3Ftitle%20%20%3Fspa%20%0AWITH%0A%7B%0A%20%20SELECT%20%3Ffile%20%3Ftitle%0A%20%20WHERE%0A%20%20%7B%0A%20%20%20%20SERVICE%20wikibase%3Amwapi%0A%20%20%20%20%7B%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22Generator%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22commons.wikimedia.org%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Agcmtitle%20%22Category%3AUploaded_with_spa2Commons%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Agenerator%20%22categorymembers%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Agcmtype%20%22file%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Agcmlimit%20%22max%22%20.%0A%20%20%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle%20.%0A%20%20%20%20%20%20%3Fpageid%20wikibase%3AapiOutput%20%22%40pageid%22%20.%0A%20%20%20%20%7D%0A%20%20%20%20BIND%20%28URI%28CONCAT%28%27https%3A%2F%2Fcommons.wikimedia.org%2Fentity%2FM%27%2C%20%3Fpageid%29%29%20AS%20%3Ffile%29%0A%20%20%7D%0A%7D%20AS%20%25get_files%0AWHERE%0A%7B%0A%20%20INCLUDE%20%25get_files%0A%20%20OPTIONAL%20%7B%3Ffile%20wdt%3AP4819%20%3Fspa%7D%0A%7D)\n",
    "* [PAWS Example Notebooks](https://wikitech.wikimedia.org/wiki/PAWS/PAWS_examples_and_recipes)   \n",
    "* [hub.toolforge.org](https://hub.toolforge.org)\n",
    "* [writeSDoCfromExcel](https://github.com/KBNLwikimedia/SDoC/blob/main/writeSDoCfromExcel/WriteSDoCfromExcel_nopasswd.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "605aa315",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Last run:  2021-11-26 02:50:01.339986\n"
     ]
    }
   ],
   "source": [
    "from datetime import datetime\n",
    "start_time  = datetime.now()\n",
    "print(\"Last run: \", start_time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "97ddd12e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "S = requests.Session()\n",
    "\n",
    "URL = \"https://commons.wikipedia.org/w/api.php\"\n",
    "\n",
    "def get_Category(pageName):\n",
    "    PARAMS = {\n",
    "        \"action\": \"query\",\n",
    "        \"format\": \"json\",\n",
    "        \"prop\": \"categories\",\n",
    "        \"titles\": pageName   \n",
    "    }\n",
    "    #print (PARAMS)\n",
    "    r = S.get(url=URL, params=PARAMS)\n",
    "    data = r.json()\n",
    "    # TODO dont get hidden categories\n",
    "    filtercat = {\n",
    "        \"Category:CC-BY-4.0\",\n",
    "        \"Category:Swedish Portrait Archive\",\n",
    "        \"Category:Uploaded with spa2Commons\",\n",
    "        \"Category:Template Unknown (author)\",\n",
    "        \"Category:Images with extracted images\",\n",
    "        \"Category:Extracted images\",\n",
    "        \"Category:Scanned with HP Deskjet F4200\",\n",
    "        \"Category:Pages using duplicate arguments in template calls\",\n",
    "        \"Category:Creative Commons Attribution-Share Alike missing SDC copyright status\",\n",
    "        \"Category:Creative Commons Attribution-Share Alike 4.0 missing SDC copyright license\",\n",
    "        \"Category:Creative Commons Attribution missing SDC copyright status\",\n",
    "        \"Category:Creative Commons Attribution 4.0 missing SDC copyright license\",\n",
    "        \"Category:Media requiring renaming - rationale 3\",\n",
    "        \"Media requiring renaming - target already exists\"\n",
    "        }\n",
    "    target_category = \"\"\n",
    "    PAGES = data[\"query\"][\"pages\"]\n",
    "    for k, v in PAGES.items():\n",
    "    #    print(k,v)\n",
    "        for cat in v['categories']:\n",
    "            if cat[\"title\"] not in filtercat:\n",
    "                target_category = cat[\"title\"]\n",
    "                #print(\"\\tTarget cat\" ,target_category)\n",
    "    return target_category"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f84fc0da",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Category:Alfred Gustaf Ahlqvist'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "common_name = \"File:A_G_Ahlqvist_SPA10.jpg\"\n",
    "get_Category(common_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7c369175",
   "metadata": {},
   "outputs": [],
   "source": [
    "def getWD(commonsCategory):\n",
    "    urlHub = \"https://hub.toolforge.org/commons:\" + commonsCategory + \"?format=json&site=wd\"\n",
    "    #print(urlHub)\n",
    "    hub = S.get(url=urlHub)\n",
    "    data = hub.json()\n",
    "    try:\n",
    "        wd = data[\"destination\"][\"url\"].replace(\"https://www.wikidata.org/wiki/\",\"\")    \n",
    "    except:\n",
    "        print(\"Error\", data)\n",
    "        wd =\"\"\n",
    "    return wd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d5e316b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "529fc13b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Q4830349'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_category =\"Category:Axel_Rappe_(1838%E2%80%931918)\"\n",
    "getWD(test_category)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ea03dfe",
   "metadata": {},
   "source": [
    "### 1. get WD object the Picture depict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0bf6ba25",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Bernhard Lundgren 1843', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Bernhard Lundgren 1843', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Axel Wästfelt 1881', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Axel Wästfelt 1881', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Karl Salin 1890', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Karl Salin 1890', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Bernhard Lundgren 1843', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Sven Trägårdh 1814', 'lang': 'en'}}\n",
      "Error {'message': 'Not Found', 'context': {'text': 'commons:Category:Sven Trägårdh 1814', 'lang': 'en'}}\n"
     ]
    }
   ],
   "source": [
    "# used tool xxxx to get csv files with pictures \n",
    "import csv\n",
    "\n",
    "mid_wd_list = []\n",
    "cat2commonsfiles = \"Cat_2commons.csv\"\n",
    "cat2commonsfiles = \"Cat_2commons_20211101.csv\"\n",
    "cat2commonsfiles = \"Cat_2commons_20211103.csv\"\n",
    "cat2commonsfiles = \"Cat_2commons_20211106.csv\"\n",
    "cat2commonsfiles = \"Cat_2commons_20211114.csv\"\n",
    "cat2commonsfiles = \"Cat_2commons_20211126.csv\"\n",
    "\n",
    "\n",
    "with open(cat2commonsfiles) as csvfile:\n",
    "    cat_reader = csv.DictReader(csvfile, delimiter=',', quotechar='\"')\n",
    "#    cat_reader = csv.DictReader(csvfile, delimiter=';', quotechar='\"')\n",
    "#    cat_reader = csv.DictReader(csvfile, delimiter=',', quotechar='\"')\n",
    "    for row in cat_reader:\n",
    "#        print(row)\n",
    "#        print(row[\"mid\"],get_Category(row[\"title\"]),row[\"url\"]) \n",
    "#        print(row[\"mid\"],getWD(get_Category(row[\"title\"]))) \n",
    "        mid_wd_list.append([row[\"mid\"],getWD(get_Category(row[\"title\"]))])\n",
    "        #print(mid_wd_list)\n",
    "        \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7eb15827",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>M110465275</td>\n",
       "      <td>Q2824703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M111371972</td>\n",
       "      <td>Q2482314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>M112375203</td>\n",
       "      <td>Q18238225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>M112375216</td>\n",
       "      <td>Q18238225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>M112392665</td>\n",
       "      <td>Q4943098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>M112408085</td>\n",
       "      <td>Q29047398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>M112427640</td>\n",
       "      <td>Q48709014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>M112458663</td>\n",
       "      <td>Q6171709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>M112489160</td>\n",
       "      <td>Q5771716</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>M112490909</td>\n",
       "      <td>Q109545314</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0           1\n",
       "0  M110465275    Q2824703\n",
       "1  M111371972    Q2482314\n",
       "2  M112375203   Q18238225\n",
       "3  M112375216   Q18238225\n",
       "4  M112392665    Q4943098\n",
       "5  M112408085   Q29047398\n",
       "6  M112427640   Q48709014\n",
       "7  M112458663    Q6171709\n",
       "8  M112489160    Q5771716\n",
       "9  M112490909  Q109545314"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.DataFrame(mid_wd_list)\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a936951b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1717, 2)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.to_csv(\"SPACategories_Mid_WD.txt\")\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "dc1c20ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ended:  2021-11-26 03:03:52.587248\n",
      "Time elapsed (hh:mm:ss.ms) 0:13:51.248001\n"
     ]
    }
   ],
   "source": [
    "end = datetime.now()\n",
    "print(\"Ended: \", end) \n",
    "print('Time elapsed (hh:mm:ss.ms) {}'.format(datetime.now() - start_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f5c0049",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}