{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SWEPUB - ORCID\n",
    "version 0.8\n",
    "* This [notebook](https://github.com/salgo60/open-data-examples/blob/master/SWEPUB%20-%20ORCID.ipynb) \n",
    "* SWEPUB\n",
    "  * [Kundo question](https://kundo.se/org/swepub/d/api-for-amnesklassificering/#c3571837) were they recommend download the ZIP file to access data in SWEPUB --> JSON 10.81 Gbyte \n",
    "  * [datamodell/swepub-bibframe](https://www.kb.se/samverkan-och-utveckling/swepub/datamodell/swepub-bibframe.html)\n",
    "  * [Twitter SwePub](https://twitter.com/SwePub)\n",
    "  * [SPARQL SWEPUB](https://github.com/libris/swepub-sparql) feels SPARQL is better than download a zipfile ?!?!?\n",
    "    * [Finding nr records per schools ](http://virhp07.libris.kb.se/sparql/?default-graph-uri=&query=PREFIX+bmc%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2Fbibliometric%2Fmodel%23%3E+%0D%0APREFIX+swpa_m%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2FSwePubAnalysis%2Fmodel%23%3E%0D%0APREFIX+mods_m%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2Fmods%2Fmodel%23%3E+%0D%0APREFIX+outt_m%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2FSwePubAnalysis%2FOutputTypes%2Fmodel%23%3E%0D%0APREFIX+xlink%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2Fxlink%23%3E+%0D%0ASELECT+DISTINCT+xsd%3Astring%28%3F_orgName%29+COUNT%28DISTINCT+%3F_workID%29+as+%3Fc%0D%0AWHERE%0D%0A%7B%0D%0A%3FCreativeWork+bmc%3AlocalID+%3F_workID+.%0D%0A%3FPublication+bmc%3AlocalID+%3F_publicationID+.%0D%0A%0D%0A%3FOrganization+rdfs%3Alabel+%3F_orgName+.%0D%0AFILTER%28lang%28%3F_orgName%29+%3D+%27sv%27+%29%0D%0A%0D%0A%3FCreativeWork+bmc%3AreportedBy+%3FRecord+.%0D%0A%3FCreativeWork+a+bmc%3ACreativeWork+.%0D%0A%3FCreativeWork+bmc%3ApublishedAs+%3FPublication+.%0D%0A%0D%0A%23%3FCreativeWork+bmc%3ApublicationYearEarliest+%3F_pubYear+.%0D%0A%3FCreativeWork+bmc%3AhasCreatorShip+%3FCreatorShip+.%0D%0A%3FCreatorShip+bmc%3AhasAffiliation+%3FCreatorAffiliation+.%0D%0A%3FCreatorAffiliation+bmc%3AhasOrganization+%3FOrganization+.%0D%0A%0D%0A%3FCreatorShip+bmc%3AhasCreator+%3FCreator+.+%0D%0A%0D%0A%7D%0D%0AORDER+BY+xsd%3Astring%28%3F_orgName%29&format=text%2Fhtml&timeout=0&debug=on)\n",
    "    * [authorityRecords/organizations.sparql](https://github.com/libris/swepub-sparql/blob/master/sparqls/authorityRecords/organizations.sparql) --> [query](http://virhp07.libris.kb.se/sparql?default-graph-uri=&query=%23KB+Auktoritetslista+%C3%B6ver+organisationer%0D%0APREFIX+swpa_m%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2FSwePubAnalysis%2Fmodel%23%3E%0D%0APREFIX+countries%3A+%3Chttp%3A%2F%2Fwww.bpiresearch.com%2FBPMO%2F2004%2F03%2F03%2Fcdl%2FCountries%23%3E%0D%0ASELECT+DISTINCT%0D%0Axsd%3Astring%28%3F_label%29+as+%3Forganization%0D%0Axsd%3Astring%28%3F_authority%29+as+%3Fauthority%0D%0Axsd%3Astring%28%3F_id%29+as+%3Fid%0D%0Axsd%3Astring%28%3F_nameLocal%29+as+%3Fcountry%0D%0Axsd%3Astring%28%3F_countryCodeISO3166Alpha3%29+as+%3FcountryCodeISO3166Alpha3%0D%0AWHERE%0D%0A%7B%0D%0A%3FResearchOrganization+a+swpa_m%3AResearchOrganization+.%0D%0A%3FResearchOrganization+rdfs%3Alabel+%3F_label+.%0D%0A%3FResearchOrganization+swpa_m%3AhasIdentity+%3FIdentity+.%0D%0A%3FResearchOrganization+swpa_m%3AlocatedIn+%3FISO3166DefinedCountry+.%0D%0A%3FIdentity+swpa_m%3Aauthority+%3F_authority+.%0D%0A%3FISO3166DefinedCountry+countries%3AcountryCodeISO3166Alpha3+%3F_countryCodeISO3166Alpha3+.%0D%0A%3FISO3166DefinedCountry+countries%3AreferencesCountry+%3FIndependentState+.%0D%0A%3FIndependentState+countries%3AnameLocal+%3F_nameLocal+.%0D%0A%3FIdentity+swpa_m%3Aid+%3F_id+.%0D%0A%3FIdentity+swpa_m%3Aauthority+%22kb.se%22%5E%5Exsd%3Astring+.%0D%0AFILTER%28%3F_authority+%3D+%22kb.se%22%5E%5Exsd%3Astring%29%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on)\n",
    "    * [publicationChannels](https://github.com/libris/swepub-sparql/blob/master/sparqls/authorityRecords/publicationChannels.sparql) --> [query](http://virhp07.libris.kb.se/sparql?default-graph-uri=&query=PREFIX+swpa_m%3A+%3Chttp%3A%2F%2Fswepub.kb.se%2FSwePubAnalysis%2Fmodel%23%3E%0D%0ASELECT+DISTINCT%0D%0Axsd%3Astring%28%3F_onetitle%29+as+%3Ftitle%0D%0Axsd%3Astring%28%3F_issn%29+as+%3FprintISN%0D%0Axsd%3Astring%28%3F_eissn%29+as+%3FelectronicISSN%0D%0Axsd%3Aint%28%3F_weight%29+as+%3FNorwegianLevel%0D%0Axsd%3Aint%28%3F_weight7%29+as+%3FFinnishLevel%0D%0Axsd%3Aint%28%3F_weight8%29+as+%3FDanishLevel%0D%0A%3FSwedishLevel%0D%0AWHERE%0D%0A%7B%0D%0A%3FJournal+a+swpa_m%3AJournal+.%0D%0A%3FJournal+swpa_m%3Aonetitle+%3F_onetitle+.%0D%0AOPTIONAL+%7B+%3FJournal+swpa_m%3Aeissn+%3F_eissn+.+%7D%0D%0AOPTIONAL+%7B+%3FJournal+swpa_m%3Aissn+%3F_issn+.+%7D%0D%0AFILTER+%28+bound%28%3F_issn%29+%7C%7C+bound%28%3F_eissn%29+%29%0D%0A%0D%0A%3FJournal+swpa_m%3AhasRank+%3FSwedishRank+.%0D%0A%3FSwedishRank+a+swpa_m%3ASwedishRank+.%0D%0A%3FSwedishRank+swpa_m%3Aweight+%3FSwedishLevel+.%0D%0A%0D%0AOPTIONAL%0D%0A%7B%0D%0A%3FJournal+swpa_m%3AhasRank+%3FNorwegianRank+.%0D%0A%3FNorwegianRank+a+swpa_m%3ANorwegianRank+.%0D%0A%3FNorwegianRank+swpa_m%3Aweight+%3F_weight+.%0D%0A%7D%0D%0AOPTIONAL%0D%0A%7B%0D%0A%3FJournal+swpa_m%3AhasRank+%3FFinnishRank+.%0D%0A%3FFinnishRank+a+swpa_m%3AFinnishRank+.%0D%0A%3FFinnishRank+swpa_m%3Aweight+%3F_weight7+.%0D%0A%7D%0D%0AOPTIONAL%0D%0A%7B%0D%0A%3FJournal+swpa_m%3AhasRank+%3FDanishRank+.%0D%0A%3FDanishRank+a+swpa_m%3ADanishRank+.%0D%0A%3FDanishRank+swpa_m%3Aweight+%3F_weight8+.%0D%0A%7D%0D%0A%7D%0D%0ALIMIT+100000&format=text%2Fhtml&timeout=0&debug=on)\n",
    "    * Finding persons with ORCID ???\n",
    "  \n",
    "see at the end where we find an ORCID in the JSON file looks like not everyone has an ORCID... \n",
    "* Magnus C Persson [0000-0003-1062-2789](https://orcid.org/0000-0003-1062-2789) he is same as [WD Q88134673](https://www.wikidata.org/wiki/Q88134673?uselang=sv) --> [Scholia](https://scholia.toolforge.org/author/Q88134673) ->\n",
    "  * [co-authors](https://tinyurl.com/yb65gf5m)\n",
    "  * [graf](https://tinyurl.com/ycf8zc69)  \n",
    "* Wikidata\n",
    "  * ORCID property [P496](https://www.wikidata.org/wiki/Property_talk:P496) on 1 546 332 objects\n",
    "    * 0000-0002-5494-8126 --> WD query [haswbstatement:\"P356=0000-0002-5494-8126\"](https://www.wikidata.org/w/index.php?sort=relevance&search=haswbstatement%3A%22P496%3D0000-0002-5494-8126%22&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns120=1)\n",
    "  * DOI property [P356](https://www.wikidata.org/wiki/Property_talk:P356) on 25 609 492 objects\n",
    "    * doi/10.1186/S13321-016-0161-3 --> WD query [haswbstatement:\"P356=10.1186/S13321-016-0161-3\"](https://www.wikidata.org/w/index.php?sort=relevance&search=haswbstatement%3A%22P356%3D10.1186%2FS13321-016-0161-3%22&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns120=1)\n",
    "   * query using [SPARQL](https://w.wiki/X3h)  \n",
    "   * [Scholia](https://scholia.toolforge.org/) - tool for citation graphs of data in Wikidata \n",
    "      * DOI link [doi/10.1186/S13321-016-0161-3](https://scholia.toolforge.org/doi/10.1186/S13321-016-0161-3)\n",
    "      * ORCID link [orcid/0000-0002-5494-8126](https://scholia.toolforge.org/orcid/0000-0002-5494-8126]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Thu Jul 16 16:56:56 2020\n"
     ]
    }
   ],
   "source": [
    "# Try to get ORCID from SWEPUB\n",
    "# see https://kundo.se/org/swepub/d/api-for-amnesklassificering/#c3571837\n",
    "import pandas as pd\n",
    "import json\n",
    "import time\n",
    "start_time = time.time()\n",
    "filename =\"data/swepub-duplicated-2020-07-05.jsonl\"\n",
    "filestore =\"data/swepub-duplicated-2020-07-05_1.pd\"\n",
    "\n",
    "print(time.ctime())\n",
    "df_chunk = pd.read_json(filename, lines=True, chunksize=10000)\n",
    "chunk_list = []\n",
    "for i, chunk in enumerate(df_chunk):\n",
    "    chunk_list.append(chunk)\n",
    "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
    "# concat the list into dataframe\n",
    "df_concat = pd.concat(chunk_list)\n",
    "print(\"--- %s seconds ---\" % (time.time() - start_time))\n",
    "df_concat.info()\n",
    "#df_concat.to_pickle(filestore)\n",
    "#print(\"--- %s seconds ---\" % (time.time() - start_time))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.set_option(\"display.max.columns\", None)   \n",
    "df_concat[\"instanceOf\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(df_concat[\"instanceOf\"].tolist()) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Try to find ORCID and DOI\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instanceOfdf = pd.DataFrame(df_concat[\"instanceOf\"].tolist()) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instanceOfdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"genreForm\"].tolist()) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instanceOfdf.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"hasTitle\"][1:10].tolist()) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"hasNote\"][1:10].tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.display.width = 0\n",
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist()[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### we have 35 authors but looks like no ORCID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist()[0])[\"agent\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist()[0])[\"agent\"].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist()[1])[\"agent\"].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.display.width = 0  \n",
    "pd.DataFrame(instanceOfdf[\"hasTitle\"][1:10].tolist()[2]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"hasTitle\"][1:10].tolist()[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1:10].tolist()[0])[\"role\"].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"electronicLocator\"][1:10]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][1000:1010].tolist()[1])[\"agent\"].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][2000:2010].tolist()[1])[\"agent\"].tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Looks we have an ORCID   \n",
    "* Magnus C Persson [0000-0003-1062-2789](https://orcid.org/0000-0003-1062-2789) he is same as [WD Q88134673](https://www.wikidata.org/wiki/Q88134673?uselang=sv) --> [Scholia](https://scholia.toolforge.org/author/Q88134673) ->\n",
    "  * [co-authors](https://tinyurl.com/yb65gf5m)\n",
    "  * [graf](https://tinyurl.com/ycf8zc69)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][2000:2010].tolist()[1])[\"agent\"].tolist()[0][\"identifiedBy\"]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(instanceOfdf[\"contribution\"][2000:2010].tolist()[1])[\"agent\"].tolist()[0][\"identifiedBy\"][0][\"value\"]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}