{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# How to work with scientific literature\n",
    "\n",
    "Nikolay Karelin, PhD\n",
    "\n",
    "Head of AI, [silkdata.ai](https://www.silkdata.ai/main)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Outline\n",
    "\n",
    "* Intro / Motivation\n",
    "* The way of paper\n",
    "* Search: Scholar Google and beyond\n",
    "* Working through paper\n",
    "* Storage and collaboration\n",
    "* A bit on economics: Why $15-45 a paper?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Slides and links\n",
    "\n",
    "![Slides](images/presntation_qr.png)\n",
    "\n",
    "https://github.com/karelin/sci_papers_talk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Intro"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Papers, papers, PAPERS ....\n",
    "\n",
    "* ArXiv.org\n",
    "* NIPS\n",
    "* ACL\n",
    "* ACM\n",
    "* ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## How to select a paper for next product/feature?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Goals\n",
    "\n",
    "* Academia\n",
    "  - New knowledge\n",
    "  - Grants\n",
    "* Industry\n",
    "  - New but settled approaches\n",
    "  - Algorithms / data / details\n",
    "* Science is driven by community\n",
    "  - Papers as main communication media"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Innovation risk profile\n",
    "\n",
    "<br>\n",
    "\n",
    "![Risk profile](images/risk_profile.png)\n",
    "\n",
    "<br>\n",
    "\n",
    "From https://www.slideshare.net/Verhaert/innovation-day-2012-11-luc-van-goethem-frederik-wouters-verhaert-risk-requirements-cooperating-or-counteracting-forces-in-the-process/8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## About me\n",
    "\n",
    "* In R&D since 1995\n",
    "  * Academy: Optics (images), 1995-2008\n",
    "  * Industry: simulation / HPC, 2008-2017\n",
    "  * ML\n",
    "* PhD / optics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Annual number of papers\n",
    "\n",
    "![Number of papers](images/number_of_papers.png)\n",
    "\n",
    "From: Istvan Daruka, *Publication dynamics and the proliferation of scientific journals*, [DOI:10.1051/epn/2014102](https://doi.org/10.1051/epn/2014102), 2014"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# The way of paper"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Paper types\n",
    "\n",
    "* 'Classical'\n",
    "  * Letter\n",
    "  * (Regular / full) article\n",
    "  * Review\n",
    "  * Tutorial\n",
    "  * Conference (extended) abstract\n",
    "  * Conference proceedings\n",
    "  * Book chapter\n",
    "* 'New way'\n",
    "  * e-print\n",
    "* Status\n",
    "  * Invited"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Why????"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "* Community\n",
    "* Habits and traditions\n",
    "* Slow pace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Peer-review\n",
    "\n",
    "![Peer-review diagram](images/dia_peer_review.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Rejectrion ratio / Time / Bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Review: variants\n",
    "\n",
    "* Endorsement (arXiv.org)\n",
    "* Editor's coice (coferences / invited papers)\n",
    "* Single-round (conferences)\n",
    "* Post-publication (experimental)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Experiments: https://distill.pub"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Review: NIPS\n",
    "\n",
    "* Double blind\n",
    "* Max 5-6 papers per reviewer\n",
    "* Criteria\n",
    "    * Novelty\n",
    "    * Not submitted elsewhere\n",
    "    * Quality\n",
    "    * Significance\n",
    "    * Clarity\n",
    "    * Subject area\n",
    "\n",
    "Details: https://nips.cc/Conferences/2016/PaperInformation/ReviewerInstructions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## arXiv.org\n",
    "\n",
    "* Founded in 1991\n",
    "* Mostly physics\n",
    "* Initially: **pre-print** system\n",
    "* Endorsement system since January 17, 2004, https://arxiv.org/help/endorsement"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Stats:\n",
    "\n",
    "![arXiv.org statistics](images/arxiv-newsubs.png)\n",
    "\n",
    "From https://arxiv.org/help/stats/2017_by_area/index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## arXiv.org: story\n",
    "\n",
    "Why it was banned back in 1990s?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### [xxx.lanl.gov](http://xxx.lanl.gov/)\n",
    "\n",
    "![arXiv.org 1994](images/ArXiv_1994.png)\n",
    "\n",
    "By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=55801807"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Where to find papers?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## scholar.google.com\n",
    "\n",
    "![Google Scholar Example](images/google_scholar_example.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## scholar.google.com\n",
    "\n",
    "![Google scholar: citations example](images/google_scholar_citations.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## scholar.google.com\n",
    "\n",
    "* Non-commercial project\n",
    "* Citations\n",
    "* Free PDFs\n",
    "* Author profile\n",
    "* Recommendations (for authors!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Author profile\n",
    "\n",
    "![Yann LeCun profile](images/Yann_LeCun_profile.jpg)\n",
    "\n",
    "Source: https://scholar.google.com/citations?user=WLN3QrAAAAAJ&hl=en&oi=ao"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Useful browser extensions\n",
    "\n",
    "* [Google scholar button](https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn)\n",
    "* Unpaywall, https://unpaywall.org/products/extension\n",
    "* Kopernio, https://kopernio.com/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## DOI\n",
    "\n",
    "* Digital Opject Identifier\n",
    "  - doi.org/10.ddd/xxxxxxx\n",
    "* Not only papers:\n",
    "  - Zenodo, https://zenodo.org/\n",
    "  - GitHub, https://guides.github.com/activities/citable-code/\n",
    "  - figshare, https://figshare.com/about"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## [arxiv-sanity.com](http://www.arxiv-sanity.com/)\n",
    "\n",
    "<br>\n",
    "\n",
    "![Arxiv Sanity Preserver](images/Arxiv_Sanity_Preserver.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## researchgate.net\n",
    "\n",
    "<br>\n",
    "\n",
    "![ResearchGate](images/ResearchGate.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## researchgate.net\n",
    "\n",
    "* Subscription\n",
    "* Reads\n",
    "* Citations\n",
    "* Legal status unclear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Open data / APIs\n",
    "\n",
    "* arXiv.org: OAI-PMH, https://arxiv.org/help/oa/index\n",
    "* Cross-ref, https://www.crossref.org/services/metadata-delivery/\n",
    "* Unpaywall, https://unpaywall.org/products/api\n",
    "* Open citations, http://opencitations.net/\n",
    "* PubMed/PubChem, https://www.ncbi.nlm.nih.gov/home/develop/api/\n",
    "* ... many closed sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Working through a paper"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Possible order\n",
    "\n",
    "* Authors\n",
    "* Refences\n",
    "* Figures / tables\n",
    "* Discussion / conclusion\n",
    "* Code\n",
    "* Supplement\n",
    "* ... read all!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Author order\n",
    "\n",
    "<br>\n",
    "\n",
    "![Improving Palliative Care with Deep Learning](images/authors_1711.06402.jpg)\n",
    "\n",
    "<br>\n",
    "[arXiv:1711.06402](https://arxiv.org/abs/1711.06402), IEEE International Conference on Bioinformatics and Biomedicine 2017"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## References\n",
    "\n",
    "<br>\n",
    "\n",
    "![1711.06402 references](images/1711.06402_references.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## References - old style\n",
    "\n",
    "<br>\n",
    "\n",
    "![Mikolov ref](images/Mikolov_ref.png)\n",
    "\n",
    "Example: https://habrahabr.ru/company/ods/blog/329410/\n",
    "\n",
    "Abstract: http://naacl2013.naacl.org/abstracts/37.aspx <br/>\n",
    "Paper: https://www.aclweb.org/anthology/N13-1090"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Caveats\n",
    "\n",
    "* Style\n",
    "* Author's goal: new result\n",
    "* Reference relevance\n",
    "* No sentiments\n",
    "* Code updates, ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Article usage over time](images/article_usage_over_time.png)\n",
    "\n",
    "From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Storing and collaborationg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Why storing?\n",
    "\n",
    "* Notes and comments\n",
    "* Collaboration\n",
    "* Access instability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## BibTeX\n",
    "\n",
    "![BibTeX format example](images/BibTeX-Wikipedia.png)\n",
    "\n",
    "- BibTeX-Wikipedia.png\n",
    "- https://en.wikipedia.org/wiki/BibTeX"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## BibTeX & JabRef\n",
    "\n",
    "![JabRef example](images/JabRef-4-0-DOI-handling.png)\n",
    "\n",
    "http://www.jabref.org/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Zotero\n",
    "\n",
    "![Zotero example](images/Zotero.png)\n",
    "\n",
    "https://www.zotero.org/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Zotero: features and issues\n",
    "\n",
    "* Not only scholar papers (web-pages)\n",
    "* Export / import\n",
    "* Community\n",
    "* Extensions\n",
    "* Collaborration: paid!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Mendeley\n",
    "\n",
    "<br>\n",
    "\n",
    "![Mendeley desktop](images/Mendeley_Desktop.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Mendeley\n",
    "\n",
    "* Process raw collection\n",
    "* Mobile client\n",
    "* Proprietory\n",
    "* Collaboration: paid!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## I, Librarian\n",
    "\n",
    "![Demo account https://i-librarian.net/demo/](images/I-Librarian_demo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## I'Librarian\n",
    "\n",
    "* https://i-librarian.net/index.php\n",
    "* https://github.com/mkucej/i-librarian"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## I'Librarian\n",
    "\n",
    "* Easy setup\n",
    "* Collaboration\n",
    "* Almost no sharing of comments\n",
    "* Low support / rare updates\n",
    "* Old-style PHP code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# A bit on economics \n",
    "\n",
    "Why so expensive?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Publishing process\n",
    "\n",
    "<br>\n",
    "\n",
    "![Publishing process](images/publishing_process.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Publishing process\n",
    "\n",
    "- Small circulation\n",
    "- Complex typography\n",
    "- Peer-review process\n",
    "- No self-plagiarism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Classical publishing process: Offprints\n",
    "\n",
    "![Offprints](images/offprints.jpg)\n",
    "\n",
    "Source: http://englishcoffeedrinker.blogspot.com.by/2009/12/offprints-and-odd-things.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Number of scientific journals\n",
    "\n",
    "![Number of journals](images/number_of_journals.png)\n",
    "\n",
    "From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Bell Syst. Tech. J.\n",
    "\n",
    "![Bell Syst. Tech. J](images/bell_systems_technical_journal_1978_Unix.jpg)\n",
    "\n",
    "http://www.alcatel-lucent.com/bstj/\n",
    "\n",
    "http://netmgt.blogspot.com.by/2012/10/available-in-pdf-bell-system-technical.html\n",
    "\n",
    "Now: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6731005 (paid!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Future?\n",
    "\n",
    "* Copyright reform\n",
    "* New publishing modles\n",
    "* Data science:\n",
    "  - Products\n",
    "  - Automatic analysis\n",
    "  - ???"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# We are hiring!\n",
    "\n",
    "* Junior / middle data scientist wanted\n",
    "* Text / document processing\n",
    "* Outsource / internal projects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Thank you for attention\n",
    "\n",
    "\n",
    "Nikolay Karelin\n",
    "\n",
    "Facebook: https://www.facebook.com/nikolay.karelin <br/>\n",
    "Twitter: [@nick_karelin](https://twitter.com/nick_karelin) <br/>\n",
    "Linked: https://www.linkedin.com/in/nikolay-karelin/\n",
    "\n",
    "![Slides](images/presntation_qr.png)\n",
    "\n",
    "Slides: https://github.com/karelin/sci_papers_talk"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}