{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# How to work with scientific literature\n",
"\n",
"Nikolay Karelin, PhD\n",
"\n",
"Head of AI, [silkdata.ai](https://www.silkdata.ai/main)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Outline\n",
"\n",
"* Intro / Motivation\n",
"* The way of paper\n",
"* Search: Scholar Google and beyond\n",
"* Working through paper\n",
"* Storage and collaboration\n",
"* A bit on economics: Why $15-45 a paper?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Slides and links\n",
"\n",
"\n",
"\n",
"https://github.com/karelin/sci_papers_talk"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Intro"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Papers, papers, PAPERS ....\n",
"\n",
"* ArXiv.org\n",
"* NIPS\n",
"* ACL\n",
"* ACM\n",
"* ..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How to select a paper for next product/feature?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Goals\n",
"\n",
"* Academia\n",
" - New knowledge\n",
" - Grants\n",
"* Industry\n",
" - New but settled approaches\n",
" - Algorithms / data / details\n",
"* Science is driven by community\n",
" - Papers as main communication media"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Innovation risk profile\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"From https://www.slideshare.net/Verhaert/innovation-day-2012-11-luc-van-goethem-frederik-wouters-verhaert-risk-requirements-cooperating-or-counteracting-forces-in-the-process/8"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## About me\n",
"\n",
"* In R&D since 1995\n",
" * Academy: Optics (images), 1995-2008\n",
" * Industry: simulation / HPC, 2008-2017\n",
" * ML\n",
"* PhD / optics"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Annual number of papers\n",
"\n",
"\n",
"\n",
"From: Istvan Daruka, *Publication dynamics and the proliferation of scientific journals*, [DOI:10.1051/epn/2014102](https://doi.org/10.1051/epn/2014102), 2014"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# The way of paper"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Paper types\n",
"\n",
"* 'Classical'\n",
" * Letter\n",
" * (Regular / full) article\n",
" * Review\n",
" * Tutorial\n",
" * Conference (extended) abstract\n",
" * Conference proceedings\n",
" * Book chapter\n",
"* 'New way'\n",
" * e-print\n",
"* Status\n",
" * Invited"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Why????"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"* Community\n",
"* Habits and traditions\n",
"* Slow pace"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Peer-review\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Rejectrion ratio / Time / Bias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Review: variants\n",
"\n",
"* Endorsement (arXiv.org)\n",
"* Editor's coice (coferences / invited papers)\n",
"* Single-round (conferences)\n",
"* Post-publication (experimental)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Experiments: https://distill.pub"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Review: NIPS\n",
"\n",
"* Double blind\n",
"* Max 5-6 papers per reviewer\n",
"* Criteria\n",
" * Novelty\n",
" * Not submitted elsewhere\n",
" * Quality\n",
" * Significance\n",
" * Clarity\n",
" * Subject area\n",
"\n",
"Details: https://nips.cc/Conferences/2016/PaperInformation/ReviewerInstructions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## arXiv.org\n",
"\n",
"* Founded in 1991\n",
"* Mostly physics\n",
"* Initially: **pre-print** system\n",
"* Endorsement system since January 17, 2004, https://arxiv.org/help/endorsement"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Stats:\n",
"\n",
"\n",
"\n",
"From https://arxiv.org/help/stats/2017_by_area/index"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## arXiv.org: story\n",
"\n",
"Why it was banned back in 1990s?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### [xxx.lanl.gov](http://xxx.lanl.gov/)\n",
"\n",
"\n",
"\n",
"By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=55801807"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Where to find papers?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## scholar.google.com\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## scholar.google.com\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## scholar.google.com\n",
"\n",
"* Non-commercial project\n",
"* Citations\n",
"* Free PDFs\n",
"* Author profile\n",
"* Recommendations (for authors!)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Author profile\n",
"\n",
"\n",
"\n",
"Source: https://scholar.google.com/citations?user=WLN3QrAAAAAJ&hl=en&oi=ao"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Useful browser extensions\n",
"\n",
"* [Google scholar button](https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn)\n",
"* Unpaywall, https://unpaywall.org/products/extension\n",
"* Kopernio, https://kopernio.com/"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## DOI\n",
"\n",
"* Digital Opject Identifier\n",
" - doi.org/10.ddd/xxxxxxx\n",
"* Not only papers:\n",
" - Zenodo, https://zenodo.org/\n",
" - GitHub, https://guides.github.com/activities/citable-code/\n",
" - figshare, https://figshare.com/about"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [arxiv-sanity.com](http://www.arxiv-sanity.com/)\n",
"\n",
"
\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## researchgate.net\n",
"\n",
"
\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## researchgate.net\n",
"\n",
"* Subscription\n",
"* Reads\n",
"* Citations\n",
"* Legal status unclear"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Open data / APIs\n",
"\n",
"* arXiv.org: OAI-PMH, https://arxiv.org/help/oa/index\n",
"* Cross-ref, https://www.crossref.org/services/metadata-delivery/\n",
"* Unpaywall, https://unpaywall.org/products/api\n",
"* Open citations, http://opencitations.net/\n",
"* PubMed/PubChem, https://www.ncbi.nlm.nih.gov/home/develop/api/\n",
"* ... many closed sources"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Working through a paper"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Possible order\n",
"\n",
"* Authors\n",
"* Refences\n",
"* Figures / tables\n",
"* Discussion / conclusion\n",
"* Code\n",
"* Supplement\n",
"* ... read all!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Author order\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"[arXiv:1711.06402](https://arxiv.org/abs/1711.06402), IEEE International Conference on Bioinformatics and Biomedicine 2017"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## References\n",
"\n",
"
\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## References - old style\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Example: https://habrahabr.ru/company/ods/blog/329410/\n",
"\n",
"Abstract: http://naacl2013.naacl.org/abstracts/37.aspx
\n",
"Paper: https://www.aclweb.org/anthology/N13-1090"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Caveats\n",
"\n",
"* Style\n",
"* Author's goal: new result\n",
"* Reference relevance\n",
"* No sentiments\n",
"* Code updates, ..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"\n",
"From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Storing and collaborationg"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Why storing?\n",
"\n",
"* Notes and comments\n",
"* Collaboration\n",
"* Access instability"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## BibTeX\n",
"\n",
"\n",
"\n",
"- BibTeX-Wikipedia.png\n",
"- https://en.wikipedia.org/wiki/BibTeX"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## BibTeX & JabRef\n",
"\n",
"\n",
"\n",
"http://www.jabref.org/"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Zotero\n",
"\n",
"\n",
"\n",
"https://www.zotero.org/"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Zotero: features and issues\n",
"\n",
"* Not only scholar papers (web-pages)\n",
"* Export / import\n",
"* Community\n",
"* Extensions\n",
"* Collaborration: paid!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Mendeley\n",
"\n",
"
\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Mendeley\n",
"\n",
"* Process raw collection\n",
"* Mobile client\n",
"* Proprietory\n",
"* Collaboration: paid!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## I, Librarian\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## I'Librarian\n",
"\n",
"* https://i-librarian.net/index.php\n",
"* https://github.com/mkucej/i-librarian"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## I'Librarian\n",
"\n",
"* Easy setup\n",
"* Collaboration\n",
"* Almost no sharing of comments\n",
"* Low support / rare updates\n",
"* Old-style PHP code"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# A bit on economics \n",
"\n",
"Why so expensive?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Publishing process\n",
"\n",
"
\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Publishing process\n",
"\n",
"- Small circulation\n",
"- Complex typography\n",
"- Peer-review process\n",
"- No self-plagiarism"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Classical publishing process: Offprints\n",
"\n",
"\n",
"\n",
"Source: http://englishcoffeedrinker.blogspot.com.by/2009/12/offprints-and-odd-things.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Number of scientific journals\n",
"\n",
"\n",
"\n",
"From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bell Syst. Tech. J.\n",
"\n",
"\n",
"\n",
"http://www.alcatel-lucent.com/bstj/\n",
"\n",
"http://netmgt.blogspot.com.by/2012/10/available-in-pdf-bell-system-technical.html\n",
"\n",
"Now: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6731005 (paid!)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Future?\n",
"\n",
"* Copyright reform\n",
"* New publishing modles\n",
"* Data science:\n",
" - Products\n",
" - Automatic analysis\n",
" - ???"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# We are hiring!\n",
"\n",
"* Junior / middle data scientist wanted\n",
"* Text / document processing\n",
"* Outsource / internal projects"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Thank you for attention\n",
"\n",
"\n",
"Nikolay Karelin\n",
"\n",
"Facebook: https://www.facebook.com/nikolay.karelin
\n",
"Twitter: [@nick_karelin](https://twitter.com/nick_karelin)
\n",
"Linked: https://www.linkedin.com/in/nikolay-karelin/\n",
"\n",
"\n",
"\n",
"Slides: https://github.com/karelin/sci_papers_talk"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}