{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# How to work with scientific literature\n", "\n", "Nikolay Karelin, PhD\n", "\n", "Head of AI, [silkdata.ai](https://www.silkdata.ai/main)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Outline\n", "\n", "* Intro / Motivation\n", "* The way of paper\n", "* Search: Scholar Google and beyond\n", "* Working through paper\n", "* Storage and collaboration\n", "* A bit on economics: Why $15-45 a paper?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Slides and links\n", "\n", "\n", "\n", "https://github.com/karelin/sci_papers_talk" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Intro" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Papers, papers, PAPERS ....\n", "\n", "* ArXiv.org\n", "* NIPS\n", "* ACL\n", "* ACM\n", "* ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## How to select a paper for next product/feature?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Goals\n", "\n", "* Academia\n", " - New knowledge\n", " - Grants\n", "* Industry\n", " - New but settled approaches\n", " - Algorithms / data / details\n", "* Science is driven by community\n", " - Papers as main communication media" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Innovation risk profile\n", "\n", "<br>\n", "\n", "\n", "\n", "<br>\n", "\n", "From https://www.slideshare.net/Verhaert/innovation-day-2012-11-luc-van-goethem-frederik-wouters-verhaert-risk-requirements-cooperating-or-counteracting-forces-in-the-process/8" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## About me\n", "\n", "* In R&D since 1995\n", " * Academy: Optics (images), 1995-2008\n", " * Industry: simulation / HPC, 2008-2017\n", " * ML\n", "* PhD / optics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Annual number of papers\n", "\n", "\n", "\n", "From: Istvan Daruka, *Publication dynamics and the proliferation of scientific journals*, [DOI:10.1051/epn/2014102](https://doi.org/10.1051/epn/2014102), 2014" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The way of paper" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Paper types\n", "\n", "* 'Classical'\n", " * Letter\n", " * (Regular / full) article\n", " * Review\n", " * Tutorial\n", " * Conference (extended) abstract\n", " * Conference proceedings\n", " * Book chapter\n", "* 'New way'\n", " * e-print\n", "* Status\n", " * Invited" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Why????" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "* Community\n", "* Habits and traditions\n", "* Slow pace" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Peer-review\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Rejectrion ratio / Time / Bias" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Review: variants\n", "\n", "* Endorsement (arXiv.org)\n", "* Editor's coice (coferences / invited papers)\n", "* Single-round (conferences)\n", "* Post-publication (experimental)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Experiments: https://distill.pub" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Review: NIPS\n", "\n", "* Double blind\n", "* Max 5-6 papers per reviewer\n", "* Criteria\n", " * Novelty\n", " * Not submitted elsewhere\n", " * Quality\n", " * Significance\n", " * Clarity\n", " * Subject area\n", "\n", "Details: https://nips.cc/Conferences/2016/PaperInformation/ReviewerInstructions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## arXiv.org\n", "\n", "* Founded in 1991\n", "* Mostly physics\n", "* Initially: **pre-print** system\n", "* Endorsement system since January 17, 2004, https://arxiv.org/help/endorsement" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Stats:\n", "\n", "\n", "\n", "From https://arxiv.org/help/stats/2017_by_area/index" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## arXiv.org: story\n", "\n", "Why it was banned back in 1990s?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### [xxx.lanl.gov](http://xxx.lanl.gov/)\n", "\n", "\n", "\n", "By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=55801807" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Where to find papers?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## scholar.google.com\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## scholar.google.com\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## scholar.google.com\n", "\n", "* Non-commercial project\n", "* Citations\n", "* Free PDFs\n", "* Author profile\n", "* Recommendations (for authors!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Author profile\n", "\n", "\n", "\n", "Source: https://scholar.google.com/citations?user=WLN3QrAAAAAJ&hl=en&oi=ao" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Useful browser extensions\n", "\n", "* [Google scholar button](https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn)\n", "* Unpaywall, https://unpaywall.org/products/extension\n", "* Kopernio, https://kopernio.com/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## DOI\n", "\n", "* Digital Opject Identifier\n", " - doi.org/10.ddd/xxxxxxx\n", "* Not only papers:\n", " - Zenodo, https://zenodo.org/\n", " - GitHub, https://guides.github.com/activities/citable-code/\n", " - figshare, https://figshare.com/about" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## [arxiv-sanity.com](http://www.arxiv-sanity.com/)\n", "\n", "<br>\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## researchgate.net\n", "\n", "<br>\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## researchgate.net\n", "\n", "* Subscription\n", "* Reads\n", "* Citations\n", "* Legal status unclear" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Open data / APIs\n", "\n", "* arXiv.org: OAI-PMH, https://arxiv.org/help/oa/index\n", "* Cross-ref, https://www.crossref.org/services/metadata-delivery/\n", "* Unpaywall, https://unpaywall.org/products/api\n", "* Open citations, http://opencitations.net/\n", "* PubMed/PubChem, https://www.ncbi.nlm.nih.gov/home/develop/api/\n", "* ... many closed sources" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Working through a paper" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Possible order\n", "\n", "* Authors\n", "* Refences\n", "* Figures / tables\n", "* Discussion / conclusion\n", "* Code\n", "* Supplement\n", "* ... read all!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Author order\n", "\n", "<br>\n", "\n", "\n", "\n", "<br>\n", "[arXiv:1711.06402](https://arxiv.org/abs/1711.06402), IEEE International Conference on Bioinformatics and Biomedicine 2017" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## References\n", "\n", "<br>\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## References - old style\n", "\n", "<br>\n", "\n", "\n", "\n", "Example: https://habrahabr.ru/company/ods/blog/329410/\n", "\n", "Abstract: http://naacl2013.naacl.org/abstracts/37.aspx <br/>\n", "Paper: https://www.aclweb.org/anthology/N13-1090" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Caveats\n", "\n", "* Style\n", "* Author's goal: new result\n", "* Reference relevance\n", "* No sentiments\n", "* Code updates, ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "\n", "From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Storing and collaborationg" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why storing?\n", "\n", "* Notes and comments\n", "* Collaboration\n", "* Access instability" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## BibTeX\n", "\n", "\n", "\n", "- BibTeX-Wikipedia.png\n", "- https://en.wikipedia.org/wiki/BibTeX" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## BibTeX & JabRef\n", "\n", "\n", "\n", "http://www.jabref.org/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Zotero\n", "\n", "\n", "\n", "https://www.zotero.org/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Zotero: features and issues\n", "\n", "* Not only scholar papers (web-pages)\n", "* Export / import\n", "* Community\n", "* Extensions\n", "* Collaborration: paid!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mendeley\n", "\n", "<br>\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Mendeley\n", "\n", "* Process raw collection\n", "* Mobile client\n", "* Proprietory\n", "* Collaboration: paid!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## I, Librarian\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## I'Librarian\n", "\n", "* https://i-librarian.net/index.php\n", "* https://github.com/mkucej/i-librarian" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## I'Librarian\n", "\n", "* Easy setup\n", "* Collaboration\n", "* Almost no sharing of comments\n", "* Low support / rare updates\n", "* Old-style PHP code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# A bit on economics \n", "\n", "Why so expensive?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Publishing process\n", "\n", "<br>\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Publishing process\n", "\n", "- Small circulation\n", "- Complex typography\n", "- Peer-review process\n", "- No self-plagiarism" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Classical publishing process: Offprints\n", "\n", "\n", "\n", "Source: http://englishcoffeedrinker.blogspot.com.by/2009/12/offprints-and-odd-things.html" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Number of scientific journals\n", "\n", "\n", "\n", "From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Bell Syst. Tech. J.\n", "\n", "\n", "\n", "http://www.alcatel-lucent.com/bstj/\n", "\n", "http://netmgt.blogspot.com.by/2012/10/available-in-pdf-bell-system-technical.html\n", "\n", "Now: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6731005 (paid!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Future?\n", "\n", "* Copyright reform\n", "* New publishing modles\n", "* Data science:\n", " - Products\n", " - Automatic analysis\n", " - ???" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# We are hiring!\n", "\n", "* Junior / middle data scientist wanted\n", "* Text / document processing\n", "* Outsource / internal projects" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Thank you for attention\n", "\n", "\n", "Nikolay Karelin\n", "\n", "Facebook: https://www.facebook.com/nikolay.karelin <br/>\n", "Twitter: [@nick_karelin](https://twitter.com/nick_karelin) <br/>\n", "Linked: https://www.linkedin.com/in/nikolay-karelin/\n", "\n", "\n", "\n", "Slides: https://github.com/karelin/sci_papers_talk" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }