# How to work with scientific literature

Nikolay Karelin, PhD

Head of AI, [silkdata.ai](https://www.silkdata.ai/main)

# Outline

* Intro / Motivation
* The way of paper
* Search: Scholar Google and beyond
* Working through paper
* Storage and collaboration
* A bit on economics: Why $15-45 a paper?

# Slides and links

![Slides](images/presntation_qr.png)

https://github.com/karelin/sci_papers_talk

# Intro

## Papers, papers, PAPERS ....

* ArXiv.org
* NIPS
* ACL
* ACM
* ...

## How to select a paper for next product/feature?

## Goals

* Academia
  - New knowledge
  - Grants
* Industry
  - New but settled approaches
  - Algorithms / data / details
* Science is driven by community
  - Papers as main communication media

## Innovation risk profile

<br>

![Risk profile](images/risk_profile.png)

<br>

From https://www.slideshare.net/Verhaert/innovation-day-2012-11-luc-van-goethem-frederik-wouters-verhaert-risk-requirements-cooperating-or-counteracting-forces-in-the-process/8

## About me

* In R&D since 1995
  * Academy: Optics (images), 1995-2008
  * Industry: simulation / HPC, 2008-2017
  * ML
* PhD / optics

## Annual number of papers

![Number of papers](images/number_of_papers.png)

From: Istvan Daruka, *Publication dynamics and the proliferation of scientific journals*, [DOI:10.1051/epn/2014102](https://doi.org/10.1051/epn/2014102), 2014

# The way of paper

## Paper types

* 'Classical'
  * Letter
  * (Regular / full) article
  * Review
  * Tutorial
  * Conference (extended) abstract
  * Conference proceedings
  * Book chapter
* 'New way'
  * e-print
* Status
  * Invited

## Why????

* Community
* Habits and traditions
* Slow pace

## Peer-review

![Peer-review diagram](images/dia_peer_review.png)

Rejectrion ratio / Time / Bias

## Review: variants

* Endorsement (arXiv.org)
* Editor's coice (coferences / invited papers)
* Single-round (conferences)
* Post-publication (experimental)

Experiments: https://distill.pub

## Review: NIPS

* Double blind
* Max 5-6 papers per reviewer
* Criteria
    * Novelty
    * Not submitted elsewhere
    * Quality
    * Significance
    * Clarity
    * Subject area

Details: https://nips.cc/Conferences/2016/PaperInformation/ReviewerInstructions

## arXiv.org

* Founded in 1991
* Mostly physics
* Initially: **pre-print** system
* Endorsement system since January 17, 2004, https://arxiv.org/help/endorsement

### Stats:

![arXiv.org statistics](images/arxiv-newsubs.png)

From https://arxiv.org/help/stats/2017_by_area/index

## arXiv.org: story

Why it was banned back in 1990s?

### [xxx.lanl.gov](http://xxx.lanl.gov/)

![arXiv.org 1994](images/ArXiv_1994.png)

By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=55801807

# Where to find papers?

## scholar.google.com

![Google Scholar Example](images/google_scholar_example.png)

## scholar.google.com

![Google scholar: citations example](images/google_scholar_citations.png)

## scholar.google.com

* Non-commercial project
* Citations
* Free PDFs
* Author profile
* Recommendations (for authors!)

## Author profile

![Yann LeCun profile](images/Yann_LeCun_profile.jpg)

Source: https://scholar.google.com/citations?user=WLN3QrAAAAAJ&hl=en&oi=ao

## Useful browser extensions

* [Google scholar button](https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn)
* Unpaywall, https://unpaywall.org/products/extension
* Kopernio, https://kopernio.com/

## DOI

* Digital Opject Identifier
  - doi.org/10.ddd/xxxxxxx
* Not only papers:
  - Zenodo, https://zenodo.org/
  - GitHub, https://guides.github.com/activities/citable-code/
  - figshare, https://figshare.com/about

## [arxiv-sanity.com](http://www.arxiv-sanity.com/)

<br>

![Arxiv Sanity Preserver](images/Arxiv_Sanity_Preserver.png)

## researchgate.net

<br>

![ResearchGate](images/ResearchGate.png)

## researchgate.net

* Subscription
* Reads
* Citations
* Legal status unclear

## Open data / APIs

* arXiv.org: OAI-PMH, https://arxiv.org/help/oa/index
* Cross-ref, https://www.crossref.org/services/metadata-delivery/
* Unpaywall, https://unpaywall.org/products/api
* Open citations, http://opencitations.net/
* PubMed/PubChem, https://www.ncbi.nlm.nih.gov/home/develop/api/
* ... many closed sources

# Working through a paper

## Possible order

* Authors
* Refences
* Figures / tables
* Discussion / conclusion
* Code
* Supplement
* ... read all!

## Author order

<br>

![Improving Palliative Care with Deep Learning](images/authors_1711.06402.jpg)

<br>
[arXiv:1711.06402](https://arxiv.org/abs/1711.06402), IEEE International Conference on Bioinformatics and Biomedicine 2017

## References

<br>

![1711.06402 references](images/1711.06402_references.png)

## References - old style

<br>

![Mikolov ref](images/Mikolov_ref.png)

Example: https://habrahabr.ru/company/ods/blog/329410/

Abstract: http://naacl2013.naacl.org/abstracts/37.aspx <br/>
Paper: https://www.aclweb.org/anthology/N13-1090

## Caveats

* Style
* Author's goal: new result
* Reference relevance
* No sentiments
* Code updates, ...

![Article usage over time](images/article_usage_over_time.png)

From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf

# Storing and collaborationg

## Why storing?

* Notes and comments
* Collaboration
* Access instability

## BibTeX

![BibTeX format example](images/BibTeX-Wikipedia.png)

- BibTeX-Wikipedia.png
- https://en.wikipedia.org/wiki/BibTeX

## BibTeX & JabRef

![JabRef example](images/JabRef-4-0-DOI-handling.png)

http://www.jabref.org/

## Zotero

![Zotero example](images/Zotero.png)

https://www.zotero.org/

## Zotero: features and issues

* Not only scholar papers (web-pages)
* Export / import
* Community
* Extensions
* Collaborration: paid!

## Mendeley

<br>

![Mendeley desktop](images/Mendeley_Desktop.png)

## Mendeley

* Process raw collection
* Mobile client
* Proprietory
* Collaboration: paid!

## I, Librarian

![Demo account https://i-librarian.net/demo/](images/I-Librarian_demo.png)

## I'Librarian

* https://i-librarian.net/index.php
* https://github.com/mkucej/i-librarian

## I'Librarian

* Easy setup
* Collaboration
* Almost no sharing of comments
* Low support / rare updates
* Old-style PHP code

# A bit on economics 

Why so expensive?

## Publishing process

<br>

![Publishing process](images/publishing_process.png)

## Publishing process

- Small circulation
- Complex typography
- Peer-review process
- No self-plagiarism

## Classical publishing process: Offprints

![Offprints](images/offprints.jpg)

Source: http://englishcoffeedrinker.blogspot.com.by/2009/12/offprints-and-odd-things.html

## Number of scientific journals

![Number of journals](images/number_of_journals.png)

From https://www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf

## Bell Syst. Tech. J.

![Bell Syst. Tech. J](images/bell_systems_technical_journal_1978_Unix.jpg)

http://www.alcatel-lucent.com/bstj/

http://netmgt.blogspot.com.by/2012/10/available-in-pdf-bell-system-technical.html

Now: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6731005 (paid!)

## Future?

* Copyright reform
* New publishing modles
* Data science:
  - Products
  - Automatic analysis
  - ???

# We are hiring!

* Junior / middle data scientist wanted
* Text / document processing
* Outsource / internal projects

# Thank you for attention


Nikolay Karelin

Facebook: https://www.facebook.com/nikolay.karelin <br/>
Twitter: [@nick_karelin](https://twitter.com/nick_karelin) <br/>
Linked: https://www.linkedin.com/in/nikolay-karelin/

![Slides](images/presntation_qr.png)

Slides: https://github.com/karelin/sci_papers_talk