# OpenScienceWorkshop2019 Welcome to the Open Science Workshop Github Repository! You can find the slides for the workshop on [https://zenodo.org/record/3550236](https://zenodo.org/record/3550236)). This repository contains some simple examples of how code, specifically, *Jupyter Notebooks*, integrate with Github and and be run online. This repository has two demonstrations for hosting code online (see below). ## Structure of a Respository ___ In general, there are several key files you will tend to see over and over in GitHub Repositories: * README.MD file * requirements.txt * LICENSE * .gitignore The `README.MD` file is a description of your repository (more on that below). The `requirements.txt` file is typically a list of python packages that the repository can use. These are typically compatable with the pip package mananger ([documentation here](https://pip.readthedocs.io/en/1.1/requirements.html)), but the important point is to make sure that somewhere in the repository, it is clear exactly what is needed to run your code. The specifics of that are less important. The `LICENSE` file is common, but generally not necessary for smaller projects. I've chosen the MIT license, and GitHub makes it easy to choose among several. There is more information [here](https://help.github.com/en/github/creating-cloning-and-archiving-repositories/licensing-a-repository), but generally this is something you don't need to worry too much about unless you are building software. The `.gitignore` file list all of the files that you don't want to include in the reposistory. A lot of programs create temporary and/or hidden files (often starting with a '.') that can clutter a repository. The `.gitignore` specifies files to ignore for the purposes of version control. ### What can you host in GitHub? (Almost) anything! GitHub suggests that repositories are less than 1GB and limits all files to a maximum of 100 MB. It's a great place for non-sensitive data, analyses, simulations, presentations, websites, etc. Use your Github for whatever you want. It is an easy way to share files, with special tools for code, and specifically, python code. ### How is this different from git? *GitHub* is an online repository for *gits*, which is really just a project that uses *git* for version control. The basic features of both git and GitHub are very useful and simple to learn. Advanced users will find both to be very powerful and flexible programs, but don't let these complex features scare you off! The simplest use case for GitHub is a place to store data and code. The simplest use case for git is a way of tracking the changes you've made to a project. There are a number of resources online about both. Here are a couple of helpful links to get you started: * [Getting started with git](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F) * [Getting started with GitHub](https://help.github.com/en/github/getting-started-with-github) ## Documenting your Github ___ What you are reading now is the `README.MD` file. This file is written in *Markdown*, a simplified HTML-based langauge that allows you to quickly and easily write a website. With markdown, you can quickly add links, basic formatting, even code. Github will automatically render any file named README.MD in the body of the respository. You can see the raw markdown for this README.MD file: [raw markdown](https://raw.githubusercontent.com/nicktfranklin/OpenScienceWorkshop2019/master/README.md). Here is what code looks like rendered: ``` def my_fun(a, b): return a + b ``` and here's a fun photo of me and a llama: Kitten The point is you can do a lot with Markdown very easily. Your `README.MD` file can be as expressive as you like, and having a well documented and detailed `README.MD` file can really help make your work intelligable to someone else. Here is a good guide on writting Markdown in README.MD files: [Mastering Markdown](https://guides.github.com/features/mastering-markdown/). Markdown ## Hosting code online ___ You have a couple of options that allow you to host the code in your repository online so that others can use the code without going through a time-consuming installation process. Making sure that your code will run on someone else's computer can be a huge pain point, but luckily, there are two good and easy solutions for hosting python code within Jupyter notebooks: [Binder](https://mybinder.org) and [Google Colab](https://colab.research.google.com/) ### Binder [Binder](https://mybinder.org) is a resource for running Jupyter notebooks online. It integrates directly with github and will install any packages listed in your `requirements.txt`. It creates a Docker image of your repository and allows you to run your code online. We've created an example notebook to use with [Binder](https://mybinder.org). This example looks at the problem of change-points and how a simple reinforcement learning model handels them: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nicktfranklin/OpenScienceWorkshop2019/master?filepath=JupyterNotebooks%2FChangePointDetection%20Demo.ipynb) ### Google Colab [Google Colab](https://colab.research.google.com/) is simliar, but more powerful. Colab has many packages pre-installed and comes with access to GPUs, making it ideal for sharing neural networks. Below, we have a tutorial on writting code that's user-friendly and with some helpful functions in Colab. Open in Colab #### Deep Learning in Colab One of the nice things about Colab is that you can use GPUs for free to run distributed computing software like Tensorflow and PyTorch. Google has a number of excelent tutorials for this in the main page ([here](https://colab.research.google.com/)), but here is demonstration a Variational Autoencoder processing handwritten digits: [link](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/cvae.ipynb)