{ "cells": [ { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "# Programming concepts cheat sheet\n", "\n", "When trying to figure out what went into a program, look at \n", " 1. documentation,\n", " 1. file names and subdirectories under which the source code has been organized,\n", " 1. imported libraries and their documentation,\n", " 1. function names and parameters, \n", " 1. function contents. \n", " \n", "Try to find the main function, and start delving from there." ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Libraries\n", "\n", "Contain functions and data types. Used to organize code and package large functionalities into reusable units." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [], "source": [ "# Python\n", "import re # regular expressions\n", "import requests # web requests\n", "import pandas as pd # data science computation\n", "import numpy as np # numerical computation\n", "import matplotlib.pyplot as plt # plotting" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "inputHidden": false, "kernel": "ir", "outputHidden": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading required package: NLP\n", "\n", "Attaching package: ‘NLP’\n", "\n", "The following object is masked from ‘package:ggplot2’:\n", "\n", " annotate\n", "\n", "Loading required package: RColorBrewer\n" ] } ], "source": [ "# R\n", "library(ggplot2) # plotting\n", "library(tidyverse) # data wrangling\n", "library(cluster) # data clustering\n", "library(slam) # numerical computation\n", "library(tm) # text mining \n", "library(SnowballC) # word stemming\n", "library(wordcloud) # word clouds\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "kernel": "scala" }, "outputs": [ { "data": { "text/plain": [ "\u001b[32mimport \u001b[39m\u001b[36m$ivy.$ // this one is just for the notebook, and not usual Scala\n", "\u001b[39m\n", "\u001b[32mimport \u001b[39m\u001b[36mcom.github.tototoshi.csv._\n", "\u001b[39m\n", "\u001b[32mimport \u001b[39m\u001b[36mscala.io.Source\n", "\u001b[39m\n", "\u001b[32mimport \u001b[39m\u001b[36mscala.collection.JavaConverters._\n", "\u001b[39m" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "import $ivy.`com.github.tototoshi::scala-csv:1.3.5` // this one is just for the notebook, and not usual Scala\n", "import com.github.tototoshi.csv._\n", "import scala.io.Source\n", "import scala.collection.JavaConverters._\n" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Functions\n", "\n", "Allow you to package code in reusable packages. Used to organize a codebase. Zero or one output, as many input parameters as you like." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "where are we i dont know \n", "6\n" ] } ], "source": [ "# Python\n", "import re\n", "\n", "def standardize(text):\n", " text = text.replace(\".\",\" \").replace(\",\",\" \").replace(\"?\",\" \").replace(\"!\",\" \").replace(\"'\",\"\").lower()\n", " return re.sub(\"\\s+\",\" \", text)\n", "\n", "print(standardize(\"Where are we? I don't know!\"))\n", "\n", "def sum(values):\n", " sum = 0\n", " for value in values:\n", " sum += value\n", " return sum\n", "\n", "print(sum([1,2,3]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "inputHidden": false, "kernel": "ir", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"where are we i dont know \"\n", "[1] 6\n" ] } ], "source": [ "# R\n", "standardize <- function(text) {\n", " return(tolower(gsub(\"\\\\s+\",\" \",gsub(\"\\\\.\",\" \",gsub(\",\",\" \",gsub(\"\\\\?\",\" \",gsub(\"!\",\" \",gsub(\"'\",\"\",text))))))))\n", "}\n", "\n", "print(standardize(\"Where are we? I don't know!\"))\n", "\n", "sum <- function(values) {\n", " sum <- 0\n", " for (value in values) {\n", " sum <- sum + value\n", " }\n", " return(sum)\n", "}\n", "\n", "print(sum(c(1,2,3)))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "where are we i dont know \n", "6\n", "6\n" ] }, { "data": { "text/plain": [ "defined \u001b[32mfunction\u001b[39m \u001b[36mstandardize\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36msum1\u001b[39m\n", "defined \u001b[32mfunction\u001b[39m \u001b[36msum2\u001b[39m" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "def standardize(text: String) =\n", " text.replace(\".\",\" \").replace(\",\",\" \").replace(\"?\",\" \").replace(\"!\",\" \").replace(\"'\",\"\").toLowerCase().replaceAll(\"\\\\s+\",\" \")\n", "\n", "println(standardize(\"Where are we? I don't know!\"))\n", "\n", "def sum1(values: Seq[Int]) = {\n", " var sum = 0\n", " for (value <- values) sum += value\n", " sum\n", "}\n", "// here's a functional variant for sum\n", "def sum2(values: Seq[Int]) = values.reduce(_+_)\n", "println(sum1(Seq(1,2,3)))\n", "println(sum2(Seq(1,2,3)))" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "In many programming languages, methods are functions associated with data types, with a different syntax for specifying the key parameter:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "XXXX\n", "bXbX\n" ] } ], "source": [ "# Python\n", "print(\"abab\".replace(\"a\",\"b\").replace(\"b\",\"X\"))\n", "print(\"abab\".replace(\"b\",\"X\").replace(\"a\",\"b\"))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "XXXX\n", "bXbX\n" ] } ], "source": [ "// Scala\n", "println(\"abab\".replace(\"a\",\"b\").replace(\"b\",\"X\"))\n", "println(\"abab\".replace(\"b\",\"X\").replace(\"a\",\"b\"))" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "R doesn't really believe in methods.\n", "\n", "Operators are yet another, easier syntax for core functions. In Python and Scala, they really are syntactic sugar for methods, but in R they're a separate language construct." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15\n", "15\n", "[1, 2, 3, 4]\n" ] } ], "source": [ "# Python\n", "print((5).__add__(3).__add__(7))\n", "print(5+3+7)\n", "\n", "values = [1,2]\n", "values.extend([3])\n", "values += [4]\n", "print(values)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15\n", "15\n", "ArrayBuffer(1, 2, 3, 4)" ] }, { "data": { "text/plain": [ "\u001b[32mimport \u001b[39m\u001b[36mscala.collection.mutable.ArrayBuffer\n", "\n", "\u001b[39m\n", "\u001b[36mvalues\u001b[39m: \u001b[32mArrayBuffer\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mArrayBuffer\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m4\u001b[39m)\n", "\u001b[36mres3_4\u001b[39m: \u001b[32mArrayBuffer\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mArrayBuffer\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m4\u001b[39m)\n", "\u001b[36mres3_5\u001b[39m: \u001b[32mArrayBuffer\u001b[39m[\u001b[32mInt\u001b[39m] = \u001b[33mArrayBuffer\u001b[39m(\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m, \u001b[32m4\u001b[39m)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "import scala.collection.mutable.ArrayBuffer\n", "\n", "println(5.+(3).+(7))\n", "println(5+3+7)\n", "\n", "val values = ArrayBuffer(1,2)\n", "values.+=(3)\n", "values += 4\n", "print(values)" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Variables\n", "\n", "Allow you to store data and refer to it using self-defined symbols in your code" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Eetu is an adult\n" ] } ], "source": [ "# Python\n", "name = \"Eetu\"\n", "age = 18\n", "\n", "if age>=18:\n", " print(name + \" is an adult\")\n", "else:\n", " print(name + \" is a child\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "inputHidden": false, "kernel": "ir", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Eetu is an adult\"\n" ] } ], "source": [ "# R\n", "name <- \"Eetu\"\n", "age <- 18\n", "\n", "if (age>=18) {\n", " print(paste(name,\" is an adult\",sep=\"\"))\n", "} else {\n", " print(paste(name, \" is a child\",sep=\"\"))\n", "}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Eetu is an adult\n" ] }, { "data": { "text/plain": [ "\u001b[36mname\u001b[39m: \u001b[32mString\u001b[39m = \u001b[32m\"Eetu\"\u001b[39m\n", "\u001b[36mage\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m18\u001b[39m" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "val name = \"Eetu\"\n", "val age = 18\n", "\n", "if (age>=18)\n", " println(name + \" is an adult\")\n", "else\n", " println(name + \" is a child\")" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## If/else\n", "\n", "Program flow control statement that allows you to choose between alternate courses of action based on data" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Eetu is an adult\n" ] } ], "source": [ "# Python\n", "name = \"Eetu\"\n", "age = 18\n", "\n", "if age<18:\n", " print(name + \" is a child\")\n", "elif age>65:\n", " print(name + \" is old\")\n", "elif age>100:\n", " print(name + \" is ancient\")\n", "else:\n", " print(name + \" is an adult\")\n", " \n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "inputHidden": false, "kernel": "ir", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Eetu is an adult\"\n" ] } ], "source": [ "# R\n", "name <- \"Eetu\"\n", "age <- 18\n", "\n", "if (age<18) {\n", " print(paste(name, \"is a child\"))\n", "} else if (age>65) {\n", " print(paste(name, \"is old\"))\n", "} else if (age>100) {\n", " print(paste(name, \"is ancient\"))\n", "} else {\n", " print(paste(name,\"is an adult\"))\n", "}" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Eetu is an adult\n" ] }, { "data": { "text/plain": [ "\u001b[36mname\u001b[39m: \u001b[32mString\u001b[39m = \u001b[32m\"Eetu\"\u001b[39m\n", "\u001b[36mage\u001b[39m: \u001b[32mInt\u001b[39m = \u001b[32m18\u001b[39m" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "val name = \"Eetu\"\n", "val age = 18\n", "\n", "if (age<18)\n", " println(name + \" is a child\")\n", "else if (age>65)\n", " println(name + \" is old\")\n", "else if (age>100)\n", " println(name + \" is ancient\")\n", "else \n", " println(name + \" is an adult\")" ] }, { "cell_type": "markdown", "metadata": { "kernel": "scala" }, "source": [ "Some languages such as Scala and R have construct to make certain if else statements a bit easier:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Batman\n" ] }, { "data": { "text/plain": [ "\u001b[36mname\u001b[39m: \u001b[32mString\u001b[39m = \u001b[32m\"Batman\"\u001b[39m" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "val name = \"Batman\"\n", "name match {\n", " case \"John\" => println(\"Hello Johnny\")\n", " case \"Bruce Wayne\" => println(\"Hello Batman\")\n", " case anyname => println(\"Hello \"+anyname)\n", "}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "kernel": "ir" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Hello Batman\"\n" ] } ], "source": [ "# R\n", "name <- \"Batman\"\n", "switch(name,\n", " \"John\" = print(\"Hello Johnny\"),\n", " \"Bruce Wayne\" = print(\"Hello Batman\"),\n", " print(paste(\"Hello\",name))\n", ")" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## While\n", "\n", "General flow control structure for doing something as long as a condition holds" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First age over 18 (age nr. 3): 19\n", "Average age: 36.0\n" ] } ], "source": [ "# Python\n", "ages = [ 15, 17, 19, 20, 55, 90 ]\n", "\n", "i = 0\n", "while (ages[i]<18): i+=1\n", "\n", "print(\"First age over 18 (age nr. \"+str(i+1)+\"): \"+str(ages[i]))\n", "\n", "i = 0\n", "agesum = 0\n", "while i \u001b[32m\" \"\u001b[39m,\n", " \u001b[32m\"&\"\u001b[39m -> \u001b[32m\"and\"\u001b[39m,\n", " \u001b[32m\"!\"\u001b[39m -> \u001b[32m\" \"\u001b[39m,\n", " \u001b[32m\",\"\u001b[39m -> \u001b[32m\" \"\u001b[39m,\n", " \u001b[32m\"'\"\u001b[39m -> \u001b[32m\"\"\u001b[39m,\n", " \u001b[32m\"?\"\u001b[39m -> \u001b[32m\" \"\u001b[39m\n", ")\n", "\u001b[36mtext\u001b[39m: \u001b[32mString\u001b[39m = \u001b[32m\"Where are we and I dont know \"\u001b[39m" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "val replacements = Map(\n", " \".\" -> \" \",\n", " \",\" -> \" \",\n", " \"!\" -> \" \",\n", " \"?\" -> \" \",\n", " \"'\" -> \"\",\n", " \"&\" -> \"and\" \n", ")\n", "\n", "// Here we're going over all the keys in the replacement dictionary and acting on them\n", "var text = \"Where are we? & I don't know!\"\n", "for ((key,replacement) <- replacements)\n", " text = text.replace(key, replacement)\n", "println(text)\n", "\n", "// You can also explicitly refer to a particular slot in a list or a key in a dictionary using square brackets:\n", "println(replacements(\"&\"))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "!\n", "['?', '!']\n" ] } ], "source": [ "# Python\n", "# Note that a dictionary can only contain one value for each key\n", "replacements = {\n", " \".\" : \"?\",\n", " \".\" : \"!\"\n", "}\n", "print(replacements[\".\"])\n", "\n", "# Therefore, if you need multiple values, you have to combine dictionaries with lists:\n", "replacements = {\n", " \".\" : [\"?\",\"!\"]\n", "}\n", "\n", "print(replacements[\".\"])\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "!\n", "List(?, !)\n" ] }, { "data": { "text/plain": [ "\u001b[36mreplacements\u001b[39m: \u001b[32mMap\u001b[39m[\u001b[32mString\u001b[39m, \u001b[32mString\u001b[39m] = \u001b[33mMap\u001b[39m(\u001b[32m\".\"\u001b[39m -> \u001b[32m\"!\"\u001b[39m)\n", "\u001b[36mreplacements2\u001b[39m: \u001b[32mMap\u001b[39m[\u001b[32mString\u001b[39m, \u001b[32mSeq\u001b[39m[\u001b[32mString\u001b[39m]] = \u001b[33mMap\u001b[39m(\u001b[32m\".\"\u001b[39m -> \u001b[33mList\u001b[39m(\u001b[32m\"?\"\u001b[39m, \u001b[32m\"!\"\u001b[39m))" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "// Note that a dictionary can only contain one value for each key\n", "val replacements = Map(\n", " \".\" -> \"?\",\n", " \".\" -> \"!\"\n", ")\n", "println(replacements(\".\"))\n", "\n", "// Therefore, if you need multiple values, you have to combine dictionaries with lists:\n", "val replacements2 = Map(\n", " \".\" -> Seq(\"?\",\"!\")\n", ")\n", "\n", "println(replacements2(\".\"))\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "inputHidden": false, "kernel": "python3", "outputHidden": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Batman', 'Philanthropist']\n" ] } ], "source": [ "# Python\n", "# Here's some structured data stored in a combination of arrays and dictionaries:\n", "people = [\n", " { \n", " \"name\": \"Eetu\",\n", " \"age\": 18,\n", " \"jobs\": [ \"Researcher\", \"Lecturer\"]\n", " },\n", " {\n", " \"name\": \"Bruce Wayne\",\n", " \"age\": 65,\n", " \"jobs\": [ \"Batman\", \"Philanthropist\"]\n", " }\n", "]\n", "\n", "for person in people: \n", " if person[\"name\"] == \"Bruce Wayne\": \n", " print(person[\"jobs\"])" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "kernel": "scala" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "List(Batman, Philanthropist)\n" ] }, { "data": { "text/plain": [ "\u001b[36mpeople\u001b[39m: \u001b[32mSeq\u001b[39m[\u001b[32mMap\u001b[39m[\u001b[32mString\u001b[39m, \u001b[32mAny\u001b[39m]] = \u001b[33mList\u001b[39m(\n", " \u001b[33mMap\u001b[39m(\u001b[32m\"name\"\u001b[39m -> \u001b[32m\"Eetu\"\u001b[39m, \u001b[32m\"age\"\u001b[39m -> \u001b[32m18\u001b[39m, \u001b[32m\"jobs\"\u001b[39m -> \u001b[33mList\u001b[39m(\u001b[32m\"Researcher\"\u001b[39m, \u001b[32m\"Lecturer\"\u001b[39m)),\n", " \u001b[33mMap\u001b[39m(\n", " \u001b[32m\"name\"\u001b[39m -> \u001b[32m\"Bruce Wayne\"\u001b[39m,\n", " \u001b[32m\"age\"\u001b[39m -> \u001b[32m65\u001b[39m,\n", " \u001b[32m\"jobs\"\u001b[39m -> \u001b[33mList\u001b[39m(\u001b[32m\"Batman\"\u001b[39m, \u001b[32m\"Philanthropist\"\u001b[39m)\n", " )\n", ")" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Scala\n", "// Here's some structured data stored in a combination of arrays and dictionaries:\n", "val people = Seq(\n", " Map( \n", " \"name\" -> \"Eetu\",\n", " \"age\" -> 18,\n", " \"jobs\" -> Seq(\"Researcher\", \"Lecturer\")\n", " ),\n", " Map(\n", " \"name\" -> \"Bruce Wayne\",\n", " \"age\" -> 65,\n", " \"jobs\" -> Seq(\"Batman\", \"Philanthropist\")\n", " )\n", ")\n", "\n", "for (person <- people)\n", " if (person(\"name\") == \"Bruce Wayne\")\n", " println(person(\"jobs\"))" ] } ], "metadata": { "kernel_info": { "name": "python3" }, "kernelspec": { "display_name": "SoS", "language": "sos", "name": "sos" }, "language_info": { "codemirror_mode": "sos", "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", "pygments_lexer": "sos" }, "nteract": { "version": "0.2.0" }, "sos": { "kernels": [ [ "python3", "python3", 