{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.2"
    },
    "colab": {
      "name": "L-Files.ipynb",
      "provenance": []
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uCq1VF5Rfrrq",
        "colab_type": "text"
      },
      "source": [
        "Files and Printing\n",
        "------------------\n",
        "\n",
        "** See also Examples 15, 16, and 17 from Learn Python the Hard Way**\n",
        "\n",
        "You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:\n",
        "\n",
        "+ `'r'`: open a file for reading\n",
        "+ `'w'`: open a file for writing. Caution: this will overwrite any previously existing file\n",
        "+ `'a'`: append. Write to the end of a file. \n",
        "\n",
        "When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are: \n",
        "\n",
        "+ `file.read()`: read the entire contents of a file into a string\n",
        "+ `file.write(some_string)`: writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once\n",
        "+ `file.flush()`: write out any buffered writes\n",
        "+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.\n",
        "\n",
        "Here is an example using files:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nnToHSzPfrrr",
        "colab_type": "text"
      },
      "source": [
        "#### Writing a file to disk"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XMJU0m8zfrrs",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create the file temp.txt, and get it ready for writing\n",
        "f = open(\"temp.txt\", \"w\")\n",
        "f.write(\"This is my first file! The end!\\n\")\n",
        "f.write(\"Oh wait, I wanted to say something else.\")\n",
        "f.close()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "X708y4U3frrx",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Let's check that we did everything as expected\n",
        "!cat temp.txt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "K7EZ63llfrr2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a file numbers.txt and write the numbers from 0 to 24 there\n",
        "f = open(\"numbers.txt\", \"w\")\n",
        "for num in range(25):\n",
        "    f.write(str(num) + \"\\n\")\n",
        "f.close()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IVWmiM7hfrr7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Let's check that we did everything as expected\n",
        "!cat numbers.txt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bNR9ei8Kfrr-",
        "colab_type": "text"
      },
      "source": [
        "#### Reading a file from disk"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8G7tkoVSfrr_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# We now open the file for reading\n",
        "f = open(\"temp.txt\", \"r\")\n",
        "# And we read the full content of the file in memory, as a big string\n",
        "content = f.read()\n",
        "f.close()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Zwu-lAFwfrsC",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "content"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7EU23URLfrsG",
        "colab_type": "text"
      },
      "source": [
        "Once we read the file, we have the lines in a big string. Let's process that big string a little bit:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XXKxGsPrfrsH",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Read the file in the cell above, the content is in f2_content\n",
        "\n",
        "# Split the content of the file using the newline character \\n\n",
        "lines = content.split(\"\\n\")\n",
        "\n",
        "# Iterate through the line variable (it is a list of strings)\n",
        "# and then print the length of each line\n",
        "for line in lines:\n",
        "    print(line, \" ===> \", len(line))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dQCnaOYEfrsL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# We now open the file for reading\n",
        "f = open(\"numbers.txt\", \"r\")\n",
        "# And we read the full content of the file in memory, as a big string\n",
        "content = f.read()\n",
        "f.close()\n",
        "content"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pe3Cu2ndfrsP",
        "colab_type": "text"
      },
      "source": [
        "Once we read the file, we have the lines in a big string. Let's process that big string a little bit:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SVyFDWRFfrsQ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "lines = content.split(\"\\n\")  # we get back a list of strings\n",
        "print(lines)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "duemKWYnfrsU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# here we convert the strings into integers, using a list comprehension\n",
        "# we have the conditional to avoid trying to parse the string '' that\n",
        "# is at the end of the list\n",
        "numbers = [int(line) for line in lines if len(line) > 0]\n",
        "print(numbers)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lhQ2YC7kfrsY",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Let's clean up\n",
        "!rm temp.txt\n",
        "!rm numbers.txt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0rE8byENfrsb",
        "colab_type": "text"
      },
      "source": [
        "####  Exercise 1\n",
        "\n",
        "* Write a function that reads a file and returns its content as a list of strings (one string per line). Read the file with filename `restaurant-names.txt`. (The `curl` command below will download the file from the GitHub repository and store it locally. Please execute the `curl` command before proceeding with attempting to read the file.)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "apEkHdjNft1U",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nLGCx9iQfrsc",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4ZSBpRn_frsf",
        "colab_type": "text"
      },
      "source": [
        "####  Exercise 2\n",
        "\n",
        "* Write a function that reads the n-th column of a CSV file and returns its contents. (Reuse the function that you wrote above.) Then reads the file `baseball.csv` and return the content of the 5th column (`team`). (Again, remember to execute the `curl` command before proceeding.)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4i8BJuAXgaky",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/baseball.csv -o baseball.csv"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WiMgoceYfrsg",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RNorYXOSfrsl",
        "colab_type": "text"
      },
      "source": [
        "#### Exercise 3 \n",
        "\n",
        "Write code that:\n",
        "* Reads the file `phonetest.txt`\n",
        "* Write a function that takes as input a string, and removes any non-digit characters\n",
        "* Print out the \"clean\" string, without any non-digit characters\n",
        "\n",
        "(Again, remember to execute the curl command before proceeding.)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TVapnArufrsl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/phonetest.txt -o phonetest.txt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "p8w8f6nWfrsp",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mmMP5cExfrss",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "solution2": "hidden",
        "solution2_first": true,
        "id": "6HaVeqb_frsv",
        "colab_type": "text"
      },
      "source": [
        "#### Solution for exercise 3 (with a lot of comments)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "solution2": "hidden",
        "id": "YWUj194xfrsw",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# this function takes as input a phone (string variable)\n",
        "# and prints only its digits\n",
        "def clean(phone):\n",
        "    # We initialize the result variable to be empty.\n",
        "    # We will append to this variable the digit characters\n",
        "    result = \"\"\n",
        "    # This is a set of digits (as **strings**) that will\n",
        "    # allow us to filter the characters\n",
        "    digits = {\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"}\n",
        "    # We iterate over all the characters in the string \"phone\"\n",
        "    # which is a parameter of the function clean\n",
        "    for c in phone:\n",
        "        # We check if the character c is a digit\n",
        "        if c in digits:\n",
        "            # if it is, we append it to the result\n",
        "            result = result + c\n",
        "    # once we are done we return a string variable with the result\n",
        "    return result\n",
        "\n",
        "\n",
        "# This is an alternative, one-line solution that uses a list\n",
        "# comprehension to create the list of acceptable characters,\n",
        "# and then uses the join command to concatenate all the\n",
        "# characters in the list into a string. Notice that we use\n",
        "# the empty string \"\" as the connector\n",
        "def clean_oneline(phone):\n",
        "    digits = {\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"}\n",
        "    return \"\".join([c for c in phone if c in digits])\n",
        "\n",
        "\n",
        "# your code here\n",
        "# We open the file\n",
        "f = open(\"../data/phonetest.txt\", \"r\")\n",
        "# We read the content using the f.read() command\n",
        "content = f.read()\n",
        "# Close the file\n",
        "f.close()\n",
        "# We split the file into lines\n",
        "lines = content.split(\"\\n\")\n",
        "# We iterate over the lines, and we clean each one of them\n",
        "for line in lines:\n",
        "    print(line, \"==>\", clean(line))"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}