{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "colab": { "name": "L-Files.ipynb", "provenance": [] } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "uCq1VF5Rfrrq", "colab_type": "text" }, "source": [ "Files and Printing\n", "------------------\n", "\n", "** See also Examples 15, 16, and 17 from Learn Python the Hard Way**\n", "\n", "You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:\n", "\n", "+ `'r'`: open a file for reading\n", "+ `'w'`: open a file for writing. Caution: this will overwrite any previously existing file\n", "+ `'a'`: append. Write to the end of a file. \n", "\n", "When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are: \n", "\n", "+ `file.read()`: read the entire contents of a file into a string\n", "+ `file.write(some_string)`: writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once\n", "+ `file.flush()`: write out any buffered writes\n", "+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.\n", "\n", "Here is an example using files:" ] }, { "cell_type": "markdown", "metadata": { "id": "nnToHSzPfrrr", "colab_type": "text" }, "source": [ "#### Writing a file to disk" ] }, { "cell_type": "code", "metadata": { "id": "XMJU0m8zfrrs", "colab_type": "code", "colab": {} }, "source": [ "# Create the file temp.txt, and get it ready for writing\n", "f = open(\"temp.txt\", \"w\")\n", "f.write(\"This is my first file! The end!\\n\")\n", "f.write(\"Oh wait, I wanted to say something else.\")\n", "f.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "X708y4U3frrx", "colab_type": "code", "colab": {} }, "source": [ "# Let's check that we did everything as expected\n", "!cat temp.txt" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "K7EZ63llfrr2", "colab_type": "code", "colab": {} }, "source": [ "# Create a file numbers.txt and write the numbers from 0 to 24 there\n", "f = open(\"numbers.txt\", \"w\")\n", "for num in range(25):\n", " f.write(str(num) + \"\\n\")\n", "f.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "IVWmiM7hfrr7", "colab_type": "code", "colab": {} }, "source": [ "# Let's check that we did everything as expected\n", "!cat numbers.txt" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "bNR9ei8Kfrr-", "colab_type": "text" }, "source": [ "#### Reading a file from disk" ] }, { "cell_type": "code", "metadata": { "id": "8G7tkoVSfrr_", "colab_type": "code", "colab": {} }, "source": [ "# We now open the file for reading\n", "f = open(\"temp.txt\", \"r\")\n", "# And we read the full content of the file in memory, as a big string\n", "content = f.read()\n", "f.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "Zwu-lAFwfrsC", "colab_type": "code", "colab": {} }, "source": [ "content" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "7EU23URLfrsG", "colab_type": "text" }, "source": [ "Once we read the file, we have the lines in a big string. Let's process that big string a little bit:" ] }, { "cell_type": "code", "metadata": { "id": "XXKxGsPrfrsH", "colab_type": "code", "colab": {} }, "source": [ "# Read the file in the cell above, the content is in f2_content\n", "\n", "# Split the content of the file using the newline character \\n\n", "lines = content.split(\"\\n\")\n", "\n", "# Iterate through the line variable (it is a list of strings)\n", "# and then print the length of each line\n", "for line in lines:\n", " print(line, \" ===> \", len(line))" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "dQCnaOYEfrsL", "colab_type": "code", "colab": {} }, "source": [ "# We now open the file for reading\n", "f = open(\"numbers.txt\", \"r\")\n", "# And we read the full content of the file in memory, as a big string\n", "content = f.read()\n", "f.close()\n", "content" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "pe3Cu2ndfrsP", "colab_type": "text" }, "source": [ "Once we read the file, we have the lines in a big string. Let's process that big string a little bit:" ] }, { "cell_type": "code", "metadata": { "id": "SVyFDWRFfrsQ", "colab_type": "code", "colab": {} }, "source": [ "lines = content.split(\"\\n\") # we get back a list of strings\n", "print(lines)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "duemKWYnfrsU", "colab_type": "code", "colab": {} }, "source": [ "# here we convert the strings into integers, using a list comprehension\n", "# we have the conditional to avoid trying to parse the string '' that\n", "# is at the end of the list\n", "numbers = [int(line) for line in lines if len(line) > 0]\n", "print(numbers)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "lhQ2YC7kfrsY", "colab_type": "code", "colab": {} }, "source": [ "# Let's clean up\n", "!rm temp.txt\n", "!rm numbers.txt" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "0rE8byENfrsb", "colab_type": "text" }, "source": [ "#### Exercise 1\n", "\n", "* Write a function that reads a file and returns its content as a list of strings (one string per line). Read the file with filename `restaurant-names.txt`. (The `curl` command below will download the file from the GitHub repository and store it locally. Please execute the `curl` command before proceeding with attempting to read the file.)" ] }, { "cell_type": "code", "metadata": { "id": "apEkHdjNft1U", "colab_type": "code", "colab": {} }, "source": [ "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "nLGCx9iQfrsc", "colab_type": "code", "colab": {} }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "4ZSBpRn_frsf", "colab_type": "text" }, "source": [ "#### Exercise 2\n", "\n", "* Write a function that reads the n-th column of a CSV file and returns its contents. (Reuse the function that you wrote above.) Then reads the file `baseball.csv` and return the content of the 5th column (`team`). (Again, remember to execute the `curl` command before proceeding.)" ] }, { "cell_type": "code", "metadata": { "id": "4i8BJuAXgaky", "colab_type": "code", "colab": {} }, "source": [ "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/baseball.csv -o baseball.csv" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "WiMgoceYfrsg", "colab_type": "code", "colab": {} }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "RNorYXOSfrsl", "colab_type": "text" }, "source": [ "#### Exercise 3 \n", "\n", "Write code that:\n", "* Reads the file `phonetest.txt`\n", "* Write a function that takes as input a string, and removes any non-digit characters\n", "* Print out the \"clean\" string, without any non-digit characters\n", "\n", "(Again, remember to execute the curl command before proceeding.)" ] }, { "cell_type": "code", "metadata": { "id": "TVapnArufrsl", "colab_type": "code", "colab": {} }, "source": [ "!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/phonetest.txt -o phonetest.txt" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "p8w8f6nWfrsp", "colab_type": "code", "colab": {} }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "mmMP5cExfrss", "colab_type": "code", "colab": {} }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "solution2": "hidden", "solution2_first": true, "id": "6HaVeqb_frsv", "colab_type": "text" }, "source": [ "#### Solution for exercise 3 (with a lot of comments)" ] }, { "cell_type": "code", "metadata": { "solution2": "hidden", "id": "YWUj194xfrsw", "colab_type": "code", "colab": {} }, "source": [ "# this function takes as input a phone (string variable)\n", "# and prints only its digits\n", "def clean(phone):\n", " # We initialize the result variable to be empty.\n", " # We will append to this variable the digit characters\n", " result = \"\"\n", " # This is a set of digits (as **strings**) that will\n", " # allow us to filter the characters\n", " digits = {\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"}\n", " # We iterate over all the characters in the string \"phone\"\n", " # which is a parameter of the function clean\n", " for c in phone:\n", " # We check if the character c is a digit\n", " if c in digits:\n", " # if it is, we append it to the result\n", " result = result + c\n", " # once we are done we return a string variable with the result\n", " return result\n", "\n", "\n", "# This is an alternative, one-line solution that uses a list\n", "# comprehension to create the list of acceptable characters,\n", "# and then uses the join command to concatenate all the\n", "# characters in the list into a string. Notice that we use\n", "# the empty string \"\" as the connector\n", "def clean_oneline(phone):\n", " digits = {\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"}\n", " return \"\".join([c for c in phone if c in digits])\n", "\n", "\n", "# your code here\n", "# We open the file\n", "f = open(\"../data/phonetest.txt\", \"r\")\n", "# We read the content using the f.read() command\n", "content = f.read()\n", "# Close the file\n", "f.close()\n", "# We split the file into lines\n", "lines = content.split(\"\\n\")\n", "# We iterate over the lines, and we clean each one of them\n", "for line in lines:\n", " print(line, \"==>\", clean(line))" ], "execution_count": null, "outputs": [] } ] }