Files and Printing
------------------

** See also Examples 15, 16, and 17 from Learn Python the Hard Way**

You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:

+ `'r'`: open a file for reading
+ `'w'`: open a file for writing. Caution: this will overwrite any previously existing file
+ `'a'`: append. Write to the end of a file. 

When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are: 

+ `file.read()`: read the entire contents of a file into a string
+ `file.write(some_string)`: writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once
+ `file.flush()`: write out any buffered writes
+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.

Here is an example using files:

#### Writing a file to disk

In [None]:
# Create the file temp.txt, and get it ready for writing
f = open("temp.txt", "w")
f.write("This is my first file! The end!\n")
f.write("Oh wait, I wanted to say something else.")
f.close()

In [None]:
# Let's check that we did everything as expected
!cat temp.txt

In [None]:
# Create a file numbers.txt and write the numbers from 0 to 24 there
f = open("numbers.txt", "w")
for num in range(25):
 f.write(str(num) + "\n")
f.close()

In [None]:
# Let's check that we did everything as expected
!cat numbers.txt

#### Reading a file from disk

In [None]:
# We now open the file for reading
f = open("temp.txt", "r")
# And we read the full content of the file in memory, as a big string
content = f.read()
f.close()

In [None]:
content

Once we read the file, we have the lines in a big string. Let's process that big string a little bit:

In [None]:
# Read the file in the cell above, the content is in f2_content

# Split the content of the file using the newline character \n
lines = content.split("\n")

# Iterate through the line variable (it is a list of strings)
# and then print the length of each line
for line in lines:
 print(line, " ===> ", len(line))

In [None]:
# We now open the file for reading
f = open("numbers.txt", "r")
# And we read the full content of the file in memory, as a big string
content = f.read()
f.close()
content

Once we read the file, we have the lines in a big string. Let's process that big string a little bit:

In [None]:
lines = content.split("\n") # we get back a list of strings
print(lines)

In [None]:
# here we convert the strings into integers, using a list comprehension
# we have the conditional to avoid trying to parse the string '' that
# is at the end of the list
numbers = [int(line) for line in lines if len(line) > 0]
print(numbers)

In [None]:
# Let's clean up
!rm temp.txt
!rm numbers.txt

#### Exercise 1

* Write a function that reads a file and returns its content as a list of strings (one string per line). Read the file with filename `restaurant-names.txt`. (The `curl` command below will download the file from the GitHub repository and store it locally. Please execute the `curl` command before proceeding with attempting to read the file.)

In [None]:
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt

#### Exercise 2

* Write a function that reads the n-th column of a CSV file and returns its contents. (Reuse the function that you wrote above.) Then reads the file `baseball.csv` and return the content of the 5th column (`team`). (Again, remember to execute the `curl` command before proceeding.)

In [None]:
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/baseball.csv -o baseball.csv

#### Exercise 3 

Write code that:
* Reads the file `phonetest.txt`
* Write a function that takes as input a string, and removes any non-digit characters
* Print out the "clean" string, without any non-digit characters

(Again, remember to execute the curl command before proceeding.)

In [None]:
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/phonetest.txt -o phonetest.txt

#### Solution for exercise 3 (with a lot of comments)

In [None]:
# this function takes as input a phone (string variable)
# and prints only its digits
def clean(phone):
 # We initialize the result variable to be empty.
 # We will append to this variable the digit characters
 result = ""
 # This is a set of digits (as **strings**) that will
 # allow us to filter the characters
 digits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
 # We iterate over all the characters in the string "phone"
 # which is a parameter of the function clean
 for c in phone:
 # We check if the character c is a digit
 if c in digits:
 # if it is, we append it to the result
 result = result + c
 # once we are done we return a string variable with the result
 return result


# This is an alternative, one-line solution that uses a list
# comprehension to create the list of acceptable characters,
# and then uses the join command to concatenate all the
# characters in the list into a string. Notice that we use
# the empty string "" as the connector
def clean_oneline(phone):
 digits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
 return "".join([c for c in phone if c in digits])


# your code here
# We open the file
f = open("../data/phonetest.txt", "r")
# We read the content using the f.read() command
content = f.read()
# Close the file
f.close()
# We split the file into lines
lines = content.split("\n")
# We iterate over the lines, and we clean each one of them
for line in lines:
 print(line, "==>", clean(line))