# Files Input/Output
<a href="https://colab.research.google.com/github/rambasnet/FDSPython-Notebooks/blob/master/Ch10-Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- data is usually stored in secondary storage medium such as hard drive, flash drive, cd-rw, etc. using named locations called files
- files can be organized into folders
- programs often need to read data from files and save data back to files for long-term storage
- this chapter demostrates how to read and write plain text files
- use open() built-in function to work with files
```python
fileio = open(fileName, mode='r')
```
- open( ) let's you open file in different mode to read (default), write, append, etc.
- see help(open) for details

In [1]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

## write text data to a file
3-step process

1. open file with a name in write 'w' or 'a' mode
- write data
- close file

In [2]:
# old school - not preferred!!
fw = open('test1.txt', 'w') # w is for write mode
fw.write('words\n=====\n')
fw.write('apple\nball\ncat\ndog\n')
print(fw.write('zebra\n'))
fw.close() #must close the file to actually write data
# to the secondoary storage

6


In [3]:
help(fw)

Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)
 |  
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are return

In [4]:
# newer and better syntax - preferred way!!
alist = [1, 2, 3]
with open('words.txt', 'w') as fout:
    fout.write('apple\nball\ncat\ndog\n')
    fout.write('elephant\n')
    fout.write('zebra\n')
    fout.write(str(1))
    fout.write('\n')
    fout.write(str(alist))
    

# file will be automatically closed when with block is finished executing
# fout.write('test\n') # this will not be written as the file is closed; and throws I/O error

## read text data from a file
1. open file with its name; can provide relative or absolute path
- read in various ways; one line at a time, all lines, bytes, whole file, etc.
- use data
- close file

### various ways to read data
1. read(size=-1) : read at most size characters from stream or EOF (End of File) marker
2. readline() : read until newline or EOF marker
3. readlines() : read and return a list of lines from the input file

In [5]:
# read whole file as list of lines
fr = open('words.txt', 'r') # 'r' or read mode by default; file must exist
data = fr.readlines()
fr.close()

In [6]:
data[0].strip()

'apple'

In [7]:
with open('words.txt', 'r') as fr:
    data= fr.readlines()

In [8]:
help(fr)

Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)
 |  
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are return

In [9]:
data

['apple\n',
 'ball\n',
 'cat\n',
 'dog\n',
 'elephant\n',
 'zebra\n',
 '1\n',
 '[1, 2, 3]']

In [10]:
for el in data:
    print(el.strip())

apple
ball
cat
dog
elephant
zebra
1
[1, 2, 3]


In [11]:
data.sort()
    

In [12]:
data

['1\n',
 '[1, 2, 3]',
 'apple\n',
 'ball\n',
 'cat\n',
 'dog\n',
 'elephant\n',
 'zebra\n']

In [13]:
with open('words1.txt', 'w') as newFile: 
    for word in data:
        newFile.write(word)

## read data line by line
- let's create a file with about 10 integers one per line
- then, read the integer line by line into a list of integers

In [14]:
# create a file with 10 integers
# one integer per line
import random
with open('integers.txt', 'a') as fout:
    for i in range(10):
        num = random.randint(1, 1000)
        fout.write(str(num) + '\n')

In [15]:
# read the integer line by line into a list
intList = []
with open('integers.txt', 'r') as fin:
    while True:
        num = fin.readline()
        num = num.strip() # strip \n
        if not num:
            break
        print('num = ', num, type(num))
        intList.append(int(num))

num =  462 <class 'str'>
num =  298 <class 'str'>
num =  188 <class 'str'>
num =  560 <class 'str'>
num =  431 <class 'str'>
num =  279 <class 'str'>
num =  488 <class 'str'>
num =  173 <class 'str'>
num =  160 <class 'str'>
num =  502 <class 'str'>


In [16]:
print(intList)

[462, 298, 188, 560, 431, 279, 488, 173, 160, 502]


## reading the whole file at once
- read /usr/share/dict/words file on linux/mac
- windows path might be "C:/temp/words.txt" or c:\\temp\\words.txt"
- if the file doesn't exist, use provided words.txt file or create a text file with a bunch of words in it using an editor

In [17]:
# see first 10 lines using head program
! tail /usr/share/dict/words

zymotoxic
zymurgy
Zyrenian
Zyrian
Zyryan
zythem
Zythia
zythum
Zyzomys
Zyzzogeton


In [18]:
! head /usr/share/dict/words

A
a
aa
aal
aalii
aam
Aani
aardvark
aardwolf
Aaron


In [19]:
file = '/usr/share/dict/words' # works on mac/linux
with open(file) as f:
    data = f.read()


In [20]:
data



In [21]:
words = data.split('\n')
print('There are {0} words in the file.'.format(len(words)))

There are 235887 words in the file.


In [22]:
data.find('needle')

831052

In [23]:
data[831052:831052+6]

'needle'

In [24]:
# let's print first 10 words
print(words[:10])

['A', 'a', 'aa', 'aal', 'aalii', 'aam', 'Aani', 'aardvark', 'aardwolf', 'Aaron']


In [25]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self))

In [26]:
words.index('needle')

123097

In [27]:
words[123097]

'needle'

## reading the whole file as list of lines

In [28]:
file = '/usr/share/dict/words'
with open(file) as f:
    lines = f.readlines()

print('There are {0} words in the file.'.format(len(data)))

There are 2493109 words in the file.


In [29]:
lines[:2]

['A\n', 'a\n']

In [30]:
for word in lines[:10]:
    print(word.strip())

A
a
aa
aal
aalii
aam
Aani
aardvark
aardwolf
Aaron


In [31]:
for word in lines[len(lines)-10:]:
    print(word.strip())

zymotoxic
zymurgy
Zyrenian
Zyrian
Zyryan
zythem
Zythia
zythum
Zyzomys
Zyzzogeton


## select a random word from list of words
- import random
- random.choice(wordList)

In [32]:
import random
word = random.choice(lines)
word = word.lower()
print(f'random word = {word}')

random word = multisegmentate



## exercises
1. Write a program that reads a file and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)
2. Write a program that reads a file and prints only those lines that contain the substring snake.
3. Write a program that reads a text file and produces an output file which is a copy of the file, except the first five columns of each line contain a four digit line number, followed by a space. Start numbering the first line in the output file at 1. Ensure that every line number is formatted to the same width in the output file. Use one of your Python programs as test data for this exercise: your output should be a printed and numbered listing of the Python program.
4. Write a program that undoes the numbering of the previous exercise: it should read a file with numbered lines and produce another file without line numbers.