{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading Files\n", "\n", "Python provides built-in functions for reading and writing files. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To read a file, we must know the path of the file on the disk. Python has a module called `os` that has helper functions that helps dealing with the the operating system. Advantage of using the `os` module is that the code you write will work without change on any suppored operating systems." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To open a file, we need to know the path to the file. We will now open and read the file `worldcitites.csv` located in your data package. In your data package the data folder is in the `data/` directory. We can construct the relative path to the file using the `os.path.join()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_pkg_path = 'data'\n", "filename = 'worldcities.csv'\n", "path = os.path.join(data_pkg_path, filename)\n", "print(path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To open the file, use the built-in `open()` function. We specify the *mode* as `r` which means read-only. If we wanted to change the file contents or write a new file, we would open it with `w` mode.\n", "\n", "Our input file also contains Unicode characters, so we specify `UTF-8` as the encoding.\n", "\n", "The open() function returns a file object. We can call the `readline()` method for reading the content of the file, one line at a time.\n", "\n", "It is a good practice to always close the file when you are done with it. To close the file, we must call the `close()` method on the file object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f = open(path, 'r', encoding='utf-8')\n", "print(f.readline())\n", "print(f.readline())\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling `readline()` for each line of the file is tedious. Ideally, we want to loop through all the lines in file. You can iterate through the file object like below.\n", "\n", "We can loop through each line of the file and increase the `count` variable by 1 for each iteration of the loop. At the end, the count variable's value will be equal to the number of lines in the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f = open(path, 'r', encoding='utf-8')\n", "\n", "count = 0\n", "for line in f:\n", " count += 1\n", "f.close()\n", "print(count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise\n", "\n", "Print first 5 lines of the file. \n", "\n", "- Hint: Use break statement" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "data_pkg_path = 'data'\n", "filename = 'worldcities.csv'\n", "path = os.path.join(data_pkg_path, filename)\n", "\n", "# Add code to open the file and read first 5 lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }