{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 2.16

" ] }, { "cell_type": "markdown", "id": "b7cbc000", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "## _file_handling.ipynb_" ] }, { "cell_type": "markdown", "id": "9727124d", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "\n", "1. What are Files, and Why we need them?\n", "2. Opening a text file in Python\n", "3. Reading contents of a file\n", "4. Writing and appending data in a file\n", "5. Closing a file\n", "6. Change File Offset using `fd.seek()` method\n", "7. Performing some Operations on File Contents\n", "8. Reading Attributes of a File using `os.stat()` method\n", "9. Identifying Type of File\n", "10.Bonus: Handling Image Files in Python" ] }, { "cell_type": "markdown", "id": "263d8d43", "metadata": {}, "source": [ "## 1. What are Files, and Why we need them?\n", "- An important component of an operating system is its files and directories. A file is a location on disk that stores related information and has a name. We use files to organize our data in different directories on a hard-disk.\n", "- The RAM (Random Access Memory) is volatile; it holds data only as long as it is up. So, we use files to store data permanently.\n", "\n", "### File Types\n", "- Python supports two types of files (Binary files and Text files):\n", " - **Binary Files:** All binary files follow a specific format. We can open some binary files in the normal text editor, but we cannot read the content present in the file. This is because the binary files are encoded in a specific format. So for handling binary files we need applications that understand the specific format of that binary file. Most of the files that we see in our computer systems are binary files. Some examples are:\n", " - **Document Files:** .pdf, .doc, .xls, etc\n", " - **Image Files:** .png, .jpg, .gif, .bmp, etc\n", " - **Audio Files:** .mp3, .wav, .mka, .aac, etc\n", " - **Video Files:** .mp4, .3gp, .mkv, .avi, etc\n", " - **Database Files:** .mdb, .accde, .frm, .sqlite, etc\n", " - **Archive Files:** .zip, .gzip, .rar, .tar, .iso, .7z, etc\n", " - **Executable Files:** .exe, .dll, .elf, .class, etc\n", " - **Text Files:** Text files do not have any specific encoding and can be opened in a normal text editor. Text files are mostly structured as a sequence of lines, where each line includes a sequence of characters. Every line in a file terminates with a special character known as EOL or end of line. Some examples are:\n", " - **Document Files:** .txt, .tex, .rtf, etc\n", " - **Source Codes:** .c, .cpp, .js, .py, .java, etc\n", " - **Web Standards:** .html, .xml, .css, .json, etc\n", " - **Tabular Data:** .csv, .tsv, etc\n", " - **Configuration Files:** .ini, .cfg, .reg, etc\n", "\n", " \n", "### File Handling in Python\n", "For file handling there are four important operations that can be handled by Python on files:\n", " - open\n", " - read\n", " - write\n", " - close\n", " \n", "There are other file operations as well e.g., deleting a file, renaming a file, appending data to file, copying a file, changing properties of file. However, CRUD operations are basic file handling operations." ] }, { "cell_type": "markdown", "id": "799564e5", "metadata": {}, "source": [ "## 2. Opening a File\n", "Files in Python can be opened with a built-in `open()` function. Ideally, it takes two string arguments:\n", "- The `filepath` including the file name and the extension we want to open (passed as a string)\n", "- The `mode` in which we want to open the file (passed as a string), default value is `'rt'`\n", "***\n", "**`open(file, mode='rt')`**\n", "***\n", "\n", "Where,\n", "- `file` is the only required argument, including the file name and the extension we want to open (passed as a string)\n", "- `mode` is for the file access modes (passed as a string)\n", "- For other arguments read documentation `help(open)`\n", "\n", "**FILE ACCESS MODES**\n", "\n", " * `Read Only (‘r’)`: It opens the text file for reading. If the file does not exist, raises I/O error.\n", " * `Read and Write (‘r+’)`: It opens the file for reading and writing. Raises I/O error if the file does not exists.\n", " * `Write Only (‘w’)`: It opens the file for writing only. For existing file, data is truncated. Creates the file if the file does not exists.\n", " * `Write and Read (‘w+’)`: It opens the file for reading and writing. For existing file, data is truncated. Creates the file if the file does not exists.\n", " * `Append Only (‘a’)`: It opens the file for writing, appending to the end off the file if it exists. The file is created if it does not exist.\n", " * `Append and Read (‘a+’)`: It opens the file for reading and writing. The file is created if it does not exist. The data being written will be inserted at the end, after the existing data.\n", " * `Exclusive creation (‘x’)`: It Opens a file for exclusive creation. If the file already exists, the operation fails.\n", " \n", "Along with above file access modes, you can also specify how file should be handled as text or binary\n", " * `Text file (‘t’)`: Opens a file in text mode\n", " * `Binary file (‘b’)`: Opens a file in binary mode\n" ] }, { "cell_type": "code", "execution_count": null, "id": "056aceec", "metadata": {}, "outputs": [], "source": [ "# Example 1: Open a text named file f1.txt, present in the current working directory in read write mode\n", "# On Mac OS the absolute path may look like: /Users/arif/Documents/.../f1.txt\n", "# In Microsoft OSs, the absolute path may look like: \"C:\\\\Users\\\\Kakamanna\\\\f1.txt\"\n", "\n", "fd = open(\"f1.txt\", \"rt+\")\n", "fd" ] }, { "cell_type": "code", "execution_count": null, "id": "978ed48d", "metadata": {}, "outputs": [], "source": [ "# Example 2: Open a binary file named image.png, present in the current working directory in read write mode\n", "\n", "fd = open(\"image.png\", \"rb+\")\n", "fd" ] }, { "cell_type": "markdown", "id": "3a2e45f2", "metadata": {}, "source": [ "## 3. Reading Contents of a File\n", "- In Python once you have a file opened, there are three ways to read contents of that file:\n", "```\n", "fd.read(size=-1)\n", "fd.readline(size=-1)\n", "fd.readlines(sizehint=-1)\n", "```" ] }, { "cell_type": "markdown", "id": "70101033", "metadata": {}, "source": [ "### a. Using `fd.read(size=-1)` method\n", "- The `fd.read()` method reads and returns `size` characters from the file (if size is positive)\n", "- If `size` is negative or omitted, read until EOF" ] }, { "cell_type": "code", "execution_count": null, "id": "3023dbbc", "metadata": {}, "outputs": [], "source": [ "# Example: \n", "fd = open(\"f1.txt\",\"r\") \n", "fd = open(\"f1.txt\")\n", "fd = open(\"f1.txt\", \"rt\") # are all equivalent\n", "\n", "rv = fd.read(5)\n", "print(rv, type(rv))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6af297ed", "metadata": {}, "outputs": [], "source": [ "# Example: Reading the complete file till EOF character\n", "fd=open(\"f1.txt\")\n", "fd.read()" ] }, { "cell_type": "code", "execution_count": null, "id": "701d3dce", "metadata": {}, "outputs": [], "source": [ "# Example: Try to read again, and notice what will happen?\n", "fd.read()" ] }, { "cell_type": "markdown", "id": "99841da5", "metadata": {}, "source": [ ">- As you can see we got an empty string. \n", ">- The reason is `fd.read()` always reads from the current file offset, which in this situation has reached the end of file, therefore it returns an empty string" ] }, { "cell_type": "code", "execution_count": null, "id": "de8e5e43", "metadata": {}, "outputs": [], "source": [ "# close the file\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "a262d64b", "metadata": {}, "outputs": [], "source": [ "# Let us open and read the file again\n", "fd = open(\"f1.txt\",\"r\")\n", "print(fd.read())\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "99d9b088", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "91121e45", "metadata": {}, "source": [ "### b. Using `fd.readline(size=-1)` method\n", "- The `fd.readline()` method reads and returns one line at a time\n", "- If `size` is passed then reads and returns size characters of a line" ] }, { "cell_type": "code", "execution_count": null, "id": "d7c530f4", "metadata": {}, "outputs": [], "source": [ "# Example: If you want to read a file line by line use readline() method\n", "fd = open(\"f1.txt\",\"r\")\n", "print(fd.readline())\n", "print(fd.readline())\n", "print(fd.readline())\n", "print(fd.readline())\n", "print(fd.readline())\n", "\n", "# close the file\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "45eec5d1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e0122d39", "metadata": {}, "outputs": [], "source": [ "# Example: If you want to read a file line by line use readline() method\n", "fd = open(\"f1.txt\",\"r\")\n", "#print(fd.readline())\n", "print(fd.readline(5))\n", "\n", "# close the file\n", "fd.close()" ] }, { "cell_type": "markdown", "id": "04a4b5ba", "metadata": {}, "source": [ ">So `fd.read()` method focuses on reading a file character by character, while the `fd.readline()` method focuses on reading a file line by line." ] }, { "cell_type": "markdown", "id": "5a9addca", "metadata": {}, "source": [ "### c. Using `fd.readlines(sizehint=-1)` method\n", "- The `fd.readlines()` method reads until end of file and returns a list object containing the lines. " ] }, { "cell_type": "code", "execution_count": null, "id": "5aa498e6", "metadata": {}, "outputs": [], "source": [ "fd = open(\"f1.txt\",\"r\")\n", "mylist = fd.readlines()\n", "print(mylist)\n", "print(type(mylist))\n", "\n", "# close the file\n", "fd.close()" ] }, { "cell_type": "markdown", "id": "721accde", "metadata": {}, "source": [ "- If the optional `sizehint` passed, instead of reading up to EOF, whole lines totalling approximately `sizehint` bytes are read. (possibly after rounding up to an internal buffer size) " ] }, { "cell_type": "code", "execution_count": null, "id": "0be99eb7", "metadata": {}, "outputs": [], "source": [ "fd = open(\"f1.txt\",\"r\")\n", "rd = fd.readlines(17)\n", "\n", "print(rd)\n", "\n", "# close the file\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "976bbe5d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e5435465", "metadata": {}, "source": [ "## 4. Writing in a File" ] }, { "cell_type": "markdown", "id": "41789820", "metadata": {}, "source": [ "### a. The `fd.write()` method\n", "- To write a file, one must open the file in write or append mode.\n", "- The `fd.write(text)` method is used to write a string to file.\n", "- If the file is opened in write mode the existing file data is overwritten\n", "- If the file is opened in append mode the new data is written at the end of the file\n", "- If the file doesnot exist, a new file with that name is created\n", "- The `fd.write(text)` returns the number of characters written (which is always equal to the length of the string) It overwrites the existing data. If the file doesn't exist, it will create the file. \n", "\n", "\n", " * `Append Only (‘a’)`: It opens the file for writing, appending to the end off the file if it exists. The file is created if it does not exist.\n", " * `Append and Read (‘a+’)`: It opens the file for reading and writing. The file is created if it does not exist. The data being written will be inserted at the end, after the existing data.\n", " " ] }, { "cell_type": "markdown", "id": "a4e41ce4", "metadata": {}, "source": [ "**Example 1:** Let us create a new file in the present working directory. Let us use `mode='w'`(`Write Only`). It will create the file as it do not exist. If the file with the name exist its data is truncated." ] }, { "cell_type": "code", "execution_count": null, "id": "eb78e860", "metadata": {}, "outputs": [], "source": [ "fd1 = open('out.txt','w')\n", "rv = fd1.write('Python is Awesome!')\n", "print(\"Number of bytes written in the file: \", rv)\n", "fd1.close()\n", "\n", "\n", "# Let us open the file in read mode and read its contents\n", "fd1 = open('out.txt')\n", "print(fd1.read())\n", "fd1.close()\n" ] }, { "cell_type": "markdown", "id": "39cd31b2", "metadata": {}, "source": [ "**Example 2:** Let us again open `out.txt` file in the present working directory in `mode=w+` (Write and Read). Since the file already exist, the file is opened and its data is truncated. If the file with the neame does not exist, a new file is created." ] }, { "cell_type": "code", "execution_count": null, "id": "b961d227", "metadata": {}, "outputs": [], "source": [ "fd1 = open('out.txt','w+') # Due to w+ all the data is truncated\n", "\n", "print(\"Existing data in the file is: \", fd1.read())\n", "\n", "# Since the file is opened in read-write mode so we can write in the file\n", "fd1.write('Learning Python is Fun!')\n", "\n", "# Since the file is opened in read-write mode so we can read from the file\n", "print(\"Data in out.txt after writing: \", fd1.read())\n", "\n", "#This is because the file offset is at the end of the file\n", "fd1.seek(0)\n", "print(\"Data read again after seek: \", fd1.read())\n", "\n", "fd1.close()" ] }, { "cell_type": "markdown", "id": "cfa382d7", "metadata": {}, "source": [ "**Example 3:** Let us now append some text in the file `out.txt` by opening it in append mode `mode=a`. It opens the file for writing, appending to the end of the file if it exists. The file is created if it does not exist." ] }, { "cell_type": "code", "execution_count": null, "id": "4b91b807", "metadata": {}, "outputs": [], "source": [ "# creating a list\n", "fruits = [\"\\nApple\",\"\\nBanana\",\"\\nOranges\"]\n", "\n", "# open a file in append mode\n", "fd =open(\"out.txt\",mode=\"a\")\n", "\n", "#Copying the list content in file\n", "for fruit in fruits:\n", " fd.write(fruit)\n", "\n", "fd.close()\n", "\n", "\n", "# open a file in read mode again\n", "fd =open(\"out.txt\")\n", "\n", "#reading the data from file\n", "for line in fd:\n", " print(line)\n", " \n", "# close the file\n", "fd.close()" ] }, { "cell_type": "markdown", "id": "5e7315be", "metadata": {}, "source": [ "## 5. Closing a File" ] }, { "cell_type": "markdown", "id": "25f02832", "metadata": {}, "source": [ "### a. The `fd.close()` method\n", "Closing a file will free up the resources that were tied with the file. Python has a garbage collector to clean up unreferenced objects but we must not rely on it to close the file." ] }, { "cell_type": "code", "execution_count": null, "id": "93262e79", "metadata": {}, "outputs": [], "source": [ "# open a file\n", "f = open(\"f1.txt\", \"r\")\n", "\n", "# perform some file operations\n", "\n", "#close the file\n", "f.close()" ] }, { "cell_type": "markdown", "id": "f7352ee2", "metadata": {}, "source": [ "### b. Use `fd.close()` in `try...finally` Block\n", "- Often one forgets to close an open file. This may produce errors and may become harmful when you are working on large files.\n", "- Moreover, if an exception occurs when we are performing some operation with the file, the program exits without closing the file.\n", "\n", "> In such scenarios, `try-except-finally` blocks come to the rescue. We can keep the `fd.close()` method in the finally block, so that even if the program execution stops due to an exception, the file will get closed anyway." ] }, { "cell_type": "code", "execution_count": null, "id": "2fc7d4c3", "metadata": {}, "outputs": [], "source": [ "# Put the entire code in try block\n", "try:\n", " fd = open(\"f1.txt\", \"r\")\n", " # perform file operations\n", " \n", "finally:\n", " fd.close()" ] }, { "cell_type": "markdown", "id": "80bd17db", "metadata": {}, "source": [ "### c. Use of `with` Keyword while opening a File\n", "- The best way to open a file in Python script is by using the `with` keyword. \n", "- This guarantees that the file will automatically be closed when the block inside the `with` statement exits.\n", "- Even if an exception occurs before the end of the block, it will close the file before the exception is caught by an outer exception handler." ] }, { "cell_type": "code", "execution_count": null, "id": "dbb86693", "metadata": {}, "outputs": [], "source": [ "# open the file in read mode using with statement\n", "with open(\"f1.txt\", \"r\") as fd:\n", " \n", " # perform file operations\n", " print(fd.read())\n", " " ] }, { "cell_type": "markdown", "id": "066b787d", "metadata": {}, "source": [ "**Let us confirm if the file opened in the above code cell is closed or not**" ] }, { "cell_type": "code", "execution_count": null, "id": "4a5a9aad", "metadata": {}, "outputs": [], "source": [ "fd.read()" ] }, { "cell_type": "markdown", "id": "f1040bab", "metadata": {}, "source": [ "## 6. Change File Offset using `fd.seek()` method\n", " \n", "\n", "The `fd.seek()` method is used to change the position of the File Handle or current file offset to a given specific position, from where the data has to be read or written in the file. The method returns the new absolute position.\n", "```\n", "seek(offset, whence)\n", "```\n", "Where,\n", " - `offset` means the number of positions to move forward/backward. It is interpreted relative to the position indicated by whence\n", " - `whence` can take following values: \n", " - 0: start of stream (the default); offset should be zero or positive \n", " - 1: current stream position; offset may be negative\n", " - 2: end of stream; offset is usually negative\n", " \n", "**Note:** \n", "- Reference point at current position / end of file cannot be set in text mode except when offset is equal to 0.\n", "- Seek() function with negative offset only works when file is opened in Binary mode." ] }, { "cell_type": "code", "execution_count": null, "id": "e77fd658", "metadata": {}, "outputs": [], "source": [ "# Example:\n", "fd = open(\"f1.txt\",\"r\")\n", "\n", "# check the position of file offset\n", "rv = fd.seek(0, 1) # fd.seek(0, 1) is equivalent to fd.tell()\n", "print(\"Cursor is pointing at the location: \", rv)\n", "\n", "\n", "# Let us read five characters and check the position of file offset\n", "fd.read(5)\n", "rv = fd.seek(0, 1)\n", "print(\"Cursor is pointing at the location: \", rv)\n", "\n", "\n", "# Let us read remaining portion of file and check the position of file offset\n", "fd.read()\n", "rv = fd.seek(0, 1)\n", "print(\"Cursor is pointing at the location: \", rv)\n", "\n", "\n", "fd.close()\n", "\n", "# fd.seek(0, 1) is equivalent to fd.tell()" ] }, { "cell_type": "code", "execution_count": null, "id": "f8055f38", "metadata": {}, "outputs": [], "source": [ "# Example: Let us do some more practice with the seek() function\n", "# open a file in append mode\n", "fd = open(\"f1.txt\",\"a\")\n", "print(\"Cursor is pointing at the location: \", fd.seek(0, 1))\n", "\n", "# set the cursor to beginning\n", "cur = fd.seek(0, 0) # equivalent to fd.seek(0, 0)\n", "print(\"Cursor is pointing at the location: \", cur)\n", "\n", "# set the cursor to 100 position from beginning\n", "cur = fd.seek(100) # equivalent to fd.seek(100, 0)\n", "print(\"Cursor is pointing at the location: \", cur)\n", "\n", "\n", "# let us move the cursor 50 bytes back from current position\n", "cur = fd.seek(50, 0) \n", "print(\"Cursor is pointing at the location: \", cur)\n", "\n", "#close the file\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "42ef8418", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e09f3def", "metadata": {}, "source": [ "## 7. Performing some Operations on File Contents" ] }, { "cell_type": "markdown", "id": "cd573f06", "metadata": {}, "source": [ "### a. Iterating Contents of a File (Line by Line)" ] }, { "cell_type": "markdown", "id": "9944983b", "metadata": {}, "source": [ "**Using a `while` loop**" ] }, { "cell_type": "code", "execution_count": null, "id": "e8b4ace5", "metadata": {}, "outputs": [], "source": [ "fd = open(\"hello.txt\",\"r\")\n", "\n", "while True:\n", " line = fd.readline()\n", " if not line:\n", " break\n", " print(line)\n", "\n", " \n", "fd.close()" ] }, { "cell_type": "markdown", "id": "5c9df4e2", "metadata": {}, "source": [ "**Using a `for` loop**" ] }, { "cell_type": "code", "execution_count": null, "id": "15dbedc2", "metadata": {}, "outputs": [], "source": [ "fd = open(\"hello.txt\",\"r\")\n", "\n", "for line in fd: #File handle can be used as iterator in the for loop\n", " print(line)\n", "\n", " \n", "fd.close()" ] }, { "cell_type": "markdown", "id": "6e11a5d7", "metadata": {}, "source": [ "**A better way to iterate a file line by line using a `for` loop**" ] }, { "cell_type": "code", "execution_count": null, "id": "bf7f4dbd", "metadata": {}, "outputs": [], "source": [ "for line in open(\"hello.txt\", 'r'):\n", " print(line)\n", "\n", "#How can we close the file handle, as we dont have one" ] }, { "cell_type": "markdown", "id": "a3d00522", "metadata": {}, "source": [ "**The best way to iterate a file line by line using a `for` loop**" ] }, { "cell_type": "code", "execution_count": null, "id": "bcf23e3e", "metadata": {}, "outputs": [], "source": [ "with open(\"hello.txt\", 'r') as fd:\n", " for line in fd:\n", " print(line)" ] }, { "cell_type": "markdown", "id": "fca239ed", "metadata": {}, "source": [ "### b. Count the words in the file using `str.split()` method" ] }, { "cell_type": "code", "execution_count": null, "id": "17270b6d", "metadata": {}, "outputs": [], "source": [ "# Example:\n", "\n", "totalwords = 0\n", "with open(\"f1.txt\", \"r\") as fd:\n", " for line in fd:\n", " listoftokens = line.split(' ')\n", " print(line, len(listoftokens))\n", " totalwords = totalwords + len(listoftokens)\n", "\n", "print(\"\\nTotal words in this file are: \", totalwords)" ] }, { "cell_type": "markdown", "id": "fd9b1f81", "metadata": {}, "source": [ "## 8. Reading Attributes of a File using `os.stat()` method\n", "- The `os.stat(path)` method is used to get attributes of a file like\n", " - size of file\n", " - file type\n", " - owner of file\n", " - file time stamps, ...." ] }, { "cell_type": "code", "execution_count": null, "id": "fa985552", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "status = os.stat(\"out.txt\")\n", "\n", "status\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ae20ffa8", "metadata": {}, "outputs": [], "source": [ "# You can extract the information individually\n", "import os\n", "\n", "status = os.stat(\"out.txt\")\n", "\n", "print(\"File size: \", status.st_size)\n", "\n", "# extract the file type and file mode bits (permissions)\n", "print(\"File type and permissions: \", status.st_mode)\n", "\n", "# user identifier of the file owner\n", "print(\"File Owner: \", status.st_uid)\n", "\n", "# recent access time in seconds\n", "print(\"Last access time: \", status.st_atime)\n", "\n", "# recent modification time in seconds\n", "print(\"Last modification time: \", status.st_mtime)\n", "\n", "# recent metadata change on Unix and creation time on Windows\n", "print(\"Last status change time: \", status.st_ctime)" ] }, { "cell_type": "markdown", "id": "ff4037c7", "metadata": {}, "source": [ "## 9. Identifying Type of File" ] }, { "cell_type": "code", "execution_count": null, "id": "5b8ae957", "metadata": {}, "outputs": [], "source": [ "import os\n", "!ls\n", "\n", "name = input(\"Enter name of the file/directory: \")\n", "\n", "if os.path.isfile(name):\n", " print(\"It is a file\")\n", "elif os.path.isdir(name):\n", " print(\"It is a directory\")\n", "\n", "else:\n", " print(\"Unknown file type or file do not exist\")" ] }, { "cell_type": "markdown", "id": "24cc2649", "metadata": {}, "source": [ "# Bonus: Reading CSV Files in Python" ] }, { "cell_type": "code", "execution_count": null, "id": "ea2342e3", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import csv\n", "with open('file1.csv', 'r') as fd:\n", " obj = csv.reader(fd)\n", "obj" ] }, { "cell_type": "code", "execution_count": null, "id": "848f21b0", "metadata": {}, "outputs": [], "source": [ "import csv\n", "with open('file1.csv', 'r') as fd:\n", " obj = csv.reader(fd)\n", " for line in obj:\n", " print(line)" ] }, { "cell_type": "code", "execution_count": null, "id": "8d976f16", "metadata": {}, "outputs": [], "source": [ "import csv\n", "with open('file1.csv', 'r') as fd:\n", " obj = csv.reader(fd) \n", " line = next(obj) \n", " print(line)\n", " line = next(obj)\n", " print(line)" ] }, { "cell_type": "markdown", "id": "77d12efd", "metadata": {}, "source": [ "\n", "# Bonus: Handling Image Files in Python\n", "- Let us now try to open and read binary files in Python" ] }, { "cell_type": "code", "execution_count": null, "id": "3aed1e91", "metadata": {}, "outputs": [], "source": [ "fd = open(\"speech.jpg\", \"rb\")" ] }, { "cell_type": "code", "execution_count": null, "id": "c93bc732", "metadata": {}, "outputs": [], "source": [ "print(fd.read())" ] }, { "cell_type": "code", "execution_count": null, "id": "49acecb0", "metadata": {}, "outputs": [], "source": [ "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "df81393e", "metadata": {}, "outputs": [], "source": [ "from PIL import Image" ] }, { "cell_type": "code", "execution_count": null, "id": "710fd965", "metadata": {}, "outputs": [], "source": [ "img = Image.open(\"speech.jpg\")" ] }, { "cell_type": "code", "execution_count": null, "id": "d1e74196", "metadata": {}, "outputs": [], "source": [ "img.format" ] }, { "cell_type": "code", "execution_count": null, "id": "412777d9", "metadata": {}, "outputs": [], "source": [ "img.size" ] }, { "cell_type": "code", "execution_count": null, "id": "de383556", "metadata": {}, "outputs": [], "source": [ "img.mode" ] }, { "cell_type": "code", "execution_count": null, "id": "e42518f8", "metadata": {}, "outputs": [], "source": [ "img" ] }, { "cell_type": "code", "execution_count": null, "id": "8a36839c", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "310ac946", "metadata": {}, "outputs": [], "source": [ "img_array = np.array(img)" ] }, { "cell_type": "code", "execution_count": null, "id": "f3b2b620", "metadata": {}, "outputs": [], "source": [ "img_array.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "635b9756", "metadata": {}, "outputs": [], "source": [ "img_array" ] }, { "cell_type": "code", "execution_count": null, "id": "9568d05f", "metadata": {}, "outputs": [], "source": [ "img_copy = img.copy()" ] }, { "cell_type": "code", "execution_count": null, "id": "630a1954", "metadata": {}, "outputs": [], "source": [ "img_copy" ] }, { "cell_type": "code", "execution_count": null, "id": "615e5f7a", "metadata": {}, "outputs": [], "source": [ "img_cropped = img.crop((250,0,500,300))" ] }, { "cell_type": "code", "execution_count": null, "id": "2833e1a3", "metadata": {}, "outputs": [], "source": [ "img_cropped" ] }, { "cell_type": "code", "execution_count": null, "id": "07ae4857", "metadata": {}, "outputs": [], "source": [ "img_cropped.save('smallimg.jpg')" ] }, { "cell_type": "code", "execution_count": null, "id": "e46a0ec7", "metadata": {}, "outputs": [], "source": [ "out = img_cropped.transpose(Image.ROTATE_90)" ] }, { "cell_type": "code", "execution_count": null, "id": "8f6c44a2", "metadata": {}, "outputs": [], "source": [ "out" ] }, { "cell_type": "code", "execution_count": null, "id": "ede7d6b8", "metadata": {}, "outputs": [], "source": [ "out.save('corpped_speech.jpg')" ] }, { "cell_type": "code", "execution_count": null, "id": "da6c89f0", "metadata": {}, "outputs": [], "source": [ "img2 = Image.open(\"corpped_speech.jpg\")" ] }, { "cell_type": "code", "execution_count": null, "id": "ed719fcf", "metadata": {}, "outputs": [], "source": [ "img2" ] }, { "cell_type": "code", "execution_count": null, "id": "7caeb92c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "98e1e11e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3a463dde", "metadata": {}, "outputs": [], "source": [ "https://www.youtube.com/watch?v=K1xO8weArNA&t=5m56s" ] }, { "cell_type": "code", "execution_count": null, "id": "3838d643", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "28463c88", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "72be84de", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "719550ac", "metadata": {}, "source": [ "# Bonus: Accessing Files from GitHub Gists" ] }, { "cell_type": "markdown", "id": "c040707d", "metadata": {}, "source": [ "> While accessing Internet, if you get `SSL: CERTIFICATE_VERIFY_FAILED` error, set, the `_create_default_https_context` attribute of `ssl` to `_create_unverified_context`" ] }, { "cell_type": "code", "execution_count": null, "id": "29a5f041", "metadata": {}, "outputs": [], "source": [ "import ssl\n", "ssl._create_default_https_context = ssl._create_unverified_context" ] }, { "cell_type": "markdown", "id": "38075fc1", "metadata": {}, "source": [ "> Get the raw data url of the file ('hellogist.txt') to be downloaded from your github gist. Pass that url to `urlretrieve()` function to download file to disk\n", "```\n", "urllib.request.urlretrieve(url, filename=None)\n", "``` " ] }, { "cell_type": "markdown", "id": "2d1a335e", "metadata": {}, "source": [ "**Example1: Download, Open and Read `hellogist.txt` from course Gist**" ] }, { "cell_type": "code", "execution_count": null, "id": "87491b77", "metadata": {}, "outputs": [], "source": [ "import urllib\n", "\n", "myurl = 'https://gist.githubusercontent.com/arifpucit/6e2d95002460db296506ec6f0cfb7008/raw/9dcad33321c01194dffe3586fc80c5a966a9494f/hellogist.txt'\n", "\n", "\n", "urllib.request.urlretrieve(myurl, 'hellogist.txt')\n" ] }, { "cell_type": "markdown", "id": "fd4d422c", "metadata": {}, "source": [ "Let us now open and read the file" ] }, { "cell_type": "code", "execution_count": null, "id": "2caf23ab", "metadata": {}, "outputs": [], "source": [ "fd = open(\"hellogist.txt\",\"r\")\n", "mylist = fd.readlines()\n", "print(mylist)\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "9e26366a", "metadata": {}, "outputs": [], "source": [ "with open(\"hellogist.txt\", 'r') as fd:\n", " for line in fd:\n", " print(line)" ] }, { "cell_type": "markdown", "id": "9d17d7d4", "metadata": {}, "source": [ "**Example2: Download, Open and Read `family.csv` from course Gist**" ] }, { "cell_type": "code", "execution_count": null, "id": "104771c4", "metadata": {}, "outputs": [], "source": [ "import urllib\n", "\n", "myurl = 'https://gist.githubusercontent.com/arifpucit/6e2d95002460db296506ec6f0cfb7008/raw/9dcad33321c01194dffe3586fc80c5a966a9494f/family.csv'\n", "\n", "\n", "urllib.request.urlretrieve(myurl, 'family.csv')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "82eeed80", "metadata": {}, "outputs": [], "source": [ "fd = open(\"family.csv\",\"r\")\n", "mylist = fd.readlines()\n", "print(mylist)\n", "fd.close()" ] }, { "cell_type": "markdown", "id": "30562203", "metadata": {}, "source": [ "**Example3: Download, Open and Read `myself.txt` from course Gist**" ] }, { "cell_type": "code", "execution_count": null, "id": "70927e3e", "metadata": {}, "outputs": [], "source": [ "import urllib\n", "\n", "myurl = 'https://gist.githubusercontent.com/arifpucit/6e2d95002460db296506ec6f0cfb7008/raw/efee0050a52048215c8772063ba4ef47ecd2b514/myself.txt'\n", "\n", "\n", "urllib.request.urlretrieve(myurl, 'myself.txt')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "07dc0fac", "metadata": {}, "outputs": [], "source": [ "fd = open(\"myself.txt\",\"r\")\n", "mylist = fd.readlines()\n", "print(mylist)\n", "fd.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "7b29612b", "metadata": {}, "outputs": [], "source": [ "with open(\"myself.txt\", 'r') as fd:\n", " for line in fd:\n", " print(line)" ] }, { "cell_type": "code", "execution_count": null, "id": "7909cce0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b492abe4", "metadata": {}, "outputs": [], "source": [ "def display(fname):\n", " fd = open(fname, \"r\")\n", " for line in fd:\n", " print(line, end = '') \n", " fd.close()\n", "display('longfile.txt')" ] }, { "cell_type": "markdown", "id": "14b48a0f", "metadata": {}, "source": [ "### Write a Python program to display lines of a file in reverse order" ] }, { "cell_type": "code", "execution_count": null, "id": "77ad2483", "metadata": {}, "outputs": [], "source": [ "def display(fname):\n", " fd = open(fname, \"r\")\n", " mylist = fd.readlines()\n", " for line in reversed(mylist):\n", " print(line.rstrip())\n", " fd.close()\n", "display('longfile.txt')" ] }, { "cell_type": "code", "execution_count": null, "id": "943611d1", "metadata": {}, "outputs": [], "source": [ "def display(fname, count):\n", " fd = open(fname, \"r\")\n", " a = 0\n", " mylist = fd.readlines()\n", " fd.close()\n", " for line in reversed(mylist):\n", " print(line.rstrip())# print(line, end = '')\n", " a = a + 1\n", " if a == count:\n", " break\n", "display('longfile.txt', 5)" ] }, { "cell_type": "code", "execution_count": null, "id": "67847a87", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "be4d2baa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "b6aad11e", "metadata": {}, "source": [ "### Write a Python program to count the number of lines in a file." ] }, { "cell_type": "code", "execution_count": null, "id": "b4049644", "metadata": {}, "outputs": [], "source": [ "def line_count(fname):\n", " fd = open(fname, \"r\")\n", " mylist = fd.readlines()\n", " fd.close()\n", " return len(mylist) \n", "\n", "line_count('longfile.txt')" ] }, { "cell_type": "markdown", "id": "8d6404d4", "metadata": {}, "source": [ "### Write a Python program to count the number of words in a file." ] }, { "cell_type": "code", "execution_count": null, "id": "8a025f48", "metadata": {}, "outputs": [], "source": [ "def word_count(fname):\n", " fd = open(fname, \"r\")\n", " mystring = fd.read()\n", " fd.close()\n", " mystring = mystring.replace('\\n', ' ')\n", " wordcount = len(mystring.split(' '))\n", " return wordcount\n", " \n", "word_count('f1.txt')" ] }, { "cell_type": "markdown", "id": "e6fb6b72", "metadata": {}, "source": [ "### Write a Python program to read a file and store its lines in a list (w/o n ew line char at the end)" ] }, { "cell_type": "code", "execution_count": null, "id": "d63959e6", "metadata": {}, "outputs": [], "source": [ "def read_lines(fname):\n", " mylist[]\n", " fd = open(fname)\n", " data = f.readlines()\n", " for myline in data:\n", " myline = myline.replace('\\n', ' ')\n", " mylist.add(myline)\n", " print(mylist)\n", "\n", "file_read('longfile.txt')\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cd811fcb", "metadata": {}, "outputs": [], "source": [ "!cat f1.txt" ] }, { "cell_type": "markdown", "id": "5a63eb87", "metadata": {}, "source": [ "### Write a python program to find the longest words." ] }, { "cell_type": "code", "execution_count": null, "id": "67dee9d2", "metadata": {}, "outputs": [], "source": [ "def size_n_words(fname, n):\n", " fd = open(fname, 'r')\n", " mystring = fd.read()\n", " fd.close()\n", " mystring = mystring.replace('\\n', ' ') #remove new line char if part of word\n", " mystring = mystring.replace('.', '') #remove period char if part of word\n", " mystring = mystring.replace(',', '') #remove comma char if part of word\n", " mylist = mystring.split(' ') # now you have list of words\n", " result_list = [word for word in mylist if len(word) == n]\n", " return result_list\n", "\n", "size_n_words('f1.txt', 2)" ] }, { "cell_type": "code", "execution_count": null, "id": "4f8b1c27", "metadata": {}, "outputs": [], "source": [ "lengths" ] }, { "cell_type": "code", "execution_count": null, "id": "0586b37c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8641469a", "metadata": {}, "outputs": [], "source": [ "def uld_count(fname):\n", " fd = open(fname,\"r\")\n", " data=fd.read()\n", " fd.close()\n", " lcase = 100\n", " ucase = lcase = digits = 0\n", " #lcase=0\n", " #digits=0\n", " for ch in data:\n", " if ch.islower():\n", " lcase+=1\n", " if ch.isupper():\n", " ucase+=1\n", " if ch.isdigit():\n", " digits+=1\n", " return ucase, lcase, digits\n", "#print(\"Total Number of Upper Case letters are:\",cnt_ucase)\n", "# print(\"Total Number of Lower Case letters are:\",cnt_lcase)\n", "# print(\"Total Number of digits are:\",cnt_digits)\n", "u, l, d = uld_count('longfile.txt')\n", "print(\"Upper Case:\", u)\n", "print(\"Lower Case:\", l)\n", "print(\"Digits count:\", d)" ] }, { "cell_type": "code", "execution_count": null, "id": "5f5bc6d3", "metadata": {}, "outputs": [], "source": [ "fd = open('longfile.txt', \"r\")\n", "for line in fd:\n", " print(line)" ] }, { "cell_type": "code", "execution_count": null, "id": "7af6cc28", "metadata": {}, "outputs": [], "source": [ "!cat f1.txt" ] }, { "cell_type": "code", "execution_count": null, "id": "e6df289a", "metadata": {}, "outputs": [], "source": [ "def search_word(fname):\n", " ctr = 0\n", " word_search = input(\"Enter the words to search:\")\n", " fd = open(fname, \"r\")\n", " for line in fd:\n", " words = line.split()\n", " for word in words:\n", " if (word == word_search):\n", " ctr += 1\n", " fd.close()\n", " return ctr\n", "rv = search_word('f1.txt')\n", "print(\"Word found \", rv, \" times in the file\")" ] }, { "cell_type": "code", "execution_count": null, "id": "4f079825", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9f585313", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4fdad6a8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "df03c6c5", "metadata": {}, "outputs": [], "source": [ "def mycopy(src, dest):\n", " with open(src, \"r\") as fd1:\n", " data = fd1.read()\n", " with open(dest, \"w\") as fd2:\n", " fd2.write(data)\n", "mycopy('longfile.txt', 'temp.txt')\n", "!cat temp.txt" ] }, { "cell_type": "code", "execution_count": null, "id": "c752b0ee", "metadata": {}, "outputs": [], "source": [ "def mymerge(file1, file2):\n", " with open(file1, \"r\") as fd1:\n", " data1 = fd1.read() \n", " \n", " with open(file2, \"r\") as fd2:\n", " data2 = fd2.read()\n", "\n", " with open('merge.txt', \"w\") as fd3:\n", " fd3.write(data1)\n", " fd3.write(data2)\n", "mymerge('hello.txt', 'f1.txt')\n", "!cat merge.txt" ] }, { "cell_type": "code", "execution_count": null, "id": "0657be30", "metadata": {}, "outputs": [], "source": [ "def program3():\n", " with open(\"hello.txt\",\"r\") as f1:\n", " data=f1.read()\n", " with open(\"f1.txt\",\"r\") as f2:\n", " data1=f2.read()\n", " with open(\"merge.txt\",\"w\") as f3:\n", " f3.write(data)\n", " f3.write(data1)\n", "program3()\n", "!cat hello.txt\n", "!cat f1.txt\n", "!cat merge.txt" ] }, { "cell_type": "code", "execution_count": null, "id": "ce791f32", "metadata": {}, "outputs": [], "source": [ "fd = open(\"hellogist.txt\",\"r\")\n", "print(fd.encoding)\n", "print(fd.name)\n", "print(fd.closed)\n", "print(fd.mode)\n", "\n", "print(fd.readable())\n", "print(fd.writable())\n", "print(fd.fileno())\n", "print(fd.isatty())\n", "print(fd.next())" ] }, { "cell_type": "code", "execution_count": null, "id": "a5d45bbe", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "bba48430", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "910cbfeb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d759f665", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c24431e7", "metadata": {}, "source": [ "# Binary Files" ] }, { "cell_type": "code", "execution_count": null, "id": "e23ca08e", "metadata": { "scrolled": true }, "outputs": [], "source": [ "with open('numbers.bin', 'wb') as fd:\n", " numbers = [1,2,3,4]\n", " fd.write(bytes(numbers))\n", "\n", "with open('numbers.bin', 'rb') as fd:\n", " data = fd.read()\n", " print(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "3e35cb13", "metadata": {}, "outputs": [], "source": [ "with open('string.bin', 'wb') as fd:\n", " mystring = \"Python123\"\n", " fd.write(mystring.encode('utf8'))\n", "\n", "with open('string.bin', 'rb') as fd:\n", " data = fd.read()\n", " print(data)" ] }, { "cell_type": "markdown", "id": "3da5dd43", "metadata": {}, "source": [ "# pickle module" ] }, { "cell_type": "code", "execution_count": null, "id": "fcd0c1e0", "metadata": {}, "outputs": [], "source": [ "import pickle\n", "with open('temp.bin', 'wb') as fd:\n", " pickle.dump(['Arif', 52, 2.3], fd)\n", " pickle.dump(['Rauf', 53, 4.5], fd)\n", " \n", "with open('temp.bin', 'rb') as fd:\n", " data = pickle.load(fd)\n", " print(data)\n", " data = pickle.load(fd)\n", " print(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "5c859dad", "metadata": {}, "outputs": [], "source": [ "#Standard code to write data to a binary file\n", "import pickle\n", "with open('students.bin', 'wb') as fd:\n", " while True:\n", " op = int(input(\"Enter 1 to add data and 0 to quit...\"))\n", " if (op == 1):\n", " name = input(\"Enter name:\")\n", " rollno = input(\"Enter rollno:\")\n", " marks = float(input(\"Enter marks:\"))\n", " data = [name, rollno, marks]\n", " pickle.dump(data, fd)\n", " elif (op == 0):\n", " break" ] }, { "cell_type": "code", "execution_count": 17, "id": "929fe10f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['arif', '1', 22.0]\n", "['rauf', '2', 33.0]\n", "['hadeed', '3', 72.3]\n", "['mujahid', '4', 88.0]\n" ] } ], "source": [ "# Standard code to read entire binary file\n", "import pickle\n", "with open('students.bin', 'rb') as fd:\n", " while True:\n", " try:\n", " data = pickle.load(fd)\n", " print(data)\n", " except EOFError:\n", " break" ] }, { "cell_type": "code", "execution_count": null, "id": "3372347e", "metadata": {}, "outputs": [], "source": [ "!cp students.bin backup.bin" ] }, { "cell_type": "code", "execution_count": 12, "id": "0c8e9130", "metadata": {}, "outputs": [], "source": [ "!cp backup.bin students.bin" ] }, { "cell_type": "code", "execution_count": 16, "id": "047b4daf", "metadata": {}, "outputs": [], "source": [ "# Reading specific column or row\n", "with open('students.bin', 'rb+') as fd:\n", " while True:\n", " try:\n", " posn = fd.tell()\n", " record = pickle.load(fd)\n", " if(record[0] == 'hadeed'):\n", " record[2] = 72.3\n", " fd.seek(posn,0)\n", " pickle.dump(record, fd)\n", " except EOFError:\n", " break" ] }, { "cell_type": "code", "execution_count": null, "id": "c1c5e999", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }