{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## file-io demo \n", "\n", "Open and read text file, read first line & last line, also use binary read. Only binary read can use reverse order seek, faster for large file processing. To test on larger text files. \n", "\n", "Extends basic idea from Python Workout, by Reuven Lerner, c 2020 Manning, chapter 5 \n", "Start August 7, 2020 week, \n", " Updates 9/3/2020 week " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0. Check python version, active conda env " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.8.2\r\n" ] } ], "source": [ "!python --version" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# conda environments:\r\n", "#\r\n", "base /home/jyoon/conda3\r\n", "dlpy * /home/jyoon/conda3/envs/dlpy\r\n", "fluentpy /home/jyoon/conda3/envs/fluentpy\r\n", "\r\n" ] } ], "source": [ "!conda env list \n", "# * is next to active conda env. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Start practice" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello! Welcome to demofile.txt\n", "This file is for testing purposes.\n", "Good Luck!\n", "\n" ] } ], "source": [ "f = open('demofile.txt', 'rt')\n", "blob = f.read() # read all \n", "print(blob)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n" ] } ], "source": [ "print(f.closed) # check if f is closed at this point." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello! Welcome to demofile.txt\n", "\n" ] } ], "source": [ "f = open('demofile.txt') # Open again to go back to file start point. \n", "line = f.readline() # read first line, no S\n", "print(line)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Good Luck!\n", "\n" ] } ], "source": [ "f = open('demofile.txt')\n", "lines = f.readlines() # read all lines, iterable list. With S\n", "print(lines[-1]) # print last line\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Binary format reads 'rb' " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'Hello! Welcome to demofile.txt\\n' b'uck!\\n'\n" ] } ], "source": [ "# Try binary format reading. \n", "with open('demofile.txt', 'rb') as f: # binary format\n", " first = f.readline()\n", " f.seek(-5, 2) # Go to pointer 5th bytes from end of file \n", " #last = f.read() # does readline work? May not read all of the last line. \n", " last = f.readline() # readline does work. Still not all of the line. \n", " print(first, last)\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'Hello! Welcome to demofile.txt\\n'\n", " b'Good Luck!\\n'\n" ] } ], "source": [ "# Seek from end of file needs a loop. \n", "with open('demofile.txt', 'rb') as f: # binary format\n", " first = f.readline()\n", " \n", " f.seek(-2, 2) # Go to pointer 2 bytes relative to end \n", " while f.read(1) != b\"\\n\": # while read one byte is not equal to binary '\\n'\n", " f.seek(-2, 1) # go back 1 byte. \n", " # after while loop, this is beginning of last line \n", " last = f.read() # read all bytes from here. \n", " # last = f.readline() gives same output. \n", " \n", " print(str(first)+\"\\n\", str(last))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Good for short files, readlines() all at once, save, index [ ] into each line" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello! Welcome to demofile.txt\n", " Good Luck!\n", "\n", "True\n" ] } ], "source": [ "# Good for short text files. \n", "with open('demofile.txt', 'rt') as f: \n", " lines = f.readlines()\n", " first = lines[0]\n", " last = lines[-1]\n", " print(first, last)\n", "\n", "print(f.closed) # check file is cloased after \"with\" block end. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. seek(0) to go back to start, works with 'rt' " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "lines: ['Hello! Welcome to demofile.txt\\n', 'This file is for testing purposes.\\n', 'Good Luck!\\n']\n", "blob: Hello! Welcome to demofile.txt\n", "This file is for testing purposes.\n", "Good Luck!\n", "\n", "one: ['Hello! Welcome to demofile.txt\\n']\n", "two: This file is for testing purposes.\n", "\n" ] } ], "source": [ "f2 = open('demofile.txt') \n", "lines = f2.readlines()\n", "print(\"lines: \", lines)\n", "\n", "f2.seek(0) # Goto beginning of file\n", "blob= f2.read()\n", "print(\"blob: \", blob)\n", "\n", "f2.seek(0)\n", "one = f2.readlines(1)\n", "print(\"one: \", one)\n", "\n", "f2.seek(0)\n", "two = f2.readlines()[1] # Works! prints 2nd item. nifty. \n", "print(\"two: \", two)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Only binary read 'rb' can use relative indexing " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'Hello! Welcome to demofile.txt\\n'\n", "b'Good Luck!\\n'\n", "77\n", "66\n", "b'Good Luck!\\n'\n" ] } ], "source": [ "# seek can be used with text format? \n", "# Yes, but can't do relative location other than 0, current point. \n", "# If the file is opened in text mode (without ‘b’), only offsets returned by tell() are legal. \n", "# none of the relative indexing from back/current location works without 'b' binary format. \n", "\n", "with open('demofile.txt', 'rb') as f: # need to be in binary format\n", " first = f.readline()\n", " print(first)\n", " \n", " f.seek(0, 0) \n", " last = f.readlines()[2]\n", " print(last)\n", " \n", " #f.seek(-2, 2) # not work, only 0 offset works with 2, 1 mode. \n", " f.seek(0, 2) # end of file work with text format file. \n", " print(f.tell())\n", " f.seek(-11, 1)\n", " print(f.tell())\n", " last = f.read()\n", " print(last) \n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "77\n", "0\n", "Hello! Welcome to demofile.txt\n", " 31\n", "Good Luck!\n", " 77\n", "['This file is for testing purposes.\\n', 'Good Luck!\\n'] 77\n", "This file is for testing purposes.\n", " 66\n", "Good Luck!\n", "\n" ] } ], "source": [ "with open('demofile.txt') as f: # text format again\n", " f.seek(0, 2) # end of file \n", " loc = f.tell(); print(loc)\n", " \n", " f.seek(0, 0) # go back to beginning of tile \n", " print(f.tell())\n", " first = f.readline()\n", " print(first, f.tell())\n", " \n", " last = f.readlines()[-1]\n", " print(last, f.tell())\n", " \n", " f.seek(31, 0)\n", " two = f.readlines()\n", " print(two, f.tell())\n", " \n", " f.seek(31, 0)\n", " second = f.readline()\n", " print(second, f.tell())\n", " \n", " third = f.readline()\n", " print(third)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Doen't solve large file last line problem, but another way to move around 'rt' format." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "77\n", "11\n", "66\n", "Good Luck!\n", "\n" ] } ], "source": [ "with open('demofile.txt') as f: # text format again\n", " \n", " # num_ln = len(f.readlines()); print(num_ln)\n", " \n", " num_ch_all = len(f.read())\n", " print(num_ch_all)\n", " \n", " f.seek(0) # back to beginning\n", " num_ch_last = len(f.readlines()[-1]); \n", " print(num_ch_last)\n", " \n", " offset = num_ch_all - num_ch_last # offset from file beginning\n", " f.seek(offset, 0)\n", " print(f.tell())\n", " print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Read a slightly larger file, The Raven by Edgar Allen Poe " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "first: The Raven\n", " \n", " last: Shall be lifted--nevermore!\n" ] } ], "source": [ "with open('TheRaven-Poe.txt') as raven: # read as default text format \n", " lines = raven.readlines()\n", " first = lines[0]\n", " last = lines[-1]\n", " print(\"first: \", first, \"\\n\", \"last: \", last)\n", " " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n", "['The Raven\\n', 'by\\n', 'Edgar Allan Poe\\n', '\\n', ' Once upon a midnight dreary, while I pondered, weak and weary,\\n', ' Over many a quaint and curious volume of forgotten lore--\\n', ' While I nodded, nearly napping, suddenly there came a tapping,\\n', ' As of some one gently rapping, rapping at my chamber door.\\n', ' \"\\'Tis some visitor,\" I muttered, \"tapping at my chamber door--\\n', ' Only this and nothing more.\"\\n']\n" ] } ], "source": [ "print(raven.close()) # check it it closed\n", "print(lines[0:10])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Start of poem: \n", " Once upon a midnight dreary, while I pondered, weak and weary,\n", "\n" ] } ], "source": [ "print(\"Start of poem: \")\n", "print(lines[4])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Raven\n", "by\n", "Edgar Allan Poe\n", "\n", " Once upon a midnight dreary, while I pondered, weak and weary,\n", " Over many a quaint and curious volume of forgotten lore--\n", " While I nodded, nearly napping, suddenly there came a tapping,\n", " As of some one gently rapping, rapping at my chamber door.\n", " \"'Tis some visitor,\" I muttered, \"tapping at my chamber door--\n", " Only this and nothing more.\"\n", "\n", " Ah, distinctly I remember it was in the bleak December,\n", " And each separate dying ember wrought its ghost upon the floor.\n", " Eagerly I wished the morrow;--vainly I had sought to borrow\n", " From my books surcease of sorrow--sorrow for the lost Lenore--\n", " For the rare and radiant maiden whom the angels name Lenore--\n", " Nameless here for evermore.\n", "\n", " And the silken sad uncertain rustling of each purple curtain\n", " Thrilled me--filled me with fantastic terrors never felt before;\n", " So that now, to still the beating of my heart, I stood repeating\n", " \"'Tis some visiter entreating entrance at my chamber door--\n", " Some late visiter entreating entrance at my chamber door;\n", " This it is and nothing more.\"\n", "\n", " Presently my soul grew stronger; hesitating then no longer,\n", " \"Sir,\" said I, \"or Madam, truly your forgiveness I implore;\n", " But the fact is I was napping, and so gently you came rapping,\n", " And so faintly you came tapping, tapping at my chamber door,\n", " That I scarce was sure I heard you\"--here I opened wide the door--\n", " Darkness there and nothing more.\n", "\n", " Deep into that darkness peering, long I stood there wondering, fearing,\n", " Doubting, dreaming dreams no mortals ever dared to dream before;\n", " But the silence was unbroken, and the stillness gave no token,\n", " And the only word there spoken was the whispered word, \"Lenore?\"\n", " This I whispered, and an echo murmured back the word, \"Lenore!\"--\n", " Merely this and nothing more.\n", "\n", " Back into the chamber turning, all my soul within me burning,\n", " Soon again I heard a tapping something louder than before.\n", " \"Surely,\" said I, \"surely that is something at my window lattice;\n", " Let me see, then, what thereat is and this mystery explore--\n", " Let my heart be still a moment and this mystery explore;--\n", " 'Tis the wind and nothing more.\n", "\n", " Open here I flung the shutter, when, with many a flirt and flutter,\n", " In there stepped a stately Raven of the saintly days of yore.\n", " Not the least obeisance made he; not a minute stopped or stayed he,\n", " But, with mien of lord or lady, perched above my chamber door--\n", " Perched upon a bust of Pallas just above my chamber door--\n", " Perched, and sat, and nothing more.\n", "\n", " Then the ebony bird beguiling my sad fancy into smiling,\n", " By the grave and stern decorum of the countenance it wore,\n", " \"Though thy crest be shorn and shaven, thou,\" I said, \"art sure no craven,\n", " Ghastly grim and ancient Raven wandering from the Nightly shore--\n", " Tell me what thy lordly name is on the Night's Plutonian shore!\"\n", " Quoth the Raven, \"Nevermore.\"\n", "\n", " Much I marvelled this ungainly fowl to hear discourse so plainly,\n", " Though its answer little meaning--little relevancy bore;\n", " For we cannot help agreeing that no living human being\n", " Ever yet was blessed with seeing bird above his chamber door--\n", " Bird or beast upon the sculptured bust above his chamber door,\n", " With such name as \"Nevermore.\"\n", "\n", " But the Raven, sitting lonely on that placid bust, spoke only\n", " That one word, as if its soul in that one word he did outpour\n", " Nothing farther then he uttered; not a feather then he fluttered--\n", " Till I scarcely more than muttered: \"Other friends have flown before--\n", " On the morrow _he_ will leave me, as my Hopes have flown before.\"\n", " Then the bird said \"Nevermore.\"\n", "\n", " Startled at the stillness broken by reply so aptly spoken,\n", " \"Doubtless,\" said I, \"what it utters is its only stock and store,\n", " Caught from some unhappy master whom unmerciful Disaster\n", " Followed fast and followed faster till his songs one burden bore--\n", " Till the dirges of his Hope that melancholy burden bore\n", " Of 'Never--nevermore.'\"\n", "\n", " But the Raven still beguiling all my sad soul into smiling,\n", " Straight I wheeled a cushioned seat in front of bird and bust and door;\n", " Then, upon the velvet sinking, I betook myself to linking\n", " Fancy unto fancy, thinking what this ominous bird of yore--\n", " What this grim, ungainly, ghastly, gaunt, and ominous bird of yore\n", " Meant in croaking \"Nevermore.\"\n", "\n", " This I sat engaged in guessing, but no syllable expressing\n", " To the fowl whose fiery eyes now burned into my bosom's core;\n", " This and more I sat divining, with my head at ease reclining\n", " On the cushion's velvet lining that the lamp-light gloated o'er,\n", " But whose velvet violet lining with the lamp-light gloating o'er\n", " _She_ shall press, ah, nevermore!\n", "\n", " Then, methought, the air grew denser, perfumed from an unseen censer\n", " Swung by Seraphim whose foot-falls tinkled on the tufted floor.\n", " \"Wretch,\" I cried, \"thy God hath lent thee--by these angels he hath sent thee\n", " Respite--respite and nepenthe from thy memories of Lenore!\n", " Quaff, oh quaff this kind nepenthe and forget this lost Lenore!\"\n", " Quoth the Raven, \"Nevermore.\"\n", "\n", " \"Prophet!\" said I, \"thing of evil!--prophet still, if bird or devil!--\n", " Whether Tempter sent, or whether tempest tossed thee here ashore,\n", " Desolate, yet all undaunted, on this desert land enchanted--\n", " On this home by Horror haunted--tell me truly, I implore--\n", " Is there--_is_ there balm in Gilead?--tell me--tell me, I implore!\"\n", " Quoth the Raven, \"Nevermore.\"\n", "\n", " \"Prophet!\" said I, \"thing of evil!--prophet still, if bird or devil!\n", " By that Heaven that bends above us--by that God we both adore--\n", " Tell this soul with sorrow laden if, within the distant Aidenn,\n", " It shall clasp a sainted maiden whom the angels name Lenore--\n", " Clasp a rare and radiant maiden whom the angels name Lenore.\"\n", " Quoth the Raven, \"Nevermore.\"\n", "\n", " \"Be that our sign of parting, bird or fiend!\" I shrieked, upstarting--\n", " \"Get thee back into the tempest and the Night's Plutonian shore!\n", " Leave no black plume as a token of that lie thy soul has spoken!\n", " Leave my loneliness unbroken!--quit the bust above my door!\n", " Take thy beak from out my heart, and take thy form from off my door!\"\n", " Quoth the Raven, \"Nevermore.\"\n", "\n", " And the Raven, never flitting, still is sitting, still is sitting\n", " On the pallid bust of Pallas just above my chamber door;\n", " And his eyes have all the seeming of a demon's that is dreaming\n", " And the lamp-light o'er him streaming throws his shadows on the floor;\n", " And my soul from out that shadow that lies floating on the floor\n", " Shall be lifted--nevermore!\n" ] } ], "source": [ "with open('TheRaven-Poe.txt') as raven: # read again as text \n", " blob = raven.read() # Read whole thing \n", " print(blob)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'The Raven\\r\\n'\n", "b' Once upon a midnight dreary, while I pondered, weak and weary,\\r\\n'\n", "b' Shall be lifted--nevermore!'\n" ] } ], "source": [ "with open('TheRaven-Poe.txt', 'rb') as f: # read as binary format \n", " first = f.readline()\n", " print(first)\n", " \n", " for c in (1,2,3): # Move pointer past 3 lines\n", " start = f.readline()\n", " start = f.readline() # Next line is start of poem \n", " print(start)\n", " \n", " f.seek(-2, 2) # Go to pointer 2 bytes relative to end \n", " while f.read(1) != b\"\\n\": # while read one byte is not equal to binary '\\n'\n", " f.seek(-2, 1) # go back 1 byte. \n", " # after while loop, this is beginning of last line \n", " last = f.read() # read all bytes from here. \n", " # last = f.readline() gives same output. \n", " print(last)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8. Next, to test speed using 150 - 200 MB text file from Google Ngram dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.2 64-bit ('dlpy': conda)", "language": "python", "name": "python38264bitdlpycondaee83557e2f484524a67e008c475209c4" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 2 }