{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "reference-location: margin\n", "citation-location: margin\n", "execute:\n", " freeze: auto\n", " cache: true\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# File I/O (Input/Output)\n", "\n", "[Jupyter Notebook](https://lancejnelson.github.io/PH135/jupyter/io.ipynb)\n", "\n", "\n", "Up to now, we have been working with computer-generated or manually typed data sets. Often in a scientific setting your data will be stored in a file and you will need to read the contents of the file into Python so you can perform an analysis. Most data files are *text* files, but there is a large variety of these that differ mostly in the way the information is formatted. The file extension (i.e., the 3 or 4 letters after the period at the end of a file name) specifies the formatting of the file. For example, a *.csv* file (short for comma-separated values) has commas to separate the information.\n", "\n", "\n", "## The `os` module\n", "Before we learn how to read and write data files, we should first learn how to navigate our computer's file system from within Python. For this, we will use a module called `os` (short for operating system). The `os` module allows you to perform standard operations on the files and folders on your computer. When opening files from a python script, the default search path is the directory where the current file is located. If you want to open files located in other directories, you may need to use the `os` module to navigate there first. The following functions are the most commonly used ones\n", "\n", "\n", "|Function|Description|\n", "|--------|-----------|\n", "|`getcwd()`| Short for get current working directory. Returns a string of the current directory.|\n", "|`chdir(path)`| Change the current working direction to be at `path`.|\n", "|`listdir()`| List all of the files and folders in the current working directory.|\n", "|`mkdir(path)`| Make an new directory at location `path`.|\n", "\n", "In the cell below, we show the usage of these functions." ] }, { "cell_type": "code", "metadata": {}, "source": [ "#| eval: false\n", "from os import getcwd,chdir,listdir\n", "\n", "currentdir = getcwd() \n", "myfiles = listdir()\n", "#chdir(/path/to/new/directory)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">**_To Do:_**\n", "> \n", "> 1. Add print statements to the cell above and inspect the results to understand what each function does.\n", "> 2. Modify the third statement above (the one that uses `chdir`) to change the current working directory to one that actually exists on your machine.\n", "> 3. Everyone has a \"Downloads\" folder on their computer. Use the `chdir` function to change the current working directory to the Downloads folder.\n", "\n", "\n", "The `os` module has many, many more functions that do useful things but you will mostly use the functions mentioned above. As you get more experience using your computer at the command line, you will be better equipped to understand the usefulness of the rest of this library. \n", "\n", "## Reading Files\n", "### Reading Line by Line\n", "\n", "The first way to read a file is using a `for` loop to iterate over the file line by line. Admittedly, this is not the most elegant or efficient way to read a file but we present it first because it always works. First, the file is opened using the `open` command. The file should be attached to a variable for later use. Next, the `readlines()` function is used to read the file in as **a list of strings**; one string for each line in the file. We should use a `for` loop for to iterate over this list, effectivley parsing the file line by line. Finally, the file should be closed when you're finished reading it. Let's see an example for reading in the following file, which will be named `squares.csv`. (You can download [squares.csv here](https://lancejnelson.github.io/PH135/files/squares.csv) if you want to execute the cells below without an error.)\n", "\n", "\n", "```\n", "1, 1\n", "2, 4\n", "3, 9\n", "4, 16\n", "5, 25\n", "6, 36\n", "7, 49\n", "```\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "file = open(\"squares.csv\")\n", "lines = file.readlines() # Read the file into a list of strings, one string for each line in the file.\n", "for line in lines: # Iterate over the list of strings.\n", " print(line) # Do something \n", "\n", "file.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see how each line gets read separately and printed off. But this isn't super useful yet because we'd probably like to have the numbers stored in lists for our forthcoming analysis. We can fix this by creating some empty lists and appending the appropriate values as they are read in. \n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "numbers = []\n", "squares = []\n", "\n", "file = open(\"squares.csv\")\n", "for line in file.readlines():\n", " numbers.append( int(line.split(',')[0]) )\n", " squares.append( int(line.split(',')[1]) )\n", "\n", "file.close()\n", "\n", "print(numbers)\n", "print(squares)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the data from the file is saved in Python lists and our analysis can proceed. \n", "\n", "One final note: if all you ned to do is read the first few lines of a file, the `readline()` function can be used (not `readlines()`) and each call to this function will read the next line in the file." ] }, { "cell_type": "code", "metadata": {}, "source": [ "file = open(\"squares.csv\")\n", "lineOne = file.readline()\n", "lineTwo = file.readline()\n", "lineThree = file.readline()\n", "file.close()\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using NumPy's `genfromtxt` function\n", "\n", "If the data file is highly structured (every line looks the same, separator character is consistent across the file, etc) then NumPy's `genfromtxt` function can read the data very efficiently into an array. The `genfromtxt` function requires only one argument (the file name) with another optional argument (`delimiter`) that is typically included to specify the character used to separate the data. Below is an example for using this function to read in the .csv data we have been working with.\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "from numpy import genfromtxt\n", "\n", "data = genfromtxt('squares.csv',delimiter = \",\")\n", "\n", "print(data)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the data was read into a NumPy array (2D in this case because there were two columns of data), which means that we can easily slice off the individual columns if needed." ] }, { "cell_type": "code", "metadata": {}, "source": [ "from numpy import genfromtxt\n", "\n", "data = genfromtxt('squares.csv',delimiter = \",\")\n", "\n", "numbers = data[:,0]\n", "squares = data[:,1]\n", "print(numbers)\n", "print(squares)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other optional arguments that can be used with the `genfromtxt` function are given in the table below:\n", "\n", "\n", "\n", "| Argument | Description |\n", "|--------|------------|\n", "| `delimiter` |The string used to separate value. By default, whitespace acts as the delimiter.|\n", "| `skip_header` |The number of lines to skip at the beginning of a file.|\n", "| `skip_footer` |The number of lines to skip at the end of a file.|\n", "| `usecols` |Specify which columns to read with 0 being the first. For example, `usecols = (0,2,5)` will read the 1st, 3rd, and 6th columns.|\n", "| `comments` |The character used to indicate the start of a comment. Lines beginning with this character will be discarded.|\n", "\n", "\n", "\n", "\n", "\n", "## Writing Files\n", "Writing Python data to file is as simple as is reading a file. Just like when you are reading a file, determining which method to use will be determined by the type of data that you are writing. If your data is strictly numerical information stored in an array, Numpy has a function that will quickly save the data to a file. If your data is rife with inconsistencies, non-numerical data, etc, you’ll have to use Python's native `write` function. \n", "\n", "### Writing Line by Line\n", "\n", "Sometimes the data file that you want to write includes some text or other non-numerical data. For example, what if you wanted to write the following data to file:\n", "\n", "\n", "| Planet | Acceleration due to gravity (m/s$^2$|\n", "|--------|------------|\n", "| Earth | 9.8|\n", "| Moon | 1.6|\n", "| Mars | 3.7|\n", "| Venus | 8.83|\n", "| Saturn | 11.2|\n", "| Uranus | 10.5|\n", "| Neptune | 13.3|\n", "| Pluto | 0.61|\n", "| Jupiter | 24.5|\n", "| Sun | 275|\n", "\n", "In this case you must open the file you want to write through and write each line of the file one by one. This requires that you use a loop to iterate over the data. An example is given below." ] }, { "cell_type": "code", "metadata": {}, "source": [ "planets = [\"Earth\",\"Moon\",\"Mars\",\"Venus\",\"Saturn\",\"Uranus\",\"Neptune\",\"Pluto\",\"Jupiter\",\"Sun\"]\n", "g=[9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]\n", "\n", "f= open(\"planets.txt\",\"w\")\n", "\n", "f.write(\"Planet g (m/s^2)\\n\")\n", "f.write(\"------------------\\n\")\n", "\n", "for idx,planet in enumerate(planets):\n", " f.write(f\"{planet:10s} {g[idx]:5.2f} \\n\" )\n", "\n", "f.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `writelines` function\n", "If the data you are wanting to write to file is already in a list of strings, the `writelines()` function can be used once to write all of the lines in the file." ] }, { "cell_type": "code", "metadata": {}, "source": [ "planets = [\"Planet g (m/s^2)\",\"------------------\", \"Earth 9.8\",\"Moon 1.6\",\"Mars 3.7\",\"Venus 8.83\",\"Saturn 11.2\",\"Uranus 10.5\",\"Neptune 13.3\",\"Pluto 0.61\",\"Jupiter 24.5\",\"Sun 275\"]\n", "\n", "f= open(\"planets.txt\",\"w\")\n", "\n", "f.writelines(planets)\n", "\n", "f.close()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `with` statement\n", "Forgetting to close a file that you are writing to can be problematic so you should always include the `close()` function when you are done. If you are worried about forgetting it, you can use a `with` block that will automatically close the file once the block terminates. An example is given below:\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "planets = [\"Earth\",\"Moon\",\"Mars\",\"Venus\",\"Saturn\",\"Uranus\",\"Neptune\",\"Pluto\",\"Jupiter\",\"Sun\"]\n", "g=[9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]\n", "\n", "with open(\"planets.txt\",\"w\") as f:\n", " f.write(\"Planet g (m/s^2)\\n\")\n", " f.write(\"------------------\\n\")\n", "\n", " for idx,planet in enumerate(planets):\n", " f.write(f\"{planet:10s} {g[idx]:5.2f} \\n\" )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even though no `close()` function is called, the file `planets.txt` will be automatically closed once the `with` block is terminated. A `with` block can also be used when reading data files.\n", "\n", "### Using NumPy's `savetxt` function\n", "\n", "If the data is strictly numerical and contains no text, the `savetxt` function is a fast and efficient way to write the data to file. For example, maybe you have a two dimensional array containing various columns of planetary data.\n", "\n", "\n", "\n", "| Radius ( $\\times 10^{6}$ meters ) | Mass ( $\\times 10^{23}$ kg ) | Acceleration due to gravity (m/s$^2$) |\n", "|--------|------|------------|\n", "| 6.37 | 59.8 | 9.8|\n", "| 1.74 | 0.736 | 1.6|\n", "| 3.38 | 6.42 | 3.7|\n", "| 6.07 | 48.8 | 8.83|\n", "| 58.2 | 5680 | 11.2|\n", "| 23.5 | 868 | 10.5|\n", "| 22.7 | 1030 | 13.3|\n", "| 1.15 | 0.131 | 0.61|\n", "| 69.8 | 19000 | 24.5|\n", "| 696 | 19890000 | 275|\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "from numpy import savetxt\n", "\n", "data =[[6.37,59.8,9.8],[1.74,0.736,1.6],[3.38,6.42,3.7],[6.07,48.8,8.83],[58.2,5680,11.2],[23.5,868,10.5],[22.7,1030,13.3],[1.15,.131,0.61],[69.8,19000,24.5],[696,19890000,275]]\n", "\n", "savetxt(\"planetsData.txt\",data,fmt = \"%5.2e\")\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `fmt` keyword argument can be be added to specify how to format the data. If the data is stored in separate one-dimensional arrays, you can pack them into a single list(tuple) when using the `savetxt` function and it will write each data set to its own line in the file.\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "radius = [6.37,1.74,3.38,6.07,58.2,23.5,22.7,1.15,69.8,696]\n", "mass = [59.8,0.736,6.42,48.8,5680,868,1030,0.131,19000,19890000]\n", "g = [9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]\n", "\n", "savetxt(\"planetsDataTwo.txt\",(radius,mass,g),fmt = \"%5.2e\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other optional arguments that can be used with the `savetxt` function are given in the table below:\n", "\n", "\n", "\n", "| Argument | Description |\n", "|--------|------------|\n", "| `delimiter` |String or character to separate columns.|\n", "| `newline` |String or character to separate lines.|\n", "| `header` |String to be written at the beginning of the file.|\n", "| `footer` |String to be written at the end of the file.|\n", "| `comments` |String that will be prepended to the header and footer string to mark them as comments.|\n", "\n", "\n", "\n", "## Flashcards\n", "1. What does `os.getcwd()` do?\n", "2. What does `os.chdir()` do?\n", "3. What does `os.mkdir()` do?\n", "4. What does `os.listdir()` do?\n", "5. How do you open a file and read all of the lines in that file using `readlines()`?\n", "6. How do you open a file and read all of the lines in that file using `numpy.genfromtxt()`?\n", "7. What is the `delimiter` keyword argument used for when reading a file with `numpy.genfromtxt()`?\n", "8. How do you open and write to a file using the `write` function?\n", "9. How do you write to a file using `numpy.savetxt`?\n", "10. Where are the sacrament prayers located? (They are found in more than one location)\n", "\n", "## Exercises\n", "1. Use all of the functions from the `os` library that we mentioned in the text. I'll let you choose where to change your directory to and the name of the directory to create. I just want you to use all of them until you feel comfortable. \n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Python code here" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. In a previous homework problem you calculated the moments of inertia for a data set. In the cell below, you will find arrays containing 1000 values for for mass, length, and radius just as you did in the other homework problem.\n", " 1. Using a single line of code, calculate the moments of inertia for the entire data set using the familiar formula: $$ I = {1\\over 4} M R^2 + {1\\over 12} M L^2$$\n", " 2. Write this data to file first using the `open` and `write` functions. Mass values should be in column 1, lengths in column 2, radii in column 3, and moments of inertia in column 4. Add a header to the file to label the columns. Open the file and inspect it to verify that you did it right.\n", " 3. Repeat exercise 2 but using the `numpy.savetxt` function. Open the file and inspect it to verify that you did it right.\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "#| eval: false\n", "from numpy.random import normal\n", "\n", "mass = normal(5,.5,1000)\n", "length = normal(2,.2,1000)\n", "radius = normal(1,.1,1000)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. [This file](https://lancejnelson.github.io/PH135/files/planets.csv) contains planetary data for all of the planets in our solar system formatted as a \".csv\" file. \n", " 1. Download the file and move it to a location on your computer that you are familiar with. \n", " 2. Read the file using the `open` and `readlines` functions.\n", " 3. Repeat exercise 2 using `numpy.genfromtxt()`.\n", " 4. Using a single line of code, calculate the average eccentricity of the planets.\n", " 5. Using a single line of code each, determine the maximum and minium acceleration due to gravity of the planets.\n", "\n", " Tip: You'll need to use the `skip_header` and `usecols` keyword arguments to tell `genfromtxt` to skip the first row and column.\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Python code here" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. [This file](https://lancejnelson.github.io/PH135/files/OUTCAR.static) contains the output from a quantum mechanical calculation on a Cobalt-based alloy. The file is pretty large and the data is not well-structured at all even though there is valuable information inside. Reading this file is definitely a job for `open` and `readlines` rather than `numpy.genfromtxt()`. Periodically in this file there are lines that look like this: \n", "` free energy TOTEN = -72.3092 eV` \n", "and you need to extract the numbers from all of these lines. Use the following steps to build a list containing these numbers: \n", " 1. Use the `open` and `readlines` functions to read this file into a list of strings.\n", " 2. Iterate over this list and look for lines that contain the string \"TOTEN\".\n", " 3. When you find a line that has this string in it, `split` the string and extract the number. Make sure you convert the number into a float.\n", " 4. Append this number to a list that you defined initially to be empty. \n", " \n", " When you have built the list of energies, determine: \n", " 1. The number of energies in your list.\n", " 2. The average of the last 5 entries.\n", " 3. The minimum and maximum energies in your list. \n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Python code here" ], "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }