{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from IPython.display import HTML\n", "HTML('')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Structures like these are encoded in \"PDB\" files" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/pdb_header.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/pdb_atoms.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### How can we parse a complicted file like this one?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import pandas as pd\n", "pd.read_table(\"data/1stn.pdb\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### We can do better by manually *parsing* the file." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### Our test file\n", "![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/test-file.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this will print" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "print(f.readlines())\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this will print" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "for line in f.readlines():\n", " print(line)\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this will print" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "for line in f.readlines():\n", " print(line,end=\"\")\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Basic file reading operations: \n", "\n", "+ Open a file for reading: `f = open(SOME_FILE_NAME)` \n", "+ Read lines of file sequentially: `f.readlines()`\n", "+ Read one line from the file: `f.readline()`\n", "+ Read the whole file into a string: `f.read()`\n", "+ Close the file: `f.close()`\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Now what do we do with each line?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what the following program will do" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "for line in f.readlines():\n", " print(line.split())\n", "f.close() " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what the following program will do" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "for line in f.readlines():\n", " print(line.split(\"1\"))\n", "f.close() " ] }, { "cell_type": "markdown", "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "source": [ "### Splitting strings\n", "\n", "+ `SOME_STRING.split(CHAR_TO_SPLIT_ON)` allows you to split strings into a list. \n", "+ If `CHAR_TO_SPLIT_ON` is not defined, it will split on *all* whitespace (\" \",\"\\t\",\"\\n\",\"\\r\")\n", "+ \"\\t\" is TAB, \"\\n\" is NEWLINE, \"\\r\" is CARRIAGE_RETURN. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what the following will do\n", "![test_file](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/test-file.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "lines = f.readlines()\n", "f.close()\n", "\n", "line_of_interest = lines[-1]\n", "value = line_of_interest.split()[0]\n", "print(value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what will happen:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print(value*5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`value` is a string of \"1.5\". You can't do math on it yet. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The solution is to *cast* it into a float" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "value_as_float = float(value) \n", "print(value_as_float*5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Cast calls:\n", "\n", "`float`, `int`, `str`, `list`, `tuple`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "list(\"1.5\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Write a program that grabs the \"1\" from the first line in the file and multiplies it by 75. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f = open(\"test-file.txt\")\n", "lines = f.readlines()\n", "f.close()\n", "\n", "value = lines[0].split(\" \")[1]\n", "value_as_int = int(value)\n", "print(value_as_int*75)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What about *writing* to files?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Basic file writing operations: \n", "\n", "+ Open a file for writing: `f = open(SOME_FILE_NAME,'w')` **will wipe out file immediately!**\n", "+ Open a file to append: `f = open(SOME_FILE_NAME,'a')`\n", "+ Write a string to a file: `f.write(SOME_STRING)`\n", "+ Write a list of strings: `f.writelines([STRING1,STRING2,...])`\n", "+ Close the file: `f.close()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def file_printer(file_name):\n", " f = open(file_name)\n", " for line in f.readlines():\n", " print(line,end=\"\")\n", " f.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this code will do" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "a_list = [\"a\",\"b\",\"c\"]\n", "f = open(\"another-file.txt\",\"w\")\n", "for a in a_list:\n", " f.write(a)\n", "f.close()\n", "file_printer(\"another-file.txt\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this code will do" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "a_list = [\"a\",\"b\",\"c\"]\n", "f = open(\"another-file.txt\",\"w\")\n", "for a in a_list:\n", " f.write(a)\n", " f.write(\"\\n\")\n", "f.close()\n", "file_printer(\"another-file.txt\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Predict what this code will do" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "a_list = [\"a\",\"b\",\"ccat\"]\n", "f = open(\"another-file.txt\",\"w\")\n", "for a in a_list:\n", " f.write(\"A test {{}} {}\\n\".format(a))\n", "f.close()\n", "file_printer(\"another-file.txt\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "### `format` lets you make pretty strings" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print(\"The value is: {:}\".format(10.35151))\n", "print(\"The value is: {:.2f}\".format(10.35151))\n", "print(\"The value is: {:20.2f}\".format(10.35151))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "print(\"The value is: {:}\".format(10))\n", "print(\"The value is: {:20d}\".format(10))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### String formatting\n", "\n", "+ Pretty decimal printing: `\"{:LENGITH_OF_STRING.NUM_DECIMALSf}\".format(FLOAT)`\n", "+ Pretty integer printing: `\"{:LENGTH_OF_STRINGd}\".format(INT)`\n", "+ Pretty string printing: `\"{:LENGTH_OF_STRINGs}\".format(STRING)`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Create a loop that prints 0 to 9 to a file. Each number should be on its own line, written to 3 decimal places. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f = open(\"junk\",\"w\")\n", "for i in range(10):\n", " f.write(\"{:.3f}\\n\".format(i))\n", "f.close()\n", "file_printer(\"junk\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Basic file reading operations: \n", "\n", "+ Open a file for reading: `f = open(SOME_FILE_NAME)` \n", "+ Read lines of file sequentially: `f.readlines()`\n", "+ Read one line from the file: `f.readline()`\n", "+ Read the whole file into a string: `f.read()`\n", "+ Close the file: `f.close()`\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Basic file writing operations: \n", "\n", "+ Open a file for writing: `f = open(SOME_FILE_NAME,'w')` **will wipe out file immediately!**\n", "+ Open a file to append: `f = open(SOME_FILE_NAME,'a')`\n", "+ Write a string to a file: `f.write(SOME_STRING)`\n", "+ Write a list of strings: `f.writeline([STRING1,STRING2,...])`\n", "+ Close the file: `f.close()`" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "source": [ "### Splitting strings\n", "\n", "+ `SOME_STRING.split(CHAR_TO_SPLIT_ON)` allows you to split strings into a list. \n", "+ If `CHAR_TO_SPLIT_ON` is not defined, it will split on *all* whitespace (\" \",\"\\t\",\"\\n\",\"\\r\")\n", "+ \"\\t\" is TAB, \"\\n\" is NEWLINE, \"\\r\" is CARRIAGE_RETURN. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### String formatting\n", "\n", "+ Pretty decimal printing: `\"{:LENGITH_OF_STRING.NUM_DECIMALSf}\".format(FLOAT)`\n", "+ Pretty integer printing: `\"{:LENGTH_OF_STRINGd}\".format(INT)`\n", "+ Pretty string printing: `\"{:LENGTH_OF_STRINGs}\".format(STRING)`" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }