{ "metadata": { "name": "special_topics_part_1" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Running applications and data processing in Python\n\nThis tutorial provide useful tools in Python for: Script/code generation, running applicatios and operating system dependent functionalities, and working with files: reading, parsing, and writing.\n\nThe tutorial consists of three parts:\n\n* Introducing relevant Python modules and functions that are useful in the context of this tutorial\n* A complete example\n* Exercise" }, { "cell_type": "markdown", "metadata": {}, "source": "### Dealing with the file system and path\n\nThe `glob`, `os`, and `shutil` modules provide important cross-platform functionalities to deal with the file system. To see the full documentation you can refer to these links:\n\n[http://docs.python.org/2/library/glob.html](http://docs.python.org/2/library/glob.html)\n\n[http://docs.python.org/2/library/os.html](http://docs.python.org/2/library/os.html)\n\n[http://docs.python.org/2/library/shutil.html](http://docs.python.org/2/library/shutil.html)" }, { "cell_type": "code", "collapsed": false, "input": "import os\nfrom os.path import join as join_path\nhelp(os.getcwd)\nprint(\"----------------------------------------------------------\")\nhelp(os.chdir)\nprint(\"----------------------------------------------------------\")\nhelp(os.makedirs)\nprint(\"----------------------------------------------------------\")\nhelp(os.listdir)\nprint(\"----------------------------------------------------------\")\nhelp(os.remove)\nprint(\"----------------------------------------------------------\")\nhelp(join_path)\nprint(\"----------------------------------------------------------\")\nhelp(os.path.abspath)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Change the curent directory to the location of the \"Special_topics_part_1\" directory\nos.chdir(\"/Users/malastm/new_ACM_tutorials/ACM-Python-Tutorials-KAUST-2014/Special_topics_part_1/\")\n\nprint(\"The current directory was : \" + os.getcwd())\n\ntutorial_example_dir = os.path.join(os.getcwd(), \"example\")\nos.chdir(tutorial_example_dir)\nprint(\"The current directory changed to: \" + os.getcwd())\n\nprint(\"Files in this directory: \", os.listdir('.'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "import glob\n\nhelp(glob.glob)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "import shutil\n\nhelp(shutil.rmtree)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "def ensure_dir(d):\n \"\"\" Create a directory and ignore the error if the directory already exist\"\"\"\n import os, errno\n try:\n os.makedirs(d)\n except OSError as exc:\n if exc.errno == errno.EEXIST:\n pass\n else: raise\n\ndef clean_all():\n \"\"\"This function removes any generated previous result at this tutorial's example\"\"\"\n import glob, shutil, os\n for f in glob.glob('results/*.*'):\n os.remove(f)\n shutil.rmtree('results/patients', ignore_errors=True)\n\nclean_all()", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "### working with files: reading and writing\n\n" }, { "cell_type": "code", "collapsed": false, "input": "help(open)\nprint(\"----------------------------------------------------------\")\nprint(\"----------------------------------------------------------\")\nprint(dir(file))\nprint(\"----------------------------------------------------------\")\nhelp(file.close)\nprint(\"----------------------------------------------------------\")\nhelp(file.read)\nprint(\"----------------------------------------------------------\")\nhelp(file.write)\nprint(\"----------------------------------------------------------\")\nhelp(file.readline)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### A useful function to print the contents of a file given its name" }, { "cell_type": "code", "collapsed": false, "input": "def print_file(f_name):\n \"\"\"Print the contents of a file. The same operation of 'cat', the shell command\"\"\"\n f = open(f_name, 'r')\n print(f.read())\n f.close()", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "## Complete example\n\nIn this example we will:\n\n* generate a script/code from a template,\n* use Python to run the script over several inputs to generate the raw results files,\n* extract/parse the desired information from the raw files and place them in a dictionary,\n* sort the data,\n* write the results table in a CSV file, and finally\n* archive the code file and information for reproducibility" }, { "cell_type": "markdown", "metadata": {}, "source": "Before generating a script, it is a good practice to test the base script before using it as a template.\n\nThe following script takes person's information and write them in a file using human-readable text" }, { "cell_type": "code", "collapsed": false, "input": "print_file(join_path(\"scripts\", \"format_info.py\") )", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "To run an application, a python script in our example, we use the `subprocess` module. It contains functions to spawn new processes, work with their input/output/error pipes, and read their return code.\n\nHere we use `check_call` function at the module. It will raise an exception if the process does not terminate normally, that is, the return code is not zero.\n\nFor more information about the `subprocess` module see: [http://docs.python.org/2/library/subprocess.html](http://docs.python.org/2/library/subprocess.html)" }, { "cell_type": "code", "collapsed": false, "input": "import subprocess\n\npython_script = join_path(\"scripts\", \"format_info.py\")\ncommand = \"Python \" + python_script\narguments = \" Ahmad 1.77 80 25\"\n\nensure_dir('results')\n\nprint(\"We will run a process using this command: \" + command + arguments)\nsts = subprocess.check_call(command + arguments, shell=True)\nprint (\"The return status of the process: \" + str(sts))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now read the resulting output file, then remove it" }, { "cell_type": "code", "collapsed": false, "input": "print(os.listdir('results'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "print_file(join_path(\"results\", \"patient_Ahmad_25.txt\"))\nos.remove(join_path(\"results\", \"patient_Ahmad_25.txt\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "### Using the Template class for code generation\n\nThe following example uses the Template class to generate a python script from a template string. \n\nThe following code cell contains the template string of our target script. The string contains the constant parts that will remain the same and the template placeholders that will be replaced with the desired values. The `$` is used to indicate template placeholders that will be replaced with the desired string. In this string we have two placeholders: `formula` and `name`.\n\nThe goal of this code generation is to generate a script that will compute additional information and include it in the output file.\n\nNote: special care is required for special characters. For example the `\\n` is replaced with `\\\\n`. Also, if the desired script contains `$`, it should be replaced with `$$` in the template string." }, { "cell_type": "code", "collapsed": false, "input": "script_template_txt = \"\"\"#!/usr/bin/env python\n\nimport sys\n\nname = str(sys.argv[1])\nheight = float(sys.argv[2])\nweight = float(sys.argv[3])\nage = int(sys.argv[4])\n\nval = $formula\n\nf = open('results/patients/patient_' + name + '_' + str(age) + '.txt', 'w')\n\nf.write( \"patient's name: \" + name + \"\\\\n\")\nf.write( \"patient's age: \" + str(age) + \" Years\\\\n\")\nf.write( \"patient's Weight: \" + str(weight) + \" kgs\\\\n\")\nf.write( \"patient's height: \" + str(height) + \" Meters\\\\n\")\n\nf.write( \"patient's $name: %f \\\\n\" % val)\n\nf.close()\n\"\"\"", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now we create a template object, named `script_template`. Then, the template variables are substituted with the [Body Mass Index (BMI)](http://en.wikipedia.org/wiki/Body_mass_index) formula. The output string is written out to a script file named `calc_bmi.py`" }, { "cell_type": "code", "collapsed": false, "input": "from string import Template\n\nscript_template = Template(script_template_txt)\nformula = \"weight/height**2\"\nname = \"BMI\"\nscript_txt = script_template.substitute(formula=formula, name=name)\n\nwith open(join_path(\"results\",\"calc_bmi.py\"), 'w') as f:\n f.write(script_txt)\n\nprint_file(join_path(\"results\",\"calc_bmi.py\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now create a directory for the results." }, { "cell_type": "code", "collapsed": false, "input": "ensure_dir(join_path(\"results\", \"patients\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": " Here we test our generated script." }, { "cell_type": "code", "collapsed": false, "input": "python_script = join_path(\"results\",\"calc_bmi.py\")\ncommand = \"Python \" + python_script\narguments = \" Ahmad 1.77 80 25\"\n\nprint(\"Execute: \" + command + arguments)\nsubprocess.check_call(command + arguments, shell=True)\n\nprint_file(join_path(\"results\", \"patients\", \"patient_Ahmad_25.txt\"))\nos.remove(join_path(\"results\", \"patients\", \"patient_Ahmad_25.txt\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Run the new script over several examples to generate the raw files in the `patients` directory" }, { "cell_type": "code", "collapsed": false, "input": "patients = [('Williams',19,84,1.74), ('Johnson',23,82, 1.65), \n ('Jones', 25, 70, 1.8), ('Jones', 29, 85, 1.66),\n ('Smith', 30, 120, 1.9), ('ahmad', 35, 50.5, 1.5)]\n\nfor n, a, w, h in patients:\n arguments = \" %s %f %f %d\" % (n, h, w, a)\n subprocess.check_call(command + arguments, shell=True)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "os.listdir(join_path('results', 'patients'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### Parsing the results\n\nThe following function takes an object of a raw file, extracts/parses the desired values, and returns an ordered dictionary of the entries." }, { "cell_type": "code", "collapsed": false, "input": "from collections import OrderedDict\ndef parse_bmi(fp):\n fields = [(\"patient's name\", \"name\", str),(\"patient's age\", \"age\", int),\n (\"patient's Weight\", \"weight\", float),(\"patient's height\", \"height\", float),(\"patient's BMI\", \"bmi\", float)]\n record = OrderedDict()\n for f in fields:\n record[f[1]] = 0\n for ln in fp:\n for f in fields:\n if f[0] in ln:\n # Using naive string parsing:\n #val = ln.split(\":\")[1].strip().split(' ')[0]\n val = ln.split(\":\")[1]\n val = val.strip()\n val = val.split(' ')[0]\n val = map(f[2], [val])[0]\n \n record[f[1]] = val\n\n return record", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "In the following two code blocks the list of available raw files is prepared. Then, the files are parsed and placed in a list of dictionaries" }, { "cell_type": "code", "collapsed": false, "input": "files_list = glob.glob(join_path('results', 'patients', '*.txt') )\nprint (files_list)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "entries =[]\nfor rf in files_list: \n with open(rf, 'r') as fp: \n record = parse_bmi(fp)\n entries.append(record)\n\nfor i in entries: print(i)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### Sorting the results\n\nWe use the `sorted` built-in function in Python. It returns a sorted list of the given iterable, a list of dictionaries in our example. In a flat list, for example `[1,4,2]`, it is clear for sorting function what is greater/smaller to perform the sorting operation. Because we have a dictionary in each item of our list, the optional argument `key` is used to specify the value we would like to compare in the dictionary. `key` argument accepts a single argument function that returns the comparison value.\n\nFor more information see: [http://docs.python.org/2/library/functions.html#sorted](http://docs.python.org/2/library/functions.html#sorted)" }, { "cell_type": "code", "collapsed": false, "input": "# We will use the itemgetter function to specify our soring key from the dictionary\nfrom operator import itemgetter\n\nd0 = entries[0]\nprint(d0)\n\nig = itemgetter('age', 'name')\nprint (ig(d0))\n# This is equivalent to:\nprint(d0['age'], d0['name'])", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now we sort the records according to the age then the name" }, { "cell_type": "code", "collapsed": false, "input": "from operator import itemgetter\n\nentries = sorted(entries, key=itemgetter('age', 'name'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "for i in entries: print(i)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### Writing to a CSV file\n\nIn the following code block, the list of dictionaries is stored in a CSV file. We use the `csv` module for this purpose. It has many powerful functionalities to write/read CSV files. Here we use the DictWriter/DictReader classes to write/read data to/from CSV files. These two classes are specialized to deal with dictionary data types in python.\n\nFor more information see [http://docs.python.org/2/library/csv.html](http://docs.python.org/2/library/csv.html)." }, { "cell_type": "code", "collapsed": false, "input": "from csv import DictWriter\nfields = entries[0].keys()\n\nwith open( join_path('results', 'records.csv'), 'w' ) as output_file:\n r = DictWriter(output_file,fieldnames=fields)\n r.writeheader()\n for k in entries:\n r.writerow(k)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "print_file( join_path('results', 'records.csv') )", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### Loading data from CSV file and and extracting data from the table" }, { "cell_type": "code", "collapsed": false, "input": "from csv import DictReader\nwith open(join_path('results', 'records.csv'), 'rb') as output_file:\n data = DictReader(output_file)\n data = [k for k in data]\nprint(data)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now extract the `age` and `bmi` values from the data" }, { "cell_type": "code", "collapsed": false, "input": "points = []\nfor d in data:\n points.append((d['age'], d['bmi']))\nprint(points)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "#### Archiving the source files\n\nA good method to improve the reproducability of the experiment, is to create a tarball of all the source code and setup for each experiment. In a typical project, the tarball would contain all the source files, make files, build log file, and possibly the executable itself.\n\nHere we have most of the source files in the notebook, so we do not have much to place in the tarball. The results here are also stored in the tarball, although it is not a good idea in a real life example." }, { "cell_type": "code", "collapsed": false, "input": "import tarfile\nf_list = [join_path(\"results\", \"patients\", \"*.txt\"), join_path(\"results\", \"*.py\"), \n join_path(\"results\", \"patients\", \"*.csv\"), join_path(\"..\", \"*.ipynb\")]\nf_list = [glob.glob(n) for n in f_list]\nf_list = [n for nn in f_list for n in nn]\n\nf_name = join_path(\"results\", \"bmi.tar.gz\")\nprint \"Writing project files to:\" + f_name\nwith tarfile.open(f_name, \"w:gz\") as tar:\n for n in f_list:\n print \"Adding to the tar file: \" + n\n tar.add(n)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "## Exercise\n\nIn this exercise you are asked to modify the example of this tutorial and generate additional script.\n\n\nThe new generated script will accept Cholesterol level measurements. The output of this script will save the patient's information in a file as we did at the BMI example. The line containing the Cholesterol value should also include the risk level as a string. For example, a Cholesterol value = 300 will be written to the file as follows:\n\n`patient's Cholesterol level: 300 mg/dL (High risk)`\n\nWe have three categories of risk for the Cholesterol level:\n\n| Level mg/dL | Interpretation |\n| ------------- |:--------------:|\n| < 200 | Low risk |\n| 200-240 | Borderline high risk |\n| > 240 | High risk |\n\n\n\nthe The following is required in this exercise:\n\n* Modify the script template to be able to generate both the BMI and Cholesterol level scripts. Because the script template will be different, you will have to change the arguments you pass to the template to generate the BMI script.\n* To further generalize the generated scripts (for both BMI and Cholesterol), the template will accept and use an argument containing the absolute path to the destination directory of the results file.\n* The BMI results will be written to `results/bmi` directory, instead of writing directly to `results` directory. Cholesterol results will similarly be written to `results/cholesterol` directory.\n* For each of the BMI and the Cholesterol, generate the scripts and run them to produce results files.\n* Create a CSV file for each of the BMI and Cholesterol experiments\n* Load the data from each of the CSV files, merge the data in one table, and write it again to a single CSV file" }, { "cell_type": "markdown", "metadata": {}, "source": "### Solution" }, { "cell_type": "code", "collapsed": false, "input": "# Repeating some functions definitions if only the exercise is executed\ndef ensure_dir(d):\n \"\"\" Create a directory and ignore the error if the directory already exist\"\"\"\n import os, errno\n try:\n os.makedirs(d)\n except OSError as exc:\n if exc.errno == errno.EEXIST:\n pass\n else: raise\ndef print_file(f_name):\n \"\"\"Print the contents of a file. The same operation of 'cat', the shell command\"\"\"\n f = open(f_name, 'r')\n print(f.read())\n f.close()", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "import os\nfrom os.path import join as join_path\nfrom string import Template\n\n# Full path to the special topics part 1 directory\nos.chdir(\"/Users/malastm/new_ACM_tutorials/ACM-Python-Tutorials-KAUST-2014/Special_topics_part_1/\")\n\nos.chdir(join_path(os.getcwd(), \"exercise\"))\nprint(\"The current directory changed to: \" + os.getcwd())\nprint(\"Files in this directory: \", os.listdir('.'))\n\nscript_template_txt = \"\"\"#!/usr/bin/env python\n\nimport sys\n\nname = str(sys.argv[1])\nheight = float(sys.argv[2])\nweight = float(sys.argv[3])\nage = int(sys.argv[4])\npath = str(sys.argv[5])\n\n$comp\n\nf = open( path + 'patient_' + name + '_' + str(age) + '.txt', 'w')\n\nf.write( \"patient's name: \" + name + \"\\\\n\")\nf.write( \"patient's age: \" + str(age) + \" Years\\\\n\")\nf.write( \"patient's Weight: \" + str(weight) + \" kgs\\\\n\")\nf.write( \"patient's height: \" + str(height) + \" Meters\\\\n\")\n\n$file_write\n\nf.close()\n\"\"\"", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# First we generate the script for the BMI and test it\nscript_template = Template(script_template_txt)\ncomp = \"val = weight/height**2\"\nfile_write = 'f.write( \"patient\\'s BMI: \" + str(val) + \"\\\\n\")'\nscript_txt = script_template.substitute(comp=comp, file_write=file_write)\n\nensure_dir(join_path('results', 'bmi'))\nwith open(join_path(\"results\", \"bmi\",\"calc_bmi.py\"), 'w') as f:\n f.write(script_txt)\n\nprint_file(join_path(\"results\", \"bmi\",\"calc_bmi.py\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Test the BMI script\n\nimport subprocess\nbmi_results_path = join_path(os.getcwd(), \"results\", 'bmi', 'patients', '') \ncommand = \"Python \" + join_path(\"results\", \"bmi\",\"calc_bmi.py\")\narguments = \" Ahmad 1.77 80 25 \" + bmi_results_path\nensure_dir(join_path(\"results\", 'bmi', 'patients'))\n\n\nprint(\"Execute: \" + command + arguments)\nsubprocess.check_call(command + arguments, shell=True)\n\nprint_file(join_path(\"results\", \"bmi\", \"patients\", \"patient_Ahmad_25.txt\"))\nos.remove(join_path(\"results\", \"bmi\", \"patients\", \"patient_Ahmad_25.txt\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Generate the script for the Cholesterol and test it\ncomp = \"\"\"cholesterol = float(sys.argv[6])\nif cholesterol < 200:\n risk = 'Low risk'\nelif cholesterol < 240:\n risk = 'Borderline high risk'\nelse:\n risk = 'High risk'\n\"\"\"\nfile_write = 'f.write( \"patient\\'s Cholesterol: \" + str(cholesterol) + \" (\" + risk + \")\\\\n\")'\n\nscript_txt = script_template.substitute(comp=comp, file_write=file_write)\n\nensure_dir(join_path(\"results\", 'cholesterol'))\nwith open(join_path(\"results\", \"cholesterol\",\"calc_cholesterol.py\"), 'w') as f:\n f.write(script_txt)\n\nprint_file(join_path(\"results\", \"cholesterol\",\"calc_cholesterol.py\"))\n", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Test the Cholesterol script\nchol_results_path = join_path(os.getcwd(), \"results\", 'cholesterol', 'patients', '') \n\ncommand = \"Python \" + join_path(\"results\", \"cholesterol\",\"calc_cholesterol.py\")\narguments = \" Ahmad 1.77 80 25 \" + chol_results_path + \" 100\"\nensure_dir(join_path(\"results\", 'cholesterol', 'patients'))\n\n\nprint(\"Execute: \" + command + arguments)\nsubprocess.check_call(command + arguments, shell=True)\n\nprint_file(join_path(\"results\", \"cholesterol\", \"patients\", \"patient_Ahmad_25.txt\"))\nos.remove(join_path(\"results\", \"cholesterol\", \"patients\", \"patient_Ahmad_25.txt\"))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Generate the BMI results\n\npatients = [('Williams',19,84,1.74), ('Johnson',23,82, 1.65), \n ('Jones', 25, 70, 1.8), ('Jones', 29, 85, 1.66),\n ('Smith', 30, 120, 1.9), ('ahmad', 35, 50.5, 1.5)]\n\ncommand = \"Python \" + join_path(\"results\", \"bmi\",\"calc_bmi.py\")\n\nfor n, a, w, h in patients:\n arguments = \" %s %f %f %d \" % (n, h, w, a) + bmi_results_path\n subprocess.check_call(command + arguments, shell=True)\n\nos.listdir(bmi_results_path)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Generate the Cholesterol results\n\npatients = [('Williams',19,84,1.74, 150), ('Johnson',23,82, 1.65, 220), \n ('Jones', 25, 70, 1.8, 200), ('Jones', 29, 85, 1.66, 250),\n ('Smith', 30, 120, 1.9, 210), ('ahmad', 35, 50.5, 1.5, 260)]\n\ncommand = \"Python \" + join_path(\"results\", \"cholesterol\",\"calc_cholesterol.py\")\n\nfor n, a, w, h, c in patients:\n arguments = \" %s %f %f %d \" % (n, h, w, a) + chol_results_path + \" %f\" % (c)\n subprocess.check_call(command + arguments, shell=True)\n\nos.listdir(chol_results_path)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "def write_to_csv(data, f_name):\n from csv import DictWriter\n fields = data[0].keys()\n \n with open( f_name, 'w' ) as output_file:\n r = DictWriter(output_file,fieldnames=fields)\n r.writeheader()\n for k in data:\n r.writerow(k)", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Repeating this function from the example\nfrom collections import OrderedDict\ndef parse_bmi(fp):\n fields = [(\"patient's name\", \"name\", str),(\"patient's age\", \"age\", int),\n (\"patient's Weight\", \"weight\", float),(\"patient's height\", \"height\", float),(\"patient's BMI\", \"bmi\", float)]\n record = OrderedDict()\n for f in fields:\n record[f[1]] = 0\n for ln in fp:\n for f in fields:\n if f[0] in ln:\n # Using naive string parsing:\n #val = ln.split(\":\")[1].strip().split(' ')[0]\n val = ln.split(\":\")[1]\n val = val.strip()\n val = val.split(' ')[0]\n val = map(f[2], [val])[0]\n \n record[f[1]] = val\n\n return record", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Parse the BMI results using the BMI parsing script then store to the CSV file\nimport glob\nfiles_list = glob.glob(join_path(\"results\", 'bmi', 'patients', '*.txt') )\nprint (files_list)\n\nentries =[]\nfor rf in files_list: \n with open(rf, 'r') as fp: \n record = parse_bmi(fp)\n entries.append(record)\n\nfor i in entries: print(i)\n \nwrite_to_csv(entries, join_path(\"results\", 'bmi', 'patients', 'records.csv'))\nprint_file(join_path(\"results\", 'bmi', 'patients', 'records.csv'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Create a parsing function for the cholesterol results\nfrom collections import OrderedDict\ndef parse_cholesterol(fp):\n fields = [(\"patient's name\", \"name\", str),(\"patient's age\", \"age\", int),\n (\"patient's Weight\", \"weight\", float),(\"patient's height\", \"height\", float),(\"patient's Cholesterol\", \"chol\", float)]\n record = OrderedDict()\n for f in fields:\n record[f[1]] = 0\n for ln in fp:\n for f in fields:\n if f[0] in ln:\n # Using naive string parsing:\n #val = ln.split(\":\")[1].strip().split(' ')[0]\n val = ln.split(\":\")[1]\n val = val.strip()\n val = val.split(' ')[0]\n val = map(f[2], [val])[0]\n record[f[1]] = val\n \n # Handle the special case that extract the risk string from the cholestrol line\n if \"patient's Cholesterol\" in ln:\n val = ln.split(\":\")[1]\n val = val.strip()\n val = val.split(' ',1)[1][1:-1]\n val = map(str, [val])[0]\n record['chol risk'] = val \n\n return record", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Parse the Cholesterol results using the Cholesterol parsing script\nimport glob\nfiles_list = glob.glob(join_path(\"results\", 'cholesterol', 'patients', '*.txt') )\nprint (files_list)\n\nentries =[]\nfor rf in files_list: \n with open(rf, 'r') as fp: \n record = parse_cholesterol(fp)\n entries.append(record)\n\nfor i in entries: print(i)\n\nwrite_to_csv(entries, join_path(\"results\", 'cholesterol', 'patients', 'records.csv'))\nprint_file(join_path(\"results\", 'cholesterol', 'patients', 'records.csv'))", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "# Read both CSV files, merge the tables, and store all the data in a single CSV file\nfrom csv import DictReader\nfrom operator import itemgetter\n\nwith open(join_path('results', 'bmi', 'patients', 'records.csv'), 'rb') as output_file:\n data = DictReader(output_file)\n bmi_data = [k for k in data]\n\nwith open(join_path('results', 'cholesterol', 'patients', 'records.csv'), 'rb') as output_file:\n data = DictReader(output_file)\n chol_data = [k for k in data]\n\nd0 = entries[0]\nprint(d0)\n\ncommons = itemgetter('age', 'name' ,'weight' , 'height')\n# for each record in the cholesterol results add the BMI field to it from the bmi record of the same patient\nall_data = []\nfor dc in chol_data:\n for db in bmi_data:\n if commons(db) == commons(dc):\n dc['bmi'] = db['bmi']\n all_data.append(dc)\n\nwrite_to_csv(all_data, join_path(\"results\", 'records_all_data.csv'))\n\nprint_file(\"results/records_all_data.csv\")", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "*Copyright 2014, Tareq Malas, ACM Student Member.*" } ], "metadata": {} } ] }