{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Table and Catalog Introduction\n",
      "#### Written by Chris Walter of Duke University for the LSST Dark Energy Science Collaboration. Last Updated 10/2013.\n",
      "\n",
      "This is a small aside to learn about tables and catalogs.  They are used differently than you might naively think, so if you don't know about them, you need to familiarize yourself with how they work.\n",
      "\n",
      "Start by importing the table module."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import lsst.afw.table as afwTable"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A *Schema* is a special header in table that lists the content of every column in the record (name, type, etc). Routines can add new columns to the schema. Think of it as the top header of the table with the labels in it.\n",
      "\n",
      "*Tables* hold a *Schema* and an area of memory that holds records or entries.\n",
      "\n",
      "When you make a record in a table it produces a record in memory corresponding to what is in the schema and returns a pointer to the record.  You have to keep track of these yourself.  There is no way to iterate over the records in the table.  You need the pointers to do that.  So, mostly you won't use a table by itself unless it has just a few entries and you are tracking them yourself.\n",
      "\n",
      "*Catalogs* hold a table and also a list of pointers that that point at the records.  You can iterate over those.  So, for most of the things where you want a data structure where you put entries in and then later loop over them or get things out you should use a catalog.\n",
      "\n",
      "Note: you must make the record through the catalog interface or the catalog won't know about the record.  You add new records through the addNew() or append() functions."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Let's make a catalog to demostrate this and see how to use it\n",
      "\n",
      "# You always need a Schema for any table.\n",
      "schema  = afwTable.SourceTable.makeMinimalSchema()\n",
      "\n",
      "# Let's add some columns to our table by adding them to the Schema.\n",
      "k1 = schema.addField(\"f1\", type=int, doc=\"doc for f1\")\n",
      "k2 = schema.addField(\"f2\", type=numpy.float32, doc=\"doc for f2\", units=\"units for f2\")\n",
      "k3 = schema.addField(\"f3\", type=\"ArrayD\", doc=\"doc for f3\", units=\"units for f2\", size=5)\n",
      "\n",
      "# This way of producing a catalog takes a schema and makes the table internally\n",
      "# and stores it.  You can also make the table yourself and pass it to the \n",
      "# catalog.\n",
      "catalog = afwTable.SourceCatalog(schema)\n",
      "\n",
      "# Make a new record, and set the values of it.\n",
      "record = catalog.addNew()\n",
      "record.set(k1, 2)\n",
      "record.set(k2, 3.14)\n",
      "record[k3] = numpy.random.randn(5) # ndarray::Array == numpy.ndarray in Python\n",
      "\n",
      "record = catalog.addNew()\n",
      "record.set(k1, 3)\n",
      "record.set(k2, 4.14)\n",
      "record[k3] = numpy.random.randn(5) \n",
      "\n",
      "print schema.getNames() # printout our schema  \n",
      "print catalog[k1]       # Print the list of the k1 column entries\n",
      "print catalog['f1']     # The same thing but access through the name\n",
      "print catalog[0].get('f1') #The same thing but only for the 1st record\n",
      "\n",
      "# Now print out the whole catalog\n",
      "for i, item in enumerate(catalog):\n",
      "    print \"%d, %f, %s\" % (item.get('f1'), item.get('f2'), item.get('f3'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "('coord', 'f1', 'f2', 'f3', 'id', 'parent')\n",
        "[2 3]\n",
        "[2 3]\n",
        "2\n",
        "2, 3.140000, [ 0.29312333 -0.74252352 -0.32421778  2.46671887  1.30210567]\n",
        "3, 4.140000, [ 1.32837026 -0.59715475 -1.1031426   0.6457858  -0.76621458]\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "One useful command is getSchema().  You can use it to see the expanded description of all the fields in the schema."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print catalog.getSchema()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Schema(\n",
        "    (Field['L'](name=\"id\", doc=\"unique ID\"), Key<L>(offset=0, nElements=1)),\n",
        "    (Field['Coord'](name=\"coord\", doc=\"position in ra/dec\", units=\"IRCS; radians\"), Key<Coord>(offset=8, nElements=2)),\n",
        "    (Field['L'](name=\"parent\", doc=\"unique ID of parent source\"), Key<L>(offset=24, nElements=1)),\n",
        "    (Field['I'](name=\"f1\", doc=\"doc for f1\"), Key<I>(offset=32, nElements=1)),\n",
        "    (Field['F'](name=\"f2\", doc=\"doc for f2\", units=\"units for f2\"), Key<F>(offset=36, nElements=1)),\n",
        "    (Field['ArrayD'](name=\"f3\", doc=\"doc for f3\", units=\"units for f2\", size=5), Key<ArrayD>(offset=40, nElements=5)),\n",
        ")\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 4
    }
   ],
   "metadata": {}
  }
 ]
}