{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "The information presented here is placed in the public domain, and was written by\n", "[Doug Burke](https://plus.google.com/+DougBurke). The \n", "notebook used to create this page is available\n", "([empty](https://github.com/DougBurke/astro-haskell/blob/master/notebooks/a%20FITSfull%20of%20ARF.ipynb)\n", "or \n", "[filled](https://github.com/DougBurke/astro-haskell/blob/master/finished/a%20FITSfull%20of%20ARF.ipynb)),\n", "and questions can be asked using the\n", "[GitHub issues page](https://github.com/DougBurke/astro-haskell/issues)\n", "or via Twitter: .\n", "\n", "It will probably make a lot-more sense if you have already read the\n", "[previous entries](https://github.com/DougBurke/astro-haskell/blob/master/README.md)\n", "in this \"series\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading in Astronomy data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Today I'm going to read in some Astronomy data and plot it. This will require\n", "manually parsing a \n", "[FITS (Flexible Image Transport System)](http://en.wikipedia.org/wiki/FITS)\n", "[binary table](https://archive.stsci.edu/fits/fits_standard/node67.html)\n", "file$^\\dagger$; it was going to be more automated, but the post was getting too long\n", "so I decided to leave that part for another day \n", "and just stick with the manual code below (one of my current pet ideas\n", "is to provide a simple FITS back end to the \n", "[Frames](http://acowley.github.io/Frames/)\n", "library for reading in data from FITS files, so maybe I'll write something up soon if I get time to work\n", "on it).\n", "\n", "I have updated to the development version of `IHaskell`, rather than the version on\n", "[Hackage](https://hackage.haskell.org/), to see if it fixed some issues I was seeing\n", "(which it did). One of the changes this has made is to jump to IPython 3.0 - a.k.a. the \n", "[Jupyter project](https://jupyter.org/) - which will hopefully not be a problem for\n", "people playing along at home.\n", "\n", "$^\\dagger$ We normally use the acronym - i.e. FITS - and forget about the fact that we are\n", "using what is ostensibly an image format to read in tabular data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Last time the notebook was run" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import Data.Time\n", "getCurrentTime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What on earth is an ARF?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file I want to display contains an \"effective area\"\n", "curve for a source observed by the \n", "[Chandra X-ray Observatory](http://chandra.harvard.edu/). This curve is known,\n", "to X-ray astronomers, as an \n", "[ARF (Auxiliary Response File)](http://cxc.harvard.edu/ciao/dictionary/arf.html)$^\\dagger$.\n", "I can use one of the tools that we provide as part of\n", "[CIAO](http://cxc.harvard.edu/ciao/)$^\\ast$ to display some important information about the\n", "file.\n", "\n", "$^\\dagger$ Wow, a [TLA](http://en.wikipedia.org/wiki/Three-letter_acronym) that isn't on Wikipedia!\n", "\n", "$^\\ast$ We in the Science Data Systems group in the CXC \n", "are particularly proud of this contrived acronym, even if \n", "[we can't distinguish between a language and a dialect](http://cxc.harvard.edu/ciao/ciao_note.html).\n", "\n", "The [`dmlist`](http://cxc.harvard.edu/ciao/ahelp/dmlist.html) tool is used to display the structure and contents of the file formats\n", "we use when analyzing X-ray data:\n", "\n", "```\n", "% dmlist src.arf blocks\n", " \n", "--------------------------------------------------------------------------------\n", "Dataset: src.arf\n", "--------------------------------------------------------------------------------\n", " \n", " Block Name Type Dimensions\n", "--------------------------------------------------------------------------------\n", "Block 1: PRIMARY Null \n", "Block 2: SPECRESP Table 3 cols x 1070 rows\n", "```\n", "\n", "So, there are two \"blocks\" of data in the file, with the first one being empty. This is actually\n", "a consequence of the I in FITS, where the first block$^\\ddagger$ can either be empty or contain\n", "image data. A result of this is that tabular data is always relagated to the second (or later)\n", "block. \n", "\n", "We can also see what columns are stored in the table:\n", "\n", "```\n", "% dmlist src.arf cols\n", " \n", "--------------------------------------------------------------------------------\n", "Columns for Table Block SPECRESP\n", "--------------------------------------------------------------------------------\n", " \n", "ColNo Name Unit Type Range\n", " 1 ENERG_LO keV Real4 -Inf:+Inf Min Energy\n", " 2 ENERG_HI keV Real4 -Inf:+Inf Max Energy\n", " 3 SPECRESP cm**2 Real4 -Inf:+Inf Effective Area\n", "```\n", "\n", "and even see some of the data (which will be useful later, when I want to check that I've\n", "actually read things in properly):\n", "\n", "```\n", "% dmlist \"src.arf[#row=1:5]\" data\n", " \n", "--------------------------------------------------------------------------------\n", "Data for Table Block SPECRESP\n", "--------------------------------------------------------------------------------\n", " \n", "ROW ENERG_LO ENERG_HI SPECRESP\n", " \n", " 1 0.30000001192093 0.31000000238419 4.8743429184\n", " 2 0.31000000238419 0.31999999284744 14.8292617798\n", " 3 0.31999999284744 0.33000001311302 21.3022918701\n", " 4 0.33000001311302 0.34000000357628 28.5149517059\n", " 5 0.34000000357628 0.34999999403954 35.3988304138\n", "```\n", "\n", "$^\\ddagger$ I really should use the term `HDU` - which means Header/Data Unit - rather than block,\n", "but in CIAO we tend to use the term block, and as the\n", "Chandra X-ray Center pay my bills I'm sticking with this terminology.\n", "\n", "So, how can I read in the data? Well, I happen to know that the data is stored in blocks of text (US ASCII) for the metadata, interspersed by binary blocks for the actual data. More detailed information is\n", "available at the\n", "[FITS primer](http://fits.gsfc.nasa.gov/fits_primer.html)\n", "and the\n", "[FITS standard](https://archive.stsci.edu/fits/fits_standard/fits_standard.html)\n", "will also be required reading (if you are having trouble sleeping). I'll just add a note\n", "to say that in Astronomy we use the FITS standard but then layer on more domain-specific standards\n", "(occasionally these are \"best practices\" rather than any real form of a standard) to enhance the\n", "data representation (in particular for the metadata). This isn't really important for what I'm about to\n", "do, but I thought I'd mention it. There should also be addendums to the standard, since it has \n", "changed over time, but they aren't important for today's work.\n", "\n", "I'll start by loading in the file as a [bytestring](https://hackage.haskell.org/package/bytestring),\n", "picking the `Char8` variant since the text is in US ASCII format. It would be interesting to do\n", "this via a streaming interface, but for this example that would be overkill." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import qualified Data.ByteString.Char8 as B8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several representations for text-like data in Haskell: the language standard defines\n", "the `String` type, then there are `ByteString` and `Text` values which are more efficient (and have\n", "subtly-different use cases). This means that we end up with a bunch of routines \n", "that do the same thing,\n", "and often have the same name, but work on different data types$^\\dagger$.\n", "For example, there's a `readFile` function in the standard Haskell prelude$^\\ddagger$, but \n", "this returns a `String`, so I need to use the ByteString version, which I indicate\n", "using the prefix I just defined: `B8.readFile`.\n", "\n", "$^\\dagger$ There's enough semantic differences in operations that a universal \"string-like\" typeclass hasn't \n", "become popular enough in the community.\n", "\n", "$^\\ddagger$ the prelude is the name for the \"default set of symbols\" you get to use in Haskell (as it's a module\n", "name I should refer to it as `Prelude`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cts <- B8.readFile \"../data/src.arf\"\n", ":type cts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how large the file is\n", "(since this is the \n", "[`Char8`](https://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html)\n", "version of a bytestring, we don't have to worry about non-eight-bit characters):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "B8.length cts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, FITS files are written in chunks of `2880` bytes - that is, the file is padded with\n", "trailing space characters (or null bytes) if needed - so how many chunks are there?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "len = B8.length cts\n", "len / 2880" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Argh!!! This type-safe language has stopped me from making a type error and it's annoying! The length function returns an `Int` but the division operator `/` requires a \n", "[`Fractional` constraint](http://hackage.haskell.org/package/base-4.7.0.2/docs/Prelude.html#t:Fractional), and the\n", "error is telling me that integers (or, at least values with a type of `Int`), aren't fractional. What I need\n", "is to use functions from the\n", "[`Integral` typeclass](http://hackage.haskell.org/package/base-4.7.0.2/docs/Prelude.html#t:Integral): in this case\n", "I'm going to use\n", "[`divMod`](http://hackage.haskell.org/package/base-4.7.0.2/docs/Prelude.html#v:quotRem) to return both the number\n", "of blocks and any remainder." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "len `divMod` 2880" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, the file contains 12 chunks. As a reminder, or perhaps I haven't mentioned it before, binary functions such as `divMod` can be written in\n", "\"infix\" form by \"quoting\" there name; that is, the above is equal to\n", "\n", "```\n", "divMod len 2880\n", "```\n", "\n", "I tend to use the infix form if it makes it clearer (to me) what the arguments mean. There are also those\n", "binary functions - such as `+` - which are defined as being \"infix\" so do not need the back ticks; \n", "however, when used in prefix form they must be surrounded by `()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "1 + 2\n", "(+) 1 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the value `2880` is going to be used a few times below, let's \"name\" it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "chunk :: Int\n", "chunk = 2880" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, FITS files start with a Header Unit, that is `2880*n` characters of ASCII text (where `n` is an integer),\n", "and these header units are broken up every 80 characters (known as a \"card\"). So, let's look at the \n", "contents of the first card:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "B8.take 80 cts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each card can be considered to be a keyword/value pair with an optional comment (or description). The\n", "values are typed, in the sense that there are strings, booleans, integers, or floats, but there's no\n", "syntactic contraint that a given key has a specific type. The layout is pretty simple\n", "\n", " - name of the keyword in the first 8 characters (which can only be `A-Z`, `0-9`, `-` or `_`; I was under\n", " the impression that the name had to start with `A-Z` but this may not be the case,\n", " or it's clarified in some later document)\n", " \n", " - characters 9 and 10 are `\"= \"` (actually, there's a few informational keywords that don't have\n", " this, but I'm just going to ignore those today, since they are metadata about the metadata)\n", "\n", " - a string representation of the value: a character string, boolean, integer, floating point,\n", " or complex number\n", " \n", " - an optional description (or comment) which is started with the `/` character.\n", " \n", "A more detailed description of these elements can be found in the\n", "[FITS standard](https://archive.stsci.edu/fits/fits_standard/node26.html).\n", "\n", "At this point you should be wondering why Astronomers use a format that requires\n", "them to write their own parser. Well, we do have standard parsers - a recent entrant is the\n", "[`astropy` Python package](http://www.astropy.org/) and the most-used compiled version\n", "is the\n", "[CFITSIO library](http://heasarc.gsfc.nasa.gov/fitsio/fitsio.html) - but I wanted to\n", "explore some simple parsing in Haskell.\n", "\n", "Let's start by splitting up the input into cards, that is, 80-character chunks, using\n", "[splitAt](http://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html#v:splitAt):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "-- As a reminder, type introduces a synonym which can be used to help document a\n", "-- function (in this case, which part of the return is the card and which the\n", "-- remaining input), but it provides no \"extra\" type safety, since it is\n", "-- perfectly fine to treat a Card as a ByteString (since that is what it is),\n", "-- or vice versa.\n", "--\n", "type Card = B8.ByteString\n", "\n", "-- | Given a string, split off the next 80 characters and return the remaining values.\n", "getCard :: B8.ByteString -> (Card, B8.ByteString)\n", "getCard = B8.splitAt 80" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here I've taken advantage of \n", "[partial application](http://en.wikipedia.org/wiki/Partial_application) to define `getCard`. I could\n", "have explicitly included the input variable, as in the following definition\n", "\n", "```\n", "getCard bs = B8.splitAt 80 bs\n", "```\n", "\n", "but it's common to see the partially-applied version (one argument for this form is that you focus more on the \n", "code because you aren't distracted by variables that aren't actually \"doing anything\"). This approach\n", "is so common that the \"standard\" [Haskell linting tool](http://en.wikipedia.org/wiki/Lint_%28software%29)\n", "will complain if it thinks you've not been eta-reducing enough. The notebook runs this linter by default, \n", "so we can see an example of this suggestion:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "test bs = B8.splitAt 80 bs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where the **Eta reduce** term comes from the lambda calculus,\n", "and is mentioned in this \n", "[StackOverflow question](http://stackoverflow.com/questions/5793843/what-does-eta-reduce-mean-in-the-context-of-hlint). Note that if I had given `test` a signature, as I did with `getCard`, then the suggestion would not\n", "have been created (and I'm too lazy to find out why).\n", "\n", "As a quick check of `getCard`, here's the first card, where I use\n", "[pattern matching](http://learnyouahaskell.com/syntax-in-functions#pattern-matching)\n", "to deconstruct the return value\n", "(i.e. separate out the pair), and the '_' syntax tells the compiler that I am not interested in the \n", "second element of this tuple." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "(a,_) = getCard cts\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since pairs are\n", "a common form of tuple, there are standard routines to extract the first or second element\n", "of a pair: they are called\n", "[`fst`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:fst)\n", "and \n", "[`snd`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:snd)\n", "respective, since vowels are apparently in short supply!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type fst\n", ":type snd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the `fst` function, I can rewrite the above to avoid having to even talk about the second\n", "value returned by `getCard`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fst (getCard cts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `getCard` function isn't that useful on its own. What I really want is something that splits\n", "every 80 characters, returning an array of values$^\\dagger$. For this example I want to use a\n", "[sequence](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html)\n", "to represent the array (rather than the standard Haskell list), since it should be more efficient\n", "for some of the data access patterns we would want when accessing the file metadata$\\ddagger$.\n", "As with string-like operations, containers (such as lists, sequences, and vectors) \n", "have many common names, so I use a qualified import of the `Data.Sequence` module\n", "to avoid name clashes. I then explicitly import several symbols that do not clash,\n", "to avoid having to prefix them with \"`Seq.`\", as `Seq.|>` \n", "and `Seq.><` aren't the easiest symbols to read.\n", "\n", "$^\\dagger$ There is the Haskell \n", "[split](https://hackage.haskell.org/package/split) \n", "package which provides many routines for splitting\n", "up lists, but it's also easy to write your own, as I do here.\n", "\n", "$^\\ddagger$ This is why I also import the symbols `ViewR`, `|>`, `><`, and `viewr`, which involve\n", "creating and breaking-up sequences." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import qualified Data.Sequence as Seq\n", "import Data.Sequence (ViewR(..), (|>), (><), viewr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I want a function that will take a bytestring, split off the first 80 characters, and then call itself\n", "on the remaining string (in Haskell it's common to use recursion when you'd use the equivalent of a\n", "Python `for` loop). Here's an initial version which uses `getCard` to break down the input, stopping when it is empty. The syntax\n", "[`acc |> card`](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#v:-124--62-)\n", "adds$^\\dagger$ `card` onto the end of the sequence `acc`, so the `go` step adds on the\n", "current card to the end of the accumulator (the first argument) until the input string is\n", "empty, at which time the sequence is returned.\n", "\n", "$^\\dagger$ Actually, it creates a new sequence - formed by combining the inputs - rather than modifying\n", "the contents of the `card` sequence. That is, it is not Python's `acc.append(card)` but \n", "more like `newacc = acc[:]; newacc.append(card)`, but without the need to actually copy data, due to \n", "the way data is \"shared\" in Haskell." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getCards1 :: B8.ByteString -> Seq.Seq Card\n", "getCards1 bs = go Seq.empty bs\n", " where\n", " go acc xs | B8.null xs = acc\n", " | otherwise = let (card,todo) = getCard xs\n", " newacc = acc |> card\n", " in go newacc todo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function above introduces [guards](http://learnyouahaskell.com/syntax-in-functions#guards-guards), that is\n", "the syntax\n", "\n", "```\n", "name args | conditional1 = value1\n", " | conditional2 = value2\n", " ...\n", " | otherwise = valueN\n", "```\n", "\n", "The terms to the right of the `|` character (i.e. `conditonal1`...) are the guards, and act like a giant if statement, in that they are checked for from top to bottom, with the first condition that evaluates to `True` \"winning\" (the `otherwise` symbol is just another name for `True`, so\n", "it acts like the `default` label in C-style case statements). \n", "\n", "This is an example of a routine for which \"eta reduction\" makes sense, in that the `bs` argument sent in to `getCards1` is only used in the initial call to\n", "`go`, but I left it in for clarity (and because I'm about to show a different version\n", "which uses the era-reduced form!).\n", "In this new version, I take advantage of the\n", "[unfold pattern](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.1735)\n", "([postscript version](http://www.cs.ox.ac.uk/jeremy.gibbons/publications/unfold.ps.gz)) - which\n", "is available for most Haskell container data types - to\n", "deal with the building of the sequence.\n", "Here I use\n", "[`Seq.unfoldr`](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#v:unfoldr),\n", "which is given a routine that \n", "deconstructs the bytestring (i.e. it returns the next element and the remainder of the bytestring).\n", "This sounds a lot like `getCard`, although it's a slightly-modified version, as shown below, due\n", "to how `unfoldr` is defined:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getCards :: B8.ByteString -> Seq.Seq Card\n", "getCards = Seq.unfoldr go\n", " where\n", " go bs | B8.null bs = Nothing\n", " | otherwise = Just (getCard bs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explain this a bit further, let's look at `unfoldr`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type Seq.unfoldr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first argument is a function which takes a value (of type `b`), and returns a value\n", "with a \n", "[`Maybe` type](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Maybe).\n", "This is used in Haskell to represent optional values, since there are two cases:\n", "\n", " - `Nothing`, to represent the empty (or missing, or \"null\", or ...) case\n", " - `Just v`, which indicates that there is a result, with a value `v`\n", "\n", "In this case, the function uses `Nothing` to indicate that the recursion is to stop, otherwise\n", "it returns a pair of values as `Just (a,b)`, where `a` is the value to store and `b` is the\n", "new \"seed\" value (i.e. input to the next call of the function).\n", "\n", "Let's try with a simple example: a routine that is given an integer and returns the value converted to a string, along with the initial value reduced by 1. To indicate it has finished, the routine will stop (i.e. return `Nothing`) if the integer value is less than 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "silly :: Int -> Maybe (String, Int)\n", "silly a = if a < 1 then Nothing else Just (show a, a-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's several examples of its behavior:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "silly 5\n", "silly 1\n", "silly 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When used with `unfoldr`, we get a routine that takes an integer and returns a sequence of strings:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type Seq.unfoldr silly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hopefully, if I've been able to explain this, the following output should come as no surprise (well, apart from\n", "the \n", "[`fromList` part](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#v:fromList), \n", "which is part of the \n", "[`Show`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Show)\n", "representation of a Sequence):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.unfoldr silly 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With that digression out of the way, let's get back to some parsing. As the data comes in\n", "chunks of 2880 bytes, let's just try with the first chunk\n", "([take](http://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html#v:take)\n", "returns the first `n` bytes of the string):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cards = getCards (B8.take chunk cts)\n", ":type cards" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.length cards" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can use \n", "[array indexing](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#v:index)\n", "to access the contents:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.index cards 0" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.index cards 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.index cards 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As `NAXIS=0`, there's no data associated with this block, just metadata. The question is, just how much\n", "metadata? Now, due to the way that the header units are defined, there are three cases for the last card\n", "in a chunk of 2880 bytes:\n", "\n", " - a normal keyword\n", " - the END keyword\n", " - 80 spaces as a \"filler\" card after the END keyword (the standard says that the space\n", " character should be used but occasionally you find files that don't match the standard;\n", " fortunately I don't have to worry about that complication today).\n", " \n", "This means that all I need to do is check the first four characters of the last card: if it's\n", "`\"END \"` or `\" \"` then it's the end of the header data. I can use the\n", "[`viewr`](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#v:viewr)\n", "function to extract the last card (this is one of the advantages of using a sequence\n", "over a Haskell list, namely efficient access to both ends of the data). The return value has the type\n", "[`ViewR`](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Sequence.html#t:ViewR)\n", "which can either be empty (`EmptyR`), or `lcard :> lelem`, where\n", "`lelem` is the last element and `lcard` are all the cards preceeding `lelem`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "lcards :> lelem = viewr cards" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this, the last card of this block is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "lelem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above call to `viewr` - or, more particularly, the deconstruction\n", "of the return value using `:>` - is incomplete, because it does not deal\n", "with all-possible values. In this particular case we know that `cards`\n", "is not empty, so the answer can not be `EmptyR`. When given an arbitrary\n", "sequence, the empty case must also be accounted for, and I'll show\n", "code later on that shows how this can be done.\n", "\n", "This piece of code is a good example showing how Haskell's\n", "[laziness](http://en.wikipedia.org/wiki/Lazy_evaluation)\n", "can result in surprising behavior. It's not\n", "obvious that a function has actually been evaluated, even\n", "after a statement like\n", "\n", "```\n", "lcards :> lelem = viewr cards\n", "```\n", "\n", "It's a\n", "bit like poor-old Schrödinger's cat: we won't know the state until the compiler evaluates the\n", "answer (\"opens the box\"). In the code above, it was only when `lelem` was\n", "displayed that we could be sure that the `viewr` call had \"run\".\n", "As an example of this, note what happens when `viewr` is called with an empty sequence:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "xxx :> yyy = viewr Seq.empty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's no error at this point. It is only when we do something with one of the values that \n", "the error makes itself known:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.length xxx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Irrefutable pattern` failure means that\n", "it could not match the expected value of `xxx :> yyy` with the actual value, which in this case is `EmptyR`.\n", "\n", "Getting back to the header parsing, the fact that `lelem` is all blank indicates that this is the\n", "end of the first block. Let's write a quick function to encode this logic, on the off chance that I may\n", "want to use it again..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hasEndCard :: Seq.Seq Card -> Bool\n", "hasEndCard cards = case viewr cards of\n", " _ :> card -> any (`B8.isPrefixOf` card) [B8.pack \" \", B8.pack \"END \"]\n", " EmptyR -> True -- treat an empty sequence as indicating the end of a block " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I use the `case` syntax so that I can safely handle an empty input, rather than\n", "causing a run-time error (since they're somewhat frowned upon in Haskell!). If the sequence\n", "is not empty then I use pattern matching to \"deconstruct\" the return value; that is\n", "I use the syntax `_ :> card` to provide a name for the last card and to tell\n", "the compiler I don't care about the preceeding elements in the sequence \n", "(i.e. the \"`_`\" part).\n", "The check on the last card is performed using the `any` routine, which has the signature:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type any" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As I want to check whether `card` starts with `\" \"` or `\"END \"`, then the function sent\n", "to `any` takes a prefix and checks if matches the start of `card`, and the array of values\n", "are the prefixes to use. Note that there are several ways this could have been written, for instance\n", "being explicit with the functions (rather than using partial application) and moving the\n", "[`pack`](https://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html#v:pack)\n", "call into the anonymous function:\n", "\n", "```\n", "any (\\prefix -> (B8.pack prefix) `B8.isPrefixOf` card) [\" \", \"END \"]\n", "```\n", "\n", "The last check handles calls when the input sequence is empty, \n", "in which I somewhat-arbitrarily decided to return `True`.\n", "Here I was explicit in the case statement; that is, I wrote\n", "\n", "```\n", "EmptyR -> True\n", "```\n", "\n", "but you will often see the \"default\" case written as\n", "\n", "```\n", "_ -> True\n", "```\n", "\n", "A quick check that it works as expected:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hasEndCard cards" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, we can skip to the next chunk, which will be a header unit for the second block,\n", "but this time there's more than 2880 bytes of metadata\n", "(the [`drop`](http://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html#v:drop)\n", "call ignores the first `chunk` bytes of `cts`, so the `take`/`drop` \n", "calls are equivalent to Python's [chunk:2*chunk] array slice syntax):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cards2 = getCards (B8.take chunk (B8.drop chunk cts))\n", "hasEndCard cards2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What I want is a routine that will strip off 2880-byte chunks, convert them to\n", "cards, continuing until an `END` card is found. This sounds like an extension\n", "to how `getCards` works, in that it requires repeated application of\n", "the routine. It would also be nice if it\n", "returned the unused data, since we will want that when extracting the data values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getUnits :: B8.ByteString -> (Seq.Seq Card, B8.ByteString)\n", "getUnits = go Seq.empty \n", " where\n", " -- splitAt will work if the input text is smaller than the chunk\n", " -- size, but I include the check to make the code a bit clearer\n", " -- in intent (also, if the file is a valid FITS file then this\n", " -- first check should never be fired).\n", " --\n", " go cards bs | B8.length bs < chunk = (cards, bs)\n", " | otherwise = let (ls,rs) = B8.splitAt chunk bs\n", " newCards = getCards ls\n", " -- the >< operator appends the two sequences together\n", " combined = cards >< newCards\n", " in if hasEndCard newCards\n", " then (combined, rs)\n", " else go combined rs\n", " \n", "(units, bdata) = getUnits (B8.drop chunk cts)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.length units\n", "Seq.index units 0\n", "Seq.index units 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the data section, we can see it's binary encoded, which is no surprise since\n", "the `XTENSION` keyword was set to `BINTABLE`. I need access to the header information to\n", "be able to parse this data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "B8.take 20 bdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, how do we handle the metadata section? Well, the FITS header can be viewed as a glorified key-value map,\n", "so let's use this as an API (this means that I lose ordering information, and informational keywords\n", "such as `HISTORY` and `COMMENT`, but they aren't needed to find out how to decode the data section).\n", "First I load up some useful modules, which include the \n", "[`Map` API](https://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Map.html),\n", "the \n", "[`Foldable` module](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Foldable.html),\n", "and the\n", "[`isSpace`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Char.html#v:isSpace)\n", "and\n", "[`catMaybes`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Maybe.html#v:catMaybes)\n", "functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import qualified Data.Map as Map\n", "import qualified Data.Foldable as Fold\n", "import Data.Char (isSpace)\n", "import Data.Maybe (catMaybes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next up is a helper function - `rstrip` - and one to\n", "strip out the (key,value) pairs from a card (`splitCard`).\n", "\n", "The `rstrip` function takes advantage of the \n", "[`unsnoc` function](https://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString-Char8.html#v:unsnoc)\n", "function that returns a bytestring containing all-but-the-last character,\n", "and the last character. The routine recursively calls\n", "itself until a non-whitespace character is found. I believe that the name \n", "`unsnoc` comes\n", "from a \"play\" on words, in that `cons` is used to describe adding an element to the\n", "start of a list ([Lisp heritage](http://www.gnu.org/software/emacs/manual/html_node/elisp/Cons-Cells.html)),\n", "so that the act of adding an element to the end of a list\n", "is the reverse of `cons`, namely `snoc`.\n", "The `un` part then comes from the fact that here we are deconstructing, rather than\n", "creating, the string.\n", "\n", "The `splitCard` function decides whether a card contains a key/value pair,\n", "returning `Nothing` if it doesn't or splitting up the card into\n", "its parts if it does (wrapping the result in `Just`). To\n", "make downstream processing easier, the key and value strings\n", "are \"cleaned up\" (trailing whitespace is removed, so it's fortuitous\n", "that I've just written a routine to do this) as well\n", "as converted from a `ByteString` to a `String`.\n", "\n", "Now, at first glance, `splitCard` is inefficent; \n", "for example, it looks like `key` and `val` are always going\n", "to be evaluated, even when `Nothing` is returned. However,\n", "there's three points to consider:\n", "\n", " 1. we shouldn't really be worrying about performance until\n", " it becomes obvious that it's a problem;\n", " \n", " 2. thanks to Haskell's laziness, code is evaluated \"on demand\",\n", " so routines can have very-different performance\n", " characteristics compared to the same code written in a language like\n", " Python;\n", " \n", " 3. the Haskell compiler can perform some rather impressive\n", " optimisations (although this is less relevant when running\n", " in an interactive environment like this IHaskell notebook). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "-- | Remove trailing whitespace from a bytestring.\n", "--\n", "rstrip :: B8.ByteString -> B8.ByteString\n", "rstrip bs = case B8.unsnoc bs of\n", " Just (nbs, c) -> if isSpace c then rstrip nbs else bs\n", " _ -> bs\n", "\n", "-- | If a card has a keyword and value (i.e. is of the form\n", "-- 'keyword = value') return the two parts, after stripping\n", "-- out trailing whitespace.\n", "--\n", "splitCard :: \n", " Card \n", " -> Maybe (String, String) -- ^ (keyword, value)\n", "splitCard card = \n", " let (l,r) = B8.splitAt 8 card\n", " (cs,v) = B8.splitAt 2 r\n", " key = rstrip l\n", " val = rstrip (B8.dropWhile isSpace v)\n", " sep = B8.pack \"= \"\n", " in if cs == sep\n", " then Just (B8.unpack key, B8.unpack val)\n", " else Nothing " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see what `splitCard` does by giving it a card:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Seq.index units 1\n", "splitCard (Seq.index units 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason for converting to a (key,value) pair is that this is the format\n", "used to populate maps. The `map1` map (or dictionary) - which we create\n", "below - contains one keyword,\n", "`\"testkey\"`, whose value is `\"value\"`. The\n", "[`Map.lookup` function](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Map-Lazy.html#v:lookup)\n", "is used to query the map for a keyword: if it doesn't exist then\n", "`Nothing` is returned, otherwise the value is returned wrapped inside a\n", "`Just`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "map1 = Map.fromList [(\"testkey\", \"value\")]\n", "\n", "Map.lookup \"42\" map1\n", "Map.lookup \"testkey\" map1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The initial version of the conversion routine - i.e. to create the key/value map\n", "from a sequence of header cards - is given below, as `cardsToMap2`\n", "(the awkward numeric suffix is because I'm \"counting down\" to the final version \n", "of the routine):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cardsToMap2 :: Seq.Seq Card -> Map.Map String String\n", "cardsToMap2 cs = \n", " let -- step 1\n", " xs :: [Card]\n", " xs = Fold.toList cs\n", "\n", " -- step 2\n", " ys :: [Maybe (String,String)]\n", " ys = map splitCard xs\n", "\n", " -- step 3\n", " zs :: [(String,String)]\n", " zs = catMaybes ys\n", "\n", " in Map.fromList zs -- step 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The steps have been explicitly broken out and I have included the types of the\n", "terms for documentation. The four steps are:\n", "\n", " 1. convert from a sequence into a list using\n", " [`Fold.toList`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Foldable.html#v:toList)\n", " (the `Foldable` module provides a number of generic routines useful for\n", " sequences - i.e. it somewhat contradicts my earlier comment that there's no\n", " common abstraction - but it is quite an \"abstract\" abstraction, so I\n", " have stayed away from using it in this notebook where possible);\n", "\n", " 2. apply `splitCard` to each card;\n", " \n", " 3. filter out all the `Nothing` entries created by `splitCard`\n", " and extract the values from the `Just` elements using the\n", " [`catMaybes`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Maybe.html#v:catMaybes)\n", " function (which has a signature `[Maybe a] -> [a]`);\n", "\n", " 4. convert this list into a map using the\n", " [`fromList` function](https://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Map-Lazy.html#v:fromList).\n", " \n", "This function can also be written in one line:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cardsToMap1 :: Seq.Seq Card -> Map.Map String String\n", "cardsToMap1 cs = Map.fromList (catMaybes (map splitCard (Fold.toList cs)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which I only introduced as it will hopefully make it more obvious what the final version of the\n", "routine is doing (shown below). However, it also highlights a code simplification, since\n", "the pattern `catMaybes (map f xs)` can be replaced with `mapMaybe f xs`.\n", "The \n", "[`mapMaybe`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Maybe.html#v:mapMaybe)\n", "function has the signature" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import Data.Maybe (mapMaybe)\n", ":type mapMaybe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and will apply the function (here, `splitCard`) to every element of the list,\n", "throwing out all the `Nothing` values and extracting the successful values\n", "by removing the `Just` constructor. This modification was proffered by\n", "the Haskell linter ([`hlint`](http://community.haskell.org/~ndm/hlint/)),\n", "which also suggested the eta-reduction step earlier.\n", "\n", "The final version of the conversion routine is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cardsToMap :: Seq.Seq Card -> Map.Map String String\n", "cardsToMap = Map.fromList . mapMaybe splitCard . Fold.toList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this version I have used\n", "[point-free syntax](http://en.wikipedia.org/wiki/Tacit_programming)\n", "(which - for the purpose of this notebook - means without including parameter names),\n", "[partial application](http://en.wikipedia.org/wiki/Partial_application),\n", "and\n", "[function composition](http://en.wikipedia.org/wiki/Function_composition_%28computer_science%29) \n", "(the \"`.`\" operator)\n", "to combine the steps. This is all a big digression, since the three versions of the\n", "routine all do the same thing, but I mention it here as this style is\n", "common in Haskell code.\n", "\n", "The `.` operator is used in two ways in the function `cardsToMap`:\n", "\n", " 1. to indicate the namespace of a symbol (e.g. that `filter` is taken from the\n", " `Seq` module);\n", " \n", " 2. to combine functions.\n", " \n", "In the second case, `.` is itself a function, with the following signature" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type (.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is, if `f` and `g` are single-argument functions, then `f . g` and `g . f` are both single argument functions,\n", "where `f . g` means the same as `f (g x)` and `g . f` means `g (f x)`. So, function composition is read right to left. I find that this syntax can help focus on the way information flows through a sequence of functions,\n", "but it can definitely be hard to read, and occasionally \n", "[taken way past sensible extremes](https://wiki.haskell.org/Pointfree#Problems_with_pointfree)!\n", "\n", "It's time to stop the digressions and actually use the function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hdr = cardsToMap units" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \n", "[`size`](http://hackage.haskell.org/package/containers-0.5.6.3/docs/Data-Map-Lazy.html#v:size)\n", "function returns the number of keys in the map:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Map.size hdr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and I can use `lookup` on it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Map.lookup \"XTENSION\" hdr\n", "Map.lookup \"DOESNOTEXIST\" hdr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since I'm going to want to call `lookup` a few times - and I'm lazy - I want a helper routine\n", "that means I don't have to send in the header, and automatically removes the `Just` wrapper:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import Data.Maybe (fromJust)\n", "\n", "getKey1 key = fromJust (Map.lookup key hdr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following my point-free mania, this can be \"simplified\" to:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "getKey = fromJust . flip Map.lookup hdr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This uses the \n", "[`flip`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:flip)\n", "function to swap the argument order to `lookup`, so that I can use eta reduction to avoid\n", "naming the keyword variable, and\n", "[`fromJust`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Maybe.html#v:fromJust)\n", "to extract the value from the `Just` constructor. Now, `fromJust` is\n", "what is known as a\n", "[partial function](http://en.wikipedia.org/wiki/Partial_function) since it is not\n", "defined for all inputs: that is, if the input is a `Nothing` the routine\n", "will raise a run-time error.\n", "This is not normally considered \"good form\" in Haskell code, but I feel it's okay for\n", "this interactive exploration.\n", "\n", "Here's a run through of how the types work out when `flip` is used. As with\n", "many of the \"tips\" I've been showing here, it should be used with care!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type Map.lookup\n", ":type flip\n", ":type flip Map.lookup\n", ":type flip Map.lookup hdr\n", ":type fromJust . flip Map.lookup hdr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case you are interested, here is an example of `getKey` \"blowing up\" due to a\n", "missing keyword:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getKey \"DOESNOTEXIST\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a FITS binary table, the important \"structural\" keywords - i.e. those that define the layout and size\n", "of the binary data - are:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getKey \"BITPIX\"\n", "getKey \"NAXIS\"\n", "getKey \"NAXIS1\"\n", "getKey \"NAXIS2\"\n", "getKey \"TFIELDS\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, there are three fields - i.e. columns - in each row, each row is 12 bytes long, and there are\n", "1070 rows of data. The field (column) information is stored in the following keywords:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getKey \"TTYPE1\"\n", "getKey \"TFORM1\"\n", "getKey \"TUNIT1\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getKey \"TTYPE2\"\n", "getKey \"TFORM2\"\n", "getKey \"TUNIT2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getKey \"TTYPE3\"\n", "getKey \"TFORM3\"\n", "getKey \"TUNIT3\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, the three columns are called \n", "`ENERG_LO`, `ENERG_HI`, and `SPECRESP` (the `TTYPEn` values); \n", "the data is represented as three 32-bit IEEE-754 floating-point numbers with no packing (the\n", "`NAXIS1` and `TFORMn` values) and is in big-endian format$^\\dagger$. The `TUNITn` keys give any units\n", "associated with the column, and are not needed to parse the data. It would be an interesting\n", "experiment to see if the unit value could be used to create a \"numerically-typed\" variable, \n", "as I used in my \n", "[first three notebooks](https://github.com/DougBurke/astro-haskell/blob/master/README.md),\n", "and I may come back to this idea at some unspecified point in the future!\n", "\n", "$^\\dagger$ I'm sure the standard specifies this, but I just tried the various \"endian\" conversion routines until I got the answers I expected!\n", "\n", "For this notebook I am going to use an \"array of structures\" representation - that is, create a record to represent each row - rather than a \"structure of arrays\". For this, I need a row type, such as\n", "\n", "```\n", "data Row = Row { energLo :: Float, energHi :: Float, specResp :: Float }\n", "```\n", "\n", "which uses the Haskell\n", "[record syntax](http://learnyouahaskell.com/making-our-own-types-and-typeclasses#record-syntax) \n", "to allow the fields of the structure to be named (it's a bit like a C-style `struct`\n", "or Python's [`namedtuple`](http://pymotw.com/2/collections/namedtuple.html)).\n", "\n", "However, I'm going to be a little-more general than this, and allow the row to be parameterised\n", "by the data type used to store the data; that is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data Row a = Row { energLo :: a, energHi :: a, specResp :: a } deriving (Eq, Show)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I've told the compiler to \n", "[automatically create](http://en.wikibooks.org/wiki/Haskell/Classes_and_types#Deriving)\n", "instances of the \n", "[`Eq`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Eq)\n", "and \n", "[`Show`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Show)\n", "typeclasses for this type: the latter because it'll be useful when checking that the\n", "parsing has worked and the former for a sanity check I make later on, once I've read in all the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "instance Functor Row where\n", " fmap f (Row elo ehi spec) = Row (f elo) (f ehi) (f spec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I manually create\n", "a \n", "[`Functor` instance](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Functor)\n", "because I'll want that when plotting up the results$^\\dagger$. \n", "I could have got the compiler to derive this\n", "code too, by turning on the `DeriveFunctor` extension, but I felt it useful to show the code.\n", "The functor instance lets you apply a function to all elements of the row (so just like how\n", "`map` lets you apply a function to all elements of a list and it is how I applied\n", "`splitCard` to every element in the sequence as part of the `cardsToMap` function).\n", "\n", "$^\\dagger$ This isn't strictly necessary, as it doesn't save much code in this particular example, but\n", "why not!\n", "\n", "As an example of this functor instance, let's create a row of float values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "testRow :: Row Float\n", "testRow = Row 0.1 0.2 10.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which can be displayed (taking advantage of the automatically-derived `Show` instance):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "testRow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The functor instance allows us to convert all the values to a string,\n", "by applying `show` to each field using \n", "[`fmap`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:fmap):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fmap show testRow\n", ":type fmap show testRow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or even keep the values with a `Float` type but changing their\n", "value - e.g. multiplying by two:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fmap (*2) testRow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "although I have to admit that it's hard to see the use in this last example\n", "for this particular data structure!\n", "\n", "Now there's two \"main\" systems for converting to and from binary data: \n", "[binary](http://hackage.haskell.org/package/binary)\n", "and\n", "[cereal](http://hackage.haskell.org/package/cereal).\n", "I've chosen to use `cereal` because it has in-built support for\n", "IEEE-754 conversion. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import Data.Serialize.Get\n", "import Data.Serialize.IEEE754" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start simply, by just parsing the first value (so the `ENERG_LO` column of the first row)\n", "and checking that it has a value of `0.3` (which is what `dmlist` reported its value to be). The simplest\n", "way to do this is to use \n", "[`runGet`](https://hackage.haskell.org/package/cereal-0.4.1.1/docs/Data-Serialize-Get.html#v:runGet),\n", "which takes a parser and a bytestring, and returns the answer.\n", "For 32-bit floats in big-endian order, the parser is called\n", "[`getFloat32be`](http://hackage.haskell.org/package/cereal-0.4.1.1/docs/Data-Serialize-IEEE754.html#v:getFloat32be):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "runGet getFloat32be bdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Success! By manually moving through the binary data, I can extract following elements, such\n", "as the `ENERG_HI`, and `SPECRESP` values from the first row,\n", "and the `ENERG_LO` value from the second row:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "runGet getFloat32be (B8.drop 4 bdata)\n", "runGet getFloat32be (B8.drop 8 bdata)\n", "runGet getFloat32be (B8.drop 12 bdata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The values returned by `runGet` are preceeded by `Right`, which is because\n", "the return value is actually `Either String a`, where `a` is the type of the\n", "parser (`Float` in this case). \n", "The\n", "[`Either` type](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Either), \n", "is - like the `Maybe` type - used to handle cases where two different values are needed.\n", "In the `Maybe` case this is `Nothing` (e.g. \"failure\") or `Just a`; in the `Either` case, \n", "the two values are `Left a` and `Right b`, where the types of `a` and `b` can be different.\n", "One of the common use cases for `Either` values is to handle calculations that can fail,\n", "where the \"left side\" has a value of `Left String`, for the error message, and the \"right\n", "side\" is used to indicate success, which is what we have here (`Right a`). So, we can ignore\n", "for now the `Right` prefixes, and concentrate on the numeric values. As a reminder, the\n", "`dmlist` output for the three rows was:\n", "\n", "```\n", "ROW ENERG_LO ENERG_HI SPECRESP\n", " \n", " 1 0.30000001192093 0.31000000238419 4.8743429184\n", " 2 0.31000000238419 0.31999999284744 14.8292617798\n", " 3 0.31999999284744 0.33000001311302 21.3022918701\n", "```\n", "\n", "which shows that I'm getting the right values (modulo differences in the display of\n", "floating-point values).\n", "\n", "So, I *could* use this to parse each element at a time, manually stepping\n", "through the bytestring, but that\n", "seems like a lot of work. The next thing to try is the\n", "[`runGetState`](https://hackage.haskell.org/package/cereal-0.4.1.1/docs/Data-Serialize-Get.html#v:runGetState)\n", "routine, which also returns the unparsed data. That is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "-- The last argument is the index into the input string at which to start parsing,\n", "-- which is 0 here.\n", "--\n", "Right (v1, bdata1) = runGetState getFloat32be bdata 0\n", "Right (v2, bdata2) = runGetState getFloat32be bdata1 0\n", "Right (v3, bdata3) = runGetState getFloat32be bdata2 0\n", "Right (v4, _) = runGetState getFloat32be bdata3 0\n", "\n", "(v1, v2, v3, v4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here I've assumed that the parsing can not fail, so it is safe to assume the return value is\n", "of the form `Right ...`. If it did fail then we would\n", "see the `Irrefutable pattern` error (shown earlier in the discussion of `viewr`) when\n", "one of the values was used (in this case, displaying the output of the `v1..v4` tuple).\n", "\n", "The code above can hardly be called elegant. One solution is to expand the \"language\" of the parser, so that\n", "it can parse things that we are interested in - in this case a `Row Float32` - so that I\n", "can leave `runGet` (or `runGetState`) to deal with the boring parts.\n", "So, let's look at the type of the `getFloat32be` parser:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type getFloat32be" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hopefully this signature will be somewhat surprising (to the non-Haskellers\n", "reading this, anyway) since\n", "\n", " * what is this `Get` structure?\n", " \n", " * it doesn't seem to have any way to input the bytestring,\n", " nor does it have a way of returning the unparsed data!\n", " \n", "Well, this is how Haskell deals with context-sensitive computations. In this case,\n", "\"context\" means \"the remaining bytestring to parse\", but it can also be\n", "a store of messages that log a computation, or some piece of state information\n", "that can be both read or changed (e.g. the arrangement of a plot)\n", "during a computation, or whether a computation has failed, or ...\n", "The most obvious context, which I've been using in these notebooks\n", "without mentioning it, is I/O (e.g. printing messages to the screen or reading\n", "the contents of the file). Haskell bases all these different sorts of \n", "computation on the same underlying mathematical abstraction. It can \n", "be thought of as a \"mini language\" inside Haskell, often recognizable\n", "by code written in the style\n", "\n", "```\n", "do\n", " a <- someFunc\n", " b <- anotherFunc a\n", " return (a+b)\n", "```\n", "\n", "which we have seen in the previous notebooks (and will see again\n", "below) when creating graphs.\n", "Note that `return` in Haskell can be confusing for people with experience in languages such as C, Fortran, or Python. For now, just be aware that `return` is one of those places that Haskell is different to other languages.\n", "\n", "What the above means is that I can write a parser for a single row of \n", "data - which consists of three floats with no padding -\n", "with the following code (not that I'd expect you to be able to work this out from\n", "my description!):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getRow1 :: Get (Row Float)\n", "getRow1 = do\n", " elo <- getFloat32be\n", " ehi <- getFloat32be\n", " spec <- getFloat32be\n", " return (Row elo ehi spec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking this out with `runGet` gives:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "runGet getRow1 bdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "runGet getRow1 (B8.drop 12 bdata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Out of interest, here's what happens when the parsing fails (which in this case\n", "means \"runs out of data\", which I simulate by sending in less-than twelve bytes):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "runGet getRow1 (B8.take 4 bdata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fact that I labelled this routine `getRow1` might suggest to you that\n", "I'm going to re-write it, and you would not be wrong. Using yet-more\n", "mathematical abstractions - in this case\n", "[applicative functors](http://learnyouahaskell.com/functors-applicative-functors-and-monoids) -\n", "the routine can be written more compactly as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import Control.Applicative ((<$>), (<*>))\n", "\n", "getRow :: Get (Row Float)\n", "getRow = Row <$> getFloat32be <*> getFloat32be <*> getFloat32be" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This version resembles the \"point-free style\" I described earlier, in that it avoids\n", "naming \"temporary things\". I mention it here in case you see these symbols (`<$>` and `<*>`) around and wonder what they are (plus, this is how I originally wrote the routine before deciding it was a *tad* cryptic\n", "to lead with). As an aside, GHC 7.10 came out whilst I was putting together this notebook. One of the changes in this release is\n", "known as the \"[AMP](https://wiki.haskell.org/Functor-Applicative-Monad_Proposal)\", which - for the purposes of\n", "this notebook - means that the code I've written may need small tweaks to compile without\n", "warning messages when using GHC version 7.10 or later. \n", "\n", "I can now build on `getRow` (or, equivalently, `getRow1`), to write a parser for\n", "multiple rows, by repeatedly calling `getRow` until the parser runs out of\n", "data. Fortunately there's a function - \n", "[`isEmpty`](http://hackage.haskell.org/package/cereal-0.4.1.1/docs/Data-Serialize-Get.html#v:isEmpty) -\n", "for this check. The resulting parser,\n", "which I based on code from the \n", "[binary module documentation](http://hackage.haskell.org/package/binary-0.7.4.0/docs/Data-Binary-Get.html),\n", "is" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "getRows :: Get [Row Float]\n", "getRows = do\n", " empty <- isEmpty\n", " if empty\n", " then return []\n", " else do\n", " row <- getRow\n", " rows <- getRows\n", " return (row:rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When run, it checks to see if there's any more data. If not, the result is the\n", "empty list, otherwise it parses the current row, then makes a recursive call\n", "(parsing the rest of the rows), before combining the two.\n", "\n", "This time, when running the parser, I explicitly deal with the error case by\n", "converting it to an\n", "[`error`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:error), which will (hopefully) provide more useful information than an `Irrefutable pattern` error\n", "(although I don't plan on showing this feature off here ;-). In the success case\n", "I strip off the `Right` wrapper." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arows = case runGet getRows bdata of\n", " Left msg -> error msg\n", " Right v -> v" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "head arows\n", "arows !! 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, yet-more success. Let's check that there are the correct number of rows (`NAXIS2` was 1070):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "length arows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oops. This is because the data blocks are chunked up into units of 2880 bytes, just like the header blocks, so there\n", "are excess rows, as 1070 times 12 does not map to a chunk boundary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "(1070 * 12) `divMod` chunk\n", "(1200 * 12) `divMod` chunk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this particular case the extra values all decode to 0: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arows !! 1069\n", "arows !! 1070\n", "arows !! 1199" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If I were doing this properly, I'd send in the correct number of bytes to getRows (in this case `NAXIS1` times `NAXIS2` bytes) but for now I can just drop the invalid rows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rows = take 1070 arows\n", "\n", "head rows\n", "rows !! 1\n", "rows !! 2\n", "rows !! 3\n", "rows !! 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the bins are consecutive; that is, the `energHi` field of row `n` is the\n", "same as the `energLo` field of row `n+1`. To see whether this holds for all rows, I\n", "extract the contents of the \"low\" and \"high\" fields with calls to \n", "[`map`]((http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:tail),\n", "remove the first element of the \"low\" array (with\n", "[`tail`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:tail),\n", "which is like Python's `[1:]` array slice),\n", "the last element of the \"high\" array (with\n", "[`init`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:init),\n", "which is Python's `[:-1]` slice),\n", "and then check that the resulting arrays are equal. This only\n", "works because of the\n", "[`Eq`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#t:Eq)\n", "instance of `Row` that I earlier asked the compiler to derive for me (that is, \n", "`==` can deal with type `[a]` - a list of `a`'s - if it knows how to compare\n", "`a`'s)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "elos = map energLo rows\n", "ehis = map energHi rows\n", "\n", "tail elos == init ehis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, it's time to make a plot of the data. I load in the necessary code to ensure that the plots can\n", "get displayed in-line\n", "(this just re-uses the code I created in \n", "[previous notebooks](http://htmlpreview.github.io/?https://raw.githubusercontent.com/DougBurke/astro-haskell/master/html/angular%20diameter%20distance.html)):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import IHaskell.Display\n", "import Graphics.Rendering.Chart.Backend.Diagrams\n", "\n", "-- Apparently |> is also defined in the Easy module, so, as I have already imported\n", "-- it from Data.Sequence, I explicitly avoid importing it here.\n", "--\n", "import Graphics.Rendering.Chart.Easy hiding ((|>))\n", "\n", "instance IHaskellDisplay (Renderable a) where\n", " display renderable = renderableToSVG renderable 450 300 >>= display . fst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I create a function which takes a list of rows and creates a plot. Since the data\n", "is binned - in that the `SPECRESP` column is defined over an energy range - then \n", "I want to display it as a histogram. Now, there is support for drawing histograms in\n", "Chart (e.g. \n", "[`PlotBars`](https://hackage.haskell.org/package/Chart-1.3.3/docs/Graphics-Rendering-Chart-Plot-Bars.html))\n", "but I don't find the interface intuitive, so I am going to\n", "create a plot manually$^\\dagger$. I can do this by drawing a line that connects\n", "`(elo_0,specresp_0)`, `(ehi_0,specresp_0)`, \n", "`(elo_1,specresp_1)`, `(ehi_1,specresp_1)`, ... \n", "`(elo_1069,specresp_1069)`, `(ehi_1069,specresp_1069)`.\n", "Ideally the list would start at `(elo_0,0)` and end at `(ehi_1069,0)`, but I'll leave\n", "that for now. The code also takes advantage of the `Functor` instance of the\n", "`Row` type that I wrote earlier, to convert the storage type from `Float` to `Double`\n", "(the reason for this is given in the comments).\n", "\n", "Note that the plot is defined using \"do\" notation, although this time there are no\n", "obvious calls to \"set\" variables (i.e. nothing of the form `a <- function`). This is\n", "because the Chart API uses the \n", "[lens package](https://hackage.haskell.org/package/lens/)\n", "to greatly-reduce the amount of code needed to set the various fields it uses\n", "(such as the various \"titles\" which I change below). If you're interested in\n", "trying out Chart, I would\n", "use the \n", "[Chart examples](https://github.com/timbod7/haskell-chart/wiki) as a basis,\n", "before looking too deeply into how to use lens.\n", "\n", "$^\\dagger$ pointers to existing code that already does this are most welcome." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "drawARF ins = toRenderable (do\n", " layout_title .= \"ARF\"\n", " layout_x_axis . laxis_title .= \"E (keV)\"\n", " layout_y_axis . laxis_title .= \"cm^2\"\n", " plot (line \"\" [cs])\n", " )\n", " where\n", " -- At present there's no PlotValue instance for Float types, which means\n", " -- that I can't just send a list of Float values to line (I have a\n", " -- PR on GitHub to address this issue at\n", " -- https://github.com/timbod7/haskell-chart/pull/77,\n", " -- which has now been accepted, but rather than re-build with that\n", " -- version, the easiest solution is to convert the Float values\n", " -- to Double. I could do this manually, but as I have a Functor instance\n", " -- on the Row type, I can convert Row Float to Row Double just with a\n", " -- call to \"fmap realToFrac\".\n", " --\n", " rs = map (fmap realToFrac) ins\n", " \n", " (elos, ehis, arfs) = unzip3 (map (\\(Row elo ehi arf) -> (elo,ehi,arf)) rs)\n", " xs = concat (zipWith (\\a b -> [a,b]) elos ehis)\n", " ys = concatMap (\\a -> [a,a]) arfs\n", " cs = zip xs ys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that there is a Functor instance of lists, and it is the same as `map`,\n", "which means that I could have said\n", "\n", "```\n", " rs = fmap (fmap realToFrac) ins\n", "```\n", "\n", "but while the compiler can work out which `fmap` it needs to use, for this \n", "example I decided to be explicit!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying the ARF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The moment of truth: what does the ARF look like?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "drawARF rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When discussing the \n", "[k correction](http://htmlpreview.github.io/?https://raw.githubusercontent.com/DougBurke/astro-haskell/master/html/k%20corrections%20with%20a%20dimensional%20twist.html),\n", "the spectral models were in units of \n", "photon/cm$^2$/s/keV. This measures the flux of photons from a source (photon/s)\n", "per energy (the /keV term) and per unit area (the /cm$^2$ term).\n", "The ARF is given in units of area (cm$^2$), so if you multiply it by \n", "the spectral model you get a spectrum in units of photon/s/keV,\n", "which - when integrated up over an energy range - gives a flux in\n", "units of photon/s. A physical interpretation is that the ARF\n", "measures the sensitivity of the telescope as a function of energy.\n", "The fact that the ARF is not a constant, and in fact shows significant\n", "structure such as a large edge at 2 keV, is one reason why understanding\n", "X-ray data (that is, comparing models to data), is not simple.\n", "\n", "The Chandra mirrors are large -\n", "[\"human\" size](http://chandra.harvard.edu/about/telescope_system.html),\n", "with a mass of about 1500 kg - but, because this is an X-ray telescope,\n", "they are not arranged as they would be in an optical telescope (such as\n", "the \n", "[Ritchey–Chrétien](http://en.wikipedia.org/wiki/Ritchey%E2%80%93Chr%C3%A9tien_telescope)\n", "design used in the Hubble Space Telescope) which reflects light, \n", "but as four concentric cylinders\n", "(known as a \n", "[Wolter Type-I design](http://en.wikipedia.org/wiki/Wolter_telescope))\n", "which uses grazing-incidence reflection to focus the light:\n", "\n", "![Grazing-incidence reflection through the Chandra mirrors](http://chandra.harvard.edu/graphics/resources/illustrations/cxcmirrors-72.jpg)\n", "\n", "Image credits: NASA/CXC/D.Berry, available at \n", "http://chandra.harvard.edu/resources/illustrations/teleSchem.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cylinders have diameters ranging from 65 cm to 1.23 m, but\n", "an area of 400 cm$^2$ is equivalent (when thinking about an area of sky \n", "measuring the flux) to a circle of radius 11 cm:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sqrt (400 / pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can lead to jealousy when talking to our optical colleagues, who can talk about using\n", "telescopes with radii of 4 m and above, and \n", "[there are plans](http://en.wikipedia.org/wiki/List_of_largest_optical_reflecting_telescopes#Plans)\n", "for 15 m radii in the near future! In defense of the \n", "[people who build X-ray mirrors](http://en.wikipedia.org/wiki/Leon_Van_Speybroeck),\n", "optical telescopes do not use the full mirror area (M$_1$), since the \n", "[secondary mirror](http://en.wikipedia.org/wiki/Secondary_mirror)\n", "(M$_2$) obstructs some light, and there's generally a big hole in the center of the mirror,\n", "in part, because the secondary mirror occludes this area,\n", "but mainly because this is where the light is reflected down to the \n", "tertiary mirror and/or instruments ($F$ in the image below)!\n", "\n", "![Diagram of Ritchey-Chrétien reflector](http://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Diagram_Reflector_RitcheyChretien.svg/500px-Diagram_Reflector_RitcheyChretien.svg.png)\n", "\n", "Image credits: By HHahn (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons\n", "\n", "So, I guess the summary is: don't judge an Astronomer by the size of her mirror!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A different way to display the ARF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As I was writing this notebook, I got to thinking about other ways to display the data.\n", "One obvious choice was a HTML table, which gave me an excuse to play a bit more with\n", "the `IHaskell` display code. Fortunately, I have already installed the necessary Haskell packages - in this\n", "case \n", "[`blaze-html`](https://hackage.haskell.org/package/blaze-html) - so I can go straight\n", "to work. The routines from the \n", "[`Html5`](https://hackage.haskell.org/package/blaze-html-0.8.0.2/docs/Text-Blaze-Html5.html)\n", "module are going to do all the heavy lifting of creating the HTML output; all I\n", "need to do is put them together in the right way." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import Text.Blaze.Html5 hiding (head, map)\n", "import Data.Monoid ((<>), mconcat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as the Html code, I have also loaded the `<>` operator from\n", "the \n", "[Data.Monoid](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Monoid.html)\n", "module. The Monoid typeclass$^\\dagger$ represents things that can be combined together,\n", "that is - have an associative operator - \n", "using `<>`\n", "(well, it actually means a bit more than that, but all I need is the \"combine\"\n", "operator, which is also known as\n", "[mappend](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Monoid.html#v:mappend)).\n", "The \n", "[mconcat](http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-Monoid.html#v:mconcat)\n", "function lets you combine a list of values using `<>`; that is \n", "`mconcat [a_0, a_1, a_2, ... a_n]` is the same as\n", "`a_0 <> a_1 <> a_2 <> ... <> a_n`. \n", "\n", "$^\\dagger$ 10 points to the person in the back of the class - yes, you over there in the floral dress - who\n", "just mumbled \"oh what, Yet Another Mathematical Abstraction In Haskell!\".\n", "\n", "As an example, lists can be concatenated together with \n", "[`++`](http://hackage.haskell.org/package/base-4.8.0.0/docs/Prelude.html#v:-43--43-),\n", "but this form is specific to lists. The generic version - since a list\n", "has a Monoid instance - is `<>`; that is, the following two are the same:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "[1,3] ++ [2,4]\n", "[1,3] <> [2,4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start off, I want to try converting a single `Row` into a row of a table\n", "(i.e. create a `` element containing the column data in `` elements).\n", "To do this, each element of the row is\n", "\n", " - converted to Html using [`toHtml`](https://hackage.haskell.org/package/blaze-html-0.8.0.2/docs/Text-Blaze-Html.html#v:toHtml)\n", " \n", " - and then converted to a `` element, using the appropriately-named\n", " [`td`](https://hackage.haskell.org/package/blaze-html-0.8.0.2/docs/Text-Blaze-Html5.html#v:td)\n", " function.\n", " \n", "The three `` elements are combined together using `<>`, which \n", "appends them to form `.........`. These\n", "are then placed inside a `` element." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rowToHtml (Row elo ehi sp) = \n", " let col val = td (toHtml val)\n", " in tr (col elo <> col ehi <> col sp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the `toHtml` function has an interesting signature:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type toHtml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which says that it can be given any type, as long as it is an instance\n", "of the \n", "[`ToMarkup` typeclass](https://hackage.haskell.org/package/blaze-markup-0.7.0.2/docs/Text-Blaze.html#t:ToMarkup).\n", "This constraint - indicated by the text to the left of the `=>` in the signature,\n", "\"infects\" the `rowToHtml` signature, since we have:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":type rowToHtml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This makes sense, since it says that `rowToHtml` can only be used on a `Row` whose\n", "type can be converted to Html. Fortunately the `Float` type is an instance of the\n", "`ToMarkup` class. You can find this out from the documentation, either of the\n", "`ToMarkup` class or `Float` - e.g. (the `:opt` line is just to tell the notebook\n", "to include the information inline, so it's visible in the HTML version of the page,\n", "but it appears that it may not always display particularly wel):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ ":opt no-pager" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ ":info Float" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, the easiest way is to just try it and see if it compiles, or you get an error back\n", "(for example, earlier when I tried `len / 2880` and got an error about a missing\n", "`Fractional` constraint)!\n", "\n", "Applying it to the last row produces:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rowToHtml (last rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the `Html` type is an instance of the `IHaskellDisplay` type class, the\n", "notebook has automatically interpreted this for us, which is why we can't\n", "see the `` and `` elements.\n", "\n", "In fact, I can make the `Row` type an instance of the `ToMarkup` typeclass using this\n", "function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "instance ToMarkup a => ToMarkup (Row a) where\n", " toMarkup = rowToHtml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which means that I can now say:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "toMarkup (last rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Due to the way the code is setup, if you can call `toMarkup` on a type you can \n", "also call `toHtml`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "toHtml (last rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens if I try to convert several rows? " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "toMarkup (take 3 rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In other words, there is no instance of `ToMarkup a => ToMarkup [a]`,\n", "which makes sense because just because I know how to convert a single\n", "element to Html, there's no single way to combine them. This suggests\n", "that, semantically, I want a data type that represents a \"table\", so that\n", "it makes sense to say \"oh, when converting this to Html I want to\n", "include a table header as well as the data\".\n", "\n", "This leads me to defining an `ARF` type, based on a list of `Row` values\n", "(I am also going to restrict the row type to be `Float`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "newtype ARF = ARF [Row Float]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A value with a type of `ARF` is created by using the `ARF` label (constuctor),\n", "but as shown below, the notebook doesn't know what to do with it (because I\n", "did not tell the compiler to create a `Show` instance for the type, as I\n", "did with the `Row` type):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ARF rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Show` instance could be added easily - for instance:\n", "\n", "```\n", "instance Show ARF where\n", " show (ARF rows) = \"ARF \" ++ show rows\n", "```\n", "\n", "but we're here to create a HTML table! Since there's a lot of rows - over a thousand of them - I want\n", "to be able to show only a subset of them in this table. One simple interface is to use a `Maybe Int`\n", "as an indicator:\n", "\n", " - `Nothing` means to display all rows\n", " - `Just n` means only show the first and last `n` rows\n", " \n", "although other options are possible. This suggests the following function, which \n", "splits an input list up into the start and end subsets, if a subset is asked\n", "for (i.e. the count is `Just n`) *and* the list is long enough. In this case I\n", "use the `Either` type to return a value, not because the `Left` case should\n", "be considered a failure, but because I want to return *either* the full list\n", "(`Left`) *or* the start and end subsets (`Right`). The reason for splitting the\n", "two cases will hopefully become clear soon$^\\dagger$.\n", "\n", "$^\\dagger$ If you can't wait, it's because I want to add a row of \"...\" characters\n", "to indicate when rows have been excluded." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subsetRows :: \n", " Maybe Int -- ^ how many rows to select from start and end, if any \n", " -> [a] -- ^ the input list\n", " -> Either [a] ([a],[a])\n", "subsetRows mcount xs =\n", " let n = length xs\n", " in case mcount of\n", " Just nr | n > 2 * nr -> let start = take nr xs\n", " end = drop (n-nr) xs\n", " in Right (start, end)\n", " _ -> Left xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are actually two cases above that result in `Left xs` being returned:\n", "\n", " - `mcount` is `Just nr`, but `n <= 2 * nr`\n", " - `mcount` is `Nothing`\n", " \n", "I use the \"catch-all\" `_` case to handle both of these (this is in contrast\n", "to the earlier case when I called `viewr` but explicitly named both patterns\n", "of the `case` statement).\n", "\n", "Checking this gives the expected (or, hopefully you expect them) results:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subsetRows Nothing [1..10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subsetRows (Just 3) [1..10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In `subsetRows` I did not assume anything about the contents of the list (in otherwords, the list type `a`\n", "was not constrained). This is often \"a good thing™\", since it means that\n", "\n", " - the code is more re-usable\n", " - you can be more certain what the code does (or, perhaps more-importantly, what it can *not* do).\n", "\n", "That is, with a type like `[a] -> [a]`, we know that the function can either return the empty list or values drawn\n", "from the input list (maybe repeated), but it can not return a value not in the input list (since, without knowing the type, it can not \"create\" a value). This can be compared to a function like `[Int] -> [Int]` which is free to replace every even value by 2 and odd value by the length of the list.\n", "\n", "Eventually I do need to write a function with a specific type! In this case, I want something that\n", "will create the `` elements for the HTML table, marking any excluded rows by using a\n", "row of ellipses - i.e. a row like" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "toHtml (Row \"...\" \"...\" \"...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "but using the UTF-8 ellipsis character rather than three dots.\n", "\n", "Using `subsetRows` I can convert the\n", "output rows into HTML using `toHtml`, and insert a row of ellipsis if the return\n", "value was `Right`. I could do this just for `Row Float`, but let's keep trying to\n", "be general and allow any row with a `ToMarkup`-compatible type:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "htmlSubset :: ToMarkup a => Maybe Int -> [Row a] -> Html\n", "htmlSubset mcount xs = \n", " let out = case subsetRows mcount xs of\n", " Left ys -> map toHtml ys\n", " Right (ls,rs) -> let hellip = \"…\"\n", " spacer = toHtml (Row hellip hellip hellip)\n", " in map toHtml ls <> [spacer] <> map toHtml rs \n", " in mconcat out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used `<>` to combine the lists but I could have also used `++` here.\n", "Since each element of out is `Html`, and `Html` is a Monoid, I can use\n", "`mconcat` to combine them together, so that the return is `Html` rather than\n", "`[Html]`.\n", "\n", "There's a subtle point in the above for the newcomer to Haskell: why did I write\n", "\n", "```\n", "map toHtml ls <> [spacer] <> map toHtml rs\n", "```\n", "\n", "rather than \n", "\n", "```\n", "map toHtml (ls <> [spacer] <> rs)\n", "```\n", "\n", "that is, combine `ls`, `spacer`, and `rs` into a list first? The reason is\n", "because the types don't match up: `ls` and `rs` both have type `[Row a]`,\n", "but `spacer` is the output of a `toHtml` call, and so has type `Html`. The\n", "compiler knows this, and won't let you combine lists of different types.\n", "So, I convert the `ls` and `rs` values to `Html`, and then I can combine\n", "the three terms.\n", "\n", "As a check it works (although it isn't displayed particularly nicely, as there is no\n", "`` around the ``'s created by the routine):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "htmlSubset (Just 2) rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \"header\" row can be created manually (I'd probably not pull this out\n", "into a separate routine, other than in a notebook like this). It's somewhat\n", "ugly since there's a lot of type conversion going on (i.e. all the\n", "calls to `toHtml`). There are ways to avoid this - in particular the\n", "[`OverloadedStrings` GHC extension](https://ocharles.org.uk/blog/posts/2014-12-17-overloaded-strings.html),\n", "but there are times when it's worth being explicit (if only \n", "to point out where some of the \"warts\" in a lanugage are):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arfHeader :: Html\n", "arfHeader = \n", " let c1 = toHtml \"E\" <> sub (toHtml \"low\") <> toHtml \" (keV)\"\n", " c2 = toHtml \"E\" <> sub (toHtml \"high\") <> toHtml \" (keV)\"\n", " c3 = toHtml \"ARF (cm\" <> sup (toHtml \"2\") <> toHtml \")\"\n", " in tr (th c1 <> th c2 <> th c3)\n", " \n", "arfHeader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These can then be put together to create a table:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "-- | Convert a list of rows into a HTML table.\n", "--\n", "--\n", "arfToHtml ::\n", " Maybe Int -- ^ if `Nothing`, display all rows, otherwise just this number at the start and end\n", " -> ARF -- ^ data to display\n", " -> Html -- ^ HTML represenation\n", "arfToHtml mcount (ARF rows) =\n", " table (thead arfHeader <> tbody (htmlSubset mcount rows))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Testing on a subset of the rows gives (the output may not appear quite correctly,\n", "in that horizontal and vertical lines may be missing, depending on how you\n", "are viewing this notebook):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arfToHtml Nothing (ARF (take 5 rows))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arfToHtml (Just 1) (ARF (take 5 rows))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this routine, I can finally create an instance of `IHaskellDisplay` for\n", "the `ARF` type, where I choose to use a display of 3 rows:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "instance IHaskellDisplay ARF where\n", " display = display . arfToHtml (Just 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and use this to display the ARF in tabular form:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ARF rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This could be extended to include metadata about the ARF, but that would require extracting\n", "the relevant information from the header (i.e. parsing the value strings to separate out the data from\n", "the comment string), and enhancing the `ARF` data type to store this information. Perhaps next time..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There you go; I hope you enjoyed it. If you have any questions, then please use the\n", "[GitHub issues page](https://github.com/DougBurke/astro-haskell/issues) or\n", "contact me on Twitter at\n", "." ] } ], "metadata": { "kernelspec": { "display_name": "Haskell", "language": "haskell", "name": "haskell" }, "language_info": { "name": "haskell", "version": "7.8.4" } }, "nbformat": 4, "nbformat_minor": 0 }