{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This section of the notes is intended for follow-up after the class and future reference. We will not go through this in detail in the teaching sessions.\n", "\n", "This material and the exercises are more advanced than in the main notes. You are not *required* to do these, but may like to do so if you want to stretch yourself. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A2.1 Binary Operators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A2.1.1 Basic Logic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quite often (data masks in data products, being the most typical example), you will need to deal with binary numbers and binary operations, so we will introduce the concepts you need here.\n", "\n", "We came across *logical* or *Boolean* operations in the main session (with the data type `bool`). In logic, something can be `True` or `False`, and operations such as [`not`](http://en.wikipedia.org/wiki/Logical_NOT), [`and`](http://en.wikipedia.org/wiki/Logical_AND) and [`or`](http://en.wikipedia.org/wiki/Logical_OR) have quite obvious meanings. More generally in logic (and electronics) we may come across other logical operators such as [`nand`](http://en.wikipedia.org/wiki/Logical_NAND) (`not and`) and [`nor`](http://en.wikipedia.org/wiki/Logical_NOR) (`not or`) and [`xor`](http://en.wikipedia.org/wiki/Logical_XOR) (exclusive `or`: *either but not both*), but these are not defined in Python (they can of course be *derived* though).\n", "\n", "You should first make sure that you understand the results of the `and`, `or` and `not` `bool` operators:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False and False = False\n", "False and True = False\n", "True and False = False\n", "True and True = True\n" ] } ], "source": [ "print 'False and False =',False and False\n", "print 'False and True =',False and True\n", "print 'True and False =',True and False\n", "print 'True and True =',True and True" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False or False = False\n", "False or True = True\n", "True or False = True\n", "True or True = True\n" ] } ], "source": [ "print 'False or False =',False or False\n", "print 'False or True =',False or True\n", "print 'True or False =',True or False\n", "print 'True or True =',True or True" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "not False = True\n", "not True = False\n" ] } ], "source": [ "print 'not False =',not False\n", "print 'not True =',not True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise A2.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an exercise for this, you could see if you can simulate the logical combinations `xor`, `nor` and `nand`, e.g.:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False nand False = True\n", "False nand True = True\n", "True nand False = True\n", "True nand True = False\n" ] } ], "source": [ "# nand test:\n", "# see http://en.wikipedia.org/wiki/Logical_NAND\n", "# (A nand B) is not (A and B)\n", "\n", "ABList = [(False,False),(False,True),(True,False),(True,True)]\n", "for A,B in ABList:\n", " print '%s nand %s ='%(str(A),str(B)),not (A and B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A2.1.2 Binary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Endianness" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is of value to have some understanding of binary operations and representation(s):\n", "\n", "- you will come across these in encoded data products (such as the QA information in MODIS and other satellite products) as it is a more efficient way of enconding multiple sets of logical information \n", "- this is the form in which the computer ultimately stores and processes information, so it is useful to have some appreciation of that \n", "- you will sometimes need to consider how large a number representation is (e.g. byte or short integer or long integer) as this can impact computer memory and storage requirements \n", "- you may come across different binary representations for different datasets\n", "\n", "There are two main number representations used in computing, which depend on the interpretation of the [MSB](http://en.wikipedia.org/wiki/Most_significant_bit), Most Significant Bit or 'high bit order' (or Byte) and [LSB](http://en.wikipedia.org/wiki/Least_significant_bit), Least Significant Bit or 'low bit order' (or Byte). \n", "\n", "Which system is used is sometimes refered to as ['endianness'](http://en.wikipedia.org/wiki/Endianness) so we may refer to a 'big-endian' or 'little-endian' representation. In a big-endian representation, the left-most byte represents the highest number. In a little-endian system, this is the lowest number.\n", "\n", "It is probably easiest to understand this with decimal numbers:\n", "\n", "So, in a big-endian decimal representation:\n", "\n", " 152\n", " \n", "represents one hundred and fifty two (`(1 x 10^2) + (5 x 10^1) + (2 x 10^0)`)\n", "\n", "In a little-endian system, this is interpreted the other way around, so `152` is `(2 x 10^2) + (5 x 10^1) + (1 x 10^0)`, so is actually two hundred and fifty one.\n", "\n", "The term comes from Jonathan Swift's [Gulliver's travels](http://www.gutenberg.org/files/829/829-h/829-h.htm), if you are interested, refering to the which end of an egg the people of Lilliput and Blefuscu believe you should open:\n", "\n", "``\"(The people of Lilliput and Blefuscu have) been engaged in a most obstinate war for six-and-thirty moons past. It began upon the following occasion. It is allowed on all hands, that the primitive way of breaking eggs, before we eat them, was upon the larger end; but his present majesty’s grandfather, while he was a boy, going to eat an egg, and breaking it according to the ancient practice, happened to cut one of his fingers. Whereupon the emperor his father published an edict, commanding all his subjects, upon great penalties, to break the smaller end of their eggs. The people so highly resented this law, that our histories tell us, there have been six rebellions raised on that account; wherein one emperor lost his life, and another his crown. These civil commotions were constantly fomented by the monarchs of Blefuscu; and when they were quelled, the exiles always fled for refuge to that empire. It is computed that eleven thousand persons have at several times suffered death, rather than submit to break their eggs at the smaller end. Many hundred large volumes have been published upon this controversy: but the books of the Big-endians have been long forbidden, and the whole party rendered incapable by law of holding employments.\"``\n", "\n", "\n", "Endianness then (like which end of an egg you should break), is ultimately arbitrary, but we must still have conventions so that information on one computer system can be interpreted by another.\n", "\n", "At present, probably most computers you use will be little-endian (the 'Intel convention' as it has come to be known), but you may also come across data stored in big-endian (so-called 'Motorola convention') format. \n", "\n", "Endianness is mainly implemented at the byte level, so if you are considering a signle byte, which 'end you open the egg' has no consequence.\n", "\n", "For longer number representations though (e.g. a 2-byte or 4-byte integer) it can have significant consequences, and you need to be aware of the endianness of data that you might try to be reading. Many file formats will explicitly store the endinaness that data were written in, so a correct interpretation will be handled by low level reading routines. But, if you get binary data that are of more than one byte that look rather odd when you display them, one possible reason is that you have assumed a different endinaness to what the data were written in.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binary numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we are working with *binary* numbers *for a single bit*, we represent `False` by `0` and `True` by `1`.\n", "\n", "In Python, these representations can essentially be used interchangably with logical operators, (but we use the function `bool()` to convert to a boolean representation)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "print True or False" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print 1 or 0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "print bool(1 or 0)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "it's true what they say ...\n" ] } ], "source": [ "whatTheySay = True\n", "\n", "if whatTheySay:\n", " print \"it's true what they say ...\"\n", "else:\n", " print \"it's not true what they say ...\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "it's true what they say ...\n" ] } ], "source": [ "whatTheySay = 1\n", "\n", "if whatTheySay:\n", " print \"it's true what they say ...\"\n", "else:\n", " print \"it's not true what they say ...\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A binary representation of a number is a representation in [base 2](http://en.wikipedia.org/wiki/Binary_number), where we may use multiple [bits](http://en.wikipedia.org/wiki/Bit) to represent numbers.\n", "\n", "Some number representations (such as [floating point](http://en.wikipedia.org/wiki/IEEE_floating_point)) are stored in a more complicated manner, but [ASCII codes](http://www.asciitable.com/) for string representation and integers are in a more straightforward binary format.\n", "\n", "So, e.g. the (integer) **decimal** number 3 is `11` in binary (base 2):" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 == (1 * 2**1) + \\\n", " (1 * 2**0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the Python function `bin()` to get this more directly:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0b11\n" ] } ], "source": [ "print bin(3)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "print 0b11" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, the decimal number `101` is `1100101` in binary:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "101 == (1 * 2**6) + \\\n", " (1 * 2**5) + \\\n", " (0 * 2**4) + \\\n", " (0 * 2**3) + \\\n", " (1 * 2**2) + \\\n", " (0 * 2**1) + \\\n", " (1 * 2**0)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0b1100101\n" ] } ], "source": [ "print bin(101)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise A2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a. Work out what the following decimal numbers are in binary:\n", "\n", " 7 \n", " 493 \n", " 127 \n", " 255 \n", " 1024 \n", "\n", " and check your result using the approach we took to confirming `101`:\n", "\n", " 101 == (1 * 2**6) + \\\n", " (1 * 2**5) + \\\n", " (0 * 2**4) + \\\n", " (0 * 2**3) + \\\n", " (1 * 2**2) + \\\n", " (0 * 2**1) + \\\n", " (1 * 2**0)\n", " \n", "b. How many bits are needed to represent each of these numbers?\n", "\n", "c. What is the largest number you could represent in: (i) a 32 bit representation; (b) a 64 bit representation?\n", "\n", "d. Recalling that there are 8 bits in a [byte](http://en.wikipedia.org/wiki/Byte), what is the largest number you could represent in: (a) a single byte; (b) two bytes? " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0b111101101 is 493 in decimal\n", "0b111101101 is 0755 in octal\n", "0b111101101 is 0x1ed in hexadecimal\n" ] } ], "source": [ "'''\n", " Explorations in binary!\n", "\n", " We can represent:\n", " \n", " binary numbers, e.g. 0b010101\n", " octal numbers, e.g. 0o755\n", " hexadecimal, e.g. 0x2A2FF\n", "\n", " but if we print these, \n", " by default they are printed as the decimal equivalents.\n", " \n", " We can convert to binary, octal or hex string with \n", " bin(), oct(), hex()\n", "'''\n", "x = 0b111101101\n", "print bin(x),'is',x,'in decimal'\n", "print bin(x),'is',oct(x),'in octal'\n", "print bin(x),'is',hex(x),'in hexadecimal'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A2.1.2 Bitwise Operators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python (and most other computer languages) you have access to bitwise operators. As you might expect, these are operators that are executed on individual bits in a binary representation of a number.\n", "\n", "The bitwise operators available in Python are:\n", "\n", "- `&` bitwise and\n", "- `|` bitwise or\n", "- `^` bitwise `xor` (exclusive or)\n", "- `~` bitwise ones complement\n", "- `<<` bitwise left shift\n", "- `>>` bitwise right shift\n", "\n", "\n", "The `&` operator simply performs a logical `and` operation on two sets of binary representations, so:\n", "\n", " 1 & 0 == 0\n", " \n", "is the same as a logical `True and False` operation that we saw above .\n", "\n", "The `|` operator simply performs a logical `or` operation on two sets of binary representations, so:\n", "\n", " 1 | 0 == 1\n", " \n", "is the same as a logical `True or False` operation that we saw above.\n", "\n", "Similarly, \n", "\n", " 1 ^ 0 == 1\n", " 1 ^ 1 == 0\n", " \n", "and\n", "\n", " ~1 == 0\n", " ~0 == 1\n", " \n", "These same rules apply then to all bit fields:\n", "\n", " ~1010 == 0101\n", " \n", "etc.\n", "\n", "The shift operators are interesting: \n", "\n", "left shift by 1, for example is equivalent to multiplying by 2, and right shift by 1 a division by 2.\n", "\n", "They are also very useful in sorting out 'bit masks' for data products." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "521 \tA:\t0b1000001001\n", "523 \tB:\t0b1000001011\n", "\tA | B:\t0b1000001011 \t523\n", "\tA ^ B:\t0b10 \t\t2\n", "\tA & B:\t0b1000001001 \t521\n" ] } ], "source": [ "'''\n", " Bitwise operators: \n", "'''\n", "\n", "A = 521\n", "B = 523\n", "# print as binary:\n", "print A,'\\tA:\\t',bin(A)\n", "print B,'\\tB:\\t',bin(B)\n", "\n", "# some operations\n", "print '\\tA | B:\\t',bin(A|B),'\\t',A|B\n", "print '\\tA ^ B:\\t',bin(A^B),'\\t\\t',A^B\n", "print '\\tA & B:\\t',bin(A&B),'\\t',A&B" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tA:\t0b1000010011 \t531\n", "\tA>>1:\t0b100001001 \t265\n", "\tA>>1:\t0b10000100110 \t1062\n", "\tA>>3:\t0b1000010 \t66\n", "\tA>>3:\t0b1000010011000 \t4248\n" ] } ], "source": [ "'''\n", " Bitwise shift operators: \n", "'''\n", "\n", "A = 531\n", "print '\\tA:\\t',bin(A),'\\t',A\n", "print '\\tA>>1:\\t',bin(A>>1),'\\t',A>>1\n", "print '\\tA>>1:\\t',bin(A<<1),'\\t',A<<1\n", "print '\\tA>>3:\\t',bin(A>>3),'\\t',A>>3\n", "print '\\tA>>3:\\t',bin(A<<3),'\\t',A<<3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example of data masking, consider the QA mask in the [MODIS Leaf Area Index (LAI)](https://lpdaac.usgs.gov/products/modis_products_table/leaf_area_index_fraction_of_photosynthetically_active_radiation/8_day_l4_global_1km/mod15a2) product:\n", "\n", "\n", "
Bit number | \n", "Parameter Name | \n", "Bit combination | \n", "Interpretation | \n", "
0 | \n", "MODLAND_QC bits | \n", "0 | \n", "Good quality (main algorithm with or without saturation) | \n", "
\n", " | \n", " | 1 | \n", "Other Quality (back-up algorithm or fill values) | \n", "
1 | \n", "Sensor | \n", "0 | \n", "Terra | \n", "
\n", " | \n", " | 1 | \n", "Aqua | \n", "
2 | \n", "DeadDetector | \n", "0 | \n", "Detectors apparently fine for up to 50% of channels | \n", "
\n", " | \n", " | 1 | \n", "Dead detectors caused >50% adjacent detector retrieval | \n", "
3-4 | \n", "CloudState | \n", "00 | \n", "Significant clouds NOT present (clear) | \n", "
\n", " | \n", " | 01 | \n", "Significant clouds WERE present | \n", "
\n", " | \n", " | 10 | \n", "Mixed cloud present on pixel | \n", "
\n", " | \n", " | 11 | \n", "Cloud state not defined (assumed clear) | \n", "
5-7 | \n", "CF_QC | \n", "000 | \n", "Main (RT) method used (best result possible (no saturation)) | \n", "
\n", " | \n", " | 001 | \n", "Main (RT) method used with saturation. (usable) | \n", "
\n", " | \n", " | 010 | \n", "Main (RT) method failed due to bad geometry (empirical algorithm used) | \n", "
\n", " | \n", " | 010 | \n", "Main (RT) method failed due to problems other than geometry (empirical algorithm used) | \n", "
\n", " | \n", " | 010 | \n", "Pixel not produced at all. | \n", "