A Numpy array contains one or more elements\n",
"of the same type. The type
function will only tell you that\n",
"a variable is a NumPy array but won't tell you the type of\n",
"thing inside the array.\n",
"We can find out the type\n",
"of the data contained in the NumPy array.
print(data.dtype)\n", "
dtype('float64')\n", "
This tells us that the NumPy array's elements are\n", "floating-point numbers.
\n", "\n", "Generally, a function uses inputs to produce outputs.\n", "However, some functions produce outputs without\n", "needing any input. For example, checking the current time\n", "doesn't require any input.
\n", "import time\n", "print(time.ctime())\n", "
'Sat Mar 26 13:07:33 2016'\n", "
For functions that don't take in any arguments,\n",
"we still need parentheses (()
)\n",
"to tell Python to go and do something for us.
How did we know what functions NumPy has and how to use them?\n",
"If you are working in the IPython/Jupyter Notebook, there is an easy way to find out.\n",
"If you type the name of something followed by a dot, then you can use tab completion\n",
"(e.g. type numpy.
and then press tab)\n",
"to see a list of all functions and attributes that you can use. After selecting one, you\n",
"can also add a question mark (e.g. numpy.cumprod?
), and IPython will return an\n",
"explanation of the method! This is the same as doing help(numpy.cumprod)
.
We have just been importing NumPy and matplotlib using import numpy
and import matplotlib.pyplot
.
From here on we are going to shorten these imports, you can either adopt this or continue using the long version. The reason we are going to start using the shortened versions is that that is the way that the majority of open source scientific python libraries use these imports, so we want to get you used to them now.
\n", "When working with other people, it is important to agree on a convention of how common libraries\n", "are imported.
\n", "\n", "What values do the variables mass
and age
have after each statement in the following program?\n",
"Test your answers by executing the commands.
mass = 47.5\n", "age = 122\n", "mass = mass * 2.0\n", "age = age - 20\n", "
What does the following program print out?
\n", "first, second = 'Grace', 'Hopper'\n", "third, fourth = second, first\n", "print(third, fourth)\n", "
Hopper Grace\n", "
A section of an array is called a slice.\n", "We can take slices of character strings as well:
\n", "element = 'oxygen'\n", "print('first three characters:', element[0:3])\n", "print('last three characters:', element[3:6])\n", "
first three characters: oxy\n", "last three characters: gen\n", "
What is the value of element[:4]
?\n",
"What about element[4:]
?\n",
"Or element[:]
?
oxyg\n", "en\n", "oxygen\n", "
What is element[-1]
?\n",
"What is element[-2]
?
n\n", "e\n", "
Given those answers, explain what element[1:-1]
does.
Creates a substring from index 1 up to (not including) the final index,\n", "effectively removing the first and last letters from 'oxygen'
\n", "\n", "The expression element[3:3]
produces an empty string,\n",
"i.e., a string that contains no characters.\n",
"If data
holds our array of patient data,\n",
"what does data[3:3, 4:4]
produce?\n",
"What about data[3:3, :]
?
[]\n", "[]\n", "
Why do all of our plots stop just short of the upper end of our graph?
\n", "\n", "Because matplotlib normally sets x and y axes limits to the min and max of our data\n", "(depending on data range)
\n", "If we want to change this, we can use the set_ylim(min, max)
method of each 'axes',\n",
"for example:
axes3.set_ylim(0,6)\n", "
Update your plotting code to automatically set a more appropriate scale.\n",
"(Hint: you can make use of the max
and min
methods to help.)
# One method\n", "axes3.set_ylabel('min')\n", "axes3.plot(numpy.min(data, axis=0))\n", "axes3.set_ylim(0,6)\n", "
# A more automated approach\n", "min_data = numpy.min(data, axis=0)\n", "axes3.set_ylabel('min')\n", "axes3.plot(min_data)\n", "axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)\n", "
In the center and right subplots above, we expect all lines to look like step functions because\n", "non-integer value are not realistic for the minimum and maximum values. However, you can see\n", "that the lines are not always vertical or horizontal, and in particular the step function\n", "in the subplot on the right looks slanted. Why is this?
\n", "\n", "Because matplotlib interpolates (draws a straight line) between the points.\n",
"One way to do avoid this is to use the Matplotlib drawstyle
option:
Create a plot showing the standard deviation (numpy.std
)\n",
"of the inflammation data for each day across all patients.
Modify the program to display the three plots on top of one another\n", "instead of side by side.
\n", "\n", "Write some additional code that slices the first and last columns of A
,\n",
"and stacks them into a 3x2 array.\n",
"Make sure to print
the results to verify your solution.
A 'gotcha' with array indexing is that singleton dimensions\n",
"are dropped by default. That means A[:, 0]
is a one dimensional\n",
"array, which won't stack as desired. To preserve singleton dimensions,\n",
"the index itself can be a slice or array. For example, A[:, :1]
returns\n",
"a two dimensional array with one singleton dimension (i.e. a column\n",
"vector).
An alternative way to achieve the same result is to use Numpy's\n", "delete function to remove the second column of A.
\n", "\n", "Which axis would it make sense to use this function along?
\n", "\n", "Since the row axis (0) is patients, it does not make sense to get the\n", "difference between two arbitrary patients. The column axis (1) is in\n", "days, so the difference is the change in inflammation -- a meaningful\n", "concept.
\n", "\n", "If the shape of an individual data file is (60, 40)
(60 rows and 40\n",
"columns), what would the shape of the array be after you run the diff()
\n",
"function and why?
The shape will be (60, 39)
because there is one fewer difference between\n",
"columns than there are columns in the data.
How would you find the largest change in inflammation for each patient? Does\n", "it matter if the change in inflammation is an increase or a decrease?
\n", "\n", "By using the numpy.max()
function after you apply the numpy.diff()
\n",
"function, you will get the largest difference between days.
If inflammation values decrease along an axis, then the difference from\n",
"one element to the next will be negative. If\n",
"you are interested in the magnitude of the change and not the\n",
"direction, the numpy.absolute()
function will provide that.\n",
"Notice the difference if you get the largest absolute difference\n",
"between readings.