\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
06926
17437
27254
\n", "
" ], "text/plain": [ " A B C D\n", "0 6 9 2 6\n", "1 7 4 3 7\n", "2 7 2 5 4" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(rng.randint(0, 10, (3, 4)),\n", " columns=['A', 'B', 'C', 'D'])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we apply a NumPy ufunc on either of these objects, the result will be another Pandas object *with the indices preserved:*" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 403.428793\n", "1 20.085537\n", "2 1096.633158\n", "3 54.598150\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(ser)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, for a slightly more complex calculation:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-1.0000007.071068e-011.000000-1.000000e+00
1-0.7071071.224647e-160.707107-7.071068e-01
2-0.7071071.000000e+00-0.7071071.224647e-16
\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
0111
151
\n", "
" ], "text/plain": [ " A B\n", "0 1 11\n", "1 5 1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = pd.DataFrame(rng.randint(0, 20, (2, 2)),\n", " columns=list('AB'))\n", "A" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BAC
0409
1580
2926
\n", "
" ], "text/plain": [ " B A C\n", "0 4 0 9\n", "1 5 8 0\n", "2 9 2 6" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = pd.DataFrame(rng.randint(0, 10, (3, 3)),\n", " columns=list('BAC'))\n", "B" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.015.0NaN
113.06.0NaN
2NaNNaNNaN
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 15.0 NaN\n", "1 13.0 6.0 NaN\n", "2 NaN NaN NaN" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A + B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that indices are aligned correctly irrespective of their order in the two objects, and indices in the result are sorted.\n", "As was the case with ``Series``, we can use the associated object's arithmetic method and pass any desired ``fill_value`` to be used in place of missing entries.\n", "Here we'll fill with the mean of all values in ``A`` (computed by first stacking the rows of ``A``):" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.015.013.5
113.06.04.5
26.513.510.5
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 15.0 13.5\n", "1 13.0 6.0 4.5\n", "2 6.5 13.5 10.5" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fill = A.stack().mean()\n", "A.add(B, fill_value=fill)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following table lists Python operators and their equivalent Pandas object methods:\n", "\n", "| Python Operator | Pandas Method(s) |\n", "|-----------------|---------------------------------------|\n", "| ``+`` | ``add()`` |\n", "| ``-`` | ``sub()``, ``subtract()`` |\n", "| ``*`` | ``mul()``, ``multiply()`` |\n", "| ``/`` | ``truediv()``, ``div()``, ``divide()``|\n", "| ``//`` | ``floordiv()`` |\n", "| ``%`` | ``mod()`` |\n", "| ``**`` | ``pow()`` |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ufuncs: Operations Between DataFrame and Series\n", "\n", "When performing operations between a ``DataFrame`` and a ``Series``, the index and column alignment is similarly maintained.\n", "Operations between a ``DataFrame`` and a ``Series`` are similar to operations between a two-dimensional and one-dimensional NumPy array.\n", "Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[3, 8, 2, 4],\n", " [2, 6, 4, 8],\n", " [6, 1, 3, 8]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = rng.randint(10, size=(3, 4))\n", "A" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 0, 0, 0],\n", " [-1, -2, 2, 4],\n", " [ 3, -7, 1, 4]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A - A[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "According to NumPy's broadcasting rules (see [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb)), subtraction between a two-dimensional array and one of its rows is applied row-wise.\n", "\n", "In Pandas, the convention similarly operates row-wise by default:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
QRST
00000
1-1-224
23-714
\n", "
" ], "text/plain": [ " Q R S T\n", "0 0 0 0 0\n", "1 -1 -2 2 4\n", "2 3 -7 1 4" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(A, columns=list('QRST'))\n", "df - df.iloc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you would instead like to operate column-wise, you can use the object methods mentioned earlier, while specifying the ``axis`` keyword:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
QRST
0-50-6-4
1-40-22
25027
\n", "
" ], "text/plain": [ " Q R S T\n", "0 -5 0 -6 -4\n", "1 -4 0 -2 2\n", "2 5 0 2 7" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.subtract(df['R'], axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that these ``DataFrame``/``Series`` operations, like the operations discussed above, will automatically align indices between the two elements:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Q 3\n", "S 2\n", "Name: 0, dtype: int64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "halfrow = df.iloc[0, ::2]\n", "halfrow" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
QRST
00.0NaN0.0NaN
1-1.0NaN2.0NaN
23.0NaN1.0NaN
\n", "
