{ "metadata": { "name": "", "signature": "sha256:0cf1f78b6ba6bfde5f16bbcf0ddcbc279f59d4df482c314f9160298ce0b4e6f4" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample Rows Of A Dataframe In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import required modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe of test scores" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.DataFrame(np.random.randn(100, 4), columns=['test1_score', 'test2_score' ,'test3_score' ,'test4_score'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the top five rows" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test1_scoretest2_scoretest3_scoretest4_score
0-0.562832 0.285719 0.937775-1.638723
1 0.298900-1.215272 1.461132 0.866500
2-1.049831 1.767881 0.221468-1.165039
3 1.360927 0.846616-1.559061-1.340281
4-0.022707 0.946102 0.232905 0.615826
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ " test1_score test2_score test3_score test4_score\n", "0 -0.562832 0.285719 0.937775 -1.638723\n", "1 0.298900 -1.215272 1.461132 0.866500\n", "2 -1.049831 1.767881 0.221468 -1.165039\n", "3 1.360927 0.846616 -1.559061 -1.340281\n", "4 -0.022707 0.946102 0.232905 0.615826\n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Length of the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(len(df))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Randomly choose 10 rows" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rows = np.random.choice(df.index.values, 10)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert it into a dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sampled_df = df.ix[rows]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sampled_df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test1_scoretest2_scoretest3_scoretest4_score
61 0.350195-1.199999-0.277451-1.286770
46-0.310364 1.086771-0.521381 0.607132
78-0.215014 0.464960-0.369023-2.332646
9 -1.281638-0.268482-0.103900 1.559594
78-0.215014 0.464960-0.369023-2.332646
48 0.239393-0.090481 2.453789-0.126449
68-1.078161-0.712167 0.303397 0.444029
68-1.078161-0.712167 0.303397 0.444029
51 0.087971 0.397842-0.086190-0.903375
80-0.875859-0.873104 2.316806 0.518988
\n", "

10 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 37, "text": [ " test1_score test2_score test3_score test4_score\n", "61 0.350195 -1.199999 -0.277451 -1.286770\n", "46 -0.310364 1.086771 -0.521381 0.607132\n", "78 -0.215014 0.464960 -0.369023 -2.332646\n", "9 -1.281638 -0.268482 -0.103900 1.559594\n", "78 -0.215014 0.464960 -0.369023 -2.332646\n", "48 0.239393 -0.090481 2.453789 -0.126449\n", "68 -1.078161 -0.712167 0.303397 0.444029\n", "68 -1.078161 -0.712167 0.303397 0.444029\n", "51 0.087971 0.397842 -0.086190 -0.903375\n", "80 -0.875859 -0.873104 2.316806 0.518988\n", "\n", "[10 rows x 4 columns]" ] } ], "prompt_number": 37 } ], "metadata": {} } ] }