{
"metadata": {
"celltoolbar": "Slideshow",
"name": "",
"signature": "sha256:e1f952a8600390364b6b97ee76ae75210e3615993b9f2ac1e6e271b023c664dd"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Me: Chris Hausler\n",
"# Today - Pandas and Scikit-Learn\n",
"\n",
"## And a lot of firsts\n",
" + first MPUG meeting... Hi \n",
" + first presentation using IPython Notebook"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Python Data Analysis Library (pandas)\n",
"\n",
"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
\n",
"Creates somthing similar to R DataFrames.. but better\n",
"\n",
"### I think it's great, but I'm still a bit clumsy with it .. also the doco is still a little hit and miss\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Some imports"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import warnings\n",
"warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import pandas as pd\n",
"import pylab as plt\n",
"import matplotlib \n",
"%matplotlib inline\n",
"pd.__version__"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"'0.13.1'"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### pandas
has two main data structures: Series
and DataFrame
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Series - Like a one dimensional array but better"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"values = [5,3,4,8,2,9]\n",
"vals = pd.Series(values)\n",
"vals"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"0 5\n",
"1 3\n",
"2 4\n",
"3 8\n",
"4 2\n",
"5 9\n",
"dtype: int64"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each value is now associated with an _index_. The index itself is an object of class Index
and can be manipulated directly."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals.index"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals.values"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"array([5, 3, 4, 8, 2, 9])"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals * 2.5"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"0 12.5\n",
"1 7.5\n",
"2 10.0\n",
"3 20.0\n",
"4 5.0\n",
"5 22.5\n",
"dtype: float64"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We can give named indexes"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals2 = pd.Series(values, index=['tom','sally','jeff','george','pablo','florence'])\n",
"vals2"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"tom 5\n",
"sally 3\n",
"jeff 4\n",
"george 8\n",
"pablo 2\n",
"florence 9\n",
"dtype: int64"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"And use these to get the data we want"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals2[['florence','tom']]"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"florence 9\n",
"tom 5\n",
"dtype: int64"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals2[['florence','tom','kate']]"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"florence 9\n",
"tom 5\n",
"kate NaN\n",
"dtype: float64"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Dealing with missing values"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals3 = vals2[['tom','sally','pablo','florence','ricky','katrin']]\n",
"vals3"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"tom 5\n",
"sally 3\n",
"pablo 2\n",
"florence 9\n",
"ricky NaN\n",
"katrin NaN\n",
"dtype: float64"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Get rid of them"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals3.dropna()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"tom 5\n",
"sally 3\n",
"pablo 2\n",
"florence 9\n",
"dtype: float64"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Fill them with a value"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\n",
"vals3.fillna(0)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"tom 5\n",
"sally 3\n",
"pablo 2\n",
"florence 9\n",
"ricky 0\n",
"katrin 0\n",
"dtype: float64"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Fill them with a calculated value"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals3.fillna(vals3.mean())"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"tom 5.00\n",
"sally 3.00\n",
"pablo 2.00\n",
"florence 9.00\n",
"ricky 4.75\n",
"katrin 4.75\n",
"dtype: float64"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Use a function like forward fill"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals3.fillna(method='ffill')"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"tom 5\n",
"sally 3\n",
"pablo 2\n",
"florence 9\n",
"ricky 9\n",
"katrin 9\n",
"dtype: float64"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"A handy way to get a picture of our data"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals3.describe()"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"count 4.000000\n",
"mean 4.750000\n",
"std 3.095696\n",
"min 2.000000\n",
"25% 2.750000\n",
"50% 4.000000\n",
"75% 6.000000\n",
"max 9.000000\n",
"dtype: float64"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## DataFrame - Like a 2D array... with bells and whistles"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"vals.index=pd.Index(['tom','sally','pablo','florence','ricky','katrin'])\n",
"vals3=vals3[['tom','sally','pablo','florence','billy','katrin']]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# create a dataframe\n",
"dat = pd.DataFrame({'orig':vals,'new':vals3})\n",
"dat"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n", " | new | \n", "orig | \n", "
---|---|---|
billy | \n", "NaN | \n", "NaN | \n", "
florence | \n", "9 | \n", "8 | \n", "
katrin | \n", "NaN | \n", "9 | \n", "
pablo | \n", "2 | \n", "4 | \n", "
ricky | \n", "NaN | \n", "2 | \n", "
sally | \n", "3 | \n", "3 | \n", "
tom | \n", "5 | \n", "5 | \n", "
7 rows \u00d7 2 columns
\n", "\n", " | new | \n", "orig | \n", "
---|---|---|
billy | \n", "True | \n", "True | \n", "
florence | \n", "False | \n", "False | \n", "
katrin | \n", "True | \n", "False | \n", "
pablo | \n", "False | \n", "False | \n", "
ricky | \n", "True | \n", "False | \n", "
sally | \n", "False | \n", "False | \n", "
tom | \n", "False | \n", "False | \n", "
7 rows \u00d7 2 columns
\n", "\n", " | new | \n", "orig | \n", "
---|---|---|
florence | \n", "9 | \n", "8 | \n", "
pablo | \n", "2 | \n", "4 | \n", "
sally | \n", "3 | \n", "3 | \n", "
tom | \n", "5 | \n", "5 | \n", "
4 rows \u00d7 2 columns
\n", "\n", " | Date | \n", "hipster | \n", "modcloth | \n", "gumtree perth | \n", "
---|---|---|---|---|
0 | \n", "2004-01-04 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "
1 | \n", "2004-01-11 | \n", "-0.816 | \n", "-0.817 | \n", "-0.844 | \n", "
2 | \n", "2004-01-18 | \n", "-0.837 | \n", "-0.817 | \n", "-0.844 | \n", "
3 | \n", "2004-01-25 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "
4 | \n", "2004-02-01 | \n", "-0.722 | \n", "-0.817 | \n", "-0.844 | \n", "
5 | \n", "2004-02-08 | \n", "-0.795 | \n", "-0.817 | \n", "-0.844 | \n", "
6 | \n", "2004-02-15 | \n", "-0.723 | \n", "-0.817 | \n", "-0.844 | \n", "
7 | \n", "2004-02-22 | \n", "-0.713 | \n", "-0.817 | \n", "-0.844 | \n", "
8 | \n", "2004-02-29 | \n", "-0.786 | \n", "-0.817 | \n", "-0.844 | \n", "
9 | \n", "2004-03-07 | \n", "-0.675 | \n", "-0.817 | \n", "-0.844 | \n", "
10 rows \u00d7 4 columns
\n", "\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "
---|---|---|---|
2004-01-04 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-01-11 | \n", "-0.816 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-01-18 | \n", "-0.837 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-01-25 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-02-01 | \n", "-0.722 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-02-08 | \n", "-0.795 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-02-15 | \n", "-0.723 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-02-22 | \n", "-0.713 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-02-29 | \n", "-0.786 | \n", "-0.817 | \n", "-0.844 | \n", "
2004-03-07 | \n", "-0.675 | \n", "-0.817 | \n", "-0.844 | \n", "
10 rows \u00d7 3 columns
\n", "\n", " | yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|
2004-01-04 | \n", "1.341 | \n", "0.668 | \n", "0.871 | \n", "
2004-01-11 | \n", "1.239 | \n", "1.000 | \n", "1.122 | \n", "
2004-01-18 | \n", "1.022 | \n", "0.768 | \n", "1.053 | \n", "
2004-01-25 | \n", "0.923 | \n", "0.943 | \n", "0.807 | \n", "
2004-02-01 | \n", "0.904 | \n", "0.799 | \n", "0.612 | \n", "
2004-02-08 | \n", "0.786 | \n", "0.613 | \n", "0.614 | \n", "
2004-02-15 | \n", "0.729 | \n", "0.956 | \n", "0.391 | \n", "
2004-02-22 | \n", "0.537 | \n", "0.667 | \n", "1.124 | \n", "
2004-02-29 | \n", "0.534 | \n", "1.415 | \n", "1.078 | \n", "
2004-03-07 | \n", "0.229 | \n", "0.220 | \n", "1.918 | \n", "
10 rows \u00d7 3 columns
\n", "numpy.ndarray
"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"hipster['gumtree perth'].values[:20]"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
"array([-0.844, -0.844, -0.844, -0.844, -0.844, -0.844, -0.844, -0.844,\n",
" -0.844, -0.844, -0.844, -0.844, -0.844, -0.844, -0.844, -0.844,\n",
" -0.844, -0.844, -0.844, -0.844])"
]
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"View the data types, they don't need to be homogenous"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"hipster.dtypes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"hipster float64\n",
"modcloth float64\n",
"gumtree perth float64\n",
"dtype: object"
]
}
],
"prompt_number": 26
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Joins on indexes are easy!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"trend = hipster.join(not_hipster, how='inner')\n",
"trend.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|---|---|---|
2004-01-04 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "1.341 | \n", "0.668 | \n", "0.871 | \n", "
2004-01-11 | \n", "-0.816 | \n", "-0.817 | \n", "-0.844 | \n", "1.239 | \n", "1.000 | \n", "1.122 | \n", "
2004-01-18 | \n", "-0.837 | \n", "-0.817 | \n", "-0.844 | \n", "1.022 | \n", "0.768 | \n", "1.053 | \n", "
2004-01-25 | \n", "-0.976 | \n", "-0.817 | \n", "-0.844 | \n", "0.923 | \n", "0.943 | \n", "0.807 | \n", "
2004-02-01 | \n", "-0.722 | \n", "-0.817 | \n", "-0.844 | \n", "0.904 | \n", "0.799 | \n", "0.612 | \n", "
5 rows \u00d7 6 columns
\n", "\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|---|---|---|
2012-01-01 | \n", "1.411 | \n", "1.192 | \n", "1.774 | \n", "-1.077 | \n", "-1.134 | \n", "-1.285 | \n", "
2012-01-08 | \n", "1.513 | \n", "1.111 | \n", "1.579 | \n", "-0.995 | \n", "-1.183 | \n", "-1.189 | \n", "
2012-01-15 | \n", "1.523 | \n", "1.427 | \n", "1.613 | \n", "-1.027 | \n", "-1.161 | \n", "-1.337 | \n", "
2012-01-22 | \n", "1.600 | \n", "1.490 | \n", "1.514 | \n", "-1.140 | \n", "-1.177 | \n", "-1.345 | \n", "
2012-01-29 | \n", "1.459 | \n", "1.561 | \n", "1.511 | \n", "-1.046 | \n", "-1.224 | \n", "-1.233 | \n", "
5 rows \u00d7 6 columns
\n", "\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|---|---|---|
2012-12-16 | \n", "1.645 | \n", "1.175 | \n", "1.407 | \n", "-1.433 | \n", "-1.515 | \n", "-1.687 | \n", "
2012-12-23 | \n", "1.591 | \n", "1.695 | \n", "1.625 | \n", "-1.698 | \n", "-1.655 | \n", "-1.504 | \n", "
2012-12-30 | \n", "1.596 | \n", "1.515 | \n", "1.868 | \n", "-1.515 | \n", "-1.598 | \n", "-1.674 | \n", "
3 rows \u00d7 6 columns
\n", "\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|---|---|---|
2004-04-11 | \n", "-0.510 | \n", "-0.817 | \n", "-0.844 | \n", "0.521 | \n", "0.301 | \n", "-0.270 | \n", "
2006-01-29 | \n", "-0.838 | \n", "-0.817 | \n", "-0.844 | \n", "1.421 | \n", "1.309 | \n", "-0.081 | \n", "
2006-06-25 | \n", "-0.799 | \n", "-0.817 | \n", "-0.833 | \n", "1.142 | \n", "1.458 | \n", "-0.070 | \n", "
2010-01-24 | \n", "-0.454 | \n", "-0.183 | \n", "-0.107 | \n", "-0.017 | \n", "0.053 | \n", "-0.010 | \n", "
2010-01-31 | \n", "-0.381 | \n", "-0.276 | \n", "-0.142 | \n", "0.187 | \n", "0.116 | \n", "-0.044 | \n", "
5 rows \u00d7 6 columns
\n", "\n", " | hipster | \n", "modcloth | \n", "gumtree perth | \n", "yellow pages | \n", "windows installer | \n", "techno | \n", "
---|---|---|---|---|---|---|
2004-01-31 | \n", "-0.90125 | \n", "-0.817 | \n", "-0.844 | \n", "1.13125 | \n", "0.84475 | \n", "0.96325 | \n", "
2004-02-29 | \n", "-0.74780 | \n", "-0.817 | \n", "-0.844 | \n", "0.69800 | \n", "0.89000 | \n", "0.76380 | \n", "
2004-03-31 | \n", "-0.78950 | \n", "-0.817 | \n", "-0.844 | \n", "0.35650 | \n", "0.73125 | \n", "1.09175 | \n", "
2004-04-30 | \n", "-0.70400 | \n", "-0.817 | \n", "-0.844 | \n", "0.48125 | \n", "0.89125 | \n", "0.41950 | \n", "
2004-05-31 | \n", "-0.81820 | \n", "-0.817 | \n", "-0.844 | \n", "0.34780 | \n", "0.62040 | \n", "0.72860 | \n", "
5 rows \u00d7 6 columns
\n", "\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "
5 rows \u00d7 12 columns
\n", "\n", " | \n", " | Age | \n", "Survived | \n", "
---|---|---|---|
Pclass | \n", "Sex | \n", "\n", " | \n", " |
1 | \n", "female | \n", "34.611765 | \n", "0.968085 | \n", "
male | \n", "41.281386 | \n", "0.368852 | \n", "|
2 | \n", "female | \n", "28.722973 | \n", "0.921053 | \n", "
male | \n", "30.740707 | \n", "0.157407 | \n", "|
3 | \n", "female | \n", "21.750000 | \n", "0.500000 | \n", "
male | \n", "26.507589 | \n", "0.135447 | \n", "
6 rows \u00d7 2 columns
\n", "\n", " | Sex | \n", "isFemale | \n", "
---|---|---|
0 | \n", "male | \n", "0 | \n", "
1 | \n", "female | \n", "1 | \n", "
2 | \n", "female | \n", "1 | \n", "
3 | \n", "female | \n", "1 | \n", "
4 | \n", "male | \n", "0 | \n", "
5 rows \u00d7 2 columns
\n", "