{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame Creation\n",
"\n",
"In this notebook, we will learn to create new ```DataFrame``` object from other data structures( e.g.,numpy array and dictionary) and convert data frame to numpy array and dictionary. The defult setting for pandas ```DataFrame``` is \n",
"\n",
"```pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"sns.set()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1. To create new ```DataFrame``` from Numpy array. \n",
"\n",
"Let's create a random array of size(100,20) and random column names. We will use these array and column names to create the ```DataFrame``` in next step."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import random as random\n",
"A = np.random.rand(100,10)\n",
"letter = ['A','B','C','D','E','F','G','H','X']\n",
"\n",
"def namer(n):\n",
" col_names = [ random.choice(letter)\\\n",
" +random.choice(letter)\\\n",
" +random.choice(letter)\\\n",
" +random.choice(letter) for i in range(n)]\n",
" return col_names"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['HHEE', 'FEGD', 'BFHC', 'HXFC', 'CBDF', 'DEDH', 'CBCX', 'XGXB', 'GCBC', 'FDEE']\n"
]
}
],
"source": [
"print(namer(A.shape[1]))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" AAGF | \n",
" XBGA | \n",
" DXAC | \n",
" XEDB | \n",
" EDCG | \n",
" ABDH | \n",
" GFHX | \n",
" FAAB | \n",
" BBEC | \n",
" DGDC | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.285058 | \n",
" 0.456733 | \n",
" 0.841292 | \n",
" 0.397957 | \n",
" 0.888982 | \n",
" 0.970782 | \n",
" 0.379687 | \n",
" 0.565177 | \n",
" 0.657939 | \n",
" 0.461385 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.725005 | \n",
" 0.432756 | \n",
" 0.037801 | \n",
" 0.020203 | \n",
" 0.590901 | \n",
" 0.590571 | \n",
" 0.623529 | \n",
" 0.166581 | \n",
" 0.179392 | \n",
" 0.454290 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.668743 | \n",
" 0.352332 | \n",
" 0.642905 | \n",
" 0.427461 | \n",
" 0.025124 | \n",
" 0.365414 | \n",
" 0.609983 | \n",
" 0.568686 | \n",
" 0.522738 | \n",
" 0.525048 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.807165 | \n",
" 0.827478 | \n",
" 0.542088 | \n",
" 0.628743 | \n",
" 0.616745 | \n",
" 0.386370 | \n",
" 0.632225 | \n",
" 0.387794 | \n",
" 0.618686 | \n",
" 0.786503 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.645945 | \n",
" 0.136078 | \n",
" 0.769546 | \n",
" 0.721885 | \n",
" 0.107891 | \n",
" 0.128859 | \n",
" 0.938451 | \n",
" 0.875492 | \n",
" 0.647702 | \n",
" 0.148635 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" AAGF XBGA DXAC XEDB EDCG ABDH GFHX \\\n",
"0 0.285058 0.456733 0.841292 0.397957 0.888982 0.970782 0.379687 \n",
"1 0.725005 0.432756 0.037801 0.020203 0.590901 0.590571 0.623529 \n",
"2 0.668743 0.352332 0.642905 0.427461 0.025124 0.365414 0.609983 \n",
"3 0.807165 0.827478 0.542088 0.628743 0.616745 0.386370 0.632225 \n",
"4 0.645945 0.136078 0.769546 0.721885 0.107891 0.128859 0.938451 \n",
"\n",
" FAAB BBEC DGDC \n",
"0 0.565177 0.657939 0.461385 \n",
"1 0.166581 0.179392 0.454290 \n",
"2 0.568686 0.522738 0.525048 \n",
"3 0.387794 0.618686 0.786503 \n",
"4 0.875492 0.647702 0.148635 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(A, columns = col_names )\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- To save data from ```new DataFrame``` to a file:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('data/test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2. To create new ```DataFrame``` from list of dictionaries.\n",
"\n",
"Here we will create a list with collection of dictionaries. Each of the dictionary will have keys and values. Using this list of dictionaries, we will create another ```DataFrame```. The keys of the dictionary will serve as the column names."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"LD = []\n",
"for i in range(100):\n",
" LD.append({'Player' : namer(1)[0],\\\n",
" 'game1' : random.uniform(0,1),\\\n",
" 'game2' : random.uniform(0,1),\\\n",
" 'game3' : random.uniform(0,1),\n",
" 'game4' : random.uniform(0,1),\n",
" 'game5' : random.uniform(0,1)})"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Player': 'BGXB',\n",
" 'game1': 0.2965944756471328,\n",
" 'game2': 0.11334763879800447,\n",
" 'game3': 0.028543866127768824,\n",
" 'game4': 0.225405432495144,\n",
" 'game5': 0.05423542200055986}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"LD[0]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"DF = pd.DataFrame(LD)\n",
"DF=DF.set_index(\"Player\")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" game1 | \n",
" game2 | \n",
" game3 | \n",
" game4 | \n",
" game5 | \n",
"
\n",
" \n",
" | Player | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | BGXB | \n",
" 0.296594 | \n",
" 0.113348 | \n",
" 0.028544 | \n",
" 0.225405 | \n",
" 0.054235 | \n",
"
\n",
" \n",
" | DBDB | \n",
" 0.047226 | \n",
" 0.107065 | \n",
" 0.801571 | \n",
" 0.816877 | \n",
" 0.556934 | \n",
"
\n",
" \n",
" | AXXH | \n",
" 0.862611 | \n",
" 0.439051 | \n",
" 0.083341 | \n",
" 0.389785 | \n",
" 0.258748 | \n",
"
\n",
" \n",
" | BXED | \n",
" 0.643533 | \n",
" 0.082176 | \n",
" 0.167241 | \n",
" 0.405304 | \n",
" 0.088063 | \n",
"
\n",
" \n",
" | FXGE | \n",
" 0.279076 | \n",
" 0.000998 | \n",
" 0.949414 | \n",
" 0.303408 | \n",
" 0.009342 | \n",
"
\n",
" \n",
" | AHDC | \n",
" 0.617194 | \n",
" 0.272401 | \n",
" 0.252663 | \n",
" 0.788798 | \n",
" 0.130996 | \n",
"
\n",
" \n",
" | CXGA | \n",
" 0.104552 | \n",
" 0.895106 | \n",
" 0.414877 | \n",
" 0.167643 | \n",
" 0.454175 | \n",
"
\n",
" \n",
" | BDBA | \n",
" 0.045650 | \n",
" 0.926742 | \n",
" 0.454097 | \n",
" 0.055006 | \n",
" 0.939082 | \n",
"
\n",
" \n",
" | HBDC | \n",
" 0.069192 | \n",
" 0.797224 | \n",
" 0.943648 | \n",
" 0.567334 | \n",
" 0.044285 | \n",
"
\n",
" \n",
" | EBBX | \n",
" 0.383365 | \n",
" 0.852788 | \n",
" 0.679330 | \n",
" 0.418570 | \n",
" 0.817291 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" game1 game2 game3 game4 game5\n",
"Player \n",
"BGXB 0.296594 0.113348 0.028544 0.225405 0.054235\n",
"DBDB 0.047226 0.107065 0.801571 0.816877 0.556934\n",
"AXXH 0.862611 0.439051 0.083341 0.389785 0.258748\n",
"BXED 0.643533 0.082176 0.167241 0.405304 0.088063\n",
"FXGE 0.279076 0.000998 0.949414 0.303408 0.009342\n",
"AHDC 0.617194 0.272401 0.252663 0.788798 0.130996\n",
"CXGA 0.104552 0.895106 0.414877 0.167643 0.454175\n",
"BDBA 0.045650 0.926742 0.454097 0.055006 0.939082\n",
"HBDC 0.069192 0.797224 0.943648 0.567334 0.044285\n",
"EBBX 0.383365 0.852788 0.679330 0.418570 0.817291"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DF.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3. To create ```DataFrame``` from a List :"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
" D | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.816389 | \n",
" 0.212530 | \n",
" 0.705185 | \n",
" 0.225743 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.646496 | \n",
" 0.876869 | \n",
" 0.648350 | \n",
" 0.788687 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.869173 | \n",
" 0.832004 | \n",
" 0.516009 | \n",
" 0.917783 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.668866 | \n",
" 0.778850 | \n",
" 0.834528 | \n",
" 0.243842 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.742311 | \n",
" 0.984313 | \n",
" 0.872512 | \n",
" 0.451476 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C D\n",
"0 0.816389 0.212530 0.705185 0.225743\n",
"1 0.646496 0.876869 0.648350 0.788687\n",
"2 0.869173 0.832004 0.516009 0.917783\n",
"3 0.668866 0.778850 0.834528 0.243842\n",
"4 0.742311 0.984313 0.872512 0.451476"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = [random.uniform(0,1)for i in range(10)]\n",
"B = [random.uniform(0,1)for i in range(10)]\n",
"C = [random.uniform(0,1)for i in range(10)]\n",
"D = [random.uniform(0,1)for i in range(10)]\n",
"\n",
"df = pd.DataFrame()\n",
"df['A'],df['B'],df['C'],df['D'] = A,B,C,D\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### References:\n",
"1. [Pydata document for Styling DataFrame visualization](https://pandas.pydata.org/docs/user_guide/style.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}