{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## DataFrame Creation\n", "\n", "In this notebook, we will learn to create new ```DataFrame``` object from other data structures( e.g.,numpy array and dictionary) and convert data frame to numpy array and dictionary. The defult setting for pandas ```DataFrame``` is \n", "\n", "```pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "sns.set()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. To create new ```DataFrame``` from Numpy array. \n", "\n", "Let's create a random array of size(100,20) and random column names. We will use these array and column names to create the ```DataFrame``` in next step." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import random as random\n", "A = np.random.rand(100,10)\n", "letter = ['A','B','C','D','E','F','G','H','X']\n", "\n", "def namer(n):\n", " col_names = [ random.choice(letter)\\\n", " +random.choice(letter)\\\n", " +random.choice(letter)\\\n", " +random.choice(letter) for i in range(n)]\n", " return col_names" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['HHEE', 'FEGD', 'BFHC', 'HXFC', 'CBDF', 'DEDH', 'CBCX', 'XGXB', 'GCBC', 'FDEE']\n" ] } ], "source": [ "print(namer(A.shape[1]))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AAGFXBGADXACXEDBEDCGABDHGFHXFAABBBECDGDC
00.2850580.4567330.8412920.3979570.8889820.9707820.3796870.5651770.6579390.461385
10.7250050.4327560.0378010.0202030.5909010.5905710.6235290.1665810.1793920.454290
20.6687430.3523320.6429050.4274610.0251240.3654140.6099830.5686860.5227380.525048
30.8071650.8274780.5420880.6287430.6167450.3863700.6322250.3877940.6186860.786503
40.6459450.1360780.7695460.7218850.1078910.1288590.9384510.8754920.6477020.148635
\n", "
" ], "text/plain": [ " AAGF XBGA DXAC XEDB EDCG ABDH GFHX \\\n", "0 0.285058 0.456733 0.841292 0.397957 0.888982 0.970782 0.379687 \n", "1 0.725005 0.432756 0.037801 0.020203 0.590901 0.590571 0.623529 \n", "2 0.668743 0.352332 0.642905 0.427461 0.025124 0.365414 0.609983 \n", "3 0.807165 0.827478 0.542088 0.628743 0.616745 0.386370 0.632225 \n", "4 0.645945 0.136078 0.769546 0.721885 0.107891 0.128859 0.938451 \n", "\n", " FAAB BBEC DGDC \n", "0 0.565177 0.657939 0.461385 \n", "1 0.166581 0.179392 0.454290 \n", "2 0.568686 0.522738 0.525048 \n", "3 0.387794 0.618686 0.786503 \n", "4 0.875492 0.647702 0.148635 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(A, columns = col_names )\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- To save data from ```new DataFrame``` to a file:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "df.to_csv('data/test.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. To create new ```DataFrame``` from list of dictionaries.\n", "\n", "Here we will create a list with collection of dictionaries. Each of the dictionary will have keys and values. Using this list of dictionaries, we will create another ```DataFrame```. The keys of the dictionary will serve as the column names." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "LD = []\n", "for i in range(100):\n", " LD.append({'Player' : namer(1)[0],\\\n", " 'game1' : random.uniform(0,1),\\\n", " 'game2' : random.uniform(0,1),\\\n", " 'game3' : random.uniform(0,1),\n", " 'game4' : random.uniform(0,1),\n", " 'game5' : random.uniform(0,1)})" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Player': 'BGXB',\n", " 'game1': 0.2965944756471328,\n", " 'game2': 0.11334763879800447,\n", " 'game3': 0.028543866127768824,\n", " 'game4': 0.225405432495144,\n", " 'game5': 0.05423542200055986}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LD[0]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "DF = pd.DataFrame(LD)\n", "DF=DF.set_index(\"Player\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
game1game2game3game4game5
Player
BGXB0.2965940.1133480.0285440.2254050.054235
DBDB0.0472260.1070650.8015710.8168770.556934
AXXH0.8626110.4390510.0833410.3897850.258748
BXED0.6435330.0821760.1672410.4053040.088063
FXGE0.2790760.0009980.9494140.3034080.009342
AHDC0.6171940.2724010.2526630.7887980.130996
CXGA0.1045520.8951060.4148770.1676430.454175
BDBA0.0456500.9267420.4540970.0550060.939082
HBDC0.0691920.7972240.9436480.5673340.044285
EBBX0.3833650.8527880.6793300.4185700.817291
\n", "
" ], "text/plain": [ " game1 game2 game3 game4 game5\n", "Player \n", "BGXB 0.296594 0.113348 0.028544 0.225405 0.054235\n", "DBDB 0.047226 0.107065 0.801571 0.816877 0.556934\n", "AXXH 0.862611 0.439051 0.083341 0.389785 0.258748\n", "BXED 0.643533 0.082176 0.167241 0.405304 0.088063\n", "FXGE 0.279076 0.000998 0.949414 0.303408 0.009342\n", "AHDC 0.617194 0.272401 0.252663 0.788798 0.130996\n", "CXGA 0.104552 0.895106 0.414877 0.167643 0.454175\n", "BDBA 0.045650 0.926742 0.454097 0.055006 0.939082\n", "HBDC 0.069192 0.797224 0.943648 0.567334 0.044285\n", "EBBX 0.383365 0.852788 0.679330 0.418570 0.817291" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DF.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. To create ```DataFrame``` from a List :" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
00.8163890.2125300.7051850.225743
10.6464960.8768690.6483500.788687
20.8691730.8320040.5160090.917783
30.6688660.7788500.8345280.243842
40.7423110.9843130.8725120.451476
\n", "
" ], "text/plain": [ " A B C D\n", "0 0.816389 0.212530 0.705185 0.225743\n", "1 0.646496 0.876869 0.648350 0.788687\n", "2 0.869173 0.832004 0.516009 0.917783\n", "3 0.668866 0.778850 0.834528 0.243842\n", "4 0.742311 0.984313 0.872512 0.451476" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [random.uniform(0,1)for i in range(10)]\n", "B = [random.uniform(0,1)for i in range(10)]\n", "C = [random.uniform(0,1)for i in range(10)]\n", "D = [random.uniform(0,1)for i in range(10)]\n", "\n", "df = pd.DataFrame()\n", "df['A'],df['B'],df['C'],df['D'] = A,B,C,D\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References:\n", "1. [Pydata document for Styling DataFrame visualization](https://pandas.pydata.org/docs/user_guide/style.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }