# Exploring tabular data
When working with data in tables, the ability of quickly getting an overview about the data is key.

In [1]:
import pandas as pd 

## Loading CSV files from disk
To ensure compatility beween different software for processing tabular data the [CSV file format](https://en.wikipedia.org/wiki/Comma-separated_values) is commonly used. We can open those files using [pandas.read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).

In [2]:
data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')
data

Unnamed: 0,Area,Mean,StdDev,Min,Max,X,Y,XM,YM,Major,Minor,Angle,%Area,Type
,,,,,,,,,,,,,,
1,18.0,730.389,103.354,592.0,948.0,435.000,4.722,434.962,4.697,5.987,3.828,168.425,100,A
2,126.0,718.333,90.367,556.0,1046.0,388.087,8.683,388.183,8.687,16.559,9.688,175.471,100,A
3,,,,608.0,964.0,,,,7.665,7.359,,101.121,100,A
4,68.0,686.985,61.169,571.0,880.0,126.147,8.809,126.192,8.811,15.136,5.720,168.133,100,A
5,,,69.438,566.0,792.0,348.500,7.500,,7.508,,3.088,,100,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
387,152.0,801.599,111.328,582.0,1263.0,348.487,497.632,348.451,497.675,17.773,10.889,11.829,100,A
388,17.0,742.706,69.624,620.0,884.0,420.500,496.382,420.513,,,3.663,49.457,100,A
389,60.0,758.033,77.309,601.0,947.0,259.000,499.300,258.990,499.289,9.476,8.062,90.000,100,A


## Viewing the data
Viewing data can be tricky, especially when working with large tables.

In [3]:
data.head(10) # top 10 rows

Unnamed: 0,Area,Mean,StdDev,Min,Max,X,Y,XM,YM,Major,Minor,Angle,%Area,Type
,,,,,,,,,,,,,,
1.0,18.0,730.389,103.354,592.0,948.0,435.0,4.722,434.962,4.697,5.987,3.828,168.425,100.0,A
2.0,126.0,718.333,90.367,556.0,1046.0,388.087,8.683,388.183,8.687,16.559,9.688,175.471,100.0,A
3.0,,,,608.0,964.0,,,,7.665,7.359,,101.121,100.0,A
4.0,68.0,686.985,61.169,571.0,880.0,126.147,8.809,126.192,8.811,15.136,5.72,168.133,100.0,A
5.0,,,69.438,566.0,792.0,348.5,7.5,,7.508,,3.088,,100.0,A
6.0,669.0,697.164,72.863,539.0,957.0,471.696,26.253,471.694,26.197,36.656,23.237,124.34,100.0,A
7.0,5.0,658.6,49.161,607.0,710.0,28.3,8.1,28.284,8.103,3.144,2.025,161.565,100.0,A
8.0,7.0,677.571,49.899,596.0,768.0,415.357,8.786,415.36,8.804,4.11,2.168,112.5,100.0,A
9.0,14.0,691.071,63.873,586.0,808.0,493.286,9.0,493.295,9.016,5.12,3.481,38.802,100.0,C


In [4]:
data.tail(10) # bottom 10 rows

Unnamed: 0,Area,Mean,StdDev,Min,Max,X,Y,XM,YM,Major,Minor,Angle,%Area,Type
,,,,,,,,,,,,,,
382.0,45.0,734.356,68.637,575.0,867.0,171.5,494.789,171.492,494.739,14.63,3.916,95.698,100.0,B
383.0,94.0,746.617,85.198,550.0,1021.0,194.032,498.223,194.014,498.239,17.295,6.92,52.72,100.0,B
384.0,35.0,776.257,74.746,611.0,961.0,268.957,493.586,268.977,,,5.99,111.193,100.0,A
385.0,35.0,739.286,,593.0,928.0,291.871,493.843,291.871,493.806,,5.352,79.368,100.0,A
386.0,14.0,736.143,81.533,646.0,902.0,315.0,493.0,314.989,493.003,,3.676,45.0,100.0,A
387.0,152.0,801.599,111.328,582.0,1263.0,348.487,497.632,348.451,497.675,17.773,10.889,11.829,100.0,A
388.0,17.0,742.706,69.624,620.0,884.0,420.5,496.382,420.513,,,3.663,49.457,100.0,A
389.0,60.0,758.033,77.309,601.0,947.0,259.0,499.3,258.99,499.289,9.476,8.062,90.0,100.0,A
390.0,12.0,714.833,67.294,551.0,785.0,240.167,498.167,240.179,498.148,4.606,3.317,168.69,100.0,A


## Overview descriptive statistics
To get a glimpse of the range of values which exist in the given table, we can ask the DateFrame to _describe_ itself using [`DataFrame.describe()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html). It will display count, mean, standard deviation and other descriptive statistics for each column in our table.

In [5]:
data.describe()

Unnamed: 0,Area,Mean,StdDev,Min,Max,X,Y,XM,YM,Major,Minor,Angle,%Area
count,389.0,386.0,388.0,388.0,388.0,389.0,388.0,388.0,386.0,383.0,388.0,390.0,391.0
mean,107.164524,743.455565,76.575309,610.414948,962.92268,256.419859,254.384088,256.183338,253.353005,12.481016,9.500662,86.598441,100.0
std,241.037082,42.25214,31.844864,57.156709,244.897224,152.261694,155.080074,152.380388,154.42625,11.979176,49.71428,60.593686,0.0
min,1.0,587.0,0.0,516.0,587.0,3.978,4.722,4.012,4.697,1.128,1.128,0.0,100.0
25%,15.0,717.06075,63.861,570.75,847.75,127.142,102.87525,126.92325,103.81375,5.098,3.63725,34.51725,100.0
50%,44.0,741.0775,74.727,599.0,917.5,243.3,271.49,242.288,271.272,9.374,5.886,89.7035,100.0
75%,116.0,767.26075,86.8265,633.25,1014.5,400.167,395.05825,400.3635,393.80075,16.283,9.01725,134.61725,100.0
max,2755.0,912.938,377.767,877.0,3880.0,508.214,503.022,508.169,502.979,144.475,981.0,568.0,100.0


## Sorting in tables
In many cases, we are interested in table rows that contain the maximum value, e.g. in the `area` column we can find the largest object:

In [6]:
data.sort_values(by = "Area", ascending=False)

Unnamed: 0,Area,Mean,StdDev,Min,Max,X,Y,XM,YM,Major,Minor,Angle,%Area,Type
,,,,,,,,,,,,,,
190,2755.0,859.928,235.458,539.0,3880.0,108.710,302.158,110.999,300.247,144.475,24.280,39.318,100,C
81,2295.0,765.239,96.545,558.0,1431.0,375.003,134.888,374.982,135.359,65.769,44.429,127.247,100,B
209,1821.0,847.761,122.074,600.0,1510.0,287.795,321.115,288.074,321.824,55.879,41.492,112.124,100,A
252,1528.0,763.777,83.183,572.0,1172.0,191.969,385.944,192.487,385.697,63.150,30.808,34.424,100,B
265,1252.0,793.371,117.139,579.0,1668.0,262.071,394.497,262.268,394.326,60.154,26.500,50.147,100,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113,1.0,587.000,0.000,587.0,587.0,399.500,117.500,399.500,117.500,1.128,1.128,0.000,100,A
310,1.0,866.000,0.000,866.0,866.0,343.500,408.500,343.500,408.500,1.128,1.128,0.000,100,A
219,1.0,763.000,0.000,763.0,763.0,411.500,296.500,411.500,296.500,1.128,1.128,0.000,100,A
