Overview

Dataset statistics

Number of variables4
Number of observations1393570
Missing cells459028
Missing cells (%)8.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.5 MiB
Average record size in memory32.0 B

Variable types

Numeric1
Categorical2
Boolean1

Warnings

date has a high cardinality: 365 distinct values High cardinality
price has a high cardinality: 669 distinct values High cardinality
price has 459028 (32.9%) missing values Missing
date is uniformly distributed Uniform

Reproduction

Analysis started2021-10-01 03:59:22.354937
Analysis finished2021-10-01 03:59:44.050034
Duration21.7 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

listing_id
Real number (ℝ≥0)

Distinct3818
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5550111.419
Minimum3335
Maximum10340165
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.6 MiB
2021-10-01T03:59:44.208387image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum3335
5-th percentile430453
Q13258213
median6118244.5
Q38035212
95-th percentile9666446
Maximum10340165
Range10336830
Interquartile range (IQR)4776999

Descriptive statistics

Standard deviation2962273.53
Coefficient of variation (CV)0.5337322635
Kurtosis-1.104322694
Mean5550111.419
Median Absolute Deviation (MAD)2287820
Skewness-0.3096605895
Sum7.73446877 × 1012
Variance8.775064467 × 1012
MonotonicityNot monotonic
2021-10-01T03:59:44.480032image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
241032365
 
< 0.1%
9299824365
 
< 0.1%
8597687365
 
< 0.1%
2309250365
 
< 0.1%
7420339365
 
< 0.1%
1742425365
 
< 0.1%
4559985365
 
< 0.1%
6304139365
 
< 0.1%
2610187365
 
< 0.1%
8508341365
 
< 0.1%
Other values (3808)1389920
99.7%
ValueCountFrequency (%)
3335365
< 0.1%
4291365
< 0.1%
5682365
< 0.1%
6606365
< 0.1%
7369365
< 0.1%
9419365
< 0.1%
9460365
< 0.1%
9531365
< 0.1%
9534365
< 0.1%
9596365
< 0.1%
ValueCountFrequency (%)
10340165365
< 0.1%
10339145365
< 0.1%
10339144365
< 0.1%
10334184365
< 0.1%
10332096365
< 0.1%
10331249365
< 0.1%
10319529365
< 0.1%
10318171365
< 0.1%
10310373365
< 0.1%
10309898365
< 0.1%

date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct365
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.6 MiB
2016-01-04
 
3818
2016-09-11
 
3818
2016-09-09
 
3818
2016-09-08
 
3818
2016-09-07
 
3818
Other values (360)
1374480 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters13935700
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016-01-04
2nd row2016-01-05
3rd row2016-01-06
4th row2016-01-07
5th row2016-01-08

Common Values

ValueCountFrequency (%)
2016-01-043818
 
0.3%
2016-09-113818
 
0.3%
2016-09-093818
 
0.3%
2016-09-083818
 
0.3%
2016-09-073818
 
0.3%
2016-09-063818
 
0.3%
2016-09-053818
 
0.3%
2016-09-043818
 
0.3%
2016-09-033818
 
0.3%
2016-09-023818
 
0.3%
Other values (355)1355390
97.3%

Length

2021-10-01T03:59:45.256887image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016-01-043818
 
0.3%
2016-09-113818
 
0.3%
2016-09-093818
 
0.3%
2016-09-083818
 
0.3%
2016-09-073818
 
0.3%
2016-09-063818
 
0.3%
2016-09-053818
 
0.3%
2016-09-043818
 
0.3%
2016-09-033818
 
0.3%
2016-09-023818
 
0.3%
Other values (355)1355390
97.3%

Most occurring characters

ValueCountFrequency (%)
03096398
22.2%
-2787140
20.0%
12596240
18.6%
22218258
15.9%
61637922
11.8%
3320712
 
2.3%
7263442
 
1.9%
5255806
 
1.8%
8255806
 
1.8%
4251988
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number11148560
80.0%
Dash Punctuation2787140
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03096398
27.8%
12596240
23.3%
22218258
19.9%
61637922
14.7%
3320712
 
2.9%
7263442
 
2.4%
5255806
 
2.3%
8255806
 
2.3%
4251988
 
2.3%
9251988
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
-2787140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common13935700
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03096398
22.2%
-2787140
20.0%
12596240
18.6%
22218258
15.9%
61637922
11.8%
3320712
 
2.3%
7263442
 
1.9%
5255806
 
1.8%
8255806
 
1.8%
4251988
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII13935700
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03096398
22.2%
-2787140
20.0%
12596240
18.6%
22218258
15.9%
61637922
11.8%
3320712
 
2.3%
7263442
 
1.9%
5255806
 
1.8%
8255806
 
1.8%
4251988
 
1.8%

available
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
True
934542 
False
459028 
ValueCountFrequency (%)
True934542
67.1%
False459028
32.9%
2021-10-01T03:59:45.393701image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

price
Categorical

HIGH CARDINALITY
MISSING

Distinct669
Distinct (%)0.1%
Missing459028
Missing (%)32.9%
Memory size10.6 MiB
$150.00
 
36646
$100.00
 
31755
$75.00
 
29820
$125.00
 
27538
$65.00
 
26415
Other values (664)
782368 

Length

Max length9
Median length7
Mean length6.555124328
Min length6

Characters and Unicode

Total characters6126039
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)< 0.1%

Sample

1st row$85.00
2nd row$85.00
3rd row$85.00
4th row$85.00
5th row$85.00

Common Values

ValueCountFrequency (%)
$150.0036646
 
2.6%
$100.0031755
 
2.3%
$75.0029820
 
2.1%
$125.0027538
 
2.0%
$65.0026415
 
1.9%
$90.0024942
 
1.8%
$95.0024327
 
1.7%
$99.0023629
 
1.7%
$85.0023455
 
1.7%
$80.0019817
 
1.4%
Other values (659)666198
47.8%
(Missing)459028
32.9%

Length

2021-10-01T03:59:45.803566image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
150.0036646
 
3.9%
100.0031755
 
3.4%
75.0029820
 
3.2%
125.0027538
 
2.9%
65.0026415
 
2.8%
90.0024942
 
2.7%
95.0024327
 
2.6%
99.0023629
 
2.5%
85.0023455
 
2.5%
80.0019817
 
2.1%
Other values (659)666198
71.3%

Most occurring characters

ValueCountFrequency (%)
02298464
37.5%
$934542
15.3%
.934542
15.3%
1431978
 
7.1%
5427863
 
7.0%
9263820
 
4.3%
2216940
 
3.5%
7138000
 
2.3%
8128196
 
2.1%
6119875
 
2.0%
Other values (3)231819
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4256258
69.5%
Other Punctuation935239
 
15.3%
Currency Symbol934542
 
15.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02298464
54.0%
1431978
 
10.1%
5427863
 
10.1%
9263820
 
6.2%
2216940
 
5.1%
7138000
 
3.2%
8128196
 
3.0%
6119875
 
2.8%
4119803
 
2.8%
3111319
 
2.6%
Other Punctuation
ValueCountFrequency (%)
.934542
99.9%
,697
 
0.1%
Currency Symbol
ValueCountFrequency (%)
$934542
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6126039
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02298464
37.5%
$934542
15.3%
.934542
15.3%
1431978
 
7.1%
5427863
 
7.0%
9263820
 
4.3%
2216940
 
3.5%
7138000
 
2.3%
8128196
 
2.1%
6119875
 
2.0%
Other values (3)231819
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6126039
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02298464
37.5%
$934542
15.3%
.934542
15.3%
1431978
 
7.1%
5427863
 
7.0%
9263820
 
4.3%
2216940
 
3.5%
7138000
 
2.3%
8128196
 
2.1%
6119875
 
2.0%
Other values (3)231819
 
3.8%

Interactions

2021-10-01T03:59:40.750043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-01T03:59:45.967766image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-01T03:59:46.144695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-01T03:59:46.307436image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-01T03:59:46.468834image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-01T03:59:41.298082image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-01T03:59:42.164174image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-01T03:59:43.489451image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

listing_iddateavailableprice
02410322016-01-04t$85.00
12410322016-01-05t$85.00
22410322016-01-06fNaN
32410322016-01-07fNaN
42410322016-01-08fNaN
52410322016-01-09fNaN
62410322016-01-10fNaN
72410322016-01-11fNaN
82410322016-01-12fNaN
92410322016-01-13t$85.00

Last rows

listing_iddateavailableprice
1393560102086232016-12-24fNaN
1393561102086232016-12-25fNaN
1393562102086232016-12-26fNaN
1393563102086232016-12-27fNaN
1393564102086232016-12-28fNaN
1393565102086232016-12-29fNaN
1393566102086232016-12-30fNaN
1393567102086232016-12-31fNaN
1393568102086232017-01-01fNaN
1393569102086232017-01-02fNaN