Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 1393570 |
Missing cells | 459028 |
Missing cells (%) | 8.2% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 42.5 MiB |
Average record size in memory | 32.0 B |
Variable types
Numeric | 1 |
---|---|
Categorical | 2 |
Boolean | 1 |
date has a high cardinality: 365 distinct values | High cardinality |
price has a high cardinality: 669 distinct values | High cardinality |
price has 459028 (32.9%) missing values | Missing |
date is uniformly distributed | Uniform |
Reproduction
Analysis started | 2021-10-01 03:59:22.354937 |
---|---|
Analysis finished | 2021-10-01 03:59:44.050034 |
Duration | 21.7 seconds |
Software version | pandas-profiling v3.0.0 |
Download configuration | config.json |
listing_id
Real number (ℝ≥0)
Distinct | 3818 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5550111.419 |
Minimum | 3335 |
---|---|
Maximum | 10340165 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 10.6 MiB |
Quantile statistics
Minimum | 3335 |
---|---|
5-th percentile | 430453 |
Q1 | 3258213 |
median | 6118244.5 |
Q3 | 8035212 |
95-th percentile | 9666446 |
Maximum | 10340165 |
Range | 10336830 |
Interquartile range (IQR) | 4776999 |
Descriptive statistics
Standard deviation | 2962273.53 |
---|---|
Coefficient of variation (CV) | 0.5337322635 |
Kurtosis | -1.104322694 |
Mean | 5550111.419 |
Median Absolute Deviation (MAD) | 2287820 |
Skewness | -0.3096605895 |
Sum | 7.73446877 × 1012 |
Variance | 8.775064467 × 1012 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
241032 | 365 | < 0.1% |
9299824 | 365 | < 0.1% |
8597687 | 365 | < 0.1% |
2309250 | 365 | < 0.1% |
7420339 | 365 | < 0.1% |
1742425 | 365 | < 0.1% |
4559985 | 365 | < 0.1% |
6304139 | 365 | < 0.1% |
2610187 | 365 | < 0.1% |
8508341 | 365 | < 0.1% |
Other values (3808) | 1389920 |
Value | Count | Frequency (%) |
3335 | 365 | |
4291 | 365 | |
5682 | 365 | |
6606 | 365 | |
7369 | 365 | |
9419 | 365 | |
9460 | 365 | |
9531 | 365 | |
9534 | 365 | |
9596 | 365 |
Value | Count | Frequency (%) |
10340165 | 365 | |
10339145 | 365 | |
10339144 | 365 | |
10334184 | 365 | |
10332096 | 365 | |
10331249 | 365 | |
10319529 | 365 | |
10318171 | 365 | |
10310373 | 365 | |
10309898 | 365 |
Distinct | 365 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.6 MiB |
2016-01-04 | 3818 |
---|---|
2016-09-11 | 3818 |
2016-09-09 | 3818 |
2016-09-08 | 3818 |
2016-09-07 | 3818 |
Other values (360) |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 13935700 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2016-01-04 |
---|---|
2nd row | 2016-01-05 |
3rd row | 2016-01-06 |
4th row | 2016-01-07 |
5th row | 2016-01-08 |
Common Values
Value | Count | Frequency (%) |
2016-01-04 | 3818 | 0.3% |
2016-09-11 | 3818 | 0.3% |
2016-09-09 | 3818 | 0.3% |
2016-09-08 | 3818 | 0.3% |
2016-09-07 | 3818 | 0.3% |
2016-09-06 | 3818 | 0.3% |
2016-09-05 | 3818 | 0.3% |
2016-09-04 | 3818 | 0.3% |
2016-09-03 | 3818 | 0.3% |
2016-09-02 | 3818 | 0.3% |
Other values (355) | 1355390 |
Length
Value | Count | Frequency (%) |
2016-01-04 | 3818 | 0.3% |
2016-09-11 | 3818 | 0.3% |
2016-09-09 | 3818 | 0.3% |
2016-09-08 | 3818 | 0.3% |
2016-09-07 | 3818 | 0.3% |
2016-09-06 | 3818 | 0.3% |
2016-09-05 | 3818 | 0.3% |
2016-09-04 | 3818 | 0.3% |
2016-09-03 | 3818 | 0.3% |
2016-09-02 | 3818 | 0.3% |
Other values (355) | 1355390 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 3096398 | |
- | 2787140 | |
1 | 2596240 | |
2 | 2218258 | |
6 | 1637922 | |
3 | 320712 | 2.3% |
7 | 263442 | 1.9% |
5 | 255806 | 1.8% |
8 | 255806 | 1.8% |
4 | 251988 | 1.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 11148560 | |
Dash Punctuation | 2787140 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 3096398 | |
1 | 2596240 | |
2 | 2218258 | |
6 | 1637922 | |
3 | 320712 | 2.9% |
7 | 263442 | 2.4% |
5 | 255806 | 2.3% |
8 | 255806 | 2.3% |
4 | 251988 | 2.3% |
9 | 251988 | 2.3% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2787140 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 13935700 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 3096398 | |
- | 2787140 | |
1 | 2596240 | |
2 | 2218258 | |
6 | 1637922 | |
3 | 320712 | 2.3% |
7 | 263442 | 1.9% |
5 | 255806 | 1.8% |
8 | 255806 | 1.8% |
4 | 251988 | 1.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13935700 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 3096398 | |
- | 2787140 | |
1 | 2596240 | |
2 | 2218258 | |
6 | 1637922 | |
3 | 320712 | 2.3% |
7 | 263442 | 1.9% |
5 | 255806 | 1.8% |
8 | 255806 | 1.8% |
4 | 251988 | 1.8% |
available
Boolean
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 1.3 MiB |
True | |
---|---|
False |
Value | Count | Frequency (%) |
True | 934542 | |
False | 459028 |
Distinct | 669 |
---|---|
Distinct (%) | 0.1% |
Missing | 459028 |
Missing (%) | 32.9% |
Memory size | 10.6 MiB |
$150.00 | 36646 |
---|---|
$100.00 | 31755 |
$75.00 | 29820 |
$125.00 | 27538 |
$65.00 | 26415 |
Other values (664) |
Common Values
Value | Count | Frequency (%) |
$150.00 | 36646 | 2.6% |
$100.00 | 31755 | 2.3% |
$75.00 | 29820 | 2.1% |
$125.00 | 27538 | 2.0% |
$65.00 | 26415 | 1.9% |
$90.00 | 24942 | 1.8% |
$95.00 | 24327 | 1.7% |
$99.00 | 23629 | 1.7% |
$85.00 | 23455 | 1.7% |
$80.00 | 19817 | 1.4% |
Other values (659) | 666198 | |
(Missing) | 459028 |
Length
Value | Count | Frequency (%) |
150.00 | 36646 | 3.9% |
100.00 | 31755 | 3.4% |
75.00 | 29820 | 3.2% |
125.00 | 27538 | 2.9% |
65.00 | 26415 | 2.8% |
90.00 | 24942 | 2.7% |
95.00 | 24327 | 2.6% |
99.00 | 23629 | 2.5% |
85.00 | 23455 | 2.5% |
80.00 | 19817 | 2.1% |
Other values (659) | 666198 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 2298464 | |
$ | 934542 | |
. | 934542 | |
1 | 431978 | 7.1% |
5 | 427863 | 7.0% |
9 | 263820 | 4.3% |
2 | 216940 | 3.5% |
7 | 138000 | 2.3% |
8 | 128196 | 2.1% |
6 | 119875 | 2.0% |
Other values (3) | 231819 | 3.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 4256258 | |
Other Punctuation | 935239 | 15.3% |
Currency Symbol | 934542 | 15.3% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 2298464 | |
1 | 431978 | 10.1% |
5 | 427863 | 10.1% |
9 | 263820 | 6.2% |
2 | 216940 | 5.1% |
7 | 138000 | 3.2% |
8 | 128196 | 3.0% |
6 | 119875 | 2.8% |
4 | 119803 | 2.8% |
3 | 111319 | 2.6% |
Other Punctuation
Value | Count | Frequency (%) |
. | 934542 | |
, | 697 | 0.1% |
Currency Symbol
Value | Count | Frequency (%) |
$ | 934542 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 6126039 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 2298464 | |
$ | 934542 | |
. | 934542 | |
1 | 431978 | 7.1% |
5 | 427863 | 7.0% |
9 | 263820 | 4.3% |
2 | 216940 | 3.5% |
7 | 138000 | 2.3% |
8 | 128196 | 2.1% |
6 | 119875 | 2.0% |
Other values (3) | 231819 | 3.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 6126039 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 2298464 | |
$ | 934542 | |
. | 934542 | |
1 | 431978 | 7.1% |
5 | 427863 | 7.0% |
9 | 263820 | 4.3% |
2 | 216940 | 3.5% |
7 | 138000 | 2.3% |
8 | 128196 | 2.1% |
6 | 119875 | 2.0% |
Other values (3) | 231819 | 3.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
listing_id | date | available | price | |
---|---|---|---|---|
0 | 241032 | 2016-01-04 | t | $85.00 |
1 | 241032 | 2016-01-05 | t | $85.00 |
2 | 241032 | 2016-01-06 | f | NaN |
3 | 241032 | 2016-01-07 | f | NaN |
4 | 241032 | 2016-01-08 | f | NaN |
5 | 241032 | 2016-01-09 | f | NaN |
6 | 241032 | 2016-01-10 | f | NaN |
7 | 241032 | 2016-01-11 | f | NaN |
8 | 241032 | 2016-01-12 | f | NaN |
9 | 241032 | 2016-01-13 | t | $85.00 |
Last rows
listing_id | date | available | price | |
---|---|---|---|---|
1393560 | 10208623 | 2016-12-24 | f | NaN |
1393561 | 10208623 | 2016-12-25 | f | NaN |
1393562 | 10208623 | 2016-12-26 | f | NaN |
1393563 | 10208623 | 2016-12-27 | f | NaN |
1393564 | 10208623 | 2016-12-28 | f | NaN |
1393565 | 10208623 | 2016-12-29 | f | NaN |
1393566 | 10208623 | 2016-12-30 | f | NaN |
1393567 | 10208623 | 2016-12-31 | f | NaN |
1393568 | 10208623 | 2017-01-01 | f | NaN |
1393569 | 10208623 | 2017-01-02 | f | NaN |