Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 95289 |
Missing cells | 31749 |
Missing cells (%) | 4.8% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 5.1 MiB |
Average record size in memory | 56.0 B |
Variable types
Categorical | 2 |
---|---|
Numeric | 5 |
date has a high cardinality: 459 distinct values | High cardinality |
country has a high cardinality: 219 distinct values | High cardinality |
cumulative_total_cases is highly correlated with cumulative_total_deaths | High correlation |
cumulative_total_deaths is highly correlated with cumulative_total_cases | High correlation |
daily_new_cases has 6469 (6.8%) missing values | Missing |
cumulative_total_deaths has 6090 (6.4%) missing values | Missing |
daily_new_deaths has 19190 (20.1%) missing values | Missing |
country is uniformly distributed | Uniform |
cumulative_total_cases has 6301 (6.6%) zeros | Zeros |
daily_new_cases has 23482 (24.6%) zeros | Zeros |
active_cases has 10657 (11.2%) zeros | Zeros |
cumulative_total_deaths has 13326 (14.0%) zeros | Zeros |
daily_new_deaths has 34986 (36.7%) zeros | Zeros |
Reproduction
Analysis started | 2021-04-29 10:48:08.428851 |
---|---|
Analysis finished | 2021-04-29 10:48:30.200240 |
Duration | 21.77 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
Distinct | 459 |
---|---|
Distinct (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 744.6 KiB |
2020-6-13 | 219 |
---|---|
2021-3-10 | 219 |
2020-6-05 | 219 |
2020-9-05 | 219 |
2020-4-12 | 219 |
Other values (454) |
Length
Max length | 10 |
---|---|
Median length | 9 |
Mean length | 9.211440985 |
Min length | 9 |
Characters and Unicode
Total characters | 877749 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 24 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 2020-2-15 |
---|---|
2nd row | 2020-2-16 |
3rd row | 2020-2-17 |
4th row | 2020-2-18 |
5th row | 2020-2-19 |
Value | Count | Frequency (%) |
2020-6-13 | 219 | 0.2% |
2021-3-10 | 219 | 0.2% |
2020-6-05 | 219 | 0.2% |
2020-9-05 | 219 | 0.2% |
2020-4-12 | 219 | 0.2% |
2021-2-06 | 219 | 0.2% |
2020-4-19 | 219 | 0.2% |
2020-10-11 | 219 | 0.2% |
2020-9-20 | 219 | 0.2% |
2020-11-13 | 219 | 0.2% |
Other values (449) | 93099 |
Value | Count | Frequency (%) |
2020-6-13 | 219 | 0.2% |
2021-3-10 | 219 | 0.2% |
2020-6-05 | 219 | 0.2% |
2020-9-05 | 219 | 0.2% |
2020-4-12 | 219 | 0.2% |
2021-2-06 | 219 | 0.2% |
2020-4-19 | 219 | 0.2% |
2020-10-11 | 219 | 0.2% |
2020-9-20 | 219 | 0.2% |
2020-11-13 | 219 | 0.2% |
Other values (449) | 93099 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 247762 | |
0 | 208985 | |
- | 190578 | |
1 | 101415 | |
3 | 27380 | 3.1% |
4 | 21246 | 2.4% |
5 | 16208 | 1.8% |
7 | 16208 | 1.8% |
8 | 16208 | 1.8% |
6 | 15989 | 1.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 687171 | |
Dash Punctuation | 190578 | 21.7% |
Most frequent character per category
Value | Count | Frequency (%) |
2 | 247762 | |
0 | 208985 | |
1 | 101415 | |
3 | 27380 | 4.0% |
4 | 21246 | 3.1% |
5 | 16208 | 2.4% |
7 | 16208 | 2.4% |
8 | 16208 | 2.4% |
6 | 15989 | 2.3% |
9 | 15770 | 2.3% |
Value | Count | Frequency (%) |
- | 190578 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 877749 |
Most frequent character per script
Value | Count | Frequency (%) |
2 | 247762 | |
0 | 208985 | |
- | 190578 | |
1 | 101415 | |
3 | 27380 | 3.1% |
4 | 21246 | 2.4% |
5 | 16208 | 1.8% |
7 | 16208 | 1.8% |
8 | 16208 | 1.8% |
6 | 15989 | 1.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 877749 |
Most frequent character per block
Value | Count | Frequency (%) |
2 | 247762 | |
0 | 208985 | |
- | 190578 | |
1 | 101415 | |
3 | 27380 | 3.1% |
4 | 21246 | 2.4% |
5 | 16208 | 1.8% |
7 | 16208 | 1.8% |
8 | 16208 | 1.8% |
6 | 15989 | 1.8% |
Distinct | 219 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 744.6 KiB |
China | 459 |
---|---|
Democratic Republic Of The Congo | 435 |
Curacao | 435 |
Maldives | 435 |
Nepal | 435 |
Other values (214) |
Length
Max length | 32 |
---|---|
Median length | 8 |
Mean length | 9.231810597 |
Min length | 2 |
Characters and Unicode
Total characters | 879690 |
---|---|
Distinct characters | 52 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Afghanistan |
---|---|
2nd row | Afghanistan |
3rd row | Afghanistan |
4th row | Afghanistan |
5th row | Afghanistan |
Value | Count | Frequency (%) |
China | 459 | 0.5% |
Democratic Republic Of The Congo | 435 | 0.5% |
Curacao | 435 | 0.5% |
Maldives | 435 | 0.5% |
Nepal | 435 | 0.5% |
Qatar | 435 | 0.5% |
Belize | 435 | 0.5% |
Italy | 435 | 0.5% |
Isle Of Man | 435 | 0.5% |
Gabon | 435 | 0.5% |
Other values (209) | 90915 |
Value | Count | Frequency (%) |
and | 3915 | 3.0% |
islands | 3915 | 3.0% |
saint | 2610 | 2.0% |
guinea | 1740 | 1.3% |
republic | 1740 | 1.3% |
china | 1329 | 1.0% |
south | 1305 | 1.0% |
of | 1305 | 1.0% |
new | 1305 | 1.0% |
congo | 870 | 0.7% |
Other values (252) | 111795 |
Most occurring characters
Value | Count | Frequency (%) |
a | 130524 | |
n | 73974 | 8.4% |
i | 71799 | 8.2% |
e | 61335 | 7.0% |
r | 49155 | 5.6% |
o | 42630 | 4.8% |
36540 | 4.2% | |
l | 35670 | 4.1% |
s | 32625 | 3.7% |
t | 32190 | 3.7% |
Other values (42) | 313248 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 710016 | |
Uppercase Letter | 133134 | 15.1% |
Space Separator | 36540 | 4.2% |
Most frequent character per category
Value | Count | Frequency (%) |
a | 130524 | |
n | 73974 | |
i | 71799 | |
e | 61335 | |
r | 49155 | 6.9% |
o | 42630 | 6.0% |
l | 35670 | 5.0% |
s | 32625 | 4.6% |
t | 32190 | 4.5% |
u | 31755 | 4.5% |
Other values (16) | 148359 |
Value | Count | Frequency (%) |
S | 16965 | |
M | 12615 | 9.5% |
A | 11745 | 8.8% |
C | 11334 | 8.5% |
B | 9570 | 7.2% |
I | 8265 | 6.2% |
G | 7830 | 5.9% |
T | 6090 | 4.6% |
N | 5655 | 4.2% |
L | 5655 | 4.2% |
Other values (15) | 37410 |
Value | Count | Frequency (%) |
36540 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 843150 | |
Common | 36540 | 4.2% |
Most frequent character per script
Value | Count | Frequency (%) |
a | 130524 | |
n | 73974 | 8.8% |
i | 71799 | 8.5% |
e | 61335 | 7.3% |
r | 49155 | 5.8% |
o | 42630 | 5.1% |
l | 35670 | 4.2% |
s | 32625 | 3.9% |
t | 32190 | 3.8% |
u | 31755 | 3.8% |
Other values (41) | 281493 |
Value | Count | Frequency (%) |
36540 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 879690 |
Most frequent character per block
Value | Count | Frequency (%) |
a | 130524 | |
n | 73974 | 8.4% |
i | 71799 | 8.2% |
e | 61335 | 7.0% |
r | 49155 | 5.6% |
o | 42630 | 4.8% |
36540 | 4.2% | |
l | 35670 | 4.1% |
s | 32625 | 3.7% |
t | 32190 | 3.7% |
Other values (42) | 313248 |
Distinct | 42100 |
---|---|
Distinct (%) | 44.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 219233.893 |
---|---|
Minimum | 0 |
Maximum | 32789653 |
Zeros | 6301 |
Zeros (%) | 6.6% |
Memory size | 744.6 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 166 |
median | 4184 |
Q3 | 55770 |
95-th percentile | 760028.2 |
Maximum | 32789653 |
Range | 32789653 |
Interquartile range (IQR) | 55604 |
Descriptive statistics
Standard deviation | 1324354.257 |
---|---|
Coefficient of variation (CV) | 6.040828081 |
Kurtosis | 291.3701441 |
Mean | 219233.893 |
Median Absolute Deviation (MAD) | 4183 |
Skewness | 15.18149629 |
Sum | 2.089057843 × 1010 |
Variance | 1.753914198 × 1012 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
0 | 6301 | 6.6% |
1 | 1101 | 1.2% |
3 | 701 | 0.7% |
13 | 576 | 0.6% |
10 | 526 | 0.6% |
18 | 504 | 0.5% |
4 | 448 | 0.5% |
2 | 427 | 0.4% |
11 | 407 | 0.4% |
27 | 380 | 0.4% |
Other values (42090) | 83918 |
Value | Count | Frequency (%) |
0 | 6301 | |
1 | 1101 | 1.2% |
2 | 427 | 0.4% |
3 | 701 | 0.7% |
4 | 448 | 0.5% |
Value | Count | Frequency (%) |
32789653 | 1 | |
32736373 | 1 | |
32669279 | 1 | |
32602224 | 1 | |
32536920 | 1 |
Distinct | 9027 |
---|---|
Distinct (%) | 10.2% |
Missing | 6469 |
Missing (%) | 6.8% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1655.598086 |
---|---|
Minimum | -1417 |
Maximum | 349313 |
Zeros | 23482 |
Zeros (%) | 24.6% |
Memory size | 744.6 KiB |
Quantile statistics
Minimum | -1417 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 34 |
Q3 | 483 |
95-th percentile | 6444 |
Maximum | 349313 |
Range | 350730 |
Interquartile range (IQR) | 483 |
Descriptive statistics
Standard deviation | 9271.846342 |
---|---|
Coefficient of variation (CV) | 5.60030023 |
Kurtosis | 369.3821058 |
Mean | 1655.598086 |
Median Absolute Deviation (MAD) | 34 |
Skewness | 16.28283398 |
Sum | 147050222 |
Variance | 85967134.58 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
0 | 23482 | |
1 | 2838 | 3.0% |
2 | 1824 | 1.9% |
3 | 1434 | 1.5% |
4 | 1198 | 1.3% |
5 | 1059 | 1.1% |
6 | 963 | 1.0% |
7 | 832 | 0.9% |
8 | 794 | 0.8% |
9 | 670 | 0.7% |
Other values (9017) | 53726 | |
(Missing) | 6469 | 6.8% |
Value | Count | Frequency (%) |
-1417 | 1 | |
-766 | 1 | |
-322 | 1 | |
-209 | 1 | |
-110 | 1 |
Value | Count | Frequency (%) |
349313 | 1 | |
345147 | 1 | |
332503 | 1 | |
315802 | 1 | |
307570 | 1 |
Distinct | 27213 |
---|---|
Distinct (%) | 28.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 39663.68171 |
---|---|
Minimum | -826 |
Maximum | 9154882 |
Zeros | 10657 |
Zeros (%) | 11.2% |
Memory size | 744.6 KiB |
Quantile statistics
Minimum | -826 |
---|---|
5-th percentile | 0 |
Q1 | 24 |
median | 647 |
Q3 | 8158 |
95-th percentile | 109446.6 |
Maximum | 9154882 |
Range | 9155708 |
Interquartile range (IQR) | 8134 |
Descriptive statistics
Standard deviation | 332459.7629 |
---|---|
Coefficient of variation (CV) | 8.381969312 |
Kurtosis | 450.6820369 |
Mean | 39663.68171 |
Median Absolute Deviation (MAD) | 647 |
Skewness | 19.90214437 |
Sum | 3779512566 |
Variance | 1.105294939 × 1011 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
0 | 10657 | 11.2% |
1 | 2403 | 2.5% |
2 | 1499 | 1.6% |
3 | 1224 | 1.3% |
4 | 871 | 0.9% |
6 | 697 | 0.7% |
5 | 689 | 0.7% |
7 | 586 | 0.6% |
8 | 566 | 0.6% |
12 | 507 | 0.5% |
Other values (27203) | 75590 |
Value | Count | Frequency (%) |
-826 | 1 | |
-780 | 1 | |
-727 | 1 | |
-703 | 1 | |
-667 | 1 |
Value | Count | Frequency (%) |
9154882 | 1 | |
9111576 | 1 | |
9098494 | 1 | |
9093090 | 1 | |
9090105 | 1 |
Distinct | 14602 |
---|---|
Distinct (%) | 16.4% |
Missing | 6090 |
Missing (%) | 6.4% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5782.699885 |
---|---|
Minimum | 0 |
Maximum | 585880 |
Zeros | 13326 |
Zeros (%) | 14.0% |
Memory size | 744.6 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 6 |
median | 87 |
Q3 | 1151 |
95-th percentile | 25615.8 |
Maximum | 585880 |
Range | 585880 |
Interquartile range (IQR) | 1145 |
Descriptive statistics
Standard deviation | 28184.39561 |
---|---|
Coefficient of variation (CV) | 4.873916366 |
Kurtosis | 166.6417973 |
Mean | 5782.699885 |
Median Absolute Deviation (MAD) | 87 |
Skewness | 11.14736912 |
Sum | 515811047 |
Variance | 794360155.7 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
0 | 13326 | 14.0% |
1 | 4225 | 4.4% |
3 | 1682 | 1.8% |
2 | 1647 | 1.7% |
7 | 1248 | 1.3% |
10 | 1057 | 1.1% |
9 | 851 | 0.9% |
6 | 785 | 0.8% |
4 | 645 | 0.7% |
12 | 635 | 0.7% |
Other values (14592) | 63098 | |
(Missing) | 6090 | 6.4% |
Value | Count | Frequency (%) |
0 | 13326 | |
1 | 4225 | 4.4% |
2 | 1647 | 1.7% |
3 | 1682 | 1.8% |
4 | 645 | 0.7% |
Value | Count | Frequency (%) |
585880 | 1 | |
585138 | 1 | |
584327 | 1 | |
583393 | 1 | |
582478 | 1 |
Distinct | 1337 |
---|---|
Distinct (%) | 1.8% |
Missing | 19190 |
Missing (%) | 20.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 40.89822468 |
---|---|
Minimum | -31 |
Maximum | 4493 |
Zeros | 34986 |
Zeros (%) | 36.7% |
Memory size | 744.6 KiB |
Quantile statistics
Minimum | -31 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 1 |
Q3 | 11 |
95-th percentile | 184 |
Maximum | 4493 |
Range | 4524 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 182.5526912 |
---|---|
Coefficient of variation (CV) | 4.463584731 |
Kurtosis | 169.0738813 |
Mean | 40.89822468 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 10.92298739 |
Sum | 3112314 |
Variance | 33325.48507 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
0 | 34986 | |
1 | 6213 | 6.5% |
2 | 3608 | 3.8% |
3 | 2628 | 2.8% |
4 | 2134 | 2.2% |
5 | 1697 | 1.8% |
6 | 1377 | 1.4% |
8 | 1167 | 1.2% |
7 | 1153 | 1.2% |
9 | 966 | 1.0% |
Other values (1327) | 20170 | |
(Missing) | 19190 |
Value | Count | Frequency (%) |
-31 | 1 | < 0.1% |
-2 | 2 | < 0.1% |
-1 | 4 | < 0.1% |
0 | 34986 | |
1 | 6213 | 6.5% |
Value | Count | Frequency (%) |
4493 | 1 | |
4442 | 1 | |
4389 | 1 | |
4245 | 1 | |
4211 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
date | country | cumulative_total_cases | daily_new_cases | active_cases | cumulative_total_deaths | daily_new_deaths | |
---|---|---|---|---|---|---|---|
0 | 2020-2-15 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
1 | 2020-2-16 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
2 | 2020-2-17 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
3 | 2020-2-18 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
4 | 2020-2-19 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
5 | 2020-2-20 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
6 | 2020-2-21 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
7 | 2020-2-22 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
8 | 2020-2-23 | Afghanistan | 0.0 | NaN | 0.0 | 0.0 | NaN |
9 | 2020-2-24 | Afghanistan | 1.0 | NaN | 1.0 | 0.0 | NaN |
Last rows
date | country | cumulative_total_cases | daily_new_cases | active_cases | cumulative_total_deaths | daily_new_deaths | |
---|---|---|---|---|---|---|---|
95279 | 2021-4-15 | Zimbabwe | 37422.0 | 53.0 | 911.0 | 1550.0 | 2.0 |
95280 | 2021-4-16 | Zimbabwe | 37534.0 | 112.0 | 1002.0 | 1551.0 | 1.0 |
95281 | 2021-4-17 | Zimbabwe | 37699.0 | 165.0 | 1142.0 | 1552.0 | 1.0 |
95282 | 2021-4-18 | Zimbabwe | 37751.0 | 52.0 | 1179.0 | 1553.0 | 1.0 |
95283 | 2021-4-19 | Zimbabwe | 37859.0 | 108.0 | 1275.0 | 1553.0 | 0.0 |
95284 | 2021-4-20 | Zimbabwe | 37875.0 | 16.0 | 1263.0 | 1554.0 | 1.0 |
95285 | 2021-4-21 | Zimbabwe | 37980.0 | 105.0 | 1360.0 | 1555.0 | 1.0 |
95286 | 2021-4-22 | Zimbabwe | 38018.0 | 38.0 | 1390.0 | 1555.0 | 0.0 |
95287 | 2021-4-23 | Zimbabwe | 38045.0 | 27.0 | 1395.0 | 1556.0 | 1.0 |
95288 | 2021-4-24 | Zimbabwe | 38064.0 | 19.0 | 1407.0 | 1556.0 | 0.0 |