Overview

Dataset statistics

Number of variables7
Number of observations95289
Missing cells31749
Missing cells (%)4.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.1 MiB
Average record size in memory56.0 B

Variable types

Categorical2
Numeric5

Warnings

date has a high cardinality: 459 distinct values High cardinality
country has a high cardinality: 219 distinct values High cardinality
cumulative_total_cases is highly correlated with cumulative_total_deathsHigh correlation
cumulative_total_deaths is highly correlated with cumulative_total_casesHigh correlation
daily_new_cases has 6469 (6.8%) missing values Missing
cumulative_total_deaths has 6090 (6.4%) missing values Missing
daily_new_deaths has 19190 (20.1%) missing values Missing
country is uniformly distributed Uniform
cumulative_total_cases has 6301 (6.6%) zeros Zeros
daily_new_cases has 23482 (24.6%) zeros Zeros
active_cases has 10657 (11.2%) zeros Zeros
cumulative_total_deaths has 13326 (14.0%) zeros Zeros
daily_new_deaths has 34986 (36.7%) zeros Zeros

Reproduction

Analysis started2021-04-29 10:48:08.428851
Analysis finished2021-04-29 10:48:30.200240
Duration21.77 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

date
Categorical

HIGH CARDINALITY

Distinct459
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size744.6 KiB
2020-6-13
 
219
2021-3-10
 
219
2020-6-05
 
219
2020-9-05
 
219
2020-4-12
 
219
Other values (454)
94194 

Length

Max length10
Median length9
Mean length9.211440985
Min length9

Characters and Unicode

Total characters877749
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)< 0.1%

Sample

1st row2020-2-15
2nd row2020-2-16
3rd row2020-2-17
4th row2020-2-18
5th row2020-2-19
ValueCountFrequency (%)
2020-6-13219
 
0.2%
2021-3-10219
 
0.2%
2020-6-05219
 
0.2%
2020-9-05219
 
0.2%
2020-4-12219
 
0.2%
2021-2-06219
 
0.2%
2020-4-19219
 
0.2%
2020-10-11219
 
0.2%
2020-9-20219
 
0.2%
2020-11-13219
 
0.2%
Other values (449)93099
97.7%
2021-04-29T16:18:30.839795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-6-13219
 
0.2%
2021-3-10219
 
0.2%
2020-6-05219
 
0.2%
2020-9-05219
 
0.2%
2020-4-12219
 
0.2%
2021-2-06219
 
0.2%
2020-4-19219
 
0.2%
2020-10-11219
 
0.2%
2020-9-20219
 
0.2%
2020-11-13219
 
0.2%
Other values (449)93099
97.7%

Most occurring characters

ValueCountFrequency (%)
2247762
28.2%
0208985
23.8%
-190578
21.7%
1101415
11.6%
327380
 
3.1%
421246
 
2.4%
516208
 
1.8%
716208
 
1.8%
816208
 
1.8%
615989
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number687171
78.3%
Dash Punctuation190578
 
21.7%

Most frequent character per category

ValueCountFrequency (%)
2247762
36.1%
0208985
30.4%
1101415
14.8%
327380
 
4.0%
421246
 
3.1%
516208
 
2.4%
716208
 
2.4%
816208
 
2.4%
615989
 
2.3%
915770
 
2.3%
ValueCountFrequency (%)
-190578
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common877749
100.0%

Most frequent character per script

ValueCountFrequency (%)
2247762
28.2%
0208985
23.8%
-190578
21.7%
1101415
11.6%
327380
 
3.1%
421246
 
2.4%
516208
 
1.8%
716208
 
1.8%
816208
 
1.8%
615989
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII877749
100.0%

Most frequent character per block

ValueCountFrequency (%)
2247762
28.2%
0208985
23.8%
-190578
21.7%
1101415
11.6%
327380
 
3.1%
421246
 
2.4%
516208
 
1.8%
716208
 
1.8%
816208
 
1.8%
615989
 
1.8%

country
Categorical

HIGH CARDINALITY
UNIFORM

Distinct219
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size744.6 KiB
China
 
459
Democratic Republic Of The Congo
 
435
Curacao
 
435
Maldives
 
435
Nepal
 
435
Other values (214)
93090 

Length

Max length32
Median length8
Mean length9.231810597
Min length2

Characters and Unicode

Total characters879690
Distinct characters52
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAfghanistan
2nd rowAfghanistan
3rd rowAfghanistan
4th rowAfghanistan
5th rowAfghanistan
ValueCountFrequency (%)
China459
 
0.5%
Democratic Republic Of The Congo435
 
0.5%
Curacao435
 
0.5%
Maldives435
 
0.5%
Nepal435
 
0.5%
Qatar435
 
0.5%
Belize435
 
0.5%
Italy435
 
0.5%
Isle Of Man435
 
0.5%
Gabon435
 
0.5%
Other values (209)90915
95.4%
2021-04-29T16:18:31.688179image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and3915
 
3.0%
islands3915
 
3.0%
saint2610
 
2.0%
guinea1740
 
1.3%
republic1740
 
1.3%
china1329
 
1.0%
south1305
 
1.0%
of1305
 
1.0%
new1305
 
1.0%
congo870
 
0.7%
Other values (252)111795
84.8%

Most occurring characters

ValueCountFrequency (%)
a130524
14.8%
n73974
 
8.4%
i71799
 
8.2%
e61335
 
7.0%
r49155
 
5.6%
o42630
 
4.8%
36540
 
4.2%
l35670
 
4.1%
s32625
 
3.7%
t32190
 
3.7%
Other values (42)313248
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter710016
80.7%
Uppercase Letter133134
 
15.1%
Space Separator36540
 
4.2%

Most frequent character per category

ValueCountFrequency (%)
a130524
18.4%
n73974
10.4%
i71799
10.1%
e61335
8.6%
r49155
 
6.9%
o42630
 
6.0%
l35670
 
5.0%
s32625
 
4.6%
t32190
 
4.5%
u31755
 
4.5%
Other values (16)148359
20.9%
ValueCountFrequency (%)
S16965
12.7%
M12615
 
9.5%
A11745
 
8.8%
C11334
 
8.5%
B9570
 
7.2%
I8265
 
6.2%
G7830
 
5.9%
T6090
 
4.6%
N5655
 
4.2%
L5655
 
4.2%
Other values (15)37410
28.1%
ValueCountFrequency (%)
36540
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin843150
95.8%
Common36540
 
4.2%

Most frequent character per script

ValueCountFrequency (%)
a130524
15.5%
n73974
 
8.8%
i71799
 
8.5%
e61335
 
7.3%
r49155
 
5.8%
o42630
 
5.1%
l35670
 
4.2%
s32625
 
3.9%
t32190
 
3.8%
u31755
 
3.8%
Other values (41)281493
33.4%
ValueCountFrequency (%)
36540
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII879690
100.0%

Most frequent character per block

ValueCountFrequency (%)
a130524
14.8%
n73974
 
8.4%
i71799
 
8.2%
e61335
 
7.0%
r49155
 
5.6%
o42630
 
4.8%
36540
 
4.2%
l35670
 
4.1%
s32625
 
3.7%
t32190
 
3.7%
Other values (42)313248
35.6%

cumulative_total_cases
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct42100
Distinct (%)44.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean219233.893
Minimum0
Maximum32789653
Zeros6301
Zeros (%)6.6%
Memory size744.6 KiB
2021-04-29T16:18:32.069299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1166
median4184
Q355770
95-th percentile760028.2
Maximum32789653
Range32789653
Interquartile range (IQR)55604

Descriptive statistics

Standard deviation1324354.257
Coefficient of variation (CV)6.040828081
Kurtosis291.3701441
Mean219233.893
Median Absolute Deviation (MAD)4183
Skewness15.18149629
Sum2.089057843 × 1010
Variance1.753914198 × 1012
MonotocityNot monotonic
2021-04-29T16:18:32.410323image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06301
 
6.6%
11101
 
1.2%
3701
 
0.7%
13576
 
0.6%
10526
 
0.6%
18504
 
0.5%
4448
 
0.5%
2427
 
0.4%
11407
 
0.4%
27380
 
0.4%
Other values (42090)83918
88.1%
ValueCountFrequency (%)
06301
6.6%
11101
 
1.2%
2427
 
0.4%
3701
 
0.7%
4448
 
0.5%
ValueCountFrequency (%)
327896531
< 0.1%
327363731
< 0.1%
326692791
< 0.1%
326022241
< 0.1%
325369201
< 0.1%

daily_new_cases
Real number (ℝ)

MISSING
ZEROS

Distinct9027
Distinct (%)10.2%
Missing6469
Missing (%)6.8%
Infinite0
Infinite (%)0.0%
Mean1655.598086
Minimum-1417
Maximum349313
Zeros23482
Zeros (%)24.6%
Memory size744.6 KiB
2021-04-29T16:18:32.797391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-1417
5-th percentile0
Q10
median34
Q3483
95-th percentile6444
Maximum349313
Range350730
Interquartile range (IQR)483

Descriptive statistics

Standard deviation9271.846342
Coefficient of variation (CV)5.60030023
Kurtosis369.3821058
Mean1655.598086
Median Absolute Deviation (MAD)34
Skewness16.28283398
Sum147050222
Variance85967134.58
MonotocityNot monotonic
2021-04-29T16:18:33.154356image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
023482
24.6%
12838
 
3.0%
21824
 
1.9%
31434
 
1.5%
41198
 
1.3%
51059
 
1.1%
6963
 
1.0%
7832
 
0.9%
8794
 
0.8%
9670
 
0.7%
Other values (9017)53726
56.4%
(Missing)6469
 
6.8%
ValueCountFrequency (%)
-14171
< 0.1%
-7661
< 0.1%
-3221
< 0.1%
-2091
< 0.1%
-1101
< 0.1%
ValueCountFrequency (%)
3493131
< 0.1%
3451471
< 0.1%
3325031
< 0.1%
3158021
< 0.1%
3075701
< 0.1%

active_cases
Real number (ℝ)

ZEROS

Distinct27213
Distinct (%)28.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39663.68171
Minimum-826
Maximum9154882
Zeros10657
Zeros (%)11.2%
Memory size744.6 KiB
2021-04-29T16:18:33.785672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-826
5-th percentile0
Q124
median647
Q38158
95-th percentile109446.6
Maximum9154882
Range9155708
Interquartile range (IQR)8134

Descriptive statistics

Standard deviation332459.7629
Coefficient of variation (CV)8.381969312
Kurtosis450.6820369
Mean39663.68171
Median Absolute Deviation (MAD)647
Skewness19.90214437
Sum3779512566
Variance1.105294939 × 1011
MonotocityNot monotonic
2021-04-29T16:18:34.137418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010657
 
11.2%
12403
 
2.5%
21499
 
1.6%
31224
 
1.3%
4871
 
0.9%
6697
 
0.7%
5689
 
0.7%
7586
 
0.6%
8566
 
0.6%
12507
 
0.5%
Other values (27203)75590
79.3%
ValueCountFrequency (%)
-8261
< 0.1%
-7801
< 0.1%
-7271
< 0.1%
-7031
< 0.1%
-6671
< 0.1%
ValueCountFrequency (%)
91548821
< 0.1%
91115761
< 0.1%
90984941
< 0.1%
90930901
< 0.1%
90901051
< 0.1%

cumulative_total_deaths
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct14602
Distinct (%)16.4%
Missing6090
Missing (%)6.4%
Infinite0
Infinite (%)0.0%
Mean5782.699885
Minimum0
Maximum585880
Zeros13326
Zeros (%)14.0%
Memory size744.6 KiB
2021-04-29T16:18:34.501750image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median87
Q31151
95-th percentile25615.8
Maximum585880
Range585880
Interquartile range (IQR)1145

Descriptive statistics

Standard deviation28184.39561
Coefficient of variation (CV)4.873916366
Kurtosis166.6417973
Mean5782.699885
Median Absolute Deviation (MAD)87
Skewness11.14736912
Sum515811047
Variance794360155.7
MonotocityNot monotonic
2021-04-29T16:18:34.856029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
013326
 
14.0%
14225
 
4.4%
31682
 
1.8%
21647
 
1.7%
71248
 
1.3%
101057
 
1.1%
9851
 
0.9%
6785
 
0.8%
4645
 
0.7%
12635
 
0.7%
Other values (14592)63098
66.2%
(Missing)6090
 
6.4%
ValueCountFrequency (%)
013326
14.0%
14225
 
4.4%
21647
 
1.7%
31682
 
1.8%
4645
 
0.7%
ValueCountFrequency (%)
5858801
< 0.1%
5851381
< 0.1%
5843271
< 0.1%
5833931
< 0.1%
5824781
< 0.1%

daily_new_deaths
Real number (ℝ)

MISSING
ZEROS

Distinct1337
Distinct (%)1.8%
Missing19190
Missing (%)20.1%
Infinite0
Infinite (%)0.0%
Mean40.89822468
Minimum-31
Maximum4493
Zeros34986
Zeros (%)36.7%
Memory size744.6 KiB
2021-04-29T16:18:35.248027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-31
5-th percentile0
Q10
median1
Q311
95-th percentile184
Maximum4493
Range4524
Interquartile range (IQR)11

Descriptive statistics

Standard deviation182.5526912
Coefficient of variation (CV)4.463584731
Kurtosis169.0738813
Mean40.89822468
Median Absolute Deviation (MAD)1
Skewness10.92298739
Sum3112314
Variance33325.48507
MonotocityNot monotonic
2021-04-29T16:18:35.591756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
034986
36.7%
16213
 
6.5%
23608
 
3.8%
32628
 
2.8%
42134
 
2.2%
51697
 
1.8%
61377
 
1.4%
81167
 
1.2%
71153
 
1.2%
9966
 
1.0%
Other values (1327)20170
21.2%
(Missing)19190
20.1%
ValueCountFrequency (%)
-311
 
< 0.1%
-22
 
< 0.1%
-14
 
< 0.1%
034986
36.7%
16213
 
6.5%
ValueCountFrequency (%)
44931
< 0.1%
44421
< 0.1%
43891
< 0.1%
42451
< 0.1%
42111
< 0.1%

Interactions

2021-04-29T16:18:19.959363image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:20.433134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:20.804264image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:21.166707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:21.554545image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:21.976075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:22.525442image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:22.960077image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:23.350691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:23.718165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:24.102035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:24.462410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:24.829596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:25.268183image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:25.789678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:26.174738image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:26.547608image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:26.909829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:27.301028image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-29T16:18:27.645681image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-29T16:18:35.908025image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-29T16:18:36.364762image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-29T16:18:36.910838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-29T16:18:37.340076image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-04-29T16:18:28.265093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-29T16:18:28.878478image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-29T16:18:29.516926image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-29T16:18:29.840231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

datecountrycumulative_total_casesdaily_new_casesactive_casescumulative_total_deathsdaily_new_deaths
02020-2-15Afghanistan0.0NaN0.00.0NaN
12020-2-16Afghanistan0.0NaN0.00.0NaN
22020-2-17Afghanistan0.0NaN0.00.0NaN
32020-2-18Afghanistan0.0NaN0.00.0NaN
42020-2-19Afghanistan0.0NaN0.00.0NaN
52020-2-20Afghanistan0.0NaN0.00.0NaN
62020-2-21Afghanistan0.0NaN0.00.0NaN
72020-2-22Afghanistan0.0NaN0.00.0NaN
82020-2-23Afghanistan0.0NaN0.00.0NaN
92020-2-24Afghanistan1.0NaN1.00.0NaN

Last rows

datecountrycumulative_total_casesdaily_new_casesactive_casescumulative_total_deathsdaily_new_deaths
952792021-4-15Zimbabwe37422.053.0911.01550.02.0
952802021-4-16Zimbabwe37534.0112.01002.01551.01.0
952812021-4-17Zimbabwe37699.0165.01142.01552.01.0
952822021-4-18Zimbabwe37751.052.01179.01553.01.0
952832021-4-19Zimbabwe37859.0108.01275.01553.00.0
952842021-4-20Zimbabwe37875.016.01263.01554.01.0
952852021-4-21Zimbabwe37980.0105.01360.01555.01.0
952862021-4-22Zimbabwe38018.038.01390.01555.00.0
952872021-4-23Zimbabwe38045.027.01395.01556.01.0
952882021-4-24Zimbabwe38064.019.01407.01556.00.0