Ecommerce Data Profile Report#

Pandas Profiling Report

Overview

Dataset statistics

Number of variables6
Number of observations656473
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory72.6 MiB
Average record size in memory116.0 B

Variable types

Numeric5
Categorical1

Alerts

ORDER_CREATION_DATE has a high cardinality: 600619 distinct valuesHigh cardinality
ORDER_PRICE is highly overall correlated with DISCOUNT and 1 other fieldsHigh correlation
DISCOUNT is highly overall correlated with ORDER_PRICE and 1 other fieldsHigh correlation
ORDER_PRICE_AFTER_DISCOUNT is highly overall correlated with ORDER_PRICE and 1 other fieldsHigh correlation
ORDER_CREATION_DATE is uniformly distributedUniform
ORDER_ID has unique valuesUnique
DISCOUNT has 328611 (50.1%) zerosZeros

Reproduction

Analysis started2023-04-06 15:15:38.565323
Analysis finished2023-04-06 15:15:47.996260
Duration9.43 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

ORDER_ID
Real number (ℝ)

Distinct656473
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2095147.4
Minimum132
Maximum4698794
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-04-06T18:15:48.068304image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum132
5-th percentile267792.6
Q1973787
median1948335
Q33157082
95-th percentile4325716.2
Maximum4698794
Range4698662
Interquartile range (IQR)2183295

Descriptive statistics

Standard deviation1291805.1
Coefficient of variation (CV)0.61657003
Kurtosis-1.0856383
Mean2095147.4
Median Absolute Deviation (MAD)1065057
Skewness0.26530588
Sum1.3754077 × 1012
Variance1.6687603 × 1012
MonotonicityNot monotonic
2023-04-06T18:15:48.208382image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2561005 1
 
< 0.1%
2144160 1
 
< 0.1%
1377808 1
 
< 0.1%
1377841 1
 
< 0.1%
1377846 1
 
< 0.1%
1377842 1
 
< 0.1%
1372715 1
 
< 0.1%
1372539 1
 
< 0.1%
1372581 1
 
< 0.1%
1373810 1
 
< 0.1%
Other values (656463) 656463
> 99.9%
ValueCountFrequency (%)
132 1
< 0.1%
159 1
< 0.1%
194 1
< 0.1%
212 1
< 0.1%
234 1
< 0.1%
262 1
< 0.1%
263 1
< 0.1%
282 1
< 0.1%
285 1
< 0.1%
290 1
< 0.1%
ValueCountFrequency (%)
4698794 1
< 0.1%
4698578 1
< 0.1%
4698437 1
< 0.1%
4698417 1
< 0.1%
4698415 1
< 0.1%
4698314 1
< 0.1%
4698300 1
< 0.1%
4698294 1
< 0.1%
4698264 1
< 0.1%
4698204 1
< 0.1%

MAIN_SYSTEM_ID
Real number (ℝ)

Distinct4821
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43319.027
Minimum115
Maximum156199
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-04-06T18:15:48.341385image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum115
5-th percentile5362
Q117189
median32203
Q362824
95-th percentile113801
Maximum156199
Range156084
Interquartile range (IQR)45635

Descriptive statistics

Standard deviation34589.094
Coefficient of variation (CV)0.79847347
Kurtosis0.74901653
Mean43319.027
Median Absolute Deviation (MAD)18457
Skewness1.1596619
Sum2.8437772 × 1010
Variance1.1964054 × 109
MonotonicityNot monotonic
2023-04-06T18:15:48.475809image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26004 1088
 
0.2%
20591 1073
 
0.2%
34956 1061
 
0.2%
40696 1058
 
0.2%
45995 1057
 
0.2%
20819 1009
 
0.2%
43865 919
 
0.1%
33971 917
 
0.1%
36935 898
 
0.1%
26188 876
 
0.1%
Other values (4811) 646517
98.5%
ValueCountFrequency (%)
115 53
 
< 0.1%
122 5
 
< 0.1%
193 15
 
< 0.1%
247 377
0.1%
248 46
 
< 0.1%
275 51
 
< 0.1%
296 4
 
< 0.1%
300 76
 
< 0.1%
306 417
0.1%
340 147
 
< 0.1%
ValueCountFrequency (%)
156199 98
< 0.1%
156063 22
 
< 0.1%
155983 22
 
< 0.1%
155952 31
 
< 0.1%
155936 23
 
< 0.1%
155917 36
 
< 0.1%
155894 81
< 0.1%
155803 32
 
< 0.1%
155673 14
 
< 0.1%
155667 73
< 0.1%

ORDER_PRICE
Real number (ℝ)

Distinct209964
Distinct (%)32.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2681.1221
Minimum0
Maximum109444.29
Zeros7
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-04-06T18:15:48.627707image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile246
Q1984.75
median1804.35
Q33354.75
95-th percentile7832.19
Maximum109444.29
Range109444.29
Interquartile range (IQR)2370

Descriptive statistics

Standard deviation2913.3652
Coefficient of variation (CV)1.0866216
Kurtosis35.128816
Mean2681.1221
Median Absolute Deviation (MAD)965.35
Skewness4.082226
Sum1.7600843 × 109
Variance8487697
MonotonicityNot monotonic
2023-04-06T18:15:48.759986image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
305 666
 
0.1%
213.5 568
 
0.1%
225 521
 
0.1%
370 467
 
0.1%
217 452
 
0.1%
222 427
 
0.1%
174 411
 
0.1%
375 402
 
0.1%
210 398
 
0.1%
208.75 379
 
0.1%
Other values (209954) 651782
99.3%
ValueCountFrequency (%)
0 7
< 0.1%
5.55 1
 
< 0.1%
7.75 1
 
< 0.1%
8.18 2
 
< 0.1%
8.5 1
 
< 0.1%
8.52 1
 
< 0.1%
10.5 1
 
< 0.1%
10.625 1
 
< 0.1%
10.84 1
 
< 0.1%
11 1
 
< 0.1%
ValueCountFrequency (%)
109444.29 1
< 0.1%
72223.45 1
< 0.1%
68004.5 1
< 0.1%
63193.75 1
< 0.1%
57956.5 1
< 0.1%
56419.62 1
< 0.1%
56245.3 1
< 0.1%
55547.75 1
< 0.1%
55323.5 1
< 0.1%
54911.25 1
< 0.1%

DISCOUNT
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct140779
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.176098
Minimum0
Maximum4499.9325
Zeros328611
Zeros (%)50.1%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-04-06T18:15:49.001537image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q337.585
95-th percentile113.282
Maximum4499.9325
Range4499.9325
Interquartile range (IQR)37.585

Descriptive statistics

Standard deviation50.718178
Coefficient of variation (CV)1.8662789
Kurtosis159.65226
Mean27.176098
Median Absolute Deviation (MAD)0
Skewness6.3589622
Sum17840375
Variance2572.3336
MonotonicityNot monotonic
2023-04-06T18:15:49.134727image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 328611
50.1%
0.1 3918
 
0.6%
20 256
 
< 0.1%
15 180
 
< 0.1%
30 162
 
< 0.1%
25 146
 
< 0.1%
45 117
 
< 0.1%
35 105
 
< 0.1%
40 97
 
< 0.1%
12 95
 
< 0.1%
Other values (140769) 322786
49.2%
ValueCountFrequency (%)
0 328611
50.1%
0.0002 1
 
< 0.1%
0.09 1
 
< 0.1%
0.1 3918
 
0.6%
0.11875 2
 
< 0.1%
0.12 3
 
< 0.1%
0.125 1
 
< 0.1%
0.13 1
 
< 0.1%
0.14 6
 
< 0.1%
0.14775 1
 
< 0.1%
ValueCountFrequency (%)
4499.9325 1
< 0.1%
2392.056 1
< 0.1%
1770 1
< 0.1%
1641.66 1
< 0.1%
1575 1
< 0.1%
1498.64 1
< 0.1%
1469.505 1
< 0.1%
1385.145 1
< 0.1%
1360.9605 1
< 0.1%
1343.23125 1
< 0.1%
Distinct382213
Distinct (%)58.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2652.7534
Minimum-24
Maximum107802.63
Zeros166
Zeros (%)< 0.1%
Negative2
Negative (%)< 0.1%
Memory size5.0 MiB
2023-04-06T18:15:49.270352image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum-24
5-th percentile238.25
Q1979.06125
median1791.5387
Q33322.33
95-th percentile7740.8396
Maximum107802.63
Range107826.63
Interquartile range (IQR)2343.2688

Descriptive statistics

Standard deviation2874.8466
Coefficient of variation (CV)1.0837218
Kurtosis34.953531
Mean2652.7534
Median Absolute Deviation (MAD)955.84875
Skewness4.0672273
Sum1.741461 × 109
Variance8264743.1
MonotonicityNot monotonic
2023-04-06T18:15:49.398647image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
305 509
 
0.1%
213.5 440
 
0.1%
225 396
 
0.1%
370 386
 
0.1%
222 372
 
0.1%
217 368
 
0.1%
174 342
 
0.1%
208.75 314
 
< 0.1%
210 314
 
< 0.1%
1475 304
 
< 0.1%
Other values (382203) 652728
99.4%
ValueCountFrequency (%)
-24 1
 
< 0.1%
-9 1
 
< 0.1%
0 166
< 0.1%
0.0864 1
 
< 0.1%
0.225 1
 
< 0.1%
0.34925 1
 
< 0.1%
0.4239 1
 
< 0.1%
0.5025 1
 
< 0.1%
0.8425 1
 
< 0.1%
1.33718735 1
 
< 0.1%
ValueCountFrequency (%)
107802.63 1
< 0.1%
72223.45 1
< 0.1%
68004.5 1
< 0.1%
61850.51875 1
< 0.1%
56878.11 1
< 0.1%
55807.452 1
< 0.1%
55677.4185 1
< 0.1%
55223.6 1
< 0.1%
54911.25 1
< 0.1%
54895.915 1
< 0.1%

ORDER_CREATION_DATE
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct600619
Distinct (%)91.5%
Missing0
Missing (%)0.0%
Memory size47.6 MiB
2022-08-05T15:36:39
 
7
2022-07-30T19:05:04
 
6
2022-03-06T20:18:24
 
6
2022-07-04T14:38:57
 
6
2022-10-20T18:48:58
 
6
Other values (600614)
656442 

Length

Max length19
Median length19
Mean length18.999918
Min length10

Characters and Unicode

Total characters12472933
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique553607 ?
Unique (%)84.3%

Sample

1st row2022-05-11T18:49:20
2nd row2021-04-05T12:33:12
3rd row2022-04-26T18:14:48
4th row2022-04-26T18:16:59
5th row2022-01-30T22:38:03

Common Values

ValueCountFrequency (%)
2022-08-05T15:36:39 7
 
< 0.1%
2022-07-30T19:05:04 6
 
< 0.1%
2022-03-06T20:18:24 6
 
< 0.1%
2022-07-04T14:38:57 6
 
< 0.1%
2022-10-20T18:48:58 6
 
< 0.1%
2022-08-10T20:23:31 6
 
< 0.1%
2022-09-29T19:51:46 6
 
< 0.1%
2022-03-20T19:48:20 5
 
< 0.1%
2022-09-23T17:39:29 5
 
< 0.1%
2022-08-17T20:21:23 5
 
< 0.1%
Other values (600609) 656415
> 99.9%

Length

2023-04-06T18:15:49.522280image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-08-05t15:36:39 7
 
< 0.1%
2022-03-06t20:18:24 6
 
< 0.1%
2022-07-04t14:38:57 6
 
< 0.1%
2022-10-20t18:48:58 6
 
< 0.1%
2022-08-10t20:23:31 6
 
< 0.1%
2022-09-29t19:51:46 6
 
< 0.1%
2022-07-30t19:05:04 6
 
< 0.1%
2022-08-30t19:10:03 5
 
< 0.1%
2022-06-18t20:33:43 5
 
< 0.1%
2022-03-02t16:26:33 5
 
< 0.1%
Other values (600609) 656415
> 99.9%

Most occurring characters

ValueCountFrequency (%)
2 2577947
20.7%
0 2080689
16.7%
1 1648014
13.2%
- 1312946
10.5%
: 1312934
10.5%
T 656467
 
5.3%
3 606919
 
4.9%
5 517020
 
4.1%
4 516163
 
4.1%
9 326684
 
2.6%
Other values (3) 917150
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9190586
73.7%
Dash Punctuation 1312946
 
10.5%
Other Punctuation 1312934
 
10.5%
Uppercase Letter 656467
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2577947
28.0%
0 2080689
22.6%
1 1648014
17.9%
3 606919
 
6.6%
5 517020
 
5.6%
4 516163
 
5.6%
9 326684
 
3.6%
8 312693
 
3.4%
7 303177
 
3.3%
6 301280
 
3.3%
Dash Punctuation
ValueCountFrequency (%)
- 1312946
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1312934
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 656467
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11816466
94.7%
Latin 656467
 
5.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 2577947
21.8%
0 2080689
17.6%
1 1648014
13.9%
- 1312946
11.1%
: 1312934
11.1%
3 606919
 
5.1%
5 517020
 
4.4%
4 516163
 
4.4%
9 326684
 
2.8%
8 312693
 
2.6%
Other values (2) 604457
 
5.1%
Latin
ValueCountFrequency (%)
T 656467
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12472933
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2577947
20.7%
0 2080689
16.7%
1 1648014
13.2%
- 1312946
10.5%
: 1312934
10.5%
T 656467
 
5.3%
3 606919
 
4.9%
5 517020
 
4.1%
4 516163
 
4.1%
9 326684
 
2.6%
Other values (3) 917150
 
7.4%

Correlations

2023-04-06T18:15:49.608171image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ORDER_IDMAIN_SYSTEM_IDORDER_PRICEDISCOUNTORDER_PRICE_AFTER_DISCOUNT
ORDER_ID1.0000.211-0.132-0.214-0.130
MAIN_SYSTEM_ID0.2111.000-0.103-0.129-0.102
ORDER_PRICE-0.132-0.1031.0000.5601.000
DISCOUNT-0.214-0.1290.5601.0000.550
ORDER_PRICE_AFTER_DISCOUNT-0.130-0.1021.0000.5501.000

Missing values

2023-04-06T18:15:47.005871image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-04-06T18:15:47.328147image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ORDER_IDMAIN_SYSTEM_IDORDER_PRICEDISCOUNTORDER_PRICE_AFTER_DISCOUNTORDER_CREATION_DATE
0256100540591415.2500000.000415.2500002022-05-11T18:49:20
1854370516362533.70000039.4442494.2560002021-04-05T12:33:12
224652254713568.8000000.00068.8000002022-04-26T18:14:48
32465240570312140.00000032.1002107.9000002022-04-26T18:16:59
41897557266952186.4166670.0002186.4166672022-01-30T22:38:03
51898569299235230.8166670.0005230.8166672022-01-31T06:23:59
61898970122683128.2500000.0003128.2500002022-01-31T11:27:46
71928233205212250.0000000.0002250.0000002022-02-05T17:10:54
819257306656315041.7500000.00015041.7500002022-02-05T08:00:30
91925729665634125.700000360.0973765.6030002022-02-05T08:00:30
ORDER_IDMAIN_SYSTEM_IDORDER_PRICEDISCOUNTORDER_PRICE_AFTER_DISCOUNTORDER_CREATION_DATE
6564632753109697652073.25031.098752042.151252022-06-10T22:39:21
656464273796322807305.0000.00000305.000002022-06-08T00:40:39
65646527379642280734.7500.0000034.750002022-06-08T00:40:39
6564662737962228071846.50026.148751820.351252022-06-08T00:40:39
65646727379611553284013.2500.000004013.250002022-06-08T00:40:19
656468273796981727847.2500.00000847.250002022-06-08T00:41:01
6564692737739631872034.50026.707502007.792502022-06-07T23:53:02
6564702737985339713051.7750.000003051.775002022-06-08T00:44:16
6564712764885131853368.20031.072503337.127502022-06-12T18:47:40
6564722770381175331815.00017.640001797.360002022-06-13T16:24:37