Ecommerce Data Profile Report#
Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 656473 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 72.6 MiB |
| Average record size in memory | 116.0 B |
Variable types
| Numeric | 5 |
|---|---|
| Categorical | 1 |
ORDER_CREATION_DATE has a high cardinality: 600619 distinct values | High cardinality |
ORDER_PRICE is highly overall correlated with DISCOUNT and 1 other fields | High correlation |
DISCOUNT is highly overall correlated with ORDER_PRICE and 1 other fields | High correlation |
ORDER_PRICE_AFTER_DISCOUNT is highly overall correlated with ORDER_PRICE and 1 other fields | High correlation |
ORDER_CREATION_DATE is uniformly distributed | Uniform |
ORDER_ID has unique values | Unique |
DISCOUNT has 328611 (50.1%) zeros | Zeros |
Reproduction
| Analysis started | 2023-04-06 15:15:38.565323 |
|---|---|
| Analysis finished | 2023-04-06 15:15:47.996260 |
| Duration | 9.43 seconds |
| Software version | ydata-profiling vv4.1.2 |
| Download configuration | config.json |
ORDER_ID
Real number (ℝ)
| Distinct | 656473 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2095147.4 |
| Minimum | 132 |
|---|---|
| Maximum | 4698794 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.0 MiB |
Quantile statistics
| Minimum | 132 |
|---|---|
| 5-th percentile | 267792.6 |
| Q1 | 973787 |
| median | 1948335 |
| Q3 | 3157082 |
| 95-th percentile | 4325716.2 |
| Maximum | 4698794 |
| Range | 4698662 |
| Interquartile range (IQR) | 2183295 |
Descriptive statistics
| Standard deviation | 1291805.1 |
|---|---|
| Coefficient of variation (CV) | 0.61657003 |
| Kurtosis | -1.0856383 |
| Mean | 2095147.4 |
| Median Absolute Deviation (MAD) | 1065057 |
| Skewness | 0.26530588 |
| Sum | 1.3754077 × 1012 |
| Variance | 1.6687603 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2561005 | 1 | < 0.1% |
| 2144160 | 1 | < 0.1% |
| 1377808 | 1 | < 0.1% |
| 1377841 | 1 | < 0.1% |
| 1377846 | 1 | < 0.1% |
| 1377842 | 1 | < 0.1% |
| 1372715 | 1 | < 0.1% |
| 1372539 | 1 | < 0.1% |
| 1372581 | 1 | < 0.1% |
| 1373810 | 1 | < 0.1% |
| Other values (656463) | 656463 |
| Value | Count | Frequency (%) |
| 132 | 1 | |
| 159 | 1 | |
| 194 | 1 | |
| 212 | 1 | |
| 234 | 1 | |
| 262 | 1 | |
| 263 | 1 | |
| 282 | 1 | |
| 285 | 1 | |
| 290 | 1 |
| Value | Count | Frequency (%) |
| 4698794 | 1 | |
| 4698578 | 1 | |
| 4698437 | 1 | |
| 4698417 | 1 | |
| 4698415 | 1 | |
| 4698314 | 1 | |
| 4698300 | 1 | |
| 4698294 | 1 | |
| 4698264 | 1 | |
| 4698204 | 1 |
MAIN_SYSTEM_ID
Real number (ℝ)
| Distinct | 4821 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43319.027 |
| Minimum | 115 |
|---|---|
| Maximum | 156199 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.0 MiB |
Quantile statistics
| Minimum | 115 |
|---|---|
| 5-th percentile | 5362 |
| Q1 | 17189 |
| median | 32203 |
| Q3 | 62824 |
| 95-th percentile | 113801 |
| Maximum | 156199 |
| Range | 156084 |
| Interquartile range (IQR) | 45635 |
Descriptive statistics
| Standard deviation | 34589.094 |
|---|---|
| Coefficient of variation (CV) | 0.79847347 |
| Kurtosis | 0.74901653 |
| Mean | 43319.027 |
| Median Absolute Deviation (MAD) | 18457 |
| Skewness | 1.1596619 |
| Sum | 2.8437772 × 1010 |
| Variance | 1.1964054 × 109 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 26004 | 1088 | 0.2% |
| 20591 | 1073 | 0.2% |
| 34956 | 1061 | 0.2% |
| 40696 | 1058 | 0.2% |
| 45995 | 1057 | 0.2% |
| 20819 | 1009 | 0.2% |
| 43865 | 919 | 0.1% |
| 33971 | 917 | 0.1% |
| 36935 | 898 | 0.1% |
| 26188 | 876 | 0.1% |
| Other values (4811) | 646517 |
| Value | Count | Frequency (%) |
| 115 | 53 | < 0.1% |
| 122 | 5 | < 0.1% |
| 193 | 15 | < 0.1% |
| 247 | 377 | |
| 248 | 46 | < 0.1% |
| 275 | 51 | < 0.1% |
| 296 | 4 | < 0.1% |
| 300 | 76 | < 0.1% |
| 306 | 417 | |
| 340 | 147 | < 0.1% |
| Value | Count | Frequency (%) |
| 156199 | 98 | |
| 156063 | 22 | < 0.1% |
| 155983 | 22 | < 0.1% |
| 155952 | 31 | < 0.1% |
| 155936 | 23 | < 0.1% |
| 155917 | 36 | < 0.1% |
| 155894 | 81 | |
| 155803 | 32 | < 0.1% |
| 155673 | 14 | < 0.1% |
| 155667 | 73 |
ORDER_PRICE
Real number (ℝ)
| Distinct | 209964 |
|---|---|
| Distinct (%) | 32.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2681.1221 |
| Minimum | 0 |
|---|---|
| Maximum | 109444.29 |
| Zeros | 7 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 246 |
| Q1 | 984.75 |
| median | 1804.35 |
| Q3 | 3354.75 |
| 95-th percentile | 7832.19 |
| Maximum | 109444.29 |
| Range | 109444.29 |
| Interquartile range (IQR) | 2370 |
Descriptive statistics
| Standard deviation | 2913.3652 |
|---|---|
| Coefficient of variation (CV) | 1.0866216 |
| Kurtosis | 35.128816 |
| Mean | 2681.1221 |
| Median Absolute Deviation (MAD) | 965.35 |
| Skewness | 4.082226 |
| Sum | 1.7600843 × 109 |
| Variance | 8487697 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 305 | 666 | 0.1% |
| 213.5 | 568 | 0.1% |
| 225 | 521 | 0.1% |
| 370 | 467 | 0.1% |
| 217 | 452 | 0.1% |
| 222 | 427 | 0.1% |
| 174 | 411 | 0.1% |
| 375 | 402 | 0.1% |
| 210 | 398 | 0.1% |
| 208.75 | 379 | 0.1% |
| Other values (209954) | 651782 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5.55 | 1 | < 0.1% |
| 7.75 | 1 | < 0.1% |
| 8.18 | 2 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 8.52 | 1 | < 0.1% |
| 10.5 | 1 | < 0.1% |
| 10.625 | 1 | < 0.1% |
| 10.84 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 109444.29 | 1 | |
| 72223.45 | 1 | |
| 68004.5 | 1 | |
| 63193.75 | 1 | |
| 57956.5 | 1 | |
| 56419.62 | 1 | |
| 56245.3 | 1 | |
| 55547.75 | 1 | |
| 55323.5 | 1 | |
| 54911.25 | 1 |
DISCOUNT
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 140779 |
|---|---|
| Distinct (%) | 21.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 27.176098 |
| Minimum | 0 |
|---|---|
| Maximum | 4499.9325 |
| Zeros | 328611 |
| Zeros (%) | 50.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 37.585 |
| 95-th percentile | 113.282 |
| Maximum | 4499.9325 |
| Range | 4499.9325 |
| Interquartile range (IQR) | 37.585 |
Descriptive statistics
| Standard deviation | 50.718178 |
|---|---|
| Coefficient of variation (CV) | 1.8662789 |
| Kurtosis | 159.65226 |
| Mean | 27.176098 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.3589622 |
| Sum | 17840375 |
| Variance | 2572.3336 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 328611 | |
| 0.1 | 3918 | 0.6% |
| 20 | 256 | < 0.1% |
| 15 | 180 | < 0.1% |
| 30 | 162 | < 0.1% |
| 25 | 146 | < 0.1% |
| 45 | 117 | < 0.1% |
| 35 | 105 | < 0.1% |
| 40 | 97 | < 0.1% |
| 12 | 95 | < 0.1% |
| Other values (140769) | 322786 |
| Value | Count | Frequency (%) |
| 0 | 328611 | |
| 0.0002 | 1 | < 0.1% |
| 0.09 | 1 | < 0.1% |
| 0.1 | 3918 | 0.6% |
| 0.11875 | 2 | < 0.1% |
| 0.12 | 3 | < 0.1% |
| 0.125 | 1 | < 0.1% |
| 0.13 | 1 | < 0.1% |
| 0.14 | 6 | < 0.1% |
| 0.14775 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 4499.9325 | 1 | |
| 2392.056 | 1 | |
| 1770 | 1 | |
| 1641.66 | 1 | |
| 1575 | 1 | |
| 1498.64 | 1 | |
| 1469.505 | 1 | |
| 1385.145 | 1 | |
| 1360.9605 | 1 | |
| 1343.23125 | 1 |
ORDER_PRICE_AFTER_DISCOUNT
Real number (ℝ)
| Distinct | 382213 |
|---|---|
| Distinct (%) | 58.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2652.7534 |
| Minimum | -24 |
|---|---|
| Maximum | 107802.63 |
| Zeros | 166 |
| Zeros (%) | < 0.1% |
| Negative | 2 |
| Negative (%) | < 0.1% |
| Memory size | 5.0 MiB |
Quantile statistics
| Minimum | -24 |
|---|---|
| 5-th percentile | 238.25 |
| Q1 | 979.06125 |
| median | 1791.5387 |
| Q3 | 3322.33 |
| 95-th percentile | 7740.8396 |
| Maximum | 107802.63 |
| Range | 107826.63 |
| Interquartile range (IQR) | 2343.2688 |
Descriptive statistics
| Standard deviation | 2874.8466 |
|---|---|
| Coefficient of variation (CV) | 1.0837218 |
| Kurtosis | 34.953531 |
| Mean | 2652.7534 |
| Median Absolute Deviation (MAD) | 955.84875 |
| Skewness | 4.0672273 |
| Sum | 1.741461 × 109 |
| Variance | 8264743.1 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 305 | 509 | 0.1% |
| 213.5 | 440 | 0.1% |
| 225 | 396 | 0.1% |
| 370 | 386 | 0.1% |
| 222 | 372 | 0.1% |
| 217 | 368 | 0.1% |
| 174 | 342 | 0.1% |
| 208.75 | 314 | < 0.1% |
| 210 | 314 | < 0.1% |
| 1475 | 304 | < 0.1% |
| Other values (382203) | 652728 |
| Value | Count | Frequency (%) |
| -24 | 1 | < 0.1% |
| -9 | 1 | < 0.1% |
| 0 | 166 | |
| 0.0864 | 1 | < 0.1% |
| 0.225 | 1 | < 0.1% |
| 0.34925 | 1 | < 0.1% |
| 0.4239 | 1 | < 0.1% |
| 0.5025 | 1 | < 0.1% |
| 0.8425 | 1 | < 0.1% |
| 1.33718735 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 107802.63 | 1 | |
| 72223.45 | 1 | |
| 68004.5 | 1 | |
| 61850.51875 | 1 | |
| 56878.11 | 1 | |
| 55807.452 | 1 | |
| 55677.4185 | 1 | |
| 55223.6 | 1 | |
| 54911.25 | 1 | |
| 54895.915 | 1 |
ORDER_CREATION_DATE
Categorical
HIGH CARDINALITY  UNIFORM 
| Distinct | 600619 |
|---|---|
| Distinct (%) | 91.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 47.6 MiB |
| 2022-08-05T15:36:39 | 7 |
|---|---|
| 2022-07-30T19:05:04 | 6 |
| 2022-03-06T20:18:24 | 6 |
| 2022-07-04T14:38:57 | 6 |
| 2022-10-20T18:48:58 | 6 |
| Other values (600614) |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 18.999918 |
| Min length | 10 |
Characters and Unicode
| Total characters | 12472933 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 553607 ? |
|---|---|
| Unique (%) | 84.3% |
Sample
| 1st row | 2022-05-11T18:49:20 |
|---|---|
| 2nd row | 2021-04-05T12:33:12 |
| 3rd row | 2022-04-26T18:14:48 |
| 4th row | 2022-04-26T18:16:59 |
| 5th row | 2022-01-30T22:38:03 |
Common Values
| Value | Count | Frequency (%) |
| 2022-08-05T15:36:39 | 7 | < 0.1% |
| 2022-07-30T19:05:04 | 6 | < 0.1% |
| 2022-03-06T20:18:24 | 6 | < 0.1% |
| 2022-07-04T14:38:57 | 6 | < 0.1% |
| 2022-10-20T18:48:58 | 6 | < 0.1% |
| 2022-08-10T20:23:31 | 6 | < 0.1% |
| 2022-09-29T19:51:46 | 6 | < 0.1% |
| 2022-03-20T19:48:20 | 5 | < 0.1% |
| 2022-09-23T17:39:29 | 5 | < 0.1% |
| 2022-08-17T20:21:23 | 5 | < 0.1% |
| Other values (600609) | 656415 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2022-08-05t15:36:39 | 7 | < 0.1% |
| 2022-03-06t20:18:24 | 6 | < 0.1% |
| 2022-07-04t14:38:57 | 6 | < 0.1% |
| 2022-10-20t18:48:58 | 6 | < 0.1% |
| 2022-08-10t20:23:31 | 6 | < 0.1% |
| 2022-09-29t19:51:46 | 6 | < 0.1% |
| 2022-07-30t19:05:04 | 6 | < 0.1% |
| 2022-08-30t19:10:03 | 5 | < 0.1% |
| 2022-06-18t20:33:43 | 5 | < 0.1% |
| 2022-03-02t16:26:33 | 5 | < 0.1% |
| Other values (600609) | 656415 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 2577947 | |
| 0 | 2080689 | |
| 1 | 1648014 | |
| - | 1312946 | |
| : | 1312934 | |
| T | 656467 | 5.3% |
| 3 | 606919 | 4.9% |
| 5 | 517020 | 4.1% |
| 4 | 516163 | 4.1% |
| 9 | 326684 | 2.6% |
| Other values (3) | 917150 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9190586 | |
| Dash Punctuation | 1312946 | 10.5% |
| Other Punctuation | 1312934 | 10.5% |
| Uppercase Letter | 656467 | 5.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 2577947 | |
| 0 | 2080689 | |
| 1 | 1648014 | |
| 3 | 606919 | 6.6% |
| 5 | 517020 | 5.6% |
| 4 | 516163 | 5.6% |
| 9 | 326684 | 3.6% |
| 8 | 312693 | 3.4% |
| 7 | 303177 | 3.3% |
| 6 | 301280 | 3.3% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1312946 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1312934 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 656467 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 11816466 | |
| Latin | 656467 | 5.3% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 2577947 | |
| 0 | 2080689 | |
| 1 | 1648014 | |
| - | 1312946 | |
| : | 1312934 | |
| 3 | 606919 | 5.1% |
| 5 | 517020 | 4.4% |
| 4 | 516163 | 4.4% |
| 9 | 326684 | 2.8% |
| 8 | 312693 | 2.6% |
| Other values (2) | 604457 | 5.1% |
Latin
| Value | Count | Frequency (%) |
| T | 656467 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12472933 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 2577947 | |
| 0 | 2080689 | |
| 1 | 1648014 | |
| - | 1312946 | |
| : | 1312934 | |
| T | 656467 | 5.3% |
| 3 | 606919 | 4.9% |
| 5 | 517020 | 4.1% |
| 4 | 516163 | 4.1% |
| 9 | 326684 | 2.6% |
| Other values (3) | 917150 | 7.4% |
| ORDER_ID | MAIN_SYSTEM_ID | ORDER_PRICE | DISCOUNT | ORDER_PRICE_AFTER_DISCOUNT | |
|---|---|---|---|---|---|
| ORDER_ID | 1.000 | 0.211 | -0.132 | -0.214 | -0.130 |
| MAIN_SYSTEM_ID | 0.211 | 1.000 | -0.103 | -0.129 | -0.102 |
| ORDER_PRICE | -0.132 | -0.103 | 1.000 | 0.560 | 1.000 |
| DISCOUNT | -0.214 | -0.129 | 0.560 | 1.000 | 0.550 |
| ORDER_PRICE_AFTER_DISCOUNT | -0.130 | -0.102 | 1.000 | 0.550 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| ORDER_ID | MAIN_SYSTEM_ID | ORDER_PRICE | DISCOUNT | ORDER_PRICE_AFTER_DISCOUNT | ORDER_CREATION_DATE | |
|---|---|---|---|---|---|---|
| 0 | 2561005 | 40591 | 415.250000 | 0.000 | 415.250000 | 2022-05-11T18:49:20 |
| 1 | 854370 | 51636 | 2533.700000 | 39.444 | 2494.256000 | 2021-04-05T12:33:12 |
| 2 | 2465225 | 47135 | 68.800000 | 0.000 | 68.800000 | 2022-04-26T18:14:48 |
| 3 | 2465240 | 57031 | 2140.000000 | 32.100 | 2107.900000 | 2022-04-26T18:16:59 |
| 4 | 1897557 | 26695 | 2186.416667 | 0.000 | 2186.416667 | 2022-01-30T22:38:03 |
| 5 | 1898569 | 29923 | 5230.816667 | 0.000 | 5230.816667 | 2022-01-31T06:23:59 |
| 6 | 1898970 | 12268 | 3128.250000 | 0.000 | 3128.250000 | 2022-01-31T11:27:46 |
| 7 | 1928233 | 20521 | 2250.000000 | 0.000 | 2250.000000 | 2022-02-05T17:10:54 |
| 8 | 1925730 | 66563 | 15041.750000 | 0.000 | 15041.750000 | 2022-02-05T08:00:30 |
| 9 | 1925729 | 66563 | 4125.700000 | 360.097 | 3765.603000 | 2022-02-05T08:00:30 |
| ORDER_ID | MAIN_SYSTEM_ID | ORDER_PRICE | DISCOUNT | ORDER_PRICE_AFTER_DISCOUNT | ORDER_CREATION_DATE | |
|---|---|---|---|---|---|---|
| 656463 | 2753109 | 69765 | 2073.250 | 31.09875 | 2042.15125 | 2022-06-10T22:39:21 |
| 656464 | 2737963 | 22807 | 305.000 | 0.00000 | 305.00000 | 2022-06-08T00:40:39 |
| 656465 | 2737964 | 22807 | 34.750 | 0.00000 | 34.75000 | 2022-06-08T00:40:39 |
| 656466 | 2737962 | 22807 | 1846.500 | 26.14875 | 1820.35125 | 2022-06-08T00:40:39 |
| 656467 | 2737961 | 155328 | 4013.250 | 0.00000 | 4013.25000 | 2022-06-08T00:40:19 |
| 656468 | 2737969 | 81727 | 847.250 | 0.00000 | 847.25000 | 2022-06-08T00:41:01 |
| 656469 | 2737739 | 63187 | 2034.500 | 26.70750 | 2007.79250 | 2022-06-07T23:53:02 |
| 656470 | 2737985 | 33971 | 3051.775 | 0.00000 | 3051.77500 | 2022-06-08T00:44:16 |
| 656471 | 2764885 | 13185 | 3368.200 | 31.07250 | 3337.12750 | 2022-06-12T18:47:40 |
| 656472 | 2770381 | 17533 | 1815.000 | 17.64000 | 1797.36000 | 2022-06-13T16:24:37 |