Test Hypothesis

7. Test Hypothesis#

Before anything else, you must remeber that the objective of testing hypothesis is to force us to clearly state out beliefs (null hypothesis) and try our best to prove we are wrong (a.k.a reject the null hypothesis). So, in very simple terms, we are using evidence to try refute our own beliefs because we want only the strongest of them to last. After all, if you are your most delligent critic to yourself and you are a good critic, you are unlikely to hold misconceptions for long.

I choose to remove from the Loans dataset all observations where retailers never spent their funds. I will dismiss them as discovery of the product rather than intention to use it. Note that all positive observations (defaults) are still in the dataset after the filter.

Show code cell content Hide code cell content

import pandas as pd
import numpy as np


loans_df = (
    pd.read_excel("../../data/Loans_Data.xlsx")
    .assign(
        MAIN_SYSTEM_ID=lambda x: x["MAIN_SYSTEM_ID"].astype("int64"),
        LOAN_ID=lambda x: x["LOAN_ID"].astype("int64"),
        LOAN_ISSUANCE_DATE=lambda x: x["LOAN_ISSUANCE_DATE"].astype("<M8[ns]"),
        LOAN_AMOUNT=lambda x: x["LOAN_AMOUNT"].astype("float64"),
        TOTAL_INITIAL_AMOUNT=lambda x: x["TOTAL_INITIAL_AMOUNT"].astype("float64"),
        INITIAL_DATE=lambda x: x["INITIAL_DATE"].astype("<M8[ns]"),
    )
    .query("SPENT > 0")
    .sort_values("LOAN_ISSUANCE_DATE")
)
loans_df.info()
loans_df.value_counts("PAYMENT_STATUS")

<class 'pandas.core.frame.DataFrame'>
Int64Index: 57621 entries, 15567 to 68798
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 INDEX                    57621 non-null  int64         
 MAIN_SYSTEM_ID           57621 non-null  int64         
 RETAILER_ID              57621 non-null  int64         
 LOAN_ID                  57621 non-null  int64         
 LOAN_ISSUANCE_DATE       57621 non-null  datetime64[ns]
 LOAN_AMOUNT              57621 non-null  float64       
 INITIAL_COST             57621 non-null  int64         
 TOTAL_INITIAL_AMOUNT     57621 non-null  float64       
 INITIAL_DATE             57621 non-null  datetime64[ns]
 LOAN_PAYMENT_DATE        57621 non-null  datetime64[ns]
PAYMENT_AMOUNT           57621 non-null  float64       
FIRST_TRAIL_DELAYS       57621 non-null  int64         
FIRST_TRIAL_BALANCE      57621 non-null  float64       
SPENT                    57621 non-null  float64       
FINAL_COST               57621 non-null  int64         
TOTAL_FINAL_AMOUNT       57621 non-null  float64       
REPAYMENT_ID             57621 non-null  int64         
REPAYMENT_AMOUNT         57621 non-null  float64       
REPAYMENT_UPDATED        57621 non-null  datetime64[ns]
CUMMULATIVE_OUTSTANDING  57621 non-null  float64       
PAYMENT_STATUS           57621 non-null  object        
dtypes: datetime64[ns](4), float64(8), int64(8), object(1)
memory usage: 9.7+ MB

PAYMENT_STATUS
Paid             57612
Unpaid               6
Partialy paid        3
dtype: int64

ecommerce_df = (
    pd.read_csv(
        "../../data/Ecommerce_orders_Data.csv",
        header=0,
        dtype={
            "ORDER_ID": np.dtype("int64"),
            "MAIN_SYSTEM_ID": np.dtype("int64"),
            "ORDER_PRICE": np.dtype("float64"),
            "DISCOUNT": np.dtype("float64"),
            "ORDER_PRICE_AFTER_DISCOUNT": np.dtype("float64"),
            "ORDER_CREATION_DATE": np.dtype("O"),
        },
    )
    .assign(
        ORDER_CREATION_DATE=lambda x: pd.to_datetime(
            x["ORDER_CREATION_DATE"], infer_datetime_format=True
        ),
    )
    .drop(["DISCOUNT", "ORDER_PRICE_AFTER_DISCOUNT"], axis=1)
    .sort_values("ORDER_CREATION_DATE")
)
ecommerce_df.info()

Compute the rolling mean of amount retailer spent in Ecommerce over 7 and 30 days.

ecommerce_df = (
    ecommerce_df.assign(
        ROLLINGMEAN_7DAYS__ORDER_PRICE__=(
            lambda x: x.groupby("MAIN_SYSTEM_ID")
            .rolling("7D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
            .mean()
            .reset_index(drop=True)
        ),
        ROLLINGMEAN_30DAYS__ORDER_PRICE__=(
            lambda x: x.groupby("MAIN_SYSTEM_ID")
            .rolling("30D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
            .mean()
            .reset_index(drop=True)
        ),
        ROLLINGSTDDEV_7DAYS__ORDER_PRICE__=(
            lambda x: x.groupby("MAIN_SYSTEM_ID")
            .rolling("7D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
            .std()
            .reset_index(drop=True)
        ),
        ROLLINGSTDDEV_30DAYS__ORDER_PRICE__=(
            lambda x: x.groupby("MAIN_SYSTEM_ID")
            .rolling("30D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
            .std()
            .reset_index(drop=True)
        ),
    )[
        [
            "MAIN_SYSTEM_ID",
            "ORDER_CREATION_DATE",
            "ORDER_PRICE",
            "ROLLINGMEAN_7DAYS__ORDER_PRICE__",
            "ROLLINGMEAN_30DAYS__ORDER_PRICE__",
            "ROLLINGSTDDEV_7DAYS__ORDER_PRICE__",
            "ROLLINGSTDDEV_30DAYS__ORDER_PRICE__",
        ]
    ]
    .sort_values("ORDER_CREATION_DATE")
    .reset_index(drop=True)
)

Perform point-in-time correct join between datasets, accept at max 7-day-old stale rolling_mean.

merged_df = pd.merge_asof(
    loans_df,
    ecommerce_df,
    left_on="LOAN_ISSUANCE_DATE",
    right_on="ORDER_CREATION_DATE",
    by="MAIN_SYSTEM_ID",
    direction="backward",
    tolerance=pd.Timedelta("7d"),
).query("LOAN_ISSUANCE_DATE > ORDER_CREATION_DATE")

merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 34437 entries, 1 to 57618
Data columns (total 27 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 INDEX                                34437 non-null  int64         
 MAIN_SYSTEM_ID                       34437 non-null  int64         
 RETAILER_ID                          34437 non-null  int64         
 LOAN_ID                              34437 non-null  int64         
 LOAN_ISSUANCE_DATE                   34437 non-null  datetime64[ns]
 LOAN_AMOUNT                          34437 non-null  float64       
 INITIAL_COST                         34437 non-null  int64         
 TOTAL_INITIAL_AMOUNT                 34437 non-null  float64       
 INITIAL_DATE                         34437 non-null  datetime64[ns]
 LOAN_PAYMENT_DATE                    34437 non-null  datetime64[ns]
PAYMENT_AMOUNT                       34437 non-null  float64       
FIRST_TRAIL_DELAYS                   34437 non-null  int64         
FIRST_TRIAL_BALANCE                  34437 non-null  float64       
SPENT                                34437 non-null  float64       
FINAL_COST                           34437 non-null  int64         
TOTAL_FINAL_AMOUNT                   34437 non-null  float64       
REPAYMENT_ID                         34437 non-null  int64         
REPAYMENT_AMOUNT                     34437 non-null  float64       
REPAYMENT_UPDATED                    34437 non-null  datetime64[ns]
CUMMULATIVE_OUTSTANDING              34437 non-null  float64       
PAYMENT_STATUS                       34437 non-null  object        
ORDER_CREATION_DATE                  34437 non-null  datetime64[ns]
ORDER_PRICE                          34437 non-null  float64       
ROLLINGMEAN_7DAYS__ORDER_PRICE__     34437 non-null  float64       
ROLLINGMEAN_30DAYS__ORDER_PRICE__    34437 non-null  float64       
ROLLINGSTDDEV_7DAYS__ORDER_PRICE__   27689 non-null  float64       
ROLLINGSTDDEV_30DAYS__ORDER_PRICE__  33171 non-null  float64       
dtypes: datetime64[ns](5), float64(13), int64(8), object(1)
memory usage: 7.4+ MB

7.1. Missing observations#

We can’t retrieve rolling_mean for 25% of observations (from 57,621 to 43,225), which means that this approach is inviable as it stands. Before pivoting, let’s check the sanity of this result by looking at Ecommerce dataset for a couple of cases.

retailers_leftout = (
    pd.merge(
        loans_df[["MAIN_SYSTEM_ID"]].drop_duplicates(),
        merged_df[["MAIN_SYSTEM_ID"]].drop_duplicates(),
        on="MAIN_SYSTEM_ID",
        how="left",
        indicator=True,
    )
    .query("_merge == 'left_only'")
    .drop("_merge", axis=1)
    .drop_duplicates()
)

retailers_leftout.count()

MAIN_SYSTEM_ID    1058
dtype: int64

pd.merge(ecommerce_df, retailers_leftout, on="MAIN_SYSTEM_ID")[
    "MAIN_SYSTEM_ID"
].drop_duplicates().count()

They are in fact all present. So it must mean their loan request came after 7 days of their last Ecommerce transaction. Let’s expand the accepted staleness to 30 days and see how the number changes.

merged_df = (
    pd.merge_asof(
        loans_df,
        ecommerce_df,
        left_on="LOAN_ISSUANCE_DATE",
        right_on="ORDER_CREATION_DATE",
        by="MAIN_SYSTEM_ID",
        direction="backward",
        tolerance=pd.Timedelta("30d"),
    )
    .query("LOAN_ISSUANCE_DATE > ORDER_CREATION_DATE")
    .drop_duplicates()
)

merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 46632 entries, 1 to 57620
Data columns (total 27 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 INDEX                                46632 non-null  int64         
 MAIN_SYSTEM_ID                       46632 non-null  int64         
 RETAILER_ID                          46632 non-null  int64         
 LOAN_ID                              46632 non-null  int64         
 LOAN_ISSUANCE_DATE                   46632 non-null  datetime64[ns]
 LOAN_AMOUNT                          46632 non-null  float64       
 INITIAL_COST                         46632 non-null  int64         
 TOTAL_INITIAL_AMOUNT                 46632 non-null  float64       
 INITIAL_DATE                         46632 non-null  datetime64[ns]
 LOAN_PAYMENT_DATE                    46632 non-null  datetime64[ns]
PAYMENT_AMOUNT                       46632 non-null  float64       
FIRST_TRAIL_DELAYS                   46632 non-null  int64         
FIRST_TRIAL_BALANCE                  46632 non-null  float64       
SPENT                                46632 non-null  float64       
FINAL_COST                           46632 non-null  int64         
TOTAL_FINAL_AMOUNT                   46632 non-null  float64       
REPAYMENT_ID                         46632 non-null  int64         
REPAYMENT_AMOUNT                     46632 non-null  float64       
REPAYMENT_UPDATED                    46632 non-null  datetime64[ns]
CUMMULATIVE_OUTSTANDING              46632 non-null  float64       
PAYMENT_STATUS                       46632 non-null  object        
ORDER_CREATION_DATE                  46632 non-null  datetime64[ns]
ORDER_PRICE                          46632 non-null  float64       
ROLLINGMEAN_7DAYS__ORDER_PRICE__     46632 non-null  float64       
ROLLINGMEAN_30DAYS__ORDER_PRICE__    46632 non-null  float64       
ROLLINGSTDDEV_7DAYS__ORDER_PRICE__   37523 non-null  float64       
ROLLINGSTDDEV_30DAYS__ORDER_PRICE__  44892 non-null  float64       
dtypes: datetime64[ns](5), float64(13), int64(8), object(1)
memory usage: 10.0+ MB

We are still missing 20% of observations, and increasing this window means less stable estimations and more uncertainty. Let’s add a longer window rolling_mean to adjust for that.

merged_df = merged_df.assign(
    ROLLINGMEAN_120DAYS__ORDER_PRICE__=(
        lambda x: x.groupby("MAIN_SYSTEM_ID")
        .rolling("120D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
        .mean()
        .reset_index(drop=True)
    ),
    ROLLINGMEAN_360DAYS__ORDER_PRICE__=(
        lambda x: x.groupby("MAIN_SYSTEM_ID")
        .rolling("360D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
        .mean()
        .reset_index(drop=True)
    ),
    ROLLINGSTDDEV_120DAYS__ORDER_PRICE__=(
        lambda x: x.groupby("MAIN_SYSTEM_ID")
        .rolling("120D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
        .std()
        .reset_index(drop=True)
    ),
    ROLLINGSTDDEV_360DAYS__ORDER_PRICE__=(
        lambda x: x.groupby("MAIN_SYSTEM_ID")
        .rolling("360D", on="ORDER_CREATION_DATE")["ORDER_PRICE"]
        .std()
        .reset_index(drop=True)
    ),
).sort_values("ORDER_CREATION_DATE")

merged_df.value_counts("PAYMENT_STATUS")

PAYMENT_STATUS
Paid             46628
Partialy paid        2
Unpaid               2
dtype: int64

For the sake of time, I will finish this experiment with 20% of missing observations. Unfortunately, 5 out of 9 “not fully paid” observations are left out because of staleness. That would be an interesting tangent for another set of experiments. Something in the shape of “inactive-retailer” segment for credit risk.

Note

I was not able to perform Power Analysis mentioned in the next paragraph due to limited time.

However, this leaves us with a smaller section of an already very minor class. It doesn’t mean we cannot test our hypothesis, but the confidence in results will be diminished. Not all is lost, though. We can then use Power Analysis to find out how many more observations we would require to achieve 95% confidence intervals. But first things first, let’s test the first set of hypotheses:

null hypothesis 1: retailers who default (fully or partially) on their loan are requesting to borrow amounts above 1 standard deviation of their usual ecommerce volume
alternate hypothesis 1: retailers who default (fully or partially) on their loan are requesting to borrow amounts within 1 standard deviation of their usual ecommerce volume

To be very honest this first set of hypotheses will have very little significance due to us having only 4 observations. I judge it wiser to focus on the second set due to (again) limited time.

The second set of hypothesis is based on the assumption that responsible retailers (the vast majority in the dataset) would be more cautious and tend to request LOAN_AMOUNT in the neighborhood of cash flow they are used to; in other words, they won’t use us for irresponsible leverage:

null hypothesis 2: retailers who pay their loans in full are requesting to borrow amounts within 1 standard deviation of their usual ecommerce volume
alternate hypothesis 2: retailers who pay their loans in full are requesting to borrow amounts above 1 standard deviation of their usual ecommerce volume

If our tests accept the null hypothesis, we will have more confidence that credit risk assessment based on ecommerce volume wouldn’t jeopardize our best customers in volume, safeguarding customer experience. But the fact that we can’t test set 1 of hypotheses means we would still be opened to financial exposure.

7.2. Results#

I’m going to run 1000 trials of the hypothesis test for each aggregation window (7, 30, 120, 360 days) and use random sampling with replacement at each trial. This is a technique known as Bootstrapping and it’s great for estimating confidence intervals for tests.

# Define the number of bootstrap rounds
n_bootstrap_rounds = 1000

for w_size in [7, 30, 120, 360]:
    # Calculate the mean and standard deviation of the usual ecommerce volume
    mean_usual_ecommerce_volume = merged_df[
        f"ROLLINGMEAN_{w_size}DAYS__ORDER_PRICE__"
    ].mean()
    std_usual_ecommerce_volume = merged_df[
        f"ROLLINGSTDDEV_{w_size}DAYS__ORDER_PRICE__"
    ].std()

    # Calculate the 1 standard deviation limit
    one_std_limit = mean_usual_ecommerce_volume + std_usual_ecommerce_volume

    # Define function to calculate the proportion of loan_amount greater than one_std_limit
    def proportion_above_limit(data):
        return np.sum(data > one_std_limit) / len(data)

    # Initialize an empty array to store bootstrap sample proportions
    bootstrap_proportions = np.zeros(n_bootstrap_rounds)

    # Perform bootstrapping
    for i in range(n_bootstrap_rounds):
        bootstrap_sample = merged_df["LOAN_AMOUNT"].sample(frac=0.3, replace=True)
        bootstrap_proportions[i] = proportion_above_limit(bootstrap_sample)

    # Calculate the 95% confidence intervals
    lower_bound = np.percentile(bootstrap_proportions, 2.5)
    upper_bound = np.percentile(bootstrap_proportions, 97.5)

    # Check if the confidence interval contains 0.5 (equal proportions)
    print(f"Results on rolling_mean(ORDER_PRICE over {w_size} days)")
    if lower_bound > 0.2:
        print(
            "\tReject null hypothesis 2",
            f"\tRetailers are requesting to borrow amounts **above 1 standard deviation** of their mean volume in the previous {w_size} days",
            sep="\n",
        )
    else:
        print(
            "\tFail to reject null hypothesis 2",
            f"\tRetailers are requesting to borrow amounts **within 1 standard deviation** of their mean volume in the previous {w_size} days",
            sep="\n",
        )
    print(
        f"\tWe are 95% confident that between {lower_bound:.1%} and {upper_bound:.1%} of retailers "
        + "borrow above their mean ecommerce volume"
    )
    print(f"\t95% Confidence Interval: [{lower_bound:.4f}, {upper_bound:.4f}]")
    print("\n")

Results on rolling_mean(ORDER_PRICE over 7 days)
	Fail to reject null hypothesis 2
	Retailers are requesting to borrow amounts **within 1 standard deviation** of their mean volume in the previous 7 days
	We are 95% confident that between 0.9% and 1.2% of retailers borrow above their mean ecommerce volume
	95% Confidence Interval: [0.0089, 0.0121]

Results on rolling_mean(ORDER_PRICE over 30 days)
	Fail to reject null hypothesis 2
	Retailers are requesting to borrow amounts **within 1 standard deviation** of their mean volume in the previous 30 days
	We are 95% confident that between 1.6% and 2.0% of retailers borrow above their mean ecommerce volume
	95% Confidence Interval: [0.0160, 0.0202]

Results on rolling_mean(ORDER_PRICE over 120 days)
	Fail to reject null hypothesis 2
	Retailers are requesting to borrow amounts **within 1 standard deviation** of their mean volume in the previous 120 days
	We are 95% confident that between 2.5% and 3.1% of retailers borrow above their mean ecommerce volume
	95% Confidence Interval: [0.0254, 0.0311]

Results on rolling_mean(ORDER_PRICE over 360 days)
	Fail to reject null hypothesis 2
	Retailers are requesting to borrow amounts **within 1 standard deviation** of their mean volume in the previous 360 days
	We are 95% confident that between 2.6% and 3.1% of retailers borrow above their mean ecommerce volume
	95% Confidence Interval: [0.0257, 0.0310]

That’s an encoraging but expected result. Most retailers are responsible businees people who tend to be better at managing money. However, I was not expecting the percentage of retailer who go beyond this “safety zone” to bo so low. And here I can point out that not making that statement previously – what I expect to see – is something that sould always be avoided in experiments. I should have stated that I expected to see it around the 10% mark before commiting to this experiment. Maybe it’s the hurry that’s getting the best of me.

Anyhow, there is much more to be explored and proposed, but we can confidently add the rolling features. And because there was little difference between 7 and 30 days, and because of this limitation I will take the 30-day window.

Test Hypothesis

Contents

7. Test Hypothesis#

7.1. Missing observations#

7.2. Results#