test
🧩 Syntax:
Of course. This is an excellent and practical request that lies at the heart of Asset Liability Management (ALM) and treasury functions in a bank. Here is a comprehensive Python solution that walks through each of your required steps.
The solution is structured like a Jupyter Notebook for clarity, with explanations for each step.
Setup and Environment
First, let's set up the environment by importing the necessary libraries. We'll also generate some realistic synthetic data for demonstration purposes, as we don't have the actual CSV files.
# --- 0. Environment Setup ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
# For statistical models
from lifelines import KaplanMeierFitter
from prophet import Prophet
from sklearn.linear_model import LinearRegression
# Set some display options for pandas
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 150)
# --- Data Generation (for demonstration) ---
# This part creates synthetic data that mimics the real datasets.
def create_synthetic_data():
"""Generates synthetic transaction and interest rate data."""
print("Generating synthetic data...")
# --- Interest Rates ---
rate_dates = pd.to_datetime(pd.date_range(start='2020-01-01', end='2024-01-01', freq='6M'))
interest_rates_df = pd.DataFrame({
'effective_date': rate_dates,
'policy_rate': [0.00, -0.0025, 0.00, 0.005, 0.0175, 0.025, 0.035, 0.04]
})
# --- Transactions Data ---
num_customers = 500
num_accounts = 600
num_transactions = 50000
customer_ids = [f'C{1000+i}' for i in range(num_customers)]
account_keys = [f'A{10000+i}' for i in range(num_accounts)]
transactions = []
account_info = {}
for i, acc_key in enumerate(account_keys):
cust_id = np.random.choice(customer_ids)
if cust_id not in account_info:
account_info[cust_id] = {
'birth_year': np.random.randint(1940, 2005),
'zip_code': np.random.randint(10000, 98000)
}
# Account opening date is the date of the first transaction
acc_open_date = pd.to_datetime('2020-01-01') + timedelta(days=np.random.randint(0, 1200))
# Create some transactions for this account
is_volatile = np.random.rand() > 0.7 # 30% of accounts are volatile
balance = np.random.uniform(5000, 100000)
current_date = acc_open_date
while current_date < datetime(2024, 1, 1):
if np.random.rand() < 0.3: # 30% chance of transaction on any given day
is_credit = np.random.rand() > (0.6 if is_volatile else 0.4)
if is_credit:
amount = np.random.uniform(500, 5000)
balance += amount
transactions.append({
'deposit_account_key': acc_key,
'transaction_date_key': current_date,
'transaction_sign': 'credit',
'transaction_type': 'salary' if np.random.rand() > 0.8 else 'transfer',
'transaction_amt_sek': amount,
'customer_id': cust_id
})
else: # Debit
amount = np.random.uniform(100, balance * (0.5 if is_volatile else 0.1))
balance -= amount
transactions.append({
'deposit_account_key': acc_key,
'transaction_date_key': current_date,
'transaction_sign': 'debit',
'transaction_type': 'payment' if np.random.rand() > 0.5 else 'withdrawal',
'transaction_amt_sek': amount,
'customer_id': cust_id
})
current_date += timedelta(days=1)
transactions_df = pd.DataFrame(transactions)
transactions_df['transaction_id'] = [f'T{1000000+i}' for i in range(len(transactions_df))]
# Add customer info from the dictionary
cust_info_df = pd.DataFrame.from_dict(account_info, orient='index').reset_index().rename(columns={'index': 'customer_id'})
transactions_df = transactions_df.merge(cust_info_df, on='customer_id', how='left')
transactions_df.to_csv('transaction_data.csv', index=False)
interest_rates_df.to_csv('interest_rates.csv', index=False)
print("Synthetic data saved to 'transaction_data.csv' and 'interest_rates.csv'.")
# Generate the data
create_synthetic_data()
content_copy
download
Use code with caution.
Python
1. Data Preparation
This section handles loading, merging, and transforming the raw data into a usable format. The most critical step here is reconstructing the daily balance for each account.
# --- 1. Data Preparation ---
# Load datasets
try:
trans_df = pd.read_csv('transaction_data.csv')
rates_df = pd.read_csv('interest_rates.csv')
except FileNotFoundError:
print("Please run the data generation cell first.")
exit()
# Convert dates to datetime objects
trans_df['transaction_date_key'] = pd.to_datetime(trans_df['transaction_date_key'])
rates_df['effective_date'] = pd.to_datetime(rates_df['effective_date'])
# Merge interest rates with transactions.
# `merge_asof` is perfect for this, as it matches each transaction to the prevailing rate at that time.
trans_df = trans_df.sort_values('transaction_date_key')
rates_df = rates_df.sort_values('effective_date')
merged_df = pd.merge_asof(
trans_df,
rates_df,
left_on='transaction_date_key',
right_on='effective_date',
direction='backward'
)
# --- Reconstruct Daily Balance ---
print("\nReconstructing daily balances (this may take a moment)...")
# Calculate net change for each transaction
merged_df['net_change'] = merged_df['transaction_amt_sek'] * np.where(merged_df['transaction_sign'] == 'credit', 1, -1)
# Group by account and date to get total daily change
daily_changes = merged_df.groupby(['deposit_account_key', 'transaction_date_key'])['net_change'].sum().reset_index()
# Create a complete date range for each account from its first transaction to the end date
end_date = merged_df['transaction_date_key'].max()
daily_balances_list = []
for account_key, group in daily_changes.groupby('deposit_account_key'):
start_date = group['transaction_date_key'].min()
account_dates = pd.date_range(start=start_date, end=end_date, freq='D')
# Reindex to fill missing dates and forward-fill customer info
account_info = merged_df[merged_df['deposit_account_key'] == account_key].iloc[0]
reindexed = group.set_index('transaction_date_key').reindex(account_dates, fill_value=0).reset_index()
reindexed.rename(columns={'index': 'date'}, inplace=True)
reindexed['deposit_account_key'] = account_key
reindexed['balance'] = reindexed['net_change'].cumsum()
# Add customer and account info
reindexed['customer_id'] = account_info['customer_id']
reindexed['customer_birth_year'] = account_info['customer_birth_year']
reindexed['customer_zip_code'] = account_info['customer_zip_code']
daily_balances_list.append(reindexed)
# Concatenate all account data into a single DataFrame
daily_balances_df = pd.concat(daily_balances_list, ignore_index=True)
# Merge with rates again for the full daily history
daily_balances_df = pd.merge_asof(
daily_balances_df.sort_values('date'),
rates_df,
left_on='date',
right_on='effective_date',
direction='backward'
).drop(columns=['net_change'])
print("Daily balances reconstructed successfully.")
print(daily_balances_df.head())
# --- Calculate Account and Customer Age ---
current_year = datetime.now().year
account_open_dates = daily_balances_df.groupby('deposit_account_key')['date'].min().reset_index()
account_open_dates.rename(columns={'date': 'opening_date'}, inplace=True)
# Create a final summary DataFrame for each account
account_summary = account_open_dates.copy()
# Account Tenure (in months)
account_summary['tenure_months'] = ((end_date - account_summary['opening_date']).dt.days / 30.44).astype(int)
# Merge customer info
customer_info = daily_balances_df[['deposit_account_key', 'customer_id', 'customer_birth_year']].drop_duplicates()
account_summary = pd.merge(account_summary, customer_info, on='deposit_account_key')
# Customer Age
account_summary['customer_age'] = current_year - account_summary['customer_birth_year']
print("\nAccount Summary with Age/Tenure:")
print(account_summary.head())
content_copy
download
Use code with caution.
Python
2. & 3. Key Metrics Calculation & Core/Non-Core Classification
Now we'll calculate the required stability metrics and then use them to classify each account.
# --- 2/3. Calculate Key Metrics and Classify Deposits ---
print("\nCalculating stability metrics for each account...")
# Get current balance
current_balances = daily_balances_df.loc[daily_balances_df.groupby('deposit_account_key')['date'].idxmax()][['deposit_account_key', 'balance']]
current_balances.rename(columns={'balance': 'current_balance_sek'}, inplace=True)
account_summary = pd.merge(account_summary, current_balances, on='deposit_account_key')
# --- Metric Calculations ---
metrics = {}
one_year_ago = end_date - timedelta(days=365)
recent_balances = daily_balances_df[daily_balances_df['date'] >= one_year_ago]
# Stability Ratio (Min balance in last 12m / Current balance)
min_balance_12m = recent_balances.groupby('deposit_account_key')['balance'].min().reset_index()
min_balance_12m.rename(columns={'balance': 'min_balance_12m'}, inplace=True)
account_summary = pd.merge(account_summary, min_balance_12m, on='deposit_account_key', how='left')
account_summary['stability_ratio'] = (account_summary['min_balance_12m'] / account_summary['current_balance_sek']).fillna(0)
account_summary['stability_ratio'] = account_summary['stability_ratio'].clip(0, 1) # Ratio cannot be > 1
# Balance Volatility (Std Dev / Mean)
volatility = daily_balances_df.groupby('deposit_account_key')['balance'].agg(['std', 'mean']).reset_index()
volatility['balance_volatility'] = (volatility['std'] / volatility['mean']).fillna(0)
account_summary = pd.merge(account_summary, volatility[['deposit_account_key', 'balance_volatility']], on='deposit_account_key')
# Transaction Frequency (Avg transactions per month)
trans_counts = merged_df.groupby('deposit_account_key').size().reset_index(name='total_transactions')
account_summary = pd.merge(account_summary, trans_counts, on='deposit_account_key')
account_summary['transaction_frequency_monthly'] = (account_summary['total_transactions'] / account_summary['tenure_months']).replace([np.inf, -np.inf], 0).fillna(0)
# Average Balances (various windows)
for days in [30, 90, 180, 365]:
window_start = end_date - timedelta(days=days)
avg_bal = daily_balances_df[daily_balances_df['date'] >= window_start].groupby('deposit_account_key')['balance'].mean().reset_index()
avg_bal.rename(columns={'balance': f'avg_balance_{days}d'}, inplace=True)
account_summary = pd.merge(account_summary, avg_bal, on='deposit_account_key', how='left')
# Deposit Decay Rate (Simplified: Net outflow over last 12 months as % of start balance)
start_bal_12m = daily_balances_df[daily_balances_df['date'] == one_year_ago][['deposit_account_key', 'balance']]
if not start_bal_12m.empty:
start_bal_12m.rename(columns={'balance': 'start_balance_12m'}, inplace=True)
decay_df = pd.merge(start_bal_12m, current_balances, on='deposit_account_key')
decay_df['decay_rate_12m'] = (decay_df['start_balance_12m'] - decay_df['current_balance_sek']) / decay_df['start_balance_12m']
account_summary = pd.merge(account_summary, decay_df[['deposit_account_key', 'decay_rate_12m']], on='deposit_account_key', how='left').fillna(0)
# Rate Sensitivity (Simplified: Placeholder - a full model is complex)
# A real implementation would regress balance changes against rate changes over time, often at a segment level.
# For now, we'll create a placeholder.
account_summary['rate_sensitivity_score'] = np.random.uniform(0.1, 1.0, size=len(account_summary)) # Placeholder
# --- Core vs Non-Core Classification ---
def classify_deposit(row):
tenure = row['tenure_months']
volatility = row['balance_volatility']
stability = row['stability_ratio']
if tenure < 6 or volatility > 0.75:
return 'Non-Core'
if tenure > 24 and volatility < 0.2 and stability > 0.8:
return 'Core - Highly Stable'
if tenure > 12 and volatility < 0.4 and stability > 0.6:
return 'Core - Stable'
return 'Semi-Core'
account_summary['deposit_class'] = account_summary.apply(classify_deposit, axis=1)
print("\nFinal Account Summary with Metrics and Classification:")
print(account_summary.head())
# Display classification breakdown
print("\nDeposit Classification Breakdown:")
print(account_summary['deposit_class'].value_counts(normalize=True) * 100)
content_copy
download
Use code with caution.
Python
4. Behavioral Maturity Analysis
This involves analyzing deposit behavior over time, particularly how cohorts of accounts behave.
# --- 4. Behavioral Maturity Analysis ---
print("\nPerforming Behavioral Maturity Analysis...")
# --- Vintage Curves ---
# We need the opening month for each account
account_summary['opening_month'] = account_summary['opening_date'].dt.to_period('M')
# Add opening month to daily balances
vintage_data = pd.merge(daily_balances_df, account_summary[['deposit_account_key', 'opening_month']], on='deposit_account_key')
vintage_data['months_since_opening'] = ((vintage_data['date'].dt.to_period('M') - vintage_data['opening_month']).apply(lambda x: x.n))
# Calculate average balance by vintage and months since opening
vintage_analysis = vintage_data.groupby(['opening_month', 'months_since_opening'])['balance'].mean().reset_index()
# Pivot for plotting
vintage_pivot = vintage_analysis.pivot(index='months_since_opening', columns='opening_month', values='balance')
# Plotting Vintage Curves
plt.figure(figsize=(14, 7))
sns.lineplot(data=vintage_pivot)
plt.title('Vintage Analysis: Average Deposit Balance by Opening Cohort')
plt.xlabel('Months Since Account Opening')
plt.ylabel('Average Balance (SEK)')
plt.legend(title='Opening Cohort', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()
plt.show()
# --- Deposit Decay Curves & Permanent Balance Floor ---
# We can normalize the vintage curve to show decay relative to an early balance
peak_balance = vintage_pivot.iloc[2:6].mean() # Use avg balance from month 3-6 as peak
normalized_vintage = vintage_pivot / peak_balance
plt.figure(figsize=(14, 7))
sns.lineplot(data=normalized_vintage)
plt.title('Deposit Decay Curves (Normalized)')
plt.xlabel('Months Since Account Opening')
plt.ylabel('Proportion of Peak Balance Remaining')
plt.axhline(y=normalized_vintage.min().mean(), color='r', linestyle='--', label=f'Estimated Permanent Floor (~{normalized_vintage.min().mean():.2f})')
plt.legend(title='Opening Cohort', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()
plt.show()
# Calculate average life (simplified: time until balance drops to 50% of peak)
avg_life = (normalized_vintage > 0.5).sum()
print(f"\nSimplified Average Life of Deposits (months until <50% of peak):\n{avg_life.describe()}")
content_copy
download
Use code with caution.
Python
5. Statistical Models
Here we implement three powerful techniques: Survival Analysis, Time Series Forecasting, and a Monte Carlo simulation for stress testing.
# --- 5. Statistical Models ---
# --- A) Survival Analysis: Time to Significant Withdrawal ---
print("\n--- Model A: Survival Analysis ---")
# Event: a withdrawal of >25% of the balance within a 30-day window.
survival_data = []
for account_key, group in daily_balances_df.groupby('deposit_account_key'):
group = group.sort_values('date')
opening_date = group['date'].min()
duration = (end_date - opening_date).days
event_observed = 0
# Calculate rolling 30-day minimum balance
rolling_min = group['balance'].rolling(window=30, min_periods=1).min()
# An event is a drop > 25% from the previous day's balance
event_dates = group[group['balance'] < group['balance'].shift(1) * 0.75]
if not event_dates.empty:
event_date = event_dates['date'].min()
duration = (event_date - opening_date).days
event_observed = 1
survival_data.append({
'deposit_account_key': account_key,
'duration': duration,
'event_observed': event_observed
})
survival_df = pd.DataFrame(survival_data)
survival_df = pd.merge(survival_df, account_summary[['deposit_account_key', 'deposit_class']], on='deposit_account_key')
# Fit Kaplan-Meier model for different deposit classes
plt.figure(figsize=(12, 7))
ax = plt.subplot(111)
for d_class in survival_df['deposit_class'].unique():
subset = survival_df[survival_df['deposit_class'] == d_class]
kmf = KaplanMeierFitter()
kmf.fit(subset['duration'], event_observed=subset['event_observed'], label=d_class)
kmf.plot_survival_function(ax=ax)
plt.title('Survival Function: Time to Significant Withdrawal (>25%)')
plt.xlabel('Days Since Account Opening')
plt.ylabel('Probability of "Survival" (No large withdrawal)')
plt.grid(True)
plt.show()
# --- B) Time Series Forecasting: Total Deposit Balance ---
print("\n--- Model B: Time Series Forecasting with Prophet ---")
# Aggregate total deposits by day
total_deposits_ts = daily_balances_df.groupby('date')['balance'].sum().reset_index()
total_deposits_ts.rename(columns={'date': 'ds', 'balance': 'y'}, inplace=True)
# Fit Prophet model
model = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False)
model.fit(total_deposits_ts)
# Create future dataframe and predict
future = model.make_future_dataframe(periods=180) # Forecast 6 months
forecast = model.predict(future)
# Plot forecast
fig = model.plot(forecast)
plt.title('Forecast of Total Deposit Balances')
plt.xlabel('Date')
plt.ylabel('Total Balance (SEK)')
plt.show()
# --- C) Monte Carlo Simulation: Stress Testing Outflows ---
print("\n--- Model C: Monte Carlo Simulation for Stress Testing ---")
# Calculate daily percentage change in total deposits
total_deposits_ts['daily_pct_change'] = total_deposits_ts['y'].pct_change().fillna(0)
# Historical parameters
mean_change = total_deposits_ts['daily_pct_change'].mean()
std_change = total_deposits_ts['daily_pct_change'].std()
# Simulation parameters
num_simulations = 1000
num_days = 90
last_balance = total_deposits_ts['y'].iloc[-1]
simulation_results = np.zeros((num_simulations, num_days))
# Stress parameters (e.g., 2 standard deviation shock to mean outflow)
stress_mean_change = mean_change - (2 * std_change)
stress_std_change = std_change * 1.5
for i in range(num_simulations):
balance = last_balance
for j in range(num_days):
# Draw from a normal distribution with STRESSED parameters
shock = np.random.normal(stress_mean_change, stress_std_change)
balance *= (1 + shock)
simulation_results[i, j] = balance
# Plot results
plt.figure(figsize=(12, 7))
plt.plot(simulation_results.T, color='grey', alpha=0.1)
plt.plot(simulation_results.mean(axis=0), color='red', linewidth=2, label='Mean Stressed Path')
plt.axhline(y=last_balance, color='blue', linestyle='--', label='Starting Balance')
plt.title(f'Monte Carlo Simulation of Deposit Outflows under Stress ({num_days} days)')
plt.xlabel('Days from Today')
plt.ylabel('Projected Total Deposit Balance (SEK)')
plt.legend()
plt.grid(True)
plt.show()
# Quantify stressed outflow
final_stressed_balances = simulation_results[:, -1]
percentile_5 = np.percentile(final_stressed_balances, 5)
stressed_outflow = last_balance - percentile_5
print(f"Starting Balance: {last_balance:,.0f} SEK")
print(f"5th Percentile Stressed Balance after {num_days} days: {percentile_5:,.0f} SEK")
print(f"Potential Stressed Outflow (95% confidence): {stressed_outflow:,.0f} SEK ({stressed_outflow/last_balance:.2%})")
content_copy
download
Use code with caution.
Python
6. Segmentation Analysis
Finally, we group our results by various customer and account segments to gain deeper insights.
# --- 6. Segmentation Analysis ---
print("\n--- Segmentation Analysis ---")
# Create segments in the summary dataframe
# Customer Age Bins
age_bins = [0, 30, 45, 60, 150]
age_labels = ['<30', '30-45', '45-60', '60+']
account_summary['customer_age_group'] = pd.cut(account_summary['customer_age'], bins=age_bins, labels=age_labels)
# Balance Size Tiers
balance_bins = [-np.inf, 10000, 50000, 250000, np.inf]
balance_labels = ['<10k', '10k-50k', '50k-250k', '250k+']
account_summary['balance_tier'] = pd.cut(account_summary['current_balance_sek'], bins=balance_bins, labels=balance_labels)
# Geographic Regions (Simplified using first digit of zip code for Sweden)
account_summary['region'] = 'Other'
account_summary.loc[account_summary['customer_zip_code'].astype(str).str.startswith('1'), 'region'] = 'Stockholm'
account_summary.loc[account_summary['customer_zip_code'].astype(str).str.startswith('4'), 'region'] = 'Göteborg'
account_summary.loc[account_summary['customer_zip_code'].astype(str).str.startswith('2'), 'region'] = 'Malmö'
# Perform aggregation by segments
segmentation_cols = ['customer_age_group', 'balance_tier', 'region', 'deposit_class']
for segment in segmentation_cols:
print(f"\n--- Analysis by {segment} ---")
segment_analysis = account_summary.groupby(segment).agg(
num_accounts=('deposit_account_key', 'count'),
total_balance_sek=('current_balance_sek', 'sum'),
avg_stability_ratio=('stability_ratio', 'mean'),
avg_volatility=('balance_volatility', 'mean'),
avg_tenure_months=('tenure_months', 'mean')
).sort_values('total_balance_sek', ascending=False)
# Format for readability
segment_analysis['total_balance_sek'] = segment_analysis['total_balance_sek'].apply(lambda x: f'{x:,.0f}')
segment_analysis['avg_stability_ratio'] = segment_analysis['avg_stability_ratio'].apply(lambda x: f'{x:.2f}')
segment_analysis['avg_volatility'] = segment_analysis['avg_volatility'].apply(lambda x: f'{x:.2f}')
segment_analysis['avg_tenure_months'] = segment_analysis['avg_tenure_months'].apply(lambda x: f'{x:.1f}')
print(segment_analysis)
content_copy
download
Use code with caution.
Python
Conclusion and How to Use the Model
This comprehensive model provides the bank with critical insights into its retail deposit base.
Core vs. Non-Core Classification: The deposit_class is fundamental.
Regulatory Reporting: This classification directly feeds into liquidity calculations like the Liquidity Coverage Ratio (LCR) and Net Stable Funding Ratio (NSFR), where core deposits receive more favorable treatment (lower outflow assumptions).
Balance Sheet Optimization: By understanding the size and stability of the core deposit base, the bank can more confidently use these funds for longer-term lending, improving its net interest margin (NIM).
Behavioral Maturity:
The vintage curves and decay analysis quantify how long deposits "stick" with the bank. This behavioral data is more accurate than relying on contractual maturity (which is typically on-demand for retail deposits).
The "permanent" balance floor estimate provides a conservative base level of funding the bank can rely on, even during periods of deposit runoff.
Statistical Models:
Survival Analysis: Helps predict which types of accounts are most likely to experience a large withdrawal. This can inform targeted customer retention strategies.
Time Series Forecasting: Provides a baseline forecast for deposit levels, essential for short-term liquidity planning and cash management.
Monte Carlo Simulation: This is the key tool for stress testing. By simulating stressed scenarios, the bank can quantify potential deposit outflows and ensure its liquidity buffer (e.g., holding of High-Quality Liquid Assets - HQLA) is sufficient to withstand a crisis.
Segmentation Analysis:
This analysis reveals which customer segments are the most stable. For instance, the model might show that older customers in the Stockholm region with mid-sized balances are the most stable.
Marketing & Product Development: The bank can target marketing campaigns to attract more customers from these stable segments or develop products that appeal to them. Conversely, it can be cautious about becoming overly reliant on funding from less stable segments.
By integrating these outputs, the bank's Treasury and ALM departments can make more informed decisions, optimize their balance sheet structure, reduce liquidity risk, and satisfy regulatory requirements more effectively.