Data SciencePlatform: Analytics Vidhya
Loan Predictions

1. Context & Objective
Dream Housing Finance company wants to automate the loan eligibility process based on customer details. This project builds a classifier to predict approval in real-time.
2. Methodology
1. Imputed missing values using mode/median strategies.
2. Applied log transformation to handle income skewness.
3. Used SMOTE to handle class imbalance.
4. Trained an XGBoost classifier for final predictions.
In [1]:
import pandas as pd, numpy as np
from xgboost import XGBClassifier
from imblearn.over_sampling import SMOTE
df = pd.read_csv('loan_train.csv')
df['LoanAmount_log'] = np.log(df['LoanAmount'])
X = pd.get_dummies(df.drop('Loan_Status', axis=1))
y = df['Loan_Status']
X_res, y_res = SMOTE(random_state=42).fit_resample(X, y)
model = XGBClassifier(eval_metric='logloss')
model.fit(X_res, y_res)3. Final Learnings
Log transformation on skewed income data improved generalization. SMOTE prevented the model from simply predicting 'Approve' for all, resulting in a robust 78% F1-score.
Dataset details
Language
Python
Size
614 rows (Training)
Libraries Used
PandasScikit-LearnXGBoost