Completed2025

Nepal Earthquake Building Damage Predictor

Source Code View Paper

Tech Stack

PythonLightGBMScikit-learnStreamlitMLPandas

Key Results

72.5% accuracy

0.880 ROC AUC

260k+ records

Published paper

About This Project

This project addresses the challenge of predicting earthquake-induced building damage in Nepal using data from the 2015 Gorkha earthquake. Using a dataset of 260,000+ building records from the DrivenData competition combined with earthquake parameters, multiple classification models were evaluated including Logistic Regression, LightGBM, Random Forest, and Support Vector Machines.

The preprocessing pipeline involved scaling numerical features and applying both One-Hot Encoding and Ordinal Encoding for categorical features. Hyperparameter tuning via RandomizedSearchCV identified an optimized LightGBM model as the top performer.

Feature analysis consistently highlighted geographic location identifiers (geo_level_id) as the most dominant predictors. An interactive Streamlit application was developed to allow users to explore predictions based on adjustable building features, compare model performance, and view feature importances.

A comprehensive research paper was published documenting the methodology, results, and findings.