#ML, #personal, #school

NBA Machine Learning Project

Published May 22, 2025 by Noah Cauchi


NBA High-Scorer Prediction - ML Pipeline

*Timeline: Spring 2025 (1 month) Solo Project Madrid, Spain*

🎯 Project Overview

What: Built an ML pipeline that predicts high-scoring NBA players with 95%+ F1 score accuracy using 2023-2024 season data.

Why: Wanted to build a foundation in machine learning through a real-world application, combining my interest in basketball with structured statistical data.

Impact: Successfully predicted points scored while gaining hands-on experience with ML models and classification techniques.

Model Error (lower is better) Model F1 Score (higher is better)


🔧 Technical Implementation

Logistic regression performed best, offering both high accuracy and interpretability. The model revealed interesting insights like negative correlations with blocks, suggesting a split between offensive and defensive player archetypes.

Regularized Coeffecient Analysis


🚧 Key Challenges

Data Quality: The NBA Stats API returns comprehensive but noisy data. Built a preprocessing pipeline to filter outliers (players with minimal playing time) and remove irrelevant features that could introduce bias.

Model Selection: Systematically compared three approaches using cross-validation for both classification and regression. Logistic regression and elastic net regularization emerged as the optimal classifier and regressor respectivly, balancing accuracy with interpretability.


📊 Results & Impact


💡 Key Takeaways

Technical Skills: End-to-end ML pipeline development, classification algorithms, feature engineering, dimensionality reduction, API integration, data preprocessing

Professional Skills: Self-directed learning, systematic experimentation, model evaluation methodology, completing technical projects independently



*****

© 2025, Noah Cauchi