Published May 22, 2025 by Noah Cauchi
| *Timeline: Spring 2025 (1 month) | Solo Project | Madrid, Spain* |
What: Built an ML pipeline that predicts high-scoring NBA players with 95%+ F1 score accuracy using 2023-2024 season data.
Why: Wanted to build a foundation in machine learning through a real-world application, combining my interest in basketball with structured statistical data.
Impact: Successfully predicted points scored while gaining hands-on experience with ML models and classification techniques.

Logistic regression performed best, offering both high accuracy and interpretability. The model revealed interesting insights like negative correlations with blocks, suggesting a split between offensive and defensive player archetypes.

Data Quality: The NBA Stats API returns comprehensive but noisy data. Built a preprocessing pipeline to filter outliers (players with minimal playing time) and remove irrelevant features that could introduce bias.
Model Selection: Systematically compared three approaches using cross-validation for both classification and regression. Logistic regression and elastic net regularization emerged as the optimal classifier and regressor respectivly, balancing accuracy with interpretability.
Technical Skills: End-to-end ML pipeline development, classification algorithms, feature engineering, dimensionality reduction, API integration, data preprocessing
Professional Skills: Self-directed learning, systematic experimentation, model evaluation methodology, completing technical projects independently