I implemented the models and compared evaluation metrics

REGRESSION MODELS

A machine learning foundations project comparing Linear Regression, Decision Tree, and Random Forest with proper evaluation instead of button-click ML.

Project Type

Machine learning foundations project

Stack

Python, scikit-learn, NumPy, Pandas, Matplotlib

Models

Linear Regression, Decision Tree, Random Forest

Timeline

Built in 2026

Case Study

Engineering Notes

Project Overview

Regression Models is a machine learning foundations project where I compare Linear Regression, Decision Tree, and Random Forest on regression tasks. The audience is mostly me as a learner, but the project also shows how I approach ML beyond surface-level imports.

Problem / Motivation

It is easy to run scikit-learn examples and feel productive without understanding bias, variance, residuals, or why metrics disagree. I wanted a project that forced me to compare model behavior and not treat accuracy like the only truth in the room.

Architecture / System Design

The workflow uses Python with NumPy, Pandas, scikit-learn, and Matplotlib. Data is loaded and cleaned, features are prepared, models are trained, predictions are generated, and evaluation metrics are compared. Visualizations help inspect prediction quality instead of relying only on terminal output.

The structure stays intentionally simple: dataset preparation, model training, evaluation, and comparison. This keeps the focus on the ML fundamentals rather than hiding everything behind an oversized app shell.

Key Features

The project is small, but it is built to make the fundamentals visible.

Linear Regression baseline implementation.
Decision Tree and Random Forest comparison.
MSE, MAE, and R2 evaluation.
Data inspection and visualization with Pandas and Matplotlib.
Notes around OLS, loss geometry, and gradient descent concepts.

Technical Challenges

The challenge is not just running the models; it is interpreting them honestly. Different metrics can tell different stories, and tree-based models can look better while hiding overfitting risk. ML has enough traps. No need to add self-delusion also.

Solutions / Engineering Decisions

I used a direct notebook/script style because this is a learning project. The decision was to keep the pipeline readable and metric-focused, then compare models through actual outputs instead of dressing the project up as a fake production system.

Outcome / Final State

The project gives me a clearer base for regression workflows and model evaluation. It is not pretending to be a deployed ML product; it is a disciplined foundation for understanding the models I will later use in bigger systems.

Machine LearningPythonscikit-learnNumPyPandasMatplotlib

Key Capabilities

Compared Linear Regression, Decision Tree, and Random Forest models.

Evaluated models using MSE, MAE, and R2.

Studied OLS derivation, loss geometry, and the path toward gradient descent.

Keep Moving

All Work

2026