Wine Quality

— What makes a wine "premium"?

Which physicochemical properties are most strongly related to wine quality?

full version available at GitHub

Executive Summary

What makes a wine "premium"? This project explores the chemical landscape of 6,497 red and white wine samples (UCI Machine Learning Repository). By employing robust linear regression and intricate interaction modeling, I identified the key chemical drivers of quality and uncovered how these drivers behave differently across wine types.


Key Insight: Alcohol content is the strongest predictor of quality, while the impact of volatile acidity and sulfur dioxide is significantly moderated by the wine type (Red vs. White).

Exploratory Data Analysis

Before modeling, I investigated the distribution of chemical attributes. Initial observations revealed significant right-skewness in variables like Residual Sugar and Chlorides, necessitating logarithmic transformations to stabilize variance and linearize relationships.

Model & Feature Engineering

The modeling process evolved from a baseline pooled OLS to a sophisticated interaction model (Model D).

  • Variable Transformation: Applied log(x) to residual sugar, chlorides, and free sulfur dioxide to address non-linearity.

  • Interaction Modeling: We included interaction terms between wine type and several chemical properties to account for the systematic differences between red and white wines.

Advanced Diagnostics & Robustness

To ensure the integrity of the inferences, I performed rigorous diagnostics.

  • Heteroscedasticity: Detected using the Breusch-Pagan test. To mitigate this, I utilized HC3 Robust Standard Errors, ensuring that p-values and confidence intervals remained valid under non-constant variance.

  • Influence Analysis: Used Cook’s Distance and Leverage plots to ensure no single observation disproportionately skewed the results.

FINDINGS

Alcohol:

The most influential positive factor. Higher alcohol consistently correlates with higher quality ratings.


  • Volatile Acidity: Generally a "quality killer," but its negative impact is significantly more pronounced in Red Wine than in White Wine.

  • Sulfur Dioxide: The interaction model revealed that while White Wines are sensitive to sulfur levels, Red Wines show a different threshold for these preservatives.

This project demonstrates that while chemistry provides the building blocks of wine quality, the relationship is not "one size fits all." Segmenting the analysis by wine type reveals the nuanced balance required for a high-quality vintage.

full version available at GitHub

Create a free website with Framer, the website builder loved by startups, designers and agencies.