Unmasking Model Behavior: How ICE Plots Reveal Individual Prediction Variations

Introduction

In the world of machine learning, we often celebrate global model performance—the accuracy, the F1-score, or the mean squared error. But for practitioners working in high-stakes environments like healthcare, finance, or credit lending, a “good model” is not enough. You need to understand how the model behaves on a granular level. When a loan application is rejected or a medical diagnosis is issued, the question isn’t just “how accurate is the model?” but “why did the model reach this specific conclusion for this specific person?”

This is where Partial Dependence Plots (PDPs) often fall short. While PDPs show the average effect of a feature on a prediction, they can hide critical nuances. Individual Conditional Expectation (ICE) plots serve as a powerful diagnostic tool, breaking down those averages to show how a single feature influences the prediction for every individual instance in your dataset. By visualizing these variations, you can uncover hidden patterns, model biases, and complex interactions that aggregate statistics simply cannot reveal.

Key Concepts

To understand ICE plots, we must first recognize the limitation of their predecessor: the Partial Dependence Plot (PDP). A PDP shows the marginal effect of one or two features on the predicted outcome of a machine learning model. It calculates the average prediction across all observations while varying one feature of interest.

The problem? By taking that average, the PDP effectively “flattens” the data. If a feature has a positive effect on one segment of your population but a negative effect on another, the PDP might show a flat line, suggesting the feature has no influence. It masks the heterogeneity of your data.

Individual Conditional Expectation (ICE) plots solve this by displaying one line per observation. Instead of aggregating, an ICE plot isolates a single feature and varies its value for one instance at a time, keeping all other features constant. You are essentially asking the model: “If I change this input for Person X, how does their specific prediction change?” By stacking these individual lines, you get a clear view of the distribution of effects across your entire dataset.

Step-by-Step Guide: Implementing ICE Plots

Select your target feature: Identify the feature you suspect has a complex relationship with your target variable. This should be a continuous or ordinal variable, as ICE plots are most effective when demonstrating how a prediction evolves across a spectrum.
Isolate the instance: Take a single data point from your training or test set. Keep all feature values fixed except for the feature you selected in Step 1.
Vary the feature range: Create a grid of values for the selected feature (e.g., if you are looking at “Income,” create a sequence from the minimum to the maximum observed income in your data).
Generate predictions: Run the model for every value in that grid for the chosen instance. This creates the “individual” line for that person.
Repeat for the dataset: Perform steps 2 through 4 for every instance in your dataset (or a representative sample) to build the complete ICE plot.
Layer and Visualize: Overlay these lines on a single coordinate plane. If you wish to see the average effect, you can superimpose the PDP (the average of all lines) as a bold, contrasting line.

Examples and Real-World Applications

Case Study: Credit Risk Assessment

Imagine a bank using a Random Forest model to determine interest rates. A PDP might suggest that as a user’s “Debt-to-Income Ratio” (DTI) increases, the interest rate stays relatively flat on average. However, an ICE plot reveals a different story. For younger, first-time borrowers, an increase in DTI leads to a sharp spike in interest rates. For older, high-net-worth clients, the same increase in DTI results in almost no change to the predicted interest rate. This reveals that the model has learned an interaction effect: DTI is risky for some, but not for others. If the bank fails to notice this, they risk unintentional discriminatory lending practices.

Case Study: Healthcare Diagnostics

In medical research, we might use a gradient boosting model to predict the probability of a chronic illness based on “Daily Exercise Hours.” A PDP might show that exercise generally lowers the probability of disease. An ICE plot, however, shows that for individuals with a specific genetic marker, the line is perfectly flat. This signals to researchers that for this subset of the population, exercise might not be the primary lever for preventing the disease, prompting a search for other influencing variables.

Common Mistakes to Avoid

Ignoring Feature Interactions: If two features are highly correlated, ICE plots can be misleading. When you vary one feature while holding the other constant, you may be creating “impossible” data points (e.g., a 10-year-old with an income of $100k). Always check for multicollinearity before interpreting ICE plots.
Overcrowding the Plot: If your dataset contains 100,000 rows, plotting every single line will result in an unreadable “hairball” of data. Use a random sample of 100–500 instances to keep the plot clean while still capturing the diversity of the relationships.
Confusing Correlation with Causation: ICE plots demonstrate how your model behaves, not necessarily how the real world works. If your model is biased or trained on poor-quality data, the ICE plot will show you the pattern of that bias. It is a diagnostic for the model, not a definitive map of physical causality.
Ignoring Centered ICE Plots: If the starting point of your lines varies greatly, it can be hard to see the change in predictions. Consider using “Centered ICE plots,” which anchor all lines at a specific point on the y-axis, allowing you to focus on the slope and the divergence of the lines rather than their absolute position.

Advanced Tips for Better Insights

To extract the most value from ICE plots, look for “Heterogeneity.” Look for points where the lines diverge significantly. This is your model telling you that it is processing specific data points very differently from others. If the lines are parallel, the feature has a consistent effect across your population. If they cross, converge, or diverge, you have identified a strong interaction effect that the model is exploiting.

The power of an ICE plot lies not in the trends that are consistent, but in the anomalies. When lines in an ICE plot deviate from the average, you have found the specific edge cases where your model’s logic changes.

Additionally, color-coding your ICE plots can add a dimension of clarity. By coloring the lines based on a categorical variable (e.g., gender, region, or customer tier), you can instantly visualize how the model’s logic differs between groups. This is an essential step for “Fairness Auditing” in machine learning, allowing you to detect if your model is treating different demographic groups using different underlying logic.

Conclusion

ICE plots are an indispensable tool for moving beyond the “black box” nature of modern machine learning. By decomposing aggregate predictions into individual ones, they provide the transparency necessary for auditing, debugging, and refining complex models. While they require careful handling regarding multicollinearity and sample sizing, the insights gained into model behavior are unparalleled.

Whether you are trying to explain a model to non-technical stakeholders or attempting to understand why your model is making errors in specific segments, ICE plots offer the clarity you need. They remind us that behind every aggregate statistic is a collection of individual experiences—and understanding those differences is the key to building models that are not only performant but also fair, reliable, and interpretable.