Beyond Global Averages: Using Individual Conditional Expectation (ICE) Plots for Model Transparency

Introduction

In the world of machine learning, we often fall into the trap of obsessing over aggregate metrics. We look at F1-scores, R-squared values, and RMSE to determine if a model is “good.” But a model that performs well on average can still be wildly inaccurate—or even biased—when it comes to specific, high-stakes decisions. If you are using a black-box model to approve loans, diagnose medical conditions, or set insurance premiums, knowing the “global” behavior isn’t enough. You need to know how the model behaves for each specific person.

This is where Individual Conditional Expectation (ICE) plots come in. While Partial Dependence Plots (PDPs) show you the average effect of a feature across your entire dataset, ICE plots disaggregate that data. They reveal the hidden variations in predictions for individual instances, allowing you to see where your model might be breaking down. By moving from the “average” to the “individual,” you gain the accountability required for modern, ethical AI deployment.

Key Concepts

To understand ICE plots, we must first briefly revisit the Partial Dependence Plot (PDP). A PDP shows the marginal effect of a feature on the predicted outcome by averaging out the effects of all other features. The limitation is obvious: by averaging, you lose sight of interactions. If a feature has a positive effect for half your data and a negative effect for the other half, the PDP will show a flat line, suggesting the feature has no effect at all.

An ICE plot solves this by plotting one line for each observation in your dataset. For each individual, the ICE algorithm varies the value of the feature of interest while holding all other features constant for that specific row. This results in a “spaghetti plot” where each strand represents the model’s prediction for a single entity as that specific feature changes.

Key benefits of this approach include:

Uncovering Heterogeneity: It exposes cases where the model reacts differently to the same variable based on other contextual factors.
Detecting Interactions: If the lines in your ICE plot are not parallel, it is a visual confirmation that the feature of interest interacts with at least one other feature in your model.
Model Debugging: It allows you to identify outliers where the model’s behavior deviates significantly from the rest of the cohort.

Step-by-Step Guide

Implementing ICE plots is straightforward if you are using standard libraries like scikit-learn or PyInterpret. Follow these steps to generate and interpret your own:

Select a Feature: Choose a feature you suspect is highly influential or one that is subject to regulatory scrutiny.
Define the Range: Determine the range of values for this feature (e.g., if analyzing “Age,” look at the range from 18 to 80).
Apply the Function: For each observation in your dataset, create a copy of that observation and systematically replace the feature value with a grid of values across your chosen range.
Generate Predictions: Run these “modified” observations through your trained machine learning model.
Visualize: Plot the resulting predictions on a graph where the x-axis is the feature value and the y-axis is the predicted outcome. Each row of your data will be one line on this graph.
Centering (Optional): If the lines are overlapping and hard to read, apply “Centered ICE” (c-ICE). This involves subtracting the prediction at the lowest feature value from all subsequent predictions for that instance, forcing all lines to start at the same point (usually zero). This highlights the slope or the relative change rather than the absolute value.

Examples and Real-World Applications

Example 1: Credit Scoring and Gender Bias

Imagine a bank model that predicts credit risk. A global PDP might show that as “Annual Income” increases, “Default Probability” decreases. However, an ICE plot might reveal that for a specific subgroup (e.g., applicants with low credit history), increasing income has almost no effect on the prediction, whereas for others, it is highly impactful. If the lines split drastically based on a sensitive attribute like gender, the ICE plot provides immediate, visual evidence of disparate impact that a standard performance report would hide.

Example 2: Healthcare Diagnostic Models

Consider a model predicting the risk of a post-operative complication based on “Dosage of Medication X.” A standard PDP might suggest that increasing the dosage is generally safe. However, an ICE plot might show a sudden, sharp spike in risk for a subset of patients who have a specific, pre-existing condition (which the model learned as an interaction). Detecting this in development prevents the deployment of a model that could harm a vulnerable minority of patients.

Common Mistakes

Ignoring Interaction Effects: If you only look at the average (the PDP) and ignore the spread of the ICE lines, you are essentially ignoring the most dangerous parts of your model. Always check for non-parallel lines.
Scaling Overload: If your dataset has 100,000 rows, plotting 100,000 lines will result in an illegible mess. Actionable tip: Use a random sample of 100–500 instances to visualize the ICE plot. This captures the variance without overwhelming the viewer.
Ignoring Correlated Features: ICE plots assume that you can change one feature while keeping others constant. If two features are highly correlated (e.g., “Years of Experience” and “Age”), changing one while holding the other constant creates “impossible” data points (e.g., a 20-year-old with 40 years of experience). This can lead to misleading extrapolations.

Advanced Tips

To extract the most value from ICE plots, consider these advanced strategies:

The power of an ICE plot isn’t just in the visualization; it’s in the hypothesis generation. When you see a cluster of lines behaving differently, query those specific rows. Are they all from the same geographic region? Do they all have missing data in a specific field? Treat the plot as a diagnostic map for your feature engineering pipeline.

Use Derivative ICE Plots: If you are working with non-linear models like Gradient Boosting or Neural Networks, the “slope” of the lines in the ICE plot tells you the marginal change in prediction. Plotting the derivative of these lines can help you identify threshold points where the model’s “decision logic” suddenly switches, providing insight into the model’s sensitivity at specific intervals.

Combine with Clustering: If you have thousands of observations, use clustering (like K-Means) on the ICE lines themselves. This groups similar “model behaviors” together, allowing you to summarize the variance into a few distinct “profiles” rather than trying to interpret hundreds of individual lines at once.

Conclusion

In an era where “black-box” models are increasingly scrutinized, transparency is not a luxury—it is a functional requirement. Relying on average model behavior is a shortcut that leaves your organization vulnerable to hidden biases, overlooked interactions, and poor decision-making at the edge cases of your data.

ICE plots bridge the gap between model performance and model accountability. They force us to move beyond the comfort of the average and look directly at how our algorithms treat individuals. By implementing ICE plots, you don’t just build models that perform better; you build systems that you can actually explain, defend, and trust. Start by sampling your data, visualizing the variations, and questioning the anomalies. The insights you find may just change the way you build your next model.