Explainable AI is a key concept in Machine Learning/AI to explain why your model is making the predictions. It helps us understand how good a model is. In this blog, we cover how you can use a game theory-based method called Shapley Values to explain what's happening inside the ML model.

Let's assume you are tasked with developing an ML model to predict credit card defaulters. With the data at your disposal you clean and transform it, after which you properly cross-validate your results. However, the C-suite isn’t impressed because although your scores/results (precision/recall/F-1) are great, they have no clue to figure out why the model is referring to someone as a potential defaulter. You can show them the feature importance scores (which are on a global level), yet something more is desired. In other words, what is required to convince stakeholders is to explain why the ML model might be making any predictions. This would increase their trust in the model and this process of providing explanations is called model explainability.

Explainability is very important if one is working in regulated sectors like healthcare, trade, etc. In such domains, the data science teams not only work on understanding the data and model building but also try to explain why the models made those decisions.

So, what can you do now? To have higher interpretability, you can use some variants of linear models. This would enable you to explain individual predictions as well. But it comes at the cost of performance. You still want to retain similar performance with as little sacrifice of interpretability as possible. This is where certain concepts are borrowed from the field of game theory. Let’s understand it in the following sections.

Imagine you have differently skilled workers collaborating for some collective reward. How should the reward be divided fairly among them? This is what game theory tries to answer. One possible solution is to get/calculate the marginal contribution of every worker.

Before diving into the mathematics of these marginal contributions, let’s consider three workers A, B & C working together on a project of developing a web application. Our task is to find the marginal contributions of every worker in order to fairly compensate everyone. The fair compensation can be derived by calculating the marginal contribution aka payoffs and it’s formula is:

Let’s break down the formula first. Here, the game (collaborative) is the development of the web app and N is the set of the workers i.e. {A,B,C}. The payoff function is defined by v(S) which gives us the payoff for any subset of workers. For now, we want to understand how much A should be paid. To do it, you decide to use the formula above.

Therefore, N = {A,B,C} & i=A. The above formula can be rearranged in this form (an explanation is shown below)

Now let’s turn our attention to the right-hand side (don’t forget the summation) of the formula v(S∪{i})-v(S). What this is telling us is:

- Consider all possible subsets possible with and without player A.
- Calculate their payoffs with A i.e. v(S∪{i}) and without A i.e. v(S). Their difference represents the marginal value of A.
- Add up all the marginal values and we get the marginal contribution of A.

Possible subsets without A = {Φ, {B}, {C}, {B,C} } . Reminder, the number of possible subsets of a set with n elements = 2n. Here, it might look like we are not focussing on the orders i.e. we are not concerned with the order in which B & C started their work. However, it should not matter as from A’s perspective it is irrelevant whether B started his work earlier or C. So, you can evaluate the payoff function once, with and without A, and track how much was contributed once A came into the picture.

The payoff function v(S) is nothing but the function learned by the model from data. The difference v(S∪{i})-v(S) can be represented as Δv. We will have four such values for each of the four subsets. Consider the subset {B}, we get ΔvB, A which tells us how much A is contributing to the work given that only B has worked on it so far.

This step tells us to add them after scaling. The scaling term is the term in orange color

What is the need for scaling you might wonder? This is done to average out the effect of the rest of the team members for every subset size while getting the marginal value of A within each subset. It calculates the number of possible combinations of every subset size considering the set excluding worker A. For the subset {B}, ΔvB, A will have the scale value of 1/2.

There is one final scaling aspect and that is |N|. It is the total number of workers i.e. 3. This is inserted to average out the effect of the group size (number of workers). In this way, you can finally get the marginal contribution or the Shapley value for worker A.

How is this all transferable to the ML domain? It turns out the workers are nothing but the features one feeds to the model. And the payoff function calculated for every subset is nothing but the function learned by the model from data during the training phase. To understand it better, let us take the Adult Census Income dataset from the UCI repository. All the information about attributes is explained on their website.

Predict income exceeding $50k/year based on attributes (binary classification scenario). If income exceeds, then it is labeled as 1 else 0.

We have fitted a gradient boosting model using the LightGBM library. Results (precision, recall & F1 score) are shown below for both the classes.

Now, using the SHAP library, we will understand how to generate Shapley values and explain the model predictions for our problem. It is important to remember that this library will give us*approximate values*and not*exact values*. Since we have used a tree-based model, we will be using the TreeSHAP implementation for our purpose.

We start by initializing an explainer object with TreeSHAP over the model and then generating Shapley values for our target set.

The*force_plot()*method helps us in visualizing the impact of different features on the prediction. We will be looking at one record (** X_test.iloc[0,:]**) and taking its corresponding Shapley value (

Here, the value in bold (**-2.28**) is the model’s prediction in the log-odds scale. It is important to keep in mind that LightGBM trees are built in log-odds scale and then just transformed to probabilities for*predict_proba()*. A negative base value simply means that we are likely to receive a 0 instead of a 1. Features important in making predictions are colored red and blue, with red ones pushing the model score higher and blue ones pushing it lower. The features located close to the boundary of red & blue are the ones with the higher impact, which is proportional to the size of the color bar.

Now, let’s check how Shapley values are distributed across different feature values. Consider the below image. It shows the summary plot where features (Relationship, Age, etc.) are represented on the Y-axes with their values being color coded (red=high and blue=low) and their respective Shapley values on the X-axes. A high Shapley value means it is contributing more towards our event of interest and vice-versa. If we consider the feature*Capital Gain*, we can infer that high values for it are generally associated with instances of positive classification. Also, you might spot a bias in the feature*Sex *where the value of 1 (Male) corresponds more towards positive events.

Like anything, Shapley's values aren’t perfect. Some of its noteworthy shortcomings are discussed below:

**Handling of missing feature(s) values**: There is one significant drawback of Shapley values. What does it even mean if any feature(s) go missing? Replacing the missing entries with 0 won’t make practical sense. To get around it, such feature values are replaced with the expected value over the whole data. But in reality, the expected value might not be realistic. At best it is an uninformed educated guess for the missing feature(s).**Handling of correlated features**: Another prominent drawback is its handling of correlated features. In many cases, one might observe certainly correlated feature attributions to be close to zero. This happens due to a phenomenon called ‘correlation bias’. For example, while training with a highly correlated set of features {a,b,c}, the ML model might wrongly assign high importance to an arbitrary selection, let’s say feature ‘a’. Indeed this can happen as Shapley values try to make sense of the ML model and not the data.

Overall, Shapley's values are immensely valuable when trying to explain ML models keeping in mind all its limitations. It’s not perfect but it works great when applied correctly.

Don’t worry we won’t spam you!