Unoffical Google Data Science Blog: Causality in Machine Learning



Given recent advances and interest in machine learning, those of us with traditional statistical training have had occasion to ponder the similarities and differences between the fields. Many of the distinctions are due to culture and tooling, but there are also differences in thinking which run deeper. Take, for instance, how each field views the provenance of the training data when building predictive models.

For most of ML, the training data is a given, often presumed to be representative of the data against which the prediction model will be deployed, but not much else. With a few notable exceptions, ML abstracts away from the data generating mechanism, and hence sees the data as raw material from which predictions are to be extracted. Indeed, machine learning generally lacks the vocabulary to capture the distinction between observational data and randomized data that statistics finds crucial.

To contrast machine learning with statistics is not the object of this post (we can do such a post if there is sufficient interest). Rather, the focus of this post is on combining observational data with randomized data in model training, especially in a machine learning setting. The method we describe is applicable to prediction systems employed to make decisions when choosing between uncertain alternatives.

Predicting and intervening

Most of the prediction literature assumes that predictions are made by a passive observer who has no influence on the phenomenon. On the other hand, most prediction systems are used to make decisions about how to intervene in a phenomenon. Often, the assumption of non-influence is quite reasonable — say if we predict whether or not it will rain in order to determine if we should carry an umbrella. In this case, whether or not we decide to carry an umbrella clearly doesn’t affect the weather. But at other times, matters are less clear.

For instance, if the predictions are used to decide between uncertain alternative scenarios then we observe only the outcomes which were realized. In this framing, the decisions we make influence our future training data. Depending on how the model is structured, we typically use the information we gain from realized factual scenarios to assess probabilities associated with unrealized counterfactual scenarios. But this involves extrapolation and hence the counterfactual prediction might be less accurate.

Some branches of machine learning (e.g. multi-arm bandits and reinforcement learning) adopt this framing of choice between alternative scenarios in order to study optimal tradeoffs between exploration and exploitation. Our goal here is specifically to evaluate and improve counterfactual predictions.

Read the original post at The Unofficial Google Data Science Blog.