How Can Causal Machine Learning Improve Business Decisions?
In this post, Margarita and Martin Huber explain why understanding the causal impact of particular business activities like marketing campaigns or pricing policies is necessary when making decisions about the appropriate design of those actions.
Improving decision making
Buzzwords like digitization, big data and artificial intelligence are on everyone’s lips, and many companies and organizations have recognized the value of machine or deep learning for forecasting uncertain outcomes such as sales, production, or customer churn. However, when it comes to decision making about appropriately designing specific business activities like marketing campaigns or pricing policies, knowledge about the causal effects of those activities is required - something conventional machine-learning-based forecasts cannot deliver. This shortcoming has recently been overcome by the rise of causal machine learning, a modified version of machine learning that is suitable for assessing the causal effects of business activities. As the following discussion shows, causal machine learning has a large potential for increasing competiveness by improving decision making in a range of business domains – under the precondition that the data and the chosen method are appropriate for the decision problem at hand.
Predictive machine learning
To date, companies and organizations use machine learning algorithms predominantly for forecasting, i.e. to accurately predict some outcome of interest. Put simply, such predictive machine learning is based on systematically screening data to learn whether and how various characteristics, e.g. the characteristics of potential customers, the general economic situation or the day of the week, are related to an outcome like sales. In contrast to classical statistics, such algorithms learn autonomously which characteristics are relevant and which are not, in order to predict sales optimally, i.e. with the smallest possible error. To this end, several alternative models for forecasting sales are developed or “trained” in one part of the available data. In the other part of the data, these models are “validated” by checking how close the respective sales forecasts are to the sales actually observed in the data. Finally, the optimal model with the lowest prediction error is used for future sales forecasts. This so-called artificial “intelligence” is thus based on the rather dull trial and error testing of alternative methods. Yet, this approach has proven to be very effective, entailing in many domains better predictions than humans could ever make - provided that the data basis for the learning process is sufficiently informative. In companies, there are indeed a large number of potential applications of predictive machine learning, as for instance documented in Nosratabadi et al (2020). For online sales portals as well as for brick-and-mortar retail, digitized (online or scanner) data allow a prediction of the buying behavior of customers depending on the price of a product and its competing products as well as on customer characteristics, if available. In production, technical indicators, for example regarding the utilization of machines, may serve as base for predicting production downtimes due to technical problems. In finance, machine learning and so-called “deep learning” (based on complex network models, so-called neural networks) are increasingly used for portfolio optimization and forecasting share prices.
However, by far not all questions in companies can be solved based on predictions. In many cases, the effect of specific business activities is of interest, for example the causal effect of an advertising campaign, a discount or a customer card on sales. Unfortunately, predictive machine learning is not able to provide an answer to such cause-and-effect questions, as is typically required for decision support about implementing or not implementing a specific activity. While predictive machine learning may for instance predict customer churn, i.e. the likelihood that a customer is lost due to switching the provider (e.g. in telecommunication), it does not tell us for which customers specific activities (like discounts) are most effective for avoiding churn, see e.g. Ascarza (2018).
Comparing apples to apples
To further illustrate the difference between prediction and causality, suppose that by analyzing sales data, a retailer finds that loyalty cardholders generate more sales than customers without loyalty cards. Thus, the possession of a loyalty card permits making customer-specific sales forecasts. However, this does not automatically imply the existence of a causal effect, which would only be the case if customers made more purchases precisely because they had received a loyalty card. Theoretically, it could also be the other way round: Only customers who purchased a lot in earlier periods receive a loyalty card, which per se has no effect on sales at all. Therefore, in order to measure the causal effect of a particular action such as a loyalty card, it is necessary to compare “apples to apples”: One should only compare the turnover of customers with and without loyalty cards who are similar in their previous purchasing behavior and other characteristics that could influence turnover (such as age or income). This is the only way to isolate the causal effect of the loyalty card on sales from the influence of other characteristics (previous purchase behavior, age, income…).
Causal machine learning, a further development of predictive machine learning for performing causal analyses, starts precisely with this idea. Put simply, such algorithms (see e.g. Chernozhukov et al, 2018) find those characteristics that are relevant for both loyalty card receipt and turnover on a data-driven base in order to make loyalty cardholders and non-holders comparable in terms of these characteristics and ultimately determine the effect of the loyalty card on turnover. However, it comes even better: Causal machine learning may also identify customer groups for which the sales effects of the loyalty card are particularly large or small as a function of their observed characteristics. Assume, for example, that the loyalty card increases sales particularly strongly among those customers who had already bought relatively much or relatively little in earlier periods. Causal machine learning can detect such heterogeneities in causal effects (see e.g. Athey et al, 2019) as base for an optimal and purely data-based customer segmentation into groups to whom the loyalty card should or should not be offered. Such a so-called optimal policy learning approach as outlined in Athey and Wager (2019) permits maximizing the effectiveness of loyalty cards, while also taking cost-benefit considerations into account. Causal machine learning, effect heterogeneity analysis, and optimal policy learning can be applied to the evaluation of any business activities (e.g. pricing policy, marketing campaigns, further training, quality assurance measures) under the condition that the available data plausibly satisfy the stringent conditions required for a valid causal analysis. These conditions are not always easy to meet and it therefore needs to be emphasized that without sufficiently informative data, even the best method cannot provide adequate decision support.
Large tech companies such as Amazon, Google or Microsoft and many online portals have therefore been relying on the power of algorithms for years and increasingly so for causally assessing their activities (like advertising campaigns). Even in small and medium-sized business, there is a lot of potential for process and decision optimization based on improved data – and in particular causal – analysis. Nothing should therefore stand in the way of a further democratization of artificial intelligence in the corporate world (and society in general), also for answering causal questions.