In developing his views on variational induction, Pietsch relies on a difference-making account of causation. More specifically, he defines causal relevance and causal irrelevance as three-place relations between two variables and a given context or background:
In a context
, in which a condition and a phenomenon occur, is causally relevant to , in short , iff the following counterfactual holds: if had not occurred, would also not have occurred. In a context
, in which a condition and a phenomenon occur, is causally irrelevant to , in short , iff the following counterfactual holds: if had not occurred, would still have occurred. (Pietsch 2016b, 5)
The truth value of the defining counterfactual statement is assessed
in terms of difference-making, taking into account instances that are or
were realized in our world (Pietsch 2016a, 11).1
Methodologically, this assessment rests on the framework of variational
induction, which stands in the tradition of Mill’s (1889, 253ff) methods and comprises two
key methods, namely, the method of difference and the strict method of
agreement. To determine if a circumstance
If two instances with the same background
are observed, one instance, in which circumstance is present and phenomenon is present, and another instance, in which circumstance is absent and phenomenon is absent, then is causally relevant to with respect to background , iff guarantees homogeneity. (Pietsch 2021, 33)2
Simply put, the homogeneity of the background ensures that all the
circumstances that are potentially causally relevant for phenomenon
In contrast, the strict method of agreement allows us to identify relations of causal irrelevance:
If two instances with the same background
are observed, one instance, in which circumstance is present and phenomenon is present, and another instance, in which circumstance is absent and phenomenon is still present, then is causally irrelevant to with respect to background , iff guarantees homogeneity. (Pietsch 2021, 33)
For variational induction to yield reliable results, several conditions must be fulfilled. Most importantly, (i) all variables that are potentially causally relevant for the phenomenon of interest must be known, and (ii) the dataset must contain a sufficiently large number of observations covering all relevant constellations of the variable values. Pietsch acknowledges that, due to the fact that he makes these presuppositions, his account is (what he calls) externally theory-laden. However, he contends that his account avoids internal theory-ladenness, i.e., assumptions about causal connections between the variables considered. In other words, he claims to avoid the kind of theory-ladenness that is distinctive of hypothesis-driven approaches.4 With the framework of variational induction, by contrast, the causal structure of the phenomenon of interest is supposed to be elaborated from the data alone.5
1 Causal Inference by Variational Induction and the Underdetermination Problem6
To point out the fundamental difficulty of inferring causal relationships by variational induction, I first focus on more complex causal structures, namely, (i) symmetric overdetermination and preemption and (ii) epiphenomena. Then, I evaluate whether the (iii) directionality of the relation of causal relevance can be established or not. For this assessment, I take for granted that the above-mentioned conditions for variational induction are met. In particular, I assume that every possible constellation of variable values that could have been generated by the causal structure in question is indeed observed and, moreover, that the set of observations involves neither measurement errors nor accidentally correlating variables.
(i) Let us consider the following dataset
consisting of observations 1–4, which all share the same background
Observation 1:
Observation 2:
Observation 3:
Observation 4:
Further, let us suppose that the causal relationships that do, in
fact, underlie these observations are those depicted by the model in
figure 1a. In this model, we have two potential
causes of
is an ‘alternative cause’ to with respect to background , iff there exists an such that is causally relevant to with respect to a background , but causally irrelevant to with respect to a background (i.e., is always present in ). (Pietsch 2021, 34)
Since
Then, a common feature of big data is its high dimensionality,
meaning that each observation includes numerous different variables. So,
it could be put forward that this problem only arises because the
dataset is not sufficiently complex and not enough variables were
regarded. For example, introducing the variable
Observation 5:
Observation 6:
Observation 7:
Observation 8:
Observation 5 is indeed not compatible with the
model of symmetric overdetermination in figure 1d because, according to that model, the variable
(ii) In connection with epiphenomena,
similar problems arise. Epiphenomena, such as
Observation 9:
Observation 10:
Then, let us suppose that these observations, which are compatible
with all the models depicted in figure 2a–f, were generated by the causal model in 2c. Since
Pietsch acknowledges the issue that algorithms employing variational induction may mistakenly single out epiphenomena as causally relevant for a phenomenon of interest, although they lack the difference-making character of a cause (Pietsch 2021, 55; 2016a, 153–154).11 However, he attributes it to the fact that either the dataset is incomplete or the algorithm does not fully implement variational induction. By contrast, this example demonstrates that this erroneous conclusion is not due to missing observations because it is drawn despite considering all observations compatible with a given causal model. On the other hand, it cannot be ascribed to the algorithmic implementation either, as the manual, non-algorithmic application of variational induction does not satisfyingly deal with epiphenomena either.12
(iii) Finally, the fact that observations 9–10 could have been generated
by a causal structure with
In light of these difficulties, Pietsch’s assertion that it is
possible “to determine the true causal relationships by means of
variational induction” seems to be unwarranted (2021, 61). In the causal discovery
literature, it is well established that, given the causal Markov
condition and the causal faithfulness condition, certain features of the
underlying causal structure can be deduced from the probability
distribution in the data. But, aside from special cases, it is not
possible to uniquely determine the true causal structure.13
In my view, variational induction similarly faces the problem of
underdetermination as it rests ultimately on the analysis of patterns of
(conditional) dependencies and independencies in the data. That is to
say, variational induction aims at identifying the constellation of
variables
2 Objectives of Big Data Analysis and Causal Knowledge16
Pietsch distinguishes two central functions of big data approaches,
namely, prediction and intervention, and claims that
the exertion of both requires some access to causal knowledge.17 Arguably, the view that causal
knowledge is indispensable for effectively manipulating a phenomenon of
interest is hardly contested. However, causal knowledge is usually not
considered a prerequisite for predictive success.18
In that regard, it is useful to touch upon Pietsch’s notion of causal
knowledge, which bears on his distinction of direct and indirect causal
connections: If a certain variable is causally relevant for another
variable in a given context, as it is the case for
(i) First of all, as a cause (usually) correlates with the phenomenon of interest, so does an epiphenomenon of this cause. The first correlation is indicative of a (direct) causal connection, whereas the second is indicative of a common cause. As discussed for epiphenomena, variational induction does not allow us to distinguish between a correlation ascribable to a direct causal connection and a correlation ascribable to a common cause. It follows that not only variational induction but also the analysis of (conditional) correlations yields causal knowledge in this wide sense. Accordingly, it does not seem consistent to specify correlation as a contrasting notion for causation. Furthermore, since the procedure of variational induction makes use of the pattern of dependencies in the data, it does not even allow for a distinction between correlations that are indicative of some sort of causal connection and purely accidental correlations. Therefore, it remains unclear in what sense big data algorithms are capable of delimiting causation from correlation, as Pietsch maintains.20
(ii) This broad notion of causal knowledge stands in tension with Pietsch’s claim that the primary function of causal knowledge is to guide us on how to effectively intervene in the world (2021, 54). If the knowledge of an indirect causal connection between two variables is regarded as causal knowledge as well, having access to causal knowledge in this wide sense does not help to discriminate between effective and ineffective strategies to manipulate a phenomenon of interest.
(iii) And, finally, it risks obscuring the distinction between questions of prediction and questions of intervention, which are addressed in scientific practice: Causal knowledge in the strict sense, that is to say, knowledge about relations of causal relevance and irrelevance in a given set of variables, is no precondition for predictions. Thus, the fact that an algorithm implementing variational induction yields accurate predictions cannot be cited in support of the view that variational induction is capable of establishing causal relationships. Conversely, interventions indeed depend on causal knowledge in the strict sense. If such an algorithm truly did enable us to efficiently intervene in the world, this could speak in favor of Pietsch’s view that variational induction is capable of inferring direct causal connections. As an example, he refers to “algorithms [that] are designed to determine the best medicine to cure a certain cancer” (Pietsch 2021, 54).21 In fact, there are a number of studies that relied on machine learning in order to predict the response to a given drug. In a recently published work, the tumor tissue of patients with breast cancer was analyzed with different methods at diagnosis (Sammut et al. 2022). Patients were subsequently treated with chemotherapy, and treatment response was evaluated. Using a machine learning algorithm, the authors built a model to predict the response to chemotherapy, which was based on the molecular profile of the tumor as well as clinicopathological features, and model performance was successfully validated on a different dataset. Amongst other things, they drew the conclusion that patients predicted to show a poor response to standard-of-care chemotherapy should be enrolled in clinical trials investigating novel therapies. Therefore, the results of this big data approach may allow for better stratification of patients that will or will not benefit from conventional chemotherapy and are inasmuch action-guiding. However, Pietsch maintains that these algorithms are, moreover, designed to determine the best treatment for a given cancer, in this way allowing us to effectively intervene upon the phenomenon of interest, namely, tumor growth. For the sake of argument, let us suppose the algorithm revealed that three signaling pathways are hyperactive in tumors poorly responding to chemotherapy compared to tumors displaying a good treatment response. But, as outlined above, it is impossible to determine if (or which of) these three pathways are indeed driving tumor growth and which are rather an epiphenomenon of the cause of excessive tumor growth or even a consequence thereof. Accordingly, the question whether one of these hyperactive signaling pathways truly constitutes a promising therapeutic target or not cannot be answered based on the observational data alone but requires experimentation. Besides, in order to successfully intervene in the world, it is essential not only to identify the causes of the phenomenon but also to understand how these causes can be manipulated, specifically which drug effectively targets a given pathway (compare figure 3). This kind of causal knowledge may be generated in randomized controlled trials or in vitro studies but, again, cannot be derived from observational data. To trace back the practical benefit of big data approaches to the generation of causal knowledge by the algorithms used does not accurately reflect the scientific practice, which builds upon different sources of knowledge to determine effective interventions.
3 Conclusions
In my view, variational induction fails to elucidate causal structures involving preemption, symmetric overdetermination, or epiphenomena, establishing causal relationships between variables that actually are conjoined in a relation of causal irrelevance and vice versa. Furthermore, the direction of the relation of causal relevance cannot be specified by variational induction, which poses a problem for even the simplest causal models possible. These shortcomings are neither specific to the method of variational induction nor ascribable to an imperfect dataset with missing observations, an insufficient number of observed variables, or any measurement errors. Rather, the attempt to infer causal relationships from observational data (including big data) itself faces important limitations: Since for a given set of observations multiple underlying causal structures are usually conceivable, generally it is impossible to uniquely determine the true causal model from this set of observations alone without endorsing any background assumptions about or having any prior knowledge of the causal relationships between the variables involved.
Pietsch’s notion of causal knowledge explains, at least partially, why he reaches another assessment of variational induction as a method for generating causal knowledge. Supposing a broader notion of causal knowledge, he seems to have in mind a less strict success criterion: It suffices for variational induction to approximate causal relationships, namely, to determine if there is any causal connection, direct or indirect, between two variables. This could create the appearance that the conflicting assessment of variational induction as a means to infer causal relationships is merely due to two divergent, equally plausible notions of causal knowledge. However, in my opinion, Pietsch’s broad notion of causal knowledge is problematic because it blurs the distinction between causation and correlation and between the prerequisites for prediction and for intervention.
If the practical benefit of big data approaches cannot be attributed to the elucidation of causal relationships, an alternative explanation is needed. The identification of predictive markers may indeed improve patient care by sparing patients who are unlikely to respond to the adverse reactions of an ineffective treatment. Randomized controlled trials can yield negative results only because patients are not selected appropriately. This could be obviated by a more adequate patient stratification based on reliable predictor variables. The analysis of the molecular profile of a tumor can generate promising hypotheses about chemoresistance in a relatively unbiased way, which may prove to be true in experimental assays. Undoubtedly, the results of machine learning algorithms contain very valuable information, which, in conjunction with knowledge derived from other sources, provide reasons to act in a certain way. In this sense, Pietsch is right in stating that precisely data-rich sciences such as medicine are fundamentally concerned with difference-making relationships and that the correlations unveiled by machine learning algorithms certainly do not replace causation. But, although such results of big data analysis can be action-guiding and aid in singling out potentially effective interventions, this should not be taken as a confirmation of the claim that machine learning algorithms indeed elucidate the causal structure underlying the observational dataset.
Serena Galli
https://orcid.org/0000-0002-9630-8645
University of Zurich
serenacarolina.galli@uzh.ch
Acknowledgements
I thank Peter Schulte and two anonymous reviewers for their helpful comments on an earlier version of this paper.
This conception of counterfactual statements differs fundamentally from traditional counterfactual approaches to causation, such as those advanced by Lewis, who analyzes the truth conditions of counterfactual statements by referring to possible worlds (1973, 560–561).↩︎
In principle, causal relationships between continuous variables can be established likewise by extending the framework of variational induction by the method of concomitant variation (Pietsch 2021, 34). In the following, I will be concerned with binary variables exclusively.↩︎
I examine the homogeneity condition more closely in the context of epiphenomena. For a detailed discussion, cf. Pietsch (2021, 33–34; 2016b, 11–13).↩︎
A prominent advocate of such an approach is Pearl, who maintains that “causal questions can never be answered from data alone” and that answering those questions “require[s] us to formulate a model of the process that generates the data, or at least some aspects of that process,” also in the context of big data (Pearl and Mackenzie 2018, 351).↩︎
If the requirements for variational induction are met, “then there are enough data to avoid spurious correlations and to map the causal structure of the phenomenon without further internal theoretical assumptions about the phenomenon” (Pietsch 2015, 910–911). See also Pietsch (2021, 65–66).↩︎
Woodward uses the term underdetermination problem to refer to the circumstance that, given a set of variables, different causal structures encompassing these same variables can generate an identical pattern of correlations and conditional correlations (2003, 106–107).↩︎
According to Lewis’ terminology, the causal model displayed in figure 1c is an example of early preemption (1986b, 200).↩︎
, which is potentially causally relevant for as well, cannot be held fixed, as it strictly covaries with . Therefore, does not guarantee homogeneity.↩︎ guarantees homogeneity with respect to the relationship between and if “only circumstances that are causally irrelevant to can change” or that “lie on a causal chain through to or that are effects of circumstances that lie on this causal chain” (Pietsch 2021, 33–34). Since is an effect of , does guarantee homogeneity, although cannot be held fixed. However, presuming that is connected to in this way contradicts Pietsch’s claim that his account avoids assumptions about causal connections between the variables considered.↩︎Such Boolean expressions are, as Pietsch maintains, a possible result of variational induction (2021, 50).↩︎
Strictly speaking, he refers to proxies, which I take to be the equivalent of epiphenomena.↩︎
Needless to say, some of such wrong conclusions can be traced back to issues regarding data acquisition. For instance, a sampling error can result in an accidental correlation between variables that is not present in the population from which the sample was drawn. Let us suppose that observations 9–10 were generated by the causal structure displayed in figure 2g. In this case, the observed correlation between
and , on one side, and , on the other side, must have occurred by chance. Yet, by employing variational induction, the conjunction or disjunction of and is mistakenly identified as causally relevant for with respect to , and this misattribution can be recognized as such and corrected only when analyzing another, possibly larger dataset devoid of this accidental correlation.↩︎These two conditions are so-called bridge principles, which are required to connect the observations of a given set of variables to the underlying causal model that generated these observations. More specifically, the causal Markov condition allows the inference from a probabilistic dependence between two variables to a causal connection, whereas the causal faithfulness condition allows the inference from a probabilistic independence to causal separation. Cf. Eberhardt (2017, 82–85). For a discussion of underdetermination in causal inference in relation to different success criteria and background assumptions, see Zhang (2009).↩︎
Since the configurations
and do not occur in a purely observational setting, these two possibilities cannot be distinguished.↩︎As an anonymous reviewer pointed out, a promising way of dealing with this problem of underdetermination is the appeal to theoretical virtues such as parsimony. For example, Forster et al. introduce the principle of frugality that favors those causal structures with the fewest causal connections (2018). I fully agree that, technically, the procedure of variational induction could be combined with an algorithm that ranks the possible causal structures in terms of simplicity. Yet, this constraint regarding the total number of causal connections involves an assumption about the causal connections between the variables since causal models with more numerous connections, such as 2b or 2e, are dismissed in favor of models with fewer connections, although perfectly compatible with the data. Therefore, this strategy is internally theory-laden and not reconcilable with the concept of variational induction as a purely data-driven approach. An alternative strategy to determine the true causal structure is experimentation. For a detailed discussion of experimentation as a means for resolving underdetermination, cf. Eberhardt (2013).↩︎
In this section, which is concerned with variational induction as a means of causal inference from big data specifically, I acknowledge Pietsch’s claim that the most successful algorithms are based on variational induction without further examination.↩︎
Rather than distinguishing between different functions, I would propose to differentiate between two questions that are to be answered by big data analysis. To specify intervention as a function of big data approaches presupposes what is under consideration. Besides, it remains unclear how to discern which function, prediction or intervention, is exerted in a given case.↩︎
For example, Woodward maintains that accurate predictions can be made based on correlations solely; furthermore, he points out that “inferences from effect to cause are often more reliable than inferences from cause to effect” (2003, 31–32).↩︎
Pietsch’s distinction of direct and indirect causal connections differs from the conventional view, according to which the difference between direct and indirect causal connection results from the absence or presence of a mediating variable. See, for example, Woodward (2003, 55).↩︎
“By relying on variational induction, big data approaches are to some extent able to distinguish causation from correlation” (Pietsch 2021, 57).↩︎
He does not cite any specific publication to underpin his assertion.↩︎