One of the problematic aspects of ML is that most of it is completely correlational, not causal. ML algorithms work primarily by applying statistical inference and reasoning using a series of variables and a target. In essence, it tries to identify whether variable AA occurs with target BB. If so, it may be that “AA predicts BB.” However, there are many reasons why AA is predictive of BB. One particularly dangerous outcome is when there’s a lurking variable present in the data. A lurking variable is an unreported variable (C)(C) that makes it seem like ABA \rightarrow B when in fact ACA \rightarrow C.

As an example of this phenomenon, let’s consider an education context with standardized testing. Our target (YY) is the test score and our variable (XvariableX_{variable}) is the age of the applicant. There’s a lurking variable (Xlurking)X_{lurking}), which is the annual income of the applicant. When we run an ML model or statistical analysis on these variables, we might get a high positive correlation between XvariableX_{variable} and YY. However, it would be incorrect to assume that the older a test-taker is, the higher their expected score is. There’s a lurking variable in their annual income—those with higher annual incomes might be able to afford more practice tests, tutors, textbooks, etc., and therefore have a higher score. The real correlation is between annual income and test scores.

Instead of focusing on these correlations (which can be highly misleading in the presence of lurking variables), causal AI is an advanced form of traditional ML that focuses on establishing cause-effect relationships between variables and targets.

Causal models

The actual modeling process is very similar to traditional ML. The variable identification step involves selecting the variables involved in the causal linking. Model specification is choosing a specific model. Parameter estimation is the actual calculation of causal linkage.

Structural equation models

Structural equation models (SEM) are two-part models meant to analyze and measure the relationship between unobserved (latent) and observed variables. The first submodel, the measurement model, specifies relationships between latent and observed variables. The second submodel, the structural model, specifies the relationship between the latent variables themselves. These models are estimated with a method called maximum likelihood estimation, which essentially asks the question “What parameters maximize the likelihood that the data was generated from a model with these parameters?”

Get hands-on with 1400+ tech skills courses.