

Causal
Modeling

 Causal
modeling is a data modeling technique that is known
by several names, including structural modeling, path
modeling, and analysis of covariance structures. This
sophisticated extension of linear regression analysis
offers two primary advantages. First, it can solve
multiequation models that simulate complex systems
or process. Second, it gets around some of the assumptions
and limitations of standard regression modeling.

 As
an example, suppose you wanted to make a better soft
drink. You might start by measuring the impact of
product performance attributes (i.e., sweetness, amount
of carbonation, number of calories, etc) on the overall
rating of leading soft drinks. One typical way to
do this is to regress the overall rating on the attribute
ratings. This is very easy to do in a variety of statistical
programs or even spreadsheets, but the results they
produce are based on several assumptions. These are
usually referenced as BLUE (Best Linear Unbiased Estimator)
or "all things being equal" if they are
mentioned at all. Regression actually makes quite
a few assumptions about the data and the model being
solved, including that the model is ‘correctly
specified’ and that the independent variables
are not correlated.

 Virtually
every set of attributes ever put on a questionnaire
has had some degree of correlation between the individual
attributes. Usually there are several that are at
least moderately correlated. There are statistical
procedures (i.e., factor analysis) for dealing with
correlated independent variables, though often times
the correlated attributes are used as inputs to the
regression model. Suppose the soft drink model creating
using standard showed that both sweetness and the
number of calories were related to the overall rating
of a soft drink. Then the regression coefficients
would indicate the impact, ‘all things being
equal’, that changing the perceived sweetness
level would have on the overall acceptance. But since
the sweetness level and the number of calories are
correlated, all things are definitely not equal, and
there is a bias in the model.

 The
potential ‘model specification error’ is
harder to deal with. Regression assumes that the model
(i.e., the equation it was asked to solve) is an accurate
representation of the problem or system being studied
– with nothing added and nothing left out. Getting
back to the soft drinks, if the brands are identified
to the respondents, then the image of the brands will
have a significant impact on their ratings. (Anyone
who doubts this has never seen ratings of the same
products rated blind, identified, and misidentified.)

 Using
typical regression modeling you could add some image
attributes to the model, but the model would probably
still be misspecified because it is nearly impossible
to capture every nuance of a product’s image
and performance. Some parts of these are almost always
‘left out’ or otherwise impossible to quantify.
A more accurate way to specify the model would be
to conclude that there are a series of performance
attributes that drive overall ‘Product Performance’
and a series if image attributes that drive overall
‘Product Image,’ and these in turn drive
the overall product rating.

 Measuring
the overall performance and image of a product is
similar to measuring a person’s IQ. They can’t
be measured directly, but can be derived from a series
of indicators. Causal modeling will derive the measures
(called ‘unobserved exogenous variables’),
and parcel out the impact of each on the overall rating.
And since image has an impact on taste, the direct
effect of image, and the indirect effect of image
(through it’s impact on product performance)
on the overall rating can be computed. Further, if
taste in turn has an impact on image, that effect
can be quantified as well. Graphically, this would
appear as follows:


 The
arrows or paths in the diagram represent the flow
of 'causality' (i.e., effect) in the model. These
indicate that there is a statistically significant
relationship between the variables. Sometimes the
path coefficients (i.e., regression coefficients)
are included on the arrows to indicate the impact
one variable has on the next. They have been omitted
in this example.



