Causal Modeling

  • Causal modeling is a data modeling technique that is known by several names, including structural modeling, path modeling, and analysis of covariance structures. This sophisticated extension of linear regression analysis offers two primary advantages. First, it can solve multi-equation models that simulate complex systems or process. Second, it gets around some of the assumptions and limitations of standard regression modeling.
  • As an example, suppose you wanted to make a better soft drink. You might start by measuring the impact of product performance attributes (i.e., sweetness, amount of carbonation, number of calories, etc) on the overall rating of leading soft drinks. One typical way to do this is to regress the overall rating on the attribute ratings. This is very easy to do in a variety of statistical programs or even spreadsheets, but the results they produce are based on several assumptions. These are usually referenced as BLUE (Best Linear Unbiased Estimator) or "all things being equal" if they are mentioned at all. Regression actually makes quite a few assumptions about the data and the model being solved, including that the model is ‘correctly specified’ and that the independent variables are not correlated.
  • Virtually every set of attributes ever put on a questionnaire has had some degree of correlation between the individual attributes. Usually there are several that are at least moderately correlated. There are statistical procedures (i.e., factor analysis) for dealing with correlated independent variables, though often times the correlated attributes are used as inputs to the regression model. Suppose the soft drink model creating using standard showed that both sweetness and the number of calories were related to the overall rating of a soft drink. Then the regression coefficients would indicate the impact, ‘all things being equal’, that changing the perceived sweetness level would have on the overall acceptance. But since the sweetness level and the number of calories are correlated, all things are definitely not equal, and there is a bias in the model.
  • The potential ‘model specification error’ is harder to deal with. Regression assumes that the model (i.e., the equation it was asked to solve) is an accurate representation of the problem or system being studied – with nothing added and nothing left out. Getting back to the soft drinks, if the brands are identified to the respondents, then the image of the brands will have a significant impact on their ratings. (Anyone who doubts this has never seen ratings of the same products rated blind, identified, and misidentified.)
  • Using typical regression modeling you could add some image attributes to the model, but the model would probably still be misspecified because it is nearly impossible to capture every nuance of a product’s image and performance. Some parts of these are almost always ‘left out’ or otherwise impossible to quantify. A more accurate way to specify the model would be to conclude that there are a series of performance attributes that drive overall ‘Product Performance’ and a series if image attributes that drive overall ‘Product Image,’ and these in turn drive the overall product rating.
  • Measuring the overall performance and image of a product is similar to measuring a person’s IQ. They can’t be measured directly, but can be derived from a series of indicators. Causal modeling will derive the measures (called ‘unobserved exogenous variables’), and parcel out the impact of each on the overall rating. And since image has an impact on taste, the direct effect of image, and the indirect effect of image (through it’s impact on product performance) on the overall rating can be computed. Further, if taste in turn has an impact on image, that effect can be quantified as well. Graphically, this would appear as follows:
  • The arrows or paths in the diagram represent the flow of 'causality' (i.e., effect) in the model. These indicate that there is a statistically significant relationship between the variables. Sometimes the path coefficients (i.e., regression coefficients) are included on the arrows to indicate the impact one variable has on the next. They have been omitted in this example.


Cluster Analysis

  • Cluster analysis is a statistical procedure that attempts to classify a group into like or ‘homogeneous’ sub-groups. It is usually used as a segmentation tool where people are grouped together into segments based on their attitudes, behaviors, demographics, or some combination of these. However cluster analysis can also be used to cluster variables (instead of cases) into like groups as well. The task is very analogous to a coder developing a code list in that individual responses are read and classified into groups that capture the common meaning.
  • Cluster analysis is often considered to be more of an art than a science. Of all the common statistical procedures, cluster analysis gives the least statistical guidance as to whether the solution it generates is meaningful or not. The cluster analysis algorithm does not tell the researcher the ‘correct’ number of clusters in a data set. Instead, the researcher has to produce and examine a number of different cluster solutions and decide which solution is the best. So the analyst may generate cluster solutions for two clusters, three, four and so on up to 10 or more clusters. Between different clustering algorithms, number of clusters produced, and options for how the data is processed, a considerable number of cluster solutions can be generated.
  • To evaluate the solutions, the researcher generally compares the individual groups (i.e., start by comparing the groups in the two cluster solution, then compare the groups in the three cluster solution and so on) for each solution on a series of demographic, attitudinal or other measures. Other statistical procedures can be used in the evaluation process, but often times the analyst tried to interpret each solution by how it fits with the other variables, and chooses the solution that seems to fit the best.
  • Kohonen Self Organizing Maps (SOMs) are a form of Neural Network (an Artificial Intelligence technology) that also ‘clusters’ cases into like groups using a different mathematical approach. To the researcher, SOMs do just what a K-Means cluster program does, but in a different way. However, if a SOM and K-Means cluster program are told programmed to produce the same number of cluster groups, the cases will be assigned somewhat differently. Often times the SOM solution will be superior.


Conjoint Analysis

  • Conjoint analysis is a useful tool in predicting choice behavior. It is a versatile marketing research technique that can provide valuable information for new product development and forecasting, market segmentation and pricing decisions. This technique can be used to address numerous questions including:
  • Which new products will be successful?
  • Which features or attributes of a product or service drive the purchase decision?
  • Do specific market segments exist for a product?
  • What advertising appeals will be most successful with these segments?
  • Will changes in product design increase consumer preference and sales?
  • What is the optimal price to charge consumers for a product or service?
  • Can price be increased without a significant loss in sales?
  • Conjoint analysis provides insight and understanding as to how individuals value features (or "attributes") of products or services by determining their tradeoffs between different "levels" of those features. Conjoint analysis examines these tradeoffs to determine the combination of attributes that will be most satisfying to the consumer. In other words, by using conjoint analysis a company can determine the optimal features for their product or service.
  • In addition to providing information on the importance of product features, conjoint analysis provides the opportunity to conduct computer choice simulations. Since conjoint quantifies the value of each product feature it is possible to perform various "what if" scenarios and estimate preference levels of hypothetical products. Simulations such as these are very useful in determining potential market share of products or services before they are introduced to the market!
  • In sum, the value of conjoint analysis is that it predicts what products or services people will chose and assesses the weight people give to various factors that underlie their decisions. As such, it is one of the more powerful, versatile and strategically important research techniques available.

Correlation and Regression

  • Correlation is the statistical measure that quantifies the linear relationship between two variables. If you look at a scatter plot of two variables, their correlation is the slope of the ‘best fitting’ straight line that can be drawn through the points. If the line rises (traveling left to right) the slope is positive, which means that as one variable increases, the other also increases. If the line falls, the opposite is true: the slope is negative and as one variable increases, the other decreases. Further, the size of the correlation measures the size of the resulting rise or fall. So if a correlation was .5, that would mean that for each unit one variable increases, the other variable will increase by half a unit. A correlation of -.75 would mean that for each unit one variable increases, the other decreases by ¾ of a unit.
  • Regression is an extension of correlation analysis that will predict the value of one variable (the dependent variable) based on the values of one or more predictor or ‘independent’ variables. In a bi-variate regression (i.e., the dependent variable and one independent variable), the main difference between regression and correlation is that regression adds an ‘intercept’ term. Thinking of the line, the intercept is the point where the line crosses the Y-axis. A bi-variate regression produces a the general formula for a line:
  • y = a + bx where: y is the predicted value of the dependent variable
  • a is the intercept
  • b is the slope of the line
  • x is the value of the independent variable to be predicted
  • A multiple regression analysis adds more independent variables, and extends the equation above to include additional independent variables, each having their own slope.
  • Regression is typically used whenever a prediction is required. Typical uses of regression in market research include predicting market share, coupon redemption rates, product acceptance scores, customer satisfaction or awareness and so on.

Data Mining

  • Data Mining is the name given to a class of analytical techniques used to discover patterns, trends, and relationships in customer databases and other business information. This process can be an invaluable aid by providing a better understanding of customers and markets, and can ultimately lead to increase revenues, and customer satisfaction.
  • The techniques that constitute Data Mining, and their application are quite broad. While the individual applications of Data Mining technology tend to differ from one problem to the next, there are several steps common to most analyses. The process starts with compiling available information that usually exists in one or more corporate databases. This data is frequently augmented, or ‘overlaid’ with additional information such as attitudinal, demographic, or lifestyle data. Once the data has been cleaned and combined, it is ready for analysis. The analysis is generally done in two steps; knowledge discovery and verification. Ultimately, the results of the analysis are used to aid in the development of marketing programs, pricing plans, new products, and so on.


Compiling Information

  • Most organizations have a wealth of information about their customers, products or services. But this information tends to be distributed across numerous departments, databases, and computer systems. While any one database can be the starting point for Data Mining, there is often considerable synergy in combining information from multiple sources. For example, by mining sales data combined with attitudinal measures and overall category usage, detailed buyer profiles, and models of the purchase dynamics can be generated.

Overlay Files

  • It is often desirable to supplement the in-house information with data from an outside source. This information is usually either existing information sold by a service bureau, or new information collected specifically for the project. The existing information, often called ‘secondary’ data, is generally demographic (income, assets), lifestyle information (cluster codes), or market statistics (size, sales). These overlay files are often used to provide information to use in predictive models or customer segmentation.
  • It can also be very useful to collect new information about customers or markets to add data that would otherwise be unavailable for the analysis. This is frequently done to quantify the link between attitudes and behaviors, to find leverage points for marketing programs, or to better understand market dynamics. Telephone surveys are frequently used to collect attitudinal data, product category usage, loyalty measures, advertising recall, and a host of other measures.

Cleaning/Combining the Data

  • Before the data can be analyzed it needs to be cleaned and/or combined. Cleaning the data generally involves the removal of impossible or out of range values, and implementing a strategy to handle missing information. Combining the data often requires additional steps and foresight to convert the data into a common analytical framework. For example, transaction level data might need to be summarized into time periods before being combined with household demographics.


Data Analysis

  • Once the data are ready, the analysis phase generally consists of two different steps with a reporting period in between. The first step in the analysis is called knowledge discovery. In this phase, ‘smart’ algorithms search through the data looking for patterns or relationships. These algorithms are typically Chaid (Chi Square Automatic Interaction Detection) or Cart (Classification And Regression Trees) procedures, though Neural Nets, Genetic Algorithms, and other hybrid systems are also used. They generally take one user-specified variable called the ‘dependent variable’, and try to relate every variable in the file to that variable. Some algorithms can look for linear, and non-linear relationships, as well as transform the variables in a variety of ways to maximize their relationships. Relationships are generally reported as decision trees, which are an easily understood way of presenting information.
  • Data Mining analyses typically relate hundreds and even thousands of variables to several dependent variables of key interest. Since many algorithms are free to manipulate the variables to maximize their relationships, it is not uncommon for an analysis to yield hundreds of ‘significant’ relationships. These relationships are simply measures of statistical association, and are often spurious or otherwise of little importance, and they are therefore considered to be hypotheses about relationships in the data, which need to be studied further.
  • This information is generally discussed with the ‘hands on’ users or other researchers, and the number of hypotheses is filtered down to focus on the most promising avenues for further analysis. This second step in the analysis is generally called validation, and usually relies on common statistical techniques like regression, discriminant analysis, and cluster analysis. This step usually includes some form of quantification of trends or market opportunities, prediction, segmentation, or response modeling. The ultimate goal of the analysis is generally to either increase revenues through a better understanding of the customer, or else to develop better predictive models to use as forecasting tools.


Discriminant Analysis

  • Discriminant Analysis is used to relate a categorical dependent variable to a series of independent variables. It is similar to Regression Analysis, except instead of predicting the value of the dependent variable, Discriminant predicts the category of the dependent variable. It does this by constructing linear combinations of predictor variables that best distinguish between the groups of the independent variable.

  • This technique has a wide range of applications. For example, identifying the factors that distinguish satisfied customers from dissatisfied, concept acceptors from rejecters, your customers from your competitor’s customers.

Factor Analysis

  • Factor analysis is a data reduction technique that tries to reduce a list of attributes or other measures to their essence; that is, a smaller set of ‘factors’ that capture the patterns seen in the data. Marketers and researchers who study a product, service, or industry professionally sometimes perceive many more distinctions within their category than do their consumers. This can lead to questionnaires containing attribute lists that consumers see as somewhat or largely synonymous. Factor analysis tells you how many different core factors the consumers perceived out of the list of attributes they rated.
  • The main benefits of factor analysis are that the analyst can focus their attention on the unique core elements instead of the redundant attributes, and as a data ‘pre-processor‘ for regression models.


TURF Analysis
  • To contrast TURF Analysis with typical methods, consider an example with three possible flavors (A, B and C) of a product, where the "best" two flavors will be brought to market. A typical analysis might look at the % Top Two Box purchase intent for each flavor and conclude that the best two flavors to market are the ones with the two highest scores. If flavors A, B, and C receive % Top Two Box scores if 80%, 75%, and 40% respectively, you could conclude that A and B are the best two flavors. But if the vast majority of people who would buy flavor B would also buy A, the incremental gain by offering B is small. If the overlap between A and C is fairly small, even though C appeals to the fewest people in total, the combination of marketing flavors A and C will appeal to more people than the combination of A and B.
  • TURF is an acronym that stands for Total Unduplicated Reach and Frequency. This research technique originated in advertising and media research as a tool to maximize the number of people (i.e., Reach) who would be exposed to an advertisement per unit of cost. By analyzing the overlap between mailing or subscription lists, those lists with the lowest percent of overlap are identified. By comparing the number of non-duplicated people (i.e., Total Unduplicated) to the list costs, the most economical method of reaching the largest number of people can be calculated.
  • TURF Analysis is very useful for market research as well, especially when used to optimize potential product or promotional offerings. Instead of examining duplication across lists or other media sources, purchase intent scores are analyzed for a series of promotional offers or product elements (flavors, sizes, etc). By optimizing the unduplicated purchase intent of potential products or line extensions, the largest number of consumers can be appealed to with the fewest number of products or offers. TURF Analysis can also take into account different cost structures to produce the products, and help to optimize the profitability of a line extension or brand family.

  • To contrast TURF Analysis with typical methods, consider an example with three possible flavors (A, B and C) of a product, where the "best" two flavors will be brought to market. A typical analysis might look at the % Top Two Box purchase intent for each flavor and conclude that the best two flavors to market are the ones with the two highest scores. If flavors A, B, and C receive % Top Two Box scores if 80%, 75%, and 40% respectively, you could conclude that A and B are the best two flavors. But if the vast majority of people who would buy flavor B would also buy A, the incremental gain by offering B is small. If the overlap between A and C is fairly small, even though C appeals to the fewest people in total, the combination of marketing flavors A and C will appeal to more people than the combination of A and B.