

Data
Analysis

 Once
the data are ready, the analysis phase generally consists
of two different steps with a reporting period in
between. The first step in the analysis is called
knowledge discovery. In this phase, ‘smart’
algorithms search through the data looking for patterns
or relationships. These algorithms are typically Chaid
(Chi Square Automatic Interaction Detection) or Cart
(Classification And Regression Trees) procedures,
though Neural Nets, Genetic Algorithms, and other
hybrid systems are also used. They generally take
one userspecified variable called the ‘dependent
variable’, and try to relate every variable in
the file to that variable. Some algorithms can look
for linear, and nonlinear relationships, as well
as transform the variables in a variety of ways to
maximize their relationships. Relationships are generally
reported as decision trees, which are an easily understood
way of presenting information.

 Data
Mining analyses typically relate hundreds and even
thousands of variables to several dependent variables
of key interest. Since many algorithms are free to
manipulate the variables to maximize their relationships,
it is not uncommon for an analysis to yield hundreds
of ‘significant’ relationships. These relationships
are simply measures of statistical association, and
are often spurious or otherwise of little importance,
and they are therefore considered to be hypotheses
about relationships in the data, which need to be
studied further.

 This
information is generally discussed with the ‘hands
on’ users or other researchers, and the number
of hypotheses is filtered down to focus on the most
promising avenues for further analysis. This second
step in the analysis is generally called validation,
and usually relies on common statistical techniques
like regression, discriminant analysis, and cluster
analysis. This step usually includes some form of
quantification of trends or market opportunities,
prediction, segmentation, or response modeling. The
ultimate goal of the analysis is generally to either
increase revenues through a better understanding of
the customer, or else to develop better predictive
models to use as forecasting tools.



