A. Essay Questions
1. Explain the difference between univariate
statistical methods and multivariate statistical methods.
Univariate statistical methods have only one
dependent variable, whereas multivariate statistical methods have multiple
2. Explain the difference between
factorial statistical methods and multivariate statistical methods. Can
statistical methods be both factorial and also multivariate? Explain.
Factorial statistical methods have
multiple independent variables, whereas multivariate statistical methods have
multiple dependent variables. For example, a univariate two-way ANOVA is factorial because it
has two "factors" (independent variables), but it is univariate
because it has a single dependent variables. A one-way MANOVA is multivariate
because it has multiple dependent variables, but it is not factorial because it
has only one factor (independent variable). A statistical method can be both
factorial and also multivariate. For example, a two-way MANOVA is both
factorial and also multivariate because it has two factors (independent
variables) and multiple dependent variables.
3. Discuss the statement that
"most multivariate techniques were developed for use in nonexperimental
The earliest multivariate techniques,
such as factor analysis, are indeed nonexperimental.
Multivariate analysis of variance models are the primary multivariate methods
to be developed specifically for use in experimental research. They were
developed rather late in the 20th
century, long after other multivariate methods such as factor analysis and
principal components. At least a third of the methods in this book are directly
applicable to experimental research data, but even those that are not can be
used to good advantage in a true experimental setting as ways of visualizing
the results of an experiment.
4. Summarize the major kinds of data
that are possible using the "four kinds of measurement scale"
hypothesized by Stevens.
The four kinds of measurement scale
identified by Stevens are nominal, ordinal, interval, and ratio. In fact, there
are almost no examples of interval data that are not also ratio, so we often
combine the two into what is called an interval/ratio scale. So, effectively,
we only have three kinds of data: those that are categorical (nominal), those
that are ordinal, and those that are fully quantitative. As we investigate the
methods of this book, we will discover that ordinal is not a meaningful
category of data for multivariate methods. Therefore, from the standpoint of
data, the major distinction will be between those methods that apply to fully
quantitative data (interval/ratio methods), and those that apply to categorical
data. There is a third category, those that apply to complex data sets that
have both quantitative and also categorical data in them. MANOVA (chapters
eight and ten) is an example of this third category. It has categorical
independent variables and quantitative dependent variables. Factor analysis
(chapter three) is an example of a method that has only quantitative variables,
as is multiple regression. Loglinear models (chapter twelve) are an example
of a multivariate method that deals with data that are completely categorical.
5. Explain the distinction between
continuous and discrete data. Can data be both discrete and also
interval/ratio? Explain. Can data be both continuous and also categorical?
Unfortunately, some use the term
"discrete" to refer to categorical data. However, data can be fully
quantitative (interval/ratio) and yet be discrete. For example, count of number
of persons in various families is fully quantitative (interval/ratio) and yet
is discrete. Obviously, therefore, data can be both discrete and also
interval/ratio. All categorical data, however, are discrete. It is not possible
for data to be both continuous and also categorical since continuous
implies that the data are quantitative (interval/ratio).
6. There is a major difference
between experimental and correlational research. Explain how research designs
differ for these two. How do the statistical methods differ? How is
randomization applied in each kind of research?
Experimental and correlational studies
differ both in the research designs employed and also in the kind of statistics
that are used to analyze the data. They also differ in the kinds of questions
that can be answered, and in the way they use random processes. Correlational
research designs are usually based on random selection of
subjects, often in a naturalistic setting where there is little control over
the variables. Experimental research designs, on the other hand, usually
involve tight experimental controls and random assignment of
subjects to treatment groups. The critical distinction between experimental and
non-experimental designs, is that in true experimental designs the experimenter
manipulates the independent variable and uses random assignment to treatment
groups and the control group. Experimental designs enable the researcher to
make more definitive conclusions and to attribute causality, whereas the
inferences in correlational research are much more tenuous. The multivariate
methods introduced in chapters seven (Hotelling’s T-Squared), eight (MANOVA), nine (ANCOVA
and MANCOVA), and ten (repeated measures MANOVA, hierarchical linear models,
etc.) are applicable to the data obtained from true experimental designs. The
methods in the remainder of the chapters are used primarily with data from
correlational studies and therefore provide less definitive conclusions.
Evaluate the concept that although ANOVA methods were developed for
experimental research, they can be applied to correlational data, that
"the statistical methods 'work' whether or not the researcher manipulated
the IV." You may wish to bring Winer's
point (1971, p. 162) about the assumption of independence of treatment effects
and error into the discussion.
Many seem to believe that the only real
disadvantage of nonexperimental
studies is that one cannot attribute causality with a high degree of
confidence. While this is indeed a serious problem with nonexperimental
designs, there are other issues. On page 162 of his 1971 book, Winer
makes the very important point that ANOVA was originally developed to be used
within the setting of a true experiment where one has control over extraneous
variables, and subjects are assigned to treatment groups at random. The logic
underlying the significance tests and the determination of probabilities of the
Type I error is based upon the assumption that treatment effects and error
effects are independent of one another. The only assurance one can have that
the two are indeed independent is that subjects are assigned at random to
treatment groups. In other words, when ANOVA is used to analyze data from a nonexperimental
design, there is no assurance that treatment and error effects are independent
of one another, and the logic underlying the determination of the probability
of the Type I error breaks down. One is, in this case, using ANOVA
metaphorically. As Winer
says, "hence the importance of randomization in design problems."
8. Chapter one states that the
mathematical prerequisite for understanding this book is matrix algebra. Why do
you suppose matrix algebra is crucial to doing multivariate statistics?
As explained in the chapter, matrix
algebra simplifies the calculations for multivariate data. Also, in the same
way that covariance is
the underlying concept of which variance is a
special case (as is the PPMCC), matrix multiplication is also a general case of
which covariance is a special case. That is, when two matrices are multiplied,
the elements of the resultant matrix consists of sums of products of
the rows of the premultiplying
matrix and the columns of the postmultiplying
matrix. When the entries in these two matrices to be multiplied are deviations
from means, then the resultant product matrix is an SSCP matrix, which becomes
when it is divided by degrees of freedom. In other words, matrix multiplication
is uniquely fitted to creating covariance matrices and is an elegant and
efficient method for doing so.
9. Discuss the taxonomy of the four
basic data structures that are amenable to multivariate analysis, give examples
of each, and of multivariate methods that can be applied to each.
In chapter one a simplified scheme was
presented for categorizing the types of data to which multivariate methods can
be applied. Four basic data types were identified. The first is a
single sample with multiple variables measured on each sampling unit. An
example of this kind of data set would be the scores of three hundred people on
seven psychological tests. Multivariate methods that apply to this kind of data
would include principal components analysis, factor
(which is a particular method within structural equations modeling). These
methods provide answers to the question, "What is the covariance structure
of this set of multiple variables?"
The second data type is a
single sample with two sets of multiple variables (an X set and a Y set)
measured on each unit.
An example of data of this kinds would be a linked data set of six hundred and
twenty trading days, with the X set of variables consisting of seven stock
market indices (the closing value for each of those days) and the Y set of
variables consisting of nine mutual funds (the closing value for each day).
Multivariate methods that can be applied to this kind of data include canonical
(chapter five) and multivariate multiple regression (chapter
six). These methods provide answers to the question, "What is the
covariance structure in each set of multiple variables that is maximally
predictive of the other set?" Another method that can be used with a
single sample with two sets of multiple variables would be SEM, structural
(chapter fourteen). However, SEM can also be applied when there are more than
two sets of multiple variables. In fact, it can handle any number of sets of
multiple variables. It is the general case of which these other methods are
special cases, and as such it has a great deal of potential analytical power.
SEM is introduced in the final chapter of the book.
The third data type is two
samples with multiple variables measured on each unit. An
example would be a simple experiment with an experimental group and a control
group, and with two or more dependent variables measured on each observation
unit. For example, in an agricultural experiment, the effects of a fertilizer
could be assessed by applying it to twelve tomato plants selected at random
(the experimental group) and not applying it to the other twelve tomato plants
(the control group), using multiple dependent variable measurements (such as
number of tomatoes produced, average size, etc.) Multivariate methods that can be applied to
this kind of data are Hotelling's T-squared test, discriminant
analysis, and classification
T-squared test is just the multivariate analogue of the ordinary t test, which
can, of course, be applied to two-sample data when there is only one dependent
variable. When there are multiple dependent variables, the T-squared test can
be used to test for significance. The T-squared test answers the question,
"Are the means vectors for these two samples significantly different from
one another?" Discriminant analysis and classification analysis can be
used to find the optimal linear combination of the multiple dependent variables
to best separate the two groups from one another.
The fourth data type is the same as the
third, but extended to three or more samples (with multiple dependent variables
measured on each of the units of observation). For example, the same test of
the effects of fertilizer on tomatoes could be carried with two types of
fertilizer plus the control group, making three groups to be compared multivariately. The
major method here is the MANOVA, or multivariate analysis of variance,
which is the multivariate analog of ANOVA. In fact, for every ANOVA model
(two-way, three-way, repeated measures, etc.) there exists a corresponding
MANOVA model. MANOVA models answer all the same kinds of questions that ANOVA
models do (significance of main effects and interactions), but within
multivariate spaces rather than just for a single dependent variable. Discriminant
analysis and classification
methods can also be applied to multivariate data having three or more groups.