Essay 1

A. Essay Questions

1. Explain the difference between univariate statistical methods and multivariate statistical methods.

Univariate statistical methods have only one dependent variable, whereas multivariate statistical methods have multiple dependent variables

2. Explain the difference between factorial statistical methods and multivariate statistical methods. Can statistical methods be both factorial and also multivariate? Explain.

Factorial statistical methods have multiple independent variables, whereas multivariate statistical methods have multiple dependent variables. For example, a univariate two-way ANOVA is factorial because it has two "factors" (independent variables), but it is univariate because it has a single dependent variables. A one-way MANOVA is multivariate because it has multiple dependent variables, but it is not factorial because it has only one factor (independent variable). A statistical method can be both factorial and also multivariate. For example, a two-way MANOVA is both factorial and also multivariate because it has two factors (independent variables) and multiple dependent variables.

3. Discuss the statement that "most multivariate techniques were developed for use in nonexperimental research."

The earliest multivariate techniques, such as factor analysis, are indeed nonexperimental. Multivariate analysis of variance models are the primary multivariate methods to be developed specifically for use in experimental research. They were developed rather late in the 20^th century, long after other multivariate methods such as factor analysis and principal components. At least a third of the methods in this book are directly applicable to experimental research data, but even those that are not can be used to good advantage in a true experimental setting as ways of visualizing the results of an experiment.

4. Summarize the major kinds of data that are possible using the "four kinds of measurement scale" hypothesized by Stevens.

The four kinds of measurement scale identified by Stevens are nominal, ordinal, interval, and ratio. In fact, there are almost no examples of interval data that are not also ratio, so we often combine the two into what is called an interval/ratio scale. So, effectively, we only have three kinds of data: those that are categorical (nominal), those that are ordinal, and those that are fully quantitative. As we investigate the methods of this book, we will discover that ordinal is not a meaningful category of data for multivariate methods. Therefore, from the standpoint of data, the major distinction will be between those methods that apply to fully quantitative data (interval/ratio methods), and those that apply to categorical data. There is a third category, those that apply to complex data sets that have both quantitative and also categorical data in them. MANOVA (chapters eight and ten) is an example of this third category. It has categorical independent variables and quantitative dependent variables. Factor analysis (chapter three) is an example of a method that has only quantitative variables, as is multiple regression. Loglinear models (chapter twelve) are an example of a multivariate method that deals with data that are completely categorical.

5. Explain the distinction between continuous and discrete data. Can data be both discrete and also interval/ratio? Explain. Can data be both continuous and also categorical? Explain.

Unfortunately, some use the term "discrete" to refer to categorical data. However, data can be fully quantitative (interval/ratio) and yet be discrete. For example, count of number of persons in various families is fully quantitative (interval/ratio) and yet is discrete. Obviously, therefore, data can be both discrete and also interval/ratio. All categorical data, however, are discrete. It is not possible for data to be both continuous and also categorical since continuous implies that the data are quantitative (interval/ratio).

6. There is a major difference between experimental and correlational research. Explain how research designs differ for these two. How do the statistical methods differ? How is randomization applied in each kind of research?

Experimental and correlational studies differ both in the research designs employed and also in the kind of statistics that are used to analyze the data. They also differ in the kinds of questions that can be answered, and in the way they use random processes. Correlational research designs are usually based on randomselection of subjects, often in a naturalistic setting where there is little control over the variables. Experimental research designs, on the other hand, usually involve tight experimental controls and random assignment of subjects to treatment groups. The critical distinction between experimental and non-experimental designs, is that in true experimental designs the experimenter manipulates the independent variable and uses random assignment to treatment groups and the control group. Experimental designs enable the researcher to make more definitive conclusions and to attribute causality, whereas the inferences in correlational research are much more tenuous. The multivariate methods introduced in chapters seven (Hotelling’s T-Squared), eight (MANOVA), nine (ANCOVA and MANCOVA), and ten (repeated measures MANOVA, hierarchical linear models, etc.) are applicable to the data obtained from true experimental designs. The methods in the remainder of the chapters are used primarily with data from correlational studies and therefore provide less definitive conclusions.

7. Evaluate the concept that although ANOVA methods were developed for experimental research, they can be applied to correlational data, that "the statistical methods 'work' whether or not the researcher manipulated the IV." You may wish to bring Winer's point (1971, p. 162) about the assumption of independence of treatment effects and error into the discussion.

Many seem to believe that the only real disadvantage of nonexperimental studies is that one cannot attribute causality with a high degree of confidence. While this is indeed a serious problem with nonexperimental designs, there are other issues. On page 162 of his 1971 book, Winer makes the very important point that ANOVA was originally developed to be used within the setting of a true experiment where one has control over extraneous variables, and subjects are assigned to treatment groups at random. The logic underlying the significance tests and the determination of probabilities of the Type I error is based upon the assumption that treatment effects and error effects are independent of one another. The only assurance one can have that the two are indeed independent is that subjects are assigned at random to treatment groups. In other words, when ANOVA is used to analyze data from a nonexperimental design, there is no assurance that treatment and error effects are independent of one another, and the logic underlying the determination of the probability of the Type I error breaks down. One is, in this case, using ANOVA metaphorically. As Winer says, "hence the importance of randomization in design problems."

8. Chapter one states that the mathematical prerequisite for understanding this book is matrix algebra. Why do you suppose matrix algebra is crucial to doing multivariate statistics?

As explained in the chapter, matrix algebra simplifies the calculations for multivariate data. Also, in the same way that covariance is the underlying concept of which variance is a special case (as is the PPMCC), matrix multiplication is also a general case of which covariance is a special case. That is, when two matrices are multiplied, the elements of the resultant matrix consists of sums of products of the rows of the premultiplying matrix and the columns of the postmultiplyingmatrix. When the entries in these two matrices to be multiplied are deviations from means, then the resultant product matrix is an SSCP matrix, which becomes a covariance matrix when it is divided by degrees of freedom. In other words, matrix multiplication is uniquely fitted to creating covariance matrices and is an elegant and efficient method for doing so.

9. Discuss the taxonomy of the four basic data structures that are amenable to multivariate analysis, give examples of each, and of multivariate methods that can be applied to each.

In chapter one a simplified scheme was presented for categorizing the types of data to which multivariate methods can be applied. Four basic data types were identified. The first is a single sample with multiple variables measured on each sampling unit. An example of this kind of data set would be the scores of three hundred people on seven psychological tests. Multivariate methods that apply to this kind of data would include principal components analysis, factor analysis, and confirmatory factor analysis (which is a particular method within structural equations modeling). These methods provide answers to the question, "What is the covariance structure of this set of multiple variables?"

The second data type is a single sample with two sets of multiple variables (an X set and a Y set) measured on each unit. An example of data of this kinds would be a linked data set of six hundred and twenty trading days, with the X set of variables consisting of seven stock market indices (the closing value for each of those days) and the Y set of variables consisting of nine mutual funds (the closing value for each day). Multivariate methods that can be applied to this kind of data include canonical correlation (chapter five) and multivariate multiple regression (chapter six). These methods provide answers to the question, "What is the covariance structure in each set of multiple variables that is maximally predictive of the other set?" Another method that can be used with a single sample with two sets of multiple variables would be SEM, structural equations modeling (chapter fourteen). However, SEM can also be applied when there are more than two sets of multiple variables. In fact, it can handle any number of sets of multiple variables. It is the general case of which these other methods are special cases, and as such it has a great deal of potential analytical power. SEM is introduced in the final chapter of the book.

The third data type is two samples with multiple variables measured on each unit. An example would be a simple experiment with an experimental group and a control group, and with two or more dependent variables measured on each observation unit. For example, in an agricultural experiment, the effects of a fertilizer could be assessed by applying it to twelve tomato plants selected at random (the experimental group) and not applying it to the other twelve tomato plants (the control group), using multiple dependent variable measurements (such as number of tomatoes produced, average size, etc.) Multivariate methods that can be applied to this kind of data are Hotelling's T-squared test, discriminant analysis, and classification analysis. The Hotelling's T-squared test is just the multivariate analogue of the ordinary t test, which can, of course, be applied to two-sample data when there is only one dependent variable. When there are multiple dependent variables, the T-squared test can be used to test for significance. The T-squared test answers the question, "Are the means vectors for these two samples significantly different from one another?" Discriminant analysis and classification analysis can be used to find the optimal linear combination of the multiple dependent variables to best separate the two groups from one another.

The fourth data type is the same as the third, but extended to three or more samples (with multiple dependent variables measured on each of the units of observation). For example, the same test of the effects of fertilizer on tomatoes could be carried with two types of fertilizer plus the control group, making three groups to be compared multivariately. The major method here is the MANOVA, or multivariate analysis of variance, which is the multivariate analog of ANOVA. In fact, for every ANOVA model (two-way, three-way, repeated measures, etc.) there exists a corresponding MANOVA model. MANOVA models answer all the same kinds of questions that ANOVA models do (significance of main effects and interactions), but within multivariate spaces rather than just for a single dependent variable. Discriminant analysis and classification analysis methods can also be applied to multivariate data having three or more groups.