.BG .FN ca .TL Correspondence Analysis .DN Finds a new coordinate system for multivariate data such that the first coordinate has maximal inertia, the second coordinate has maximal inertia subject to being orthogonal to the first, etc. Compared to Principal Components Analysis, each row and column point has an associated mass (related to the row or column totals); and the chi-squared distance takes the place of the Euclidean distance. The issue of how to code the input data is important: this takes the place of input data transformation in PCA. .CS ca(a) .RA .AG a data matrix to be decomposed, the rows representing observations and the columns variables. .AG nf number of factors or axes to be sought; default 7. .RT list of class `"reddim"' describing the correspondence analysis: .RC rproj projections of row points on the factors. .RC cproj projections of column points on the factors. .RC evals eigenvalues associated with the new factors. These provide figures of merit for the "inertia explained" by the factors. They are usually quoted in terms of percentage of the total, or in terms of cumulative percentage of the total. .RC evecs definition of the factors in terms of the original variables. The first column is the linear combination of columns of `a' defining the first factor, etc. .RC rcntr contributions of observations to the factors. The contributions are mass times projection (on the factor) squared. Since contributions take account of the mass, they more accurately indicate influential observations for the interpretation of the factor, compared to the projections alone. .RC ccntr contributions of variables to the factors. See above remark concerning row contributions. .SH NOTE Very small negative eigenvalues, if they arise, are an artifact of the SVD algorithm used, and are to be treated as zero. .SH METHOD A singular value decomposition is carried out. .SH BACKGROUND Correspondence analysis defines the axis which provides the best fit to both the row points and the column points. A second axis is determined which best fits the data subject to being orthogonal to the first. Third and subsequent axes are similarly found. Best fit is in the least squares sense, relative to the chi-squared distance. This can be viewed as a weighted Euclidean distance between `profiles'. .PP The question of `coding' of input data is an important one. For instance, in a matrix of scores, one might wish to adjoin extra columns to the input matrix such that both the initial score, and the maximum score minus it, are included in the observation's set of values. Note that this has the effect that all row masses are equal. Hence the variables alone are differentially weighted. This is known as `doubling' the observations. In the case of binary data, such coding is known as `complete disjunctive form'. .PP Other forms of input data for which correspondence analysis can be used include frequencies, or contingency-type data. In this case, the totaled chi-squared distances of all (row or column) points from the origin is the familiar chi-squared statistic. Hence the graphical output of correspondence analysis allows assessment of departure from a null hypothesis of no dependence of rows and columns. .PP Supplementary rows or columns are projected into the factor space, after carrying out a correspondence analysis. That is to say, such row or column profiles are assumed to have zero mass, and their projections are to be found under such an assumption. Functions `supplr' and `supplc' may be used for this purpose. Supplementary rows or columns are of a different nature compared to the basis data analyzed (e.g. sex in the context of a questionnaire); or they are rows or columns which, one suspects, would untowardly influence the definition of the factors. .SH REFERENCES Extensive works of J.-P. Benzecri including .ul Correspondence Analysis Handbook Marcel Dekker, Basel, 1992. .sp M.J. Greenacre, .ul Theory and Applications of Correspondence Analysis Academic Press, New York, 1984. .sp L. Lebart, A. Morineau and K.M. Warwick, .ul Multivariate Descriptive Statistical Analysis Wiley, New York, 1984. .sp S. Nishisato, .ul Analysis of Categorical Data: Dual Scaling and Its Applications University of Toronto Press, Toronto, 1980. .sp (An extensive annotated bibliography is to be found in Greenacre.) .SA Supplementary rows and columns: `supplr', `supplc'. Initial data coding: `flou', `logique'. Other functions producing objects of class "reddim": `pca', `sammon'. Other related functions: `prcomp', `cancor', `cmdscale'. Plotting tool: `plaxes'. .EX # correspondence analysis of the breakfast cereal data, # in complete disjunctive form: bfpos <- t(cereal.attitude) bfneg <- max(bfpos) - bfpos bfposneg <- cbind(bfpos, bfneg) corr <- ca(bfposneg) # plot of first and second factors plot(corr$rproj[,1], corr$rproj[,2],type="n") text(corr$rproj[,1], corr$rproj[,2], labels=dimnames(bfposneg)[[1]]) # Place additional axes through x=0 and y=0: plaxes(corr$rproj[,1], corr$rproj[,2]) # check of row contributions corr$rcntr # # Fuzzy coding of input variables, `a', `b', `c': a.fuzz <- flou(a) b.fuzz <- flou(b) c.fuzz <- flou(c) newdata <- cbind(a.fuzz, b.fuzz, c.fuzz) ca.newdata <- ca(newdata) .KW multivariate .KW algebra .WR