FAS Info | Factor Analysis

The aim of this software is, in common with conventional factor analysis, to condense the information in a large number of related variables to be contained in a smaller number of variables (the factors), to exhibit the mutual relationships of the latter, and to generate numerical values for the latter.

Unlike conventional factor analysis software, this software has the advantages that it can accept categorical variables and variables not normally distributed and variables with some missing values as input and it does not require statistical independence of the rows (the cases) of the input data matrix, so for example multivariate time series observations can be analysed.

The model fitting proceeds by multiplying the input data cases (rows) by a repeatedly updated matrix orthogonal to the current estimate of the loadings matrix, until the latter converges to an estimate of the factor pattern.

The following is a justification for the analysis method employed:

The data to be factor-analysed consist of n q-dimensional row vectors {yi}, i = 1 to n, n and q being positive integers. The {yi} have real valued components which are either continuous or dichotomous (0 or 1), no further distributional assumptions being made. This assertion is justified by the fact, see below, that only linear combinations (which by the central limit theorem approach normality) of all the components of a yi are modelled in this approach, and the components of a yi are not individually assumed normally distributed, as is the case in conventional factor analysis.

The factor analysis model is yi = zi.M + ei (1)

Here {zi} are f-dimensional row vectors of real valued factors with f< q, f being a positive integer > 1, the factor scores. M is an f by q dimensional, rank f, matrix of real valued continuous model parameters, the factor loadings. The {ei} are q-dimensional real valued error row vectors, such that, for unequal integers j and k, ej and ek are uncorrelated. The {ei} are assumed to have a mean of the zero q-vector and the components of each yi are assumed to be distributed with a non-singular diagonal variance covariance matrix. The set {ei} and the set {zi} are assumed to be uncorrelated. The model (1) is assumed to hold, given the {yi}, with the above assumptions on the {zi} and the {ei} being satisfied, for at least one value of M.

The model is fitted by varying, in an optimisation procedure, the components of trial values of M and positive real valued σ1---σq (the estimates of the standard deviations of the components of the {ei}) to maximise the likelihood of the model (see below) for the n vectors {yi.L} (modelled as independent normal (by appealing to the central limit theorem) with mean zero and common variance-covariance matrix). Here L is a q by (q-f) dimensional, rank (q-f), real valued matrix, calculated from the value of trial M, such that M.L = O, the zero f by (q-f) matrix, and such that LT.L = I, the (q-f) by (q-f) unit matrix, where LT denotes the transpose matrix of L. L is not uniquely defined.

Suppose the matrix L is calculated as follows for each trial value of M: L is derived by any arbitrarily chosen one of the possible procedures for calculating L, to correspond to the trial value of M. It follows that, if the model (1) holds for at least one value of M, referred to as ‘true’M, then, for any i (in the interval 1 to n), the value of yi.L will, conditional on the (unknown) zi, by the central limit theorem, approach being normally distributed as N(zi.’true’M.L, LT.diag(‘true’σ12,---,’true’ σq2).L), where the ‘true’σ’s are positive standard deviations for the components of yi corresponding to the ‘true’M. Because the model (1) holds and zi is assumed uncorrelated with ei, and because each yi.L is modelled as N(O, LT.diag(σ12, ---,σq2).L), where O is the q-f dimensional row vector of zeroes and the σ‘s are modelling parameters, it follows that, as the maximum likelihood optimisation proceeds (varying the M and σ’‘s), for increasing values of n, the values of the zi.’true’M.L approximately converge asymptotically in probability to O. It follows that ‘true’M.L converges similarly to the zero f by (q-f) matrix. Thus the trial values of M, by the definition of L, will approximately converge asymptotically in probability to an f by q matrix whose columns span the space spanned by the columns of ‘true’M. That is: the trial value of M will approach a matrix M satisfying (1).

As there are no distributional restrictions on the {yi}, dichotomous components, and hence categorical variables can be accommodated.

Modelled as above, only the relative values, not the individual values, of the components (loadings) of the matrix M, have significance.

The author wishes to express his gratitude to Professor Simon Skene for making a Visiting Researcher position available to the author and for his valued advice at the Clinical Trials Unit of the University of Surrey, Guildford, UK.