Astron. Astrophys. 332, 459-478 (1998)

## 3. PCA and 2 test: the formalism

A detailed description of the PCA technique can be found in Murtagh & Heck (1987), and in Kendall (1980). Here we summarize its main characteristics. The PCA is applied to a data set of N vectors with M coordinates. In this M -dimensional space, each object is a point and the sample forms a cloud of points. The central problem which is solved by the PCA is the description of the cloud of points by a set of P vectors of a new orthonormal base, with P , and with a minimal Euclidean distance from each point to the axes defined by the new base. The eigenvectors of this new base are called principal components (PC). Minimizing the sum of distances between spectra and axes is equivalent to maximizing the sum of squared projections onto axes, i.e., maximizing the variance of the spectra when projected onto these new axes.

The input for the PCA is a matrix of N spectra M variables, which in our case are spectral elements with 2 to 10 Å/pix, depending on the resolution of the data. Each spectrum is normalized by its norm (the square root of its scalar product with itself), yielding the N normalized spectra which serve as input to the PCA:

Other normalizations can be used (for example a flux normalization), but it was shown by Connolly et al. (1995) that the details of the normalization applied to the input matrix do not have a strong influence onto the PCA results. However, the interpretation of the principal components does depend on the technique used to reduce the input matrix. Because our input vectors are normalized by their norm, we can apply the PCA onto the sum of squares and cross product (intermediate) matrix (SSCP method), which does not rescale the data nor center the data cloud. The normalized spectra then lie on the surface of a M -dimensional hyper-sphere of radius 1, and the first PC has the same direction as the average spectrum, but with norm equal to 1. Two other procedures are based on the variance-covariance matrix (VC method) and the correlation matrix (C method), respectively. The VC method places the new origin onto the centroid of the sample and the C method also re-scales the data in such a way that the distance between variables is directly proportional to the correlation between them. For the VC method the average spectrum has to be used in order to reconstruct individual spectra. We emphasize that neither the PC's nor the projections given by the SSCP method, used in this paper, are the same as those given by the VC method. However, if the normalized cloud of points is concentrated in a small portion of the hyper-sphere, then the first PC of the VC method will have almost the same direction as the second PC given by the SSCP method (see Francis et al. 1992, Folkes et al. 1996). Although these different methods give different PC's, if we take into account the underlying transformations explained above, the physical interpretation of the PC's and the projections does not change, and the final result always satisfies the maximization conditions and the orthonormality among the different principal components.

After application of the PCA using the SSCP method, we can write each spectrum as

where is the reconstructed spectrum of , is the projection of spectrum onto the eigenspectrum and is the number of PC's taken into account for the reconstruction. In Eq. (2), the PC's are in decreasing order of their contribution to the total variance.

We show in Sect. 5 below, that if the S/N is high enough (i.e., 8), then we can take = 3 or 4 to reconstruct 97 to 98% of the signal, respectively. If the S/N 8, it requires a higher number of PC's to reproduce the initial spectrum to such high accuracy because of the noise pattern. Therefore, the first 2 or 3 components carry most of the signal in each spectrum, which leads us to use , and to describe the spectral sequence. We choose to reduce these 3 parameters to the radius r and the angles and defined by the spectrum (as in Connolly et al. 1995) in spherical coordinates ( the azimuth and the polar angle taken from the equator),

(3a)
(3b)
(3c)

We express the values of and independently of the value of r:

(4a)
(4b)

Note that we prefer the use of and (rather than the ratios / and / ) for defining the spectral sequence because they have a geometrical meaning. In the next section, we show that the physical meaning of is the relative contribution of the red (or early) and the blue (or late) stellar populations within a galaxy. Note that if , then from Eq. (3c), Eq. (4b) approximates to arcsin .

For comparison with the PCA, we have implemented a simple test between the galaxies of the ESS sample and a set of templates derived from the Kennicutt sample (Kennicutt 1992a, see Sect. 5 and Sect. 6). In contrast to the PCA, the test is dependent on the set of templates used and can only provide a constrained classification procedure. The between an observed spectrum and a template can be written as

where and are the values in the spectral element or bin j of the flux-calibrated spectrum and the template, respectively. is the total number of wavelength bins for both the spectrum and the template (we take the largest wavelength interval common to the spectrum and template, and rebin both to a common wavelength step of 5 Å/pix). The denominator measures the variance of the spectrum and the template, assuming that the noise is Poissonian. Because for a given observed spectrum is the same for all the comparison templates, the value does not need to be normalized. Therefore, if we have a set of P templates, then the closest template k to the spectrum S is the one which satisfies

Note that in the PCA treatment, the wavelength interval of all input spectra must be identical. For the test, the wavelength interval can be larger than the one used for the PCA and varies from spectrum to spectrum. This difference will allow us to check the dependence of the PCA classification on the wavelength interval (cf. Sect. 6).

© European Southern Observatory (ESO) 1998

Online publication: March 23, 1998