Forum Springer Astron. Astrophys.
Forum Whats New Search Orders

Astron. Astrophys. 332, 459-478 (1998)

Previous Section Next Section Title Page Table of Contents

3. PCA and 2 test: the formalism

A detailed description of the PCA technique can be found in Murtagh & Heck (1987), and in Kendall (1980). Here we summarize its main characteristics. The PCA is applied to a data set of N vectors with M coordinates. In this M -dimensional space, each object is a point and the sample forms a cloud of points. The central problem which is solved by the PCA is the description of the cloud of points by a set of P vectors of a new orthonormal base, with P [FORMULA], [FORMULA] and with a minimal Euclidean distance from each point to the axes defined by the new base. The eigenvectors of this new base are called principal components (PC). Minimizing the sum of distances between spectra and axes is equivalent to maximizing the sum of squared projections onto axes, i.e., maximizing the variance of the spectra when projected onto these new axes.

The input for the PCA is a matrix of N spectra [FORMULA] M variables, which in our case are spectral elements with 2 to 10 Å/pix, depending on the resolution of the data. Each spectrum [FORMULA] is normalized by its norm (the square root of its scalar product with itself), yielding the N normalized spectra [FORMULA] which serve as input to the PCA:


Other normalizations can be used (for example a flux normalization), but it was shown by Connolly et al. (1995) that the details of the normalization applied to the input [FORMULA] matrix do not have a strong influence onto the PCA results. However, the interpretation of the principal components does depend on the technique used to reduce the input matrix. Because our input vectors are normalized by their norm, we can apply the PCA onto the sum of squares and cross product (intermediate) matrix (SSCP method), which does not rescale the data nor center the data cloud. The normalized spectra then lie on the surface of a M -dimensional hyper-sphere of radius 1, and the first PC has the same direction as the average spectrum, but with norm equal to 1. Two other procedures are based on the variance-covariance matrix (VC method) and the correlation matrix (C method), respectively. The VC method places the new origin onto the centroid of the sample and the C method also re-scales the data in such a way that the distance between variables is directly proportional to the correlation between them. For the VC method the average spectrum has to be used in order to reconstruct individual spectra. We emphasize that neither the PC's nor the projections given by the SSCP method, used in this paper, are the same as those given by the VC method. However, if the normalized cloud of points is concentrated in a small portion of the hyper-sphere, then the first PC of the VC method will have almost the same direction as the second PC given by the SSCP method (see Francis et al. 1992, Folkes et al. 1996). Although these different methods give different PC's, if we take into account the underlying transformations explained above, the physical interpretation of the PC's and the projections does not change, and the final result always satisfies the maximization conditions and the orthonormality among the different principal components.

After application of the PCA using the SSCP method, we can write each spectrum [FORMULA] as


where [FORMULA] is the reconstructed spectrum of [FORMULA], [FORMULA] is the projection of spectrum [FORMULA] onto the eigenspectrum [FORMULA] and [FORMULA] is the number of PC's taken into account for the reconstruction. In Eq. (2), the PC's are in decreasing order of their contribution to the total variance.

We show in Sect. 5 below, that if the S/N is high enough (i.e., [FORMULA] 8), then we can take [FORMULA] = 3 or 4 to reconstruct [FORMULA] 97 to 98% of the signal, respectively. If the S/N [FORMULA] 8, it requires a higher number of PC's to reproduce the initial spectrum to such high accuracy because of the noise pattern. Therefore, the first 2 or 3 components carry most of the signal in each spectrum, which leads us to use [FORMULA], [FORMULA] and [FORMULA] to describe the spectral sequence. We choose to reduce these 3 parameters to the radius r and the angles [FORMULA] and [FORMULA] defined by the spectrum [FORMULA] (as in Connolly et al. 1995) in spherical coordinates ([FORMULA] the azimuth and [FORMULA] the polar angle taken from the equator),

[FORMULA] (3a)
[FORMULA] (3b)
[FORMULA] (3c)

We express the values of [FORMULA] and [FORMULA] independently of the value of r:

[FORMULA] (4a)
[FORMULA] (4b)

Note that we prefer the use of [FORMULA] and [FORMULA] (rather than the ratios [FORMULA] / [FORMULA] and [FORMULA] / [FORMULA]) for defining the spectral sequence because they have a geometrical meaning. In the next section, we show that the physical meaning of [FORMULA] is the relative contribution of the red (or early) and the blue (or late) stellar populations within a galaxy. Note that if [FORMULA], then from Eq. (3c), Eq. (4b) approximates to [FORMULA] arcsin [FORMULA].

For comparison with the PCA, we have implemented a simple [FORMULA] test between the galaxies of the ESS sample and a set of templates derived from the Kennicutt sample (Kennicutt 1992a, see Sect. 5 and Sect. 6). In contrast to the PCA, the [FORMULA] test is dependent on the set of templates used and can only provide a constrained classification procedure. The [FORMULA] between an observed spectrum and a template can be written as


where [FORMULA] and [FORMULA] are the values in the spectral element or bin j of the flux-calibrated spectrum and the template, respectively. [FORMULA] is the total number of wavelength bins for both the spectrum and the template (we take the largest wavelength interval common to the spectrum and template, and rebin both to a common wavelength step of 5 Å/pix). The denominator measures the variance of the spectrum and the template, assuming that the noise is Poissonian. Because for a given observed spectrum [FORMULA] is the same for all the comparison templates, the [FORMULA] value does not need to be normalized. Therefore, if we have a set of P templates, then the closest template k to the spectrum S is the one which satisfies


Note that in the PCA treatment, the wavelength interval of all input spectra must be identical. For the [FORMULA] test, the wavelength interval can be larger than the one used for the PCA and varies from spectrum to spectrum. This difference will allow us to check the dependence of the PCA classification on the wavelength interval (cf. Sect. 6).

Previous Section Next Section Title Page Table of Contents

© European Southern Observatory (ESO) 1998

Online publication: March 23, 1998