4.1. Segregations in PCA-space
The PCA is completely `self-propelled', i.e., it does not need to be tuned or trained. Therefore it is interesting to look at the results of the PCA for the galaxies with morphology from D80, as well as for the galaxies with and without emission lines, to see in what way the PCs and the morphology or emission-line character correlate.
In Fig. 4 we show, for all 270 galaxies with morphology from D80, the distribution with respect to the first and second PCs for the 3 classes E, S0 and S+I. All galaxies have values of between -35 and 35 while is almost always in the range [-10,10]. Yet, galaxies of different morphological type have (slightly) different distributions in the ()-plane. The ellipticals have predominantly negative values of , the lenticulars are more evenly spread in while the spirals and irregulars have more positive values of than negative ones. These differences between the distributions are quantified in Table 2, which gives the fraction of galaxies with positive values of and for the three morphological classes. It is clear that the fraction of galaxies with increases towards later type. There is a tendency for to be slightly negative on average, although the effect is not very significant in view of the statistics. As the information content of the PCs ei decreases with increasing i, the higher-order PCs will not have, by themselves, more discriminating power than .
Table 2. Distribution with respect to the first and second Principal Components, and . In the upper half of the table the galaxies are grouped according to morphology (from D80). In the lower half, galaxies are grouped according to the presence (ELG) or absence (non-ELG) of emission lines.
The effect visible in Fig. 4 and Table 2 is qualitatively similar to that found by Lahav et al. (1996) who used 13 galaxy parameters (e.g. blue minus red colour, central surface brightness) and found that different morphological types occupy distinct regions in the -plane. They even detected a slight separation between E and S0 galaxies, although the regions occupied by the two morphological types had considerable overlap.
In Fig. 5 we show the distribution with respect to the first and second PC for the ELG (lefthand panel) and the non-ELG (righthand panel). There is a clear difference between the two distributions, with almost 90% of the ELG having a positive value of , while the non-ELG have, on average, a slightly negative value of (see also Table 2). Qualitatively, Figs. 4 and 5 are quite consistent, in view of the fact that almost all ELG are spirals (see Paper III). It is interesting to note that apparently the ELG represent those spirals that have essentially only positive values of .
Note also that the difference between ELG and non-ELG persists if we do not include in the PCA the spectral ranges where the main emission lines can occur. This shows that it is not only the emission lines themselves which distinguish ELG from non-ELG, but that more global properties of the spectrum, such as continuum slope (see Sect. 3.1.3), correlate with the presence of emission lines.
4.2. Success rates
In Table 3 we give the results of our morphological classification using the ANN operating on the 15 most significant PCs. The percentages quoted are averages ( r.m.s. values around these averages) over 10 realizations of the ANN. For each realization, different sets of galaxies (which are thus partly correlated) are used to train the ANN. We give this information for the test set as well as for the training set . We consider two cases: the three-class ANN classification (top), and its compressed pseudo two-class version obtained by combining the E and S0 classes of the three-class classification (see Sect. 3.2.1).
Table 3. Success rates (in percentages) for classifying galaxies with the ANN. The numbers are averages ( r.m.s. values around these averages) over 10 realizations of the ANN. For each realization, different sets of galaxies (which are partly correlated) are used to train the ANN. The first column gives the galaxy type as classified by D80. The second column gives the number of galaxies of this type that is used in the training set. The third, fourth and fifth columns give the fraction of galaxies (per morphological type) that is labeled as E, S0 and S+I, respectively, in the training set by the ANN. Column 6 gives the number of galaxies in the test set. Columns 7 to 9 give the fraction of galaxies (per morphological type) that is labeled as E, S0 and S+I, respectively, in the test set by the ANN. In the lower half of the table, the E and S0 galaxies are combined.
The overall success rates for the training and test sets are and , respectively, for the three-class system, giving each of the three classes equal weight. The success rate for the training set is larger than for the test set, which must be due to the fact that the ANN weights are calculated using the galaxies in the training set only. The success rate for the test set, however, is the one that should be applied to the entire set of galaxies to be classified.
If one uses the two-class system, viz. separating between early- and late-types only, the success rate for the training set is and for the test set . Obviously, these success rates are larger than for the three-class system because one has less categories to classify the galaxies in, and because a large fraction of the classification `failures' occurs between E's and S0's. The fact that in the two-class system the success rate of the early-type galaxies is higher than for the late-type galaxies may be due, at least partly, to the asymmetry between early- and late-type galaxies in our spectral misclassifications, as discussed in Sect. 3.3.
Using the galaxies classified by Dressler, we determined how the spirals that are incorrectly classified are distributed between early- and late-type spirals. For the training set, only 30% of the spiral galaxies that are classified as S0 by our ANN are of type Sb or later, according to D80. Sodré & Cuevas (1997) obtained a similar result, namely that the spectral variation, as measured by the first PC, is slow from E to Sab and increases strongly for later types. So one expects few Sb or later-type spirals to be classified as early-type. For the test set, all spirals classified as E by the ANN are of type Sa. For the training set 30% of the spirals of type Sa, 10% of type Sb, 13% of type Sc and 0% of type Sd+I are classified incorrectly. For the test set, these numbers are 26% for Sa, 13% for Sb and 0% for Sc or later. The early-type spirals are thus more often misclassified as E or S0 than the late-type spirals.
The fraction of the 808 ELG in the ENACS sample, used in the present analysis, that is classified as E, S0 or S+I is , and , respectively. Biviano et al. found that out of the 71 ELG with a morphological type available, 86% are of type S+I, 11% are S0 and 3% are elliptical. The fraction of ELG that is classified as spiral (80%) is higher than would be expected from the individual success rates for the E, S0 and S+I subsamples (Table 3) and the distribution of ELG over morphological type (Paper III). These would imply that of the ELG would be classified as a spiral. Apparently, the success rate for the ELG is larger than for the entire data set containing both ELG and non-ELG, which could imply that ELG are preferentially late spirals, for which the classification is more reliable than it is for early spirals.
Based on the results of Table 3 one expects E's, S0's and S+I galaxies in the set of 270 ENACS spectra with a classification by D80. These numbers agree very well with the actual numbers in the D80 set, viz. 62 E's, 118 S0's and 90 S+I. However, this is not too surprising, as the correspondence between both sets of numbers was one of our criteria to set the output ranges (Sect. 3.2.3).
The distribution of galaxy types for the entire sample of 3798 spectra in our final sample is: E, S0 and S+I.
Of all AGN in our sample, is classified as early-type. This is significantly more than the of all ELG that is classified as early-type. Apparently, there are significantly more early-type galaxies among AGN than there are among the non-AGN ELG.
In Table 4 we give the success rates for the galaxy classification if one uses different numbers of PCs. It appears that the results obtained with 10 PCs in the ANN may be marginally worse than those with 15 or 20 PCs, but the differences are not very significant.
Table 4. Success rates (in percentages) for classifying galaxies with the ANN using different numbers of Principal Components (PCs). The numbers are averages ( r.m.s. values around these averages) over 10 realizations of the ANN. For each realization, different sets of galaxies (which are partly correlated) are used to train the ANN. The results are given for the training and test sets separately and both for the three-class and the two-class classification systems.
We have also run the PCA and ANN with the spectra of the September 1992 period included as well. The classification results then are (three-class) and (two-class) for the training set, and (three-class) and (two-class) for the test set. These success rates are slightly lower than those if the spectra from the September 1992 period are not included, and they justify our choice not to include those galaxies in the analysis.
Finally, we have investigated if the success rates depend clearly on the S/N-ratio of the galaxy spectrum. This is not the case, as is expected because, by construction, the first PCs will contain relatively little noise.
© European Southern Observatory (ESO) 1999
Online publication: December 4, 1998