## 3. Uncertainties on the FP coefficientsTwo kinds of methods are usually adopted to estimate the uncertainties on fit coefficients: theoretical methods and re-sampling techniques. The `theoretical uncertainties' are obtained from the analytical expression of the variances of the estimators (e.g. IFA90 and Feigelson & Babu 1992). By their nature, these estimates are valid only asymptotically, i.e. for large sample sizes. When analytical formulae are not available, or when the sample is small, re-sampling procedures are adopted. The statistics of interest is calculated for various `pseudo-samples' drawn from the original data set. The uncertainties are then derived from the distribution of pseudo-values. The main re-sampling procedures are known as `jackknife' (see Quenouille 1949 and Tukey 1958) and `bootstrap' (see Efron 1979 and Efron & Tibshirani 1986). In the jackknife, one point is extracted in sequence from the original data set, so that a number of pseudo-samples equal to the sample size is constructed. In the bootstrap, random samples are drawn by replacement from the actual data set. Fig. 2 shows the relative uncertainties on the FP coefficients
In the following we will analyze the performance of the methods to estimate the FP uncertainties with special regard to the rôle of the sample size. The analysis will be performed by numerical simulations, which are described in the next section. ## 3.1. The simulation algorithmThe simulations consist in distributions of points extracted from a common `parent distribution'. To derive the parent population we based on the distribution in the parametric space of the galaxies in the Coma cluster (hereafter the `template' sample). This choice is mandatory since Coma is the only cluster with FP parameters available for a large number of galaxies. The simulation algorithm consists in the following. -
At first, we derived a parent distribution of E galaxies in the plane (,), using the photometric data in the Gunn band by JFK95a for 146 galaxies in Coma. The sample is complete out to Gunn mag. The distribution with respect to was described by a normal RV. The interval of the template was binned, and the mean value (MV) and standard deviation (SD) of the distribution derived for each bin. The MVs and the SDs were then fitted with respect to the central values of by polynomials of suitable order, and the best-fit curves were used to interpolate a value (and a scatter) of to each value of . -
Using by Jorgensen et al. (1995b, hereafter JFK95b) for 75 galaxies of the photometric sample (see Sect. 4), we determined the FP coefficients and the root mean square (rms) of residuals, , by the fit. We found , , and . These quantities were used to assign to the points a value and a scatter with respect to the variable .
One of the simulated samples is compared to the template in Fig. 3, to show how the simulations resemble very well the distribution of Coma galaxies.
## 3.2. Estimating the uncertaintiesWe derived the `true uncertainties' on the FP coefficients as a function of the sample size in the following way. FP simulations of fixed size
It is apparent that the true intervals depend on the fitting method
and that the most effective fit (i.e. lower values of
and
for fixed Fig. 4 can be used as a ready tool to state the number of galaxies necessary to achieve a given accuracy on the FP. Concerning the zero point of the FP, it is important to remark that
the uncertainties plotted in Fig. 4 do not represent the
estimates of usual interest. For all the applications of the FP zero
point (i.e. distance determinations, constraining of cosmological and
evolution parameters), the uncertainties on For this reason, in the following we will focus our analysis on the uncertainties of the FP slopes. Although the comparison of Fig. 2 and Fig. 4 does not show an evident disagreement, for () the values reported in literature appear almost as a scatter diagram. Starting from this remark, we now analyze the performances of the different methods used to estimate the uncertainties. ## 3.2.1. Theoretical methodsAlthough statistics allows to prove the asymptotic validity of variance estimators, it does not furnish an estimate of the `minimum sample size' for the theoretical formulae to be valid. Since such an estimate will generally depend on the `shape' of the parent population, it should be obtained each time by using simulation methods. To test the performance of theoretical variance estimators for the
FP coefficients we apply the results of Sect. 2. FP simulations
of fixed size
To discuss in more detail the point II, we calculated for each
sample the coefficient a, the theoretical uncertainty
, and the `discrepancy'
, where the actual value,
, of the coefficient In Fig. 6, we plot against
for two different values of
. We chose
that, for a normal distribution,
defines a confidence level (CL) of ,
and , corresponding to a
CL. If Eqs. (14 - 16) worked
well, on the average, for every value of
Fig. 6 suggests that in order to obtain a given confidence
level, the uncertainties on the FP coefficients of small samples
should be estimated using a
interval dependent on We conclude that for the
theoretical formulae are not reliable. Although the desired confidence
intervals can be roughly obtained by using effective, suitably tested,
standard intervals, the individual estimates can be significantly
different, up to for ## 3.2.2. Re-sampling proceduresThe hypothesis underlying the use of re-sampling methods is that the available data set furnishes a good approximation to the parent population. The statistics of interest is calculated for various pseudo-samples drawn from the actual data set. If this `sampling hypothesis' holds, the distribution of pseudo-values coincides with the `true' one and the confidence intervals can be accurately estimated at the cost of some computing time (see e.g. Efron & Tibshirani 1986). However, the smaller the sample size, the larger is the probability that the actual sample gives a poor representation of the parent population. In order to derive a `minimum sample size' for the re-sampling methods to be reliable, numerical simulations have to be employed. On the basis of the analysis of the previous section, we studied the performance of re-sampling uncertainties on the FP coefficients by testing the use of the bootstrap method. The results that follow were found to be largely independent of the simulation parameters, of the actual FP coefficients, and of the fitting method. FP simulations of fixed size
As shown in the figure, the bootstrap method gives a good measure of the average uncertainties, but a very poor, largely scattered, representation of the actual errors. Only for
() the MVs of the re-sampling
standard errors appear to differ from the true uncertainties. As a
matter of fact, it turns out that this difference is due to the use of
1 standard intervals to estimate the
bootstrap confidence intervals. For small samples, in fact, the
distribution of pseudo-values is significantly different from a normal
one, so that the desired CLs must be obtained by non-parametric
estimates
On the other hand, by looking at Fig. 7, we notice that for small samples the bootstrap uncertainties have a very large dispersion with respect to true values. The scatter varies from for up to for . Below the FP parent population is poorly represented, so that the single bootstrap estimates can be very unsatisfactory. To have a comparison with the theoretical methods, we compared the width of the error bars shown in Fig. 5 and Fig. 7. In Fig. 9 we plot the difference of the error bars against the logarithm of the sample size.
While for large samples theoretic and bootstrap uncertainties have
a similar scatter, for
() the bootstrap errors become
increasingly less accurate. For ,
the scatter increases up to for
The conclusions are thus the following. For large samples, both theoretic and bootstrap methods give accurate estimates of the uncertainties. For , both methods give estimates that can differ significantly from the true values. The bootstrap is accurate on the average but the uncertainties have a very large scatter. The theoretical methods give values that are more precise, but systematically underestimated. © European Southern Observatory (ESO) 2000 Online publication: October 30, 2000 |