4. Monte Carlo simulations of a test case
To investigate properties of the ML estimator we use the Hyades cluster as a test case. Based on kinematic criteria, including spectroscopic radial velocities, Perryman et al. (1998) identified 197 stars from the Hipparcos Catalogue as probable members of the Hyades (stars with `1' in column x of their Table 2). This sample is subsequently referred to as `Hy0'. The positions and distances of these stars, as determined by Hipparcos, define the three-dimensional structure of the cluster assumed in all our simulations. The astrometric accuracies () are also taken directly from the Hipparcos Catalogue. In equatorial coordinates the centroid velocity of the cluster is taken to be km s-1, as found by Perryman et al. (1998) for the inner 20 pc of the cluster, and from the same authors we take the centroid position to be pc.
4.1. Solutions based on consistent cluster models
In this section we examine the performance of the ML estimator in the case when the model assumed in the solution correctly describes the actual kinematics of the cluster. The purpose is to determine the possible biases that are intrinsic to the method, and its behaviour under ideal circumstances. In Sect. 4.2, we then consider how deviations from the assumed model will affect the results.
4.1.1. Precision and bias of estimates
Results from 5000 experiments with the basic cluster model applied to the 197 stars in sample Hy0 are summarised in Table 1 (simulation a ). A value of 0.30 km s-1 was taken for the true velocity dispersion (Perryman et al. 1998). It is seen that, as a mean of the many experiments, the parallaxes are within mas of their true values, and the estimated centroid velocity vector within km s-1 of the true vector. As a consequence, the bias in the radial component of the centroid velocity is only 0.01 km s-1. However, the velocity dispersion is significantly underestimated (0.15 versus the true value 0.30 km s-1).
Table 1. Results of Monte Carlo simulations for the Hyades sample Hy0 with stars. Each simulation (a to c ) comprises 5000 experiments and the results are described by the following quantities: assumed (`true') parameter value; mean estimated parameter value; mean estimated standard error of the parameter; rms scatter of the estimates. The columns correspond to the model parameters and through and, in the final column, the radial component of the centroid velocity (). For the parallaxes, the statistics refer to the deviations from the true values ().
In simulation a , all the model parameters are thus recovered without any significant bias, except , which is severely underestimated. On the other hand, the uncertainties of all the estimated parameters are significantly underestimated (roughly by 25 per cent, i.e. ). This latter circumstance is probably a consequence of the underestimated , which through Eq. (16) gives a too low value for the total variance of the observables. Even if we are (here) not primarily interested in estimating the velocity dispersion per se, it is thus important to understand why it is so strongly underestimated in the solutions. This problem is further investigated in Sect. 4.3.
The simulations described above all used the basic cluster model, in which no systematic velocity field is included. In simulation b of Table 1 we added a pure rotation (), and in c a rotation plus the remaining terms () due to non-isotropic dilation. In both cases the ML estimation was made with exactly the same model as was used to generate the data, i.e. with a total of model parameters in b and in c . No expansion rate () was assumed. The non-zero components of and were all arbitrarily set to +20 km s-1 kpc-1. It is seen that the assumed systematic velocity terms are approximately recovered in the solutions, albeit with biases that are of the same order as the formal uncertainties . An important observation is that the centroid radial velocity () remains practically unbiased, even if the velocity field terms are not. The inclusion of more parameters in the solution naturally increases both the formal uncertainties and the actual scatter of the estimates. We conclude that a linear velocity field (without general expansion), if it exists, can be determined with the present method. The resulting astrometric radial velocities are unbiased, but the statistical uncertainties are larger than for the basic cluster model. If we include an isotropic expansion in the simulation of data, the estimated centroid radial velocity is biased according to Eq. (10) in Paper I. The other basic model parameters remain unbiased.
4.1.2. Kinematically improved parallaxes
Fig. 2 shows the distribution of parallax errors in simulation a . The estimation errors in the parallaxes are significantly smaller than the observational errors assumed in the input data for the solutions. As discussed in Sect. 6.3 of Paper I this can be understood as resulting from a combination of the original trigonometric parallaxes with the kinematic parallaxes derived from the observed proper motions and the fitted kinematic parameters. We call these `kinematically improved parallaxes', as they are neither trigonometric, nor (purely) kinematic. The kinematically improved parallaxes, here seen as a by-product of the determination of astrometric radial velocities, are by themselves highly interesting e.g. for a better definition of the Hertzsprung-Russell diagram for several clusters and associations (Dravins et al. 1997; Madsen 1999; de Bruijne 1999).
The rms error of 0.43 mas for the kinematically improved parallaxes indicated in Fig. 2 is in reasonable agreement with the theoretical Eq. (11) of Paper I. Assuming mas, mas yr-1 (representative for ), km s-1, and other data as in Table 4 of Paper I, the latter equation gives mas.
The standard deviation of the normalised estimation error is 1.28. The formal errors of the estimated parallaxes thus need to be increased by 28 per cent to be consistent with the scatter found in the Monte Carlo experiments.
4.1.3. Distribution of the goodness-of-fit values
Fig. 3 shows the cumulative distribution of the values from simulation a in Table 1 (solid curve). For comparison, the thick gray line shows the distribution (with slope -0.5 in the lin-log diagram) expected on theoretical grounds as explained in Sect. 3.7. The empirical distribution is indeed very nearly exponential, but with a slope of . The scaling factor in Eq. (21) is, therefore, in this simulation. This is in rough quantitative agreement with the previous conclusion that the total covariances are too small as a consequence of the underestimation of .
The distribution of the values is of course modified by the rejection procedure described in Sect. 3.7, whereby an upper limit is introduced. As shown by the dashed curves in Fig. 3, this procedure produces a gently truncated exponential distribution of the values, without much affecting the distribution of the small values.
Robustness refers to the desirable property of an estimator that the results are relatively insensitive to deviations from the model assumptions. In the context of the basic cluster model two types of deviation have been considered: systematic velocity patterns, and deviations from a Gaussian velocity distribution. The latter may be caused either by individual contaminating field stars or astrometric binaries, or by a more general shape of velocity distribution (e.g. a mixture of different dispersions).
The existence of systematic velocity patterns can at least partly be dealt with by solving components of a linear velocity field as discussed in Sect. 3.3. However, in practice this solution would not be accepted unless it gave a significant result for the linear velocity terms. We should therefore consider the possible biases in the basic solution produced by velocity fields that are weak enough to remain undetected. Given the formal errors and scatters in simulations b and c (Table 1), it is clear that components in and (or equivalently in ) of the order of 10 km s-1 kpc-1 would generally remain undetected. Simulations were made, in which the components of and were randomly assigned values with a uniform distribution in the interval 10 km s-1 kpc- 1. In each experiment the centroid velocity was estimated by means of the basic cluster model, i.e. without solving for the linear velocity field. The resulting rms scatter in the astrometric radial velocity of the cluster centroid was 0.43 km s-1. This should be compared with the scatter of 0.34 km s-1 obtained in simulation a without linear velocity fields, but with otherwise identical assumptions. The increased scatter (by 0.26 km s-1 in quadrature) is in very good agreement with the theoretically expected 0.27 km s-1 derived from Eq. (A6) in Paper I.
The use of (primarily) kinematic criteria to identify probable cluster members, e.g. for the Hyades by Perryman et al. (1998), precludes that the input sample for the ML estimation contains a large number of field stars. Also member stars with strongly deviating proper motion (due to binarity) are rejected a priori. However, a small number of stars could still have peculiar velocities exceeding several times the combined standard deviation of observational errors and . The rejection procedure described in Sect. 3.7 is intended to eliminate such outliers. To test the effectiveness of the procedure, we made a simulation in which the peculiar velocity of an individual star, with probability 0.05, was multiplied by a factor 10. The centroid astrometric accuracy was estimated with different rejection limits . The rms scatter was 0.56 km s-1 for the full sample () and reached a shallow minimum of 0.35 km s-1 for , with a mean rejection rate about 0.05. Since this minimum is very close to what is obtained for the full, uncontaminated sample, we conclude that the rejection procedure is very efficient for this type of outliers.
In reality we do not expect such a clear-cut distinction between well-behaved member stars and outliers. A more likely situation is that of a continuous blend of populations with different kinematic characteristics. A simple model for this is to assume that the velocity dispersion itself is a random variable. Since the dispersion must be positive, a convenient assumption is that it follows a log-normal distribution with median value and logarithmic standard width (thus is Gaussian with mean value and standard deviation ). With km s-1 and we found that the scatter in the centroid radial velocity was 0.44, 0.42, 0.40, 0.38, 0.40 km s-1 for , 25, 20, 15 and 10, respectively. A cut-off limit around 15 thus appears optimal also in this case. The non-Gaussian nature of the internal velocities is clearly revealed in the statistics of the residuals, as shown by the distribution of values in Fig. 4. This may be a useful diagnostic for the study of real cluster data (Sect. 5).
A rejection procedure using thus appears to provide excellent protection against outliers and works very well also in more general cases of non-Gaussian velocity dispersions. The accuracy of the resulting centroid velocity is only marginally degraded compared with the nominal case of a Gaussian velocity dispersion.
4.3. Unbiased estimation of the velocity dispersion
It was noted in Sect. 4.1.1 that the internal velocity dispersion is strongly underestimated in the ML solutions based on simulated observations. We now turn to investigating this effect more closely, and to finding a remedy for it.
The bias in the ML estimate of apparent in Table 1 is probably related to the circumstance that we assume an isotropic three-dimensional dispersion of the peculiar velocities , while in practice only one component can be measured astrometrically, viz. perpendicular to the plane containing the line of sight and the centroid velocity vector. The radial component of the peculiar velocities is obviously not determined at all, since that would require spectroscopic velocities. The remaining tangential component , parallel to the plane containing the line of sight and the centroid velocity, is largely absorbed by the individual distance estimates (provided that , as is the case for the Hyades cluster). Thus the measured variance in one direction () is effectively `spread out' in all three directions, causing the estimated to come out much too small. The effect is compounded by the observational errors in the proper motions, which are implicitly taken into account in the ML estimation by reducing even further. This explains, at least qualitatively, why we obtain in simulations where the true dispersion is less than a certain value ( km s-1 for the Hyades case).
It should be remarked here that the existence of this bias does not imply that the present formulation or implementation of the ML method would not be valid. While the ML method is known to perform well in many practical situations, there is no guarantee that it provides unbiased estimates. In the present case the bias seems to be the consequence of an intrinsic anisotropy of the astrometric observations with respect to the mathematical cluster model.
The referee has drawn our attention to a practical way in which this difficulty could be avoided, using a variant of the procedure described by Narayanan & Gould (1999a). Instead of forcing a solution with the - physically well motivated - isotropic dispersion, let us assume a triaxial velocity ellipsoid with dispersions , , along the previously defined axes. Applying the ML estimation to this model will of course give much too small a value for and an undetermined . However, may be obtained without bias, and that value could then be adopted as the actual dispersion in all three axes. (Narayanan & Gould use also spectroscopic radial velocities and photometric distances, and so are able to obtain information on all three components; they then impose an isotropic dispersion using the best combined estimate.)
We have not adopted that method, mainly because there is a mathematical inconsistency involved in forcing and to equal the independently estimated . However, along a similar line of thought we found another way to deal with the problem. From the proper-motion residuals of the ML solution we compute estimates of the peculiar velocity components and of their observational uncertainties . A posteriori analysis of these data provides an estimate of , and hence of under the hypothesis of isotropic dispersion. Details of the procedure are given in Appendix A.4. Table 2 shows that calculated in this way is a practically unbiased estimate of even for very small dispersions. This value can then be used in Monte Carlo simulations to derive the true uncertainties in all the estimated parameters.
Table 2. Results of Monte Carlo simulations corresponding to case a in Table 1 (the Hyades sample Hy0 with 197 stars), but for different values of the assumed velocity dispersion and including results from the analysis of proper motion residuals. assumed (`true') velocity dispersion; velocity dispersion obtained by the standard ML estimation method of Sect. 3; velocity dispersion normal to the cluster motion, calculated according to Sect. 4.3 and Appendix A.4. The table gives mean values and rms scatter S from simulations with 200 experiments each. All values are in km s-1.
© European Southern Observatory (ESO) 2000
Online publication: April 17, 2000