 |  |
Astron. Astrophys. 356, 1119-1135 (2000)
4. Monte Carlo simulations of a test case
To investigate properties of the ML estimator we use the Hyades
cluster as a test case. Based on kinematic criteria, including
spectroscopic radial velocities, Perryman et al. (1998) identified
197 stars from the Hipparcos Catalogue as probable members of the
Hyades (stars with `1' in column x of their Table 2). This sample
is subsequently referred to as `Hy0'. The positions and distances of
these stars, as determined by Hipparcos, define the three-dimensional
structure of the cluster assumed in all our simulations. The
astrometric accuracies ( ) are also
taken directly from the Hipparcos Catalogue. In equatorial coordinates
the centroid velocity of the cluster is taken to be
km s-1, as
found by Perryman et al. (1998) for the inner 20 pc of the
cluster, and from the same authors we take the centroid position to be
pc.
4.1. Solutions based on consistent cluster models
In this section we examine the performance of the ML estimator in
the case when the model assumed in the solution correctly describes
the actual kinematics of the cluster. The purpose is to determine the
possible biases that are intrinsic to the method, and its behaviour
under ideal circumstances. In Sect. 4.2, we then consider how
deviations from the assumed model will affect the results.
4.1.1. Precision and bias of estimates
Results from 5000 experiments with the basic cluster model applied
to the 197 stars in sample Hy0 are summarised in Table 1
(simulation a ). A value of 0.30 km s-1
was taken for the true velocity dispersion (Perryman et al.
1998). It is seen that, as a mean of the many experiments, the
parallaxes are within mas of
their true values, and the estimated centroid velocity vector within
km s-1 of the
true vector. As a consequence, the bias in the radial component of the
centroid velocity is only 0.01 km s-1. However,
the velocity dispersion is significantly underestimated (0.15 versus
the true value 0.30 km s-1).
Table 1. Results of Monte Carlo simulations for the Hyades sample Hy0 with stars. Each simulation (a to c ) comprises 5000 experiments and the results are described by the following quantities: assumed (`true') parameter value; mean estimated parameter value; mean estimated standard error of the parameter; rms scatter of the estimates. The columns correspond to the model parameters and through and, in the final column, the radial component of the centroid velocity ( ). For the parallaxes, the statistics refer to the deviations from the true values ( ).
In simulation a , all the model parameters are thus
recovered without any significant bias, except
, which is severely underestimated.
On the other hand, the uncertainties of all the estimated
parameters are significantly underestimated (roughly by 25 per
cent, i.e. ). This latter
circumstance is probably a consequence of the underestimated
, which through Eq. (16) gives a
too low value for the total variance of the observables. Even if we
are (here) not primarily interested in estimating the velocity
dispersion per se, it is thus important to understand why it is
so strongly underestimated in the solutions. This problem is further
investigated in Sect. 4.3.
The simulations described above all used the basic cluster model,
in which no systematic velocity field is included. In simulation
b of Table 1 we added a pure rotation
( ), and in c a rotation plus
the remaining terms ( ) due to
non-isotropic dilation. In both cases the ML estimation was made with
exactly the same model as was used to generate the data, i.e. with a
total of model parameters in
b and in c . No
expansion rate ( ) was assumed. The
non-zero components of and
were all arbitrarily set to
+20 km s-1 kpc-1. It is seen that
the assumed systematic velocity terms are approximately recovered in
the solutions, albeit with biases that are of the same order as the
formal uncertainties . An important
observation is that the centroid radial velocity
( ) remains practically unbiased,
even if the velocity field terms are not. The inclusion of more
parameters in the solution naturally increases both the formal
uncertainties and the actual scatter of the estimates. We conclude
that a linear velocity field (without general expansion), if it
exists, can be determined with the present method. The resulting
astrometric radial velocities are unbiased, but the statistical
uncertainties are larger than for the basic cluster model. If we
include an isotropic expansion in the simulation of data, the
estimated centroid radial velocity is biased according to
Eq. (10) in Paper I. The other basic model parameters remain
unbiased.
4.1.2. Kinematically improved parallaxes
Fig. 2 shows the distribution of parallax errors in simulation
a . The estimation errors in the parallaxes are significantly
smaller than the observational errors assumed in the input data for
the solutions. As discussed in Sect. 6.3 of Paper I this can
be understood as resulting from a combination of the original
trigonometric parallaxes with the kinematic parallaxes derived from
the observed proper motions and the fitted kinematic parameters. We
call these `kinematically improved parallaxes', as they are neither
trigonometric, nor (purely) kinematic. The kinematically improved
parallaxes, here seen as a by-product of the determination of
astrometric radial velocities, are by themselves highly interesting
e.g. for a better definition of the Hertzsprung-Russell diagram for
several clusters and associations (Dravins et al. 1997; Madsen
1999; de Bruijne 1999).
![[FIGURE]](img168.gif) |
Fig. 2. Distribution of parallax errors in simulation a (Table 1): the histogram labelled `obs' is for the observational errors ( , standard deviation 1.76 mas), while `est' is for the estimation errors ( , standard deviation 0.43 mas).
|
The rms error of 0.43 mas for the kinematically improved
parallaxes indicated in Fig. 2 is in reasonable agreement with
the theoretical Eq. (11) of Paper I. Assuming
mas,
mas yr-1
(representative for ),
km s-1, and
other data as in Table 4 of Paper I, the latter equation
gives mas.
The standard deviation of the normalised estimation error
is 1.28. The formal errors of the
estimated parallaxes thus need to be increased by 28 per cent to
be consistent with the scatter found in the Monte Carlo
experiments.
4.1.3. Distribution of the goodness-of-fit values
Fig. 3 shows the cumulative distribution of the
values from simulation a in
Table 1 (solid curve). For comparison, the thick gray line shows
the distribution (with slope -0.5 in
the lin-log diagram) expected on theoretical grounds as explained in
Sect. 3.7. The empirical distribution is indeed very nearly
exponential, but with a slope of .
The scaling factor in Eq. (21) is, therefore,
in this simulation. This is in
rough quantitative agreement with the previous conclusion that the
total covariances are too small as a
consequence of the underestimation of
.
![[FIGURE]](img183.gif) |
Fig. 3. Cumulative distribution of the goodness-of-fit values obtained in simulation a of Table 1 (solid curve) and with cut-off values , 25, 20, 15, and 10 (dashed curves). The thick gray line is for a chi-square distribution with 2 degrees of freedom. In these simulations, the internal velocities follow a Gaussian distribution with dispersion km s-1.
|
The distribution of the values is
of course modified by the rejection procedure described in
Sect. 3.7, whereby an upper limit
is introduced. As shown by the
dashed curves in Fig. 3, this procedure produces a gently
truncated exponential distribution of the
values, without much affecting the
distribution of the small values.
4.2. Robustness
Robustness refers to the desirable property of an estimator that
the results are relatively insensitive to deviations from the model
assumptions. In the context of the basic cluster model two types of
deviation have been considered: systematic velocity patterns, and
deviations from a Gaussian velocity distribution. The latter may be
caused either by individual contaminating field stars or astrometric
binaries, or by a more general shape of velocity distribution (e.g. a
mixture of different dispersions).
The existence of systematic velocity patterns can at least partly
be dealt with by solving components of a linear velocity field as
discussed in Sect. 3.3. However, in practice this solution would
not be accepted unless it gave a significant result for the linear
velocity terms. We should therefore consider the possible biases in
the basic solution produced by velocity fields that are weak enough to
remain undetected. Given the formal errors and scatters in simulations
b and c (Table 1), it is clear that components in
and
(or equivalently in
) of the order of
10 km s-1 kpc-1 would generally
remain undetected. Simulations were made, in which the components of
and
were randomly assigned values with a
uniform distribution in the interval
10 km s-1 kpc-
1. In each experiment the centroid velocity was estimated by
means of the basic cluster model, i.e. without solving for the linear
velocity field. The resulting rms scatter in the astrometric radial
velocity of the cluster centroid was 0.43 km s-1.
This should be compared with the scatter of
0.34 km s-1 obtained in simulation a
without linear velocity fields, but with otherwise identical
assumptions. The increased scatter (by
0.26 km s-1 in quadrature) is in very good
agreement with the theoretically expected
0.27 km s-1 derived from Eq. (A6) in
Paper I.
The use of (primarily) kinematic criteria to identify probable
cluster members, e.g. for the Hyades by Perryman et al. (1998),
precludes that the input sample for the ML estimation contains a large
number of field stars. Also member stars with strongly deviating
proper motion (due to binarity) are rejected a priori. However, a
small number of stars could still have peculiar velocities exceeding
several times the combined standard deviation of observational errors
and . The rejection procedure
described in Sect. 3.7 is intended to eliminate such outliers. To
test the effectiveness of the procedure, we made a simulation in which
the peculiar velocity of an individual star, with probability 0.05,
was multiplied by a factor 10. The centroid astrometric accuracy was
estimated with different rejection limits
. The rms scatter was
0.56 km s-1 for the full sample
( ) and reached a shallow minimum of
0.35 km s-1 for
, with a mean rejection rate about
0.05. Since this minimum is very close to what is obtained for the
full, uncontaminated sample, we conclude that the rejection procedure
is very efficient for this type of outliers.
In reality we do not expect such a clear-cut distinction between
well-behaved member stars and outliers. A more likely situation is
that of a continuous blend of populations with different kinematic
characteristics. A simple model for this is to assume that the
velocity dispersion itself is a
random variable. Since the dispersion must be positive, a convenient
assumption is that it follows a log-normal distribution with median
value and logarithmic standard
width (thus
is Gaussian with mean value
and standard deviation
). With
km s-1 and
we found that the scatter in the
centroid radial velocity was 0.44, 0.42, 0.40, 0.38,
0.40 km s-1 for
, 25, 20, 15 and 10, respectively. A
cut-off limit around 15 thus appears optimal also in this case. The
non-Gaussian nature of the internal velocities is clearly revealed in
the statistics of the residuals, as shown by the distribution of
values in Fig. 4. This may be a
useful diagnostic for the study of real cluster data
(Sect. 5).
![[FIGURE]](img203.gif) |
Fig. 4. Cumulative distribution of the goodness-of-fit values obtained in simulations with a non-Gaussian velocity distribution. As in Fig. 3, the different curves are for different cut-off values (solid curve) and 30, 25, 20, 15, 10 (dashed curves). In these experiments, was modelled as a log-normal random variable with median 0.3 km s-1 and standard deviation 1.0 in .
|
A rejection procedure using thus
appears to provide excellent protection against outliers and works
very well also in more general cases of non-Gaussian velocity
dispersions. The accuracy of the resulting centroid velocity is only
marginally degraded compared with the nominal case of a Gaussian
velocity dispersion.
4.3. Unbiased estimation of the velocity dispersion
It was noted in Sect. 4.1.1 that the internal velocity
dispersion is strongly
underestimated in the ML solutions based on simulated observations. We
now turn to investigating this effect more closely, and to finding a
remedy for it.
The bias in the ML estimate of
apparent in Table 1 is probably related to the circumstance that
we assume an isotropic three-dimensional dispersion of the peculiar
velocities , while in practice only
one component can be measured astrometrically, viz.
perpendicular to the plane
containing the line of sight and the centroid velocity vector. The
radial component of the peculiar
velocities is obviously not determined at all, since that would
require spectroscopic velocities. The remaining tangential component
, parallel to the plane containing
the line of sight and the centroid velocity, is largely absorbed by
the individual distance estimates (provided that
, as is the case for the Hyades
cluster). Thus the measured variance in one direction
( ) is effectively `spread out' in
all three directions, causing the estimated
to come out much too small. The
effect is compounded by the observational errors in the proper
motions, which are implicitly taken into account in the ML estimation
by reducing even further. This
explains, at least qualitatively, why we obtain
in simulations where the true
dispersion is less than a certain value
( km s-1 for
the Hyades case).
It should be remarked here that the existence of this bias does not
imply that the present formulation or implementation of the ML method
would not be valid. While the ML method is known to perform well in
many practical situations, there is no guarantee that it provides
unbiased estimates. In the present case the bias seems to be the
consequence of an intrinsic anisotropy of the astrometric observations
with respect to the mathematical cluster model.
The referee has drawn our attention to a practical way in which
this difficulty could be avoided, using a variant of the procedure
described by Narayanan & Gould (1999a). Instead of forcing a
solution with the - physically well motivated - isotropic dispersion,
let us assume a triaxial velocity ellipsoid with dispersions
,
,
along the previously defined axes. Applying the ML estimation to this
model will of course give much too small a value for
and an undetermined
. However,
may be obtained without bias, and
that value could then be adopted as the actual dispersion in all three
axes. (Narayanan & Gould use also spectroscopic radial velocities
and photometric distances, and so are able to obtain information on
all three components; they then impose an isotropic dispersion using
the best combined estimate.)
We have not adopted that method, mainly because there is a
mathematical inconsistency involved in forcing
and
to equal the independently
estimated . However, along a similar
line of thought we found another way to deal with the problem. From
the proper-motion residuals of the ML solution we compute estimates of
the peculiar velocity components
and of their observational uncertainties
. A posteriori analysis of
these data provides an estimate of ,
and hence of under the hypothesis of
isotropic dispersion. Details of the procedure are given in
Appendix A.4. Table 2 shows that
calculated in this way is a
practically unbiased estimate of
even for very small dispersions. This value can then be used in Monte
Carlo simulations to derive the true uncertainties in all the
estimated parameters.
![[TABLE]](img226.gif)
Table 2. Results of Monte Carlo simulations corresponding to case a in Table 1 (the Hyades sample Hy0 with 197 stars), but for different values of the assumed velocity dispersion and including results from the analysis of proper motion residuals. assumed (`true') velocity dispersion; velocity dispersion obtained by the standard ML estimation method of Sect. 3; velocity dispersion normal to the cluster motion, calculated according to Sect. 4.3 and Appendix A.4. The table gives mean values and rms scatter S from simulations with 200 experiments each. All values are in km s-1.
© European Southern Observatory (ESO) 2000
Online publication: April 17, 2000
helpdesk.link@springer.de  |