Astron. Astrophys. 356, 1119-1135 (2000)

## 3. Mathematical formulation

### 3.1. The precise definition of `radial velocity'

When aiming at sub-km s-1 accuracy levels it becomes necessary to consider the exact meaning of the term `radial velocity'. Spectroscopically, the measured quantity is the Doppler shift , where is the observed wavelength and the rest-frame wavelength. We assume that this shift has been corrected for local effects such as the observer's motion relative to the Solar System Barycentre. Even then, the quantity z depends not only on the radial component of the star's velocity, but also on the transverse Doppler shift, gravitational redshift, gas motions in the stellar atmosphere, conventions concerning the adopted reference frames, and possibly other effects unknown to the observer. Since the precise interpretation of z in terms of stellar motion is thus model-dependent, we have proposed (Lindegren et al. 1999) that the term `radial-velocity measure' is used for the quantity cz, and that this (well-defined) quantity should be regarded as the proper result of a spectroscopic measurement.

The above approach to the interpretation of lineshift measurements is conceptually rather different from the traditional way in which radial velocities are determined. The aim of the latter is (usually) to determine in some sense the `true' radial components of the space motions of the stars, by removing all other sources of spectroscopic shifts. This can be done, at least for solar-type stars and up to a certain degree, through comparison with the solar spectrum, directly or indirectly via minor planets. The pioneering efforts by Griffin et al. (1988) and Gunn et al. (1988) to obtain accurate radial velocities for stars in the Hyades cluster may be cited as an example of this classical approach. We think, however, that current and future spectroscopic measurements of much higher accuracy will require a more stringent definition, in which the observable quantity (represented by the radial-velocity measure cz) is clearly separated from its physical interpretation.

The determination of astrometric radial velocities is not affected by factors such as the transverse Doppler effect and gravitational redshift. Nevertheless we need to state explicitly what we mean by radial velocity, in order to compare our astrometric results with spectroscopic determinations. The main point to consider is the light-time effects due to the finite speed of light. As the star moves through space, the time interval from light emission at the object () to the arrival at the solar system barycentre () changes, and the corresponding stretching or compression of the time scale naturally affects the observations. Rigorous treatment of this problem is beyond the present paper. The residual effect is however very small provided a single time scale (such as ) is consistently used to describe the phenomena. Since proper motions are defined as the time derivatives of direction with respect to (, where is the barycentric direction to the star) we adopt the convention that the (astrometric) radial velocity is defined as . In the absence of relativistic effects this is related to the Doppler shift by .

### 3.2. The maximum-likelihood method

The maximum-likelihood (ML) method is a well-known technique for parameter estimation described in most textbooks on probability theory and statistics (Kendall & Stuart 1979; Casella & Berger 1990). The following brief introduction provides some of the mathematical framework, notations and terminology required for subsequent sections.

Application of the ML estimation method requires that the observed quantities (observables - in our case the astrometric data) are modelled as random variables whose probability density function (pdf) depends on a finite set of model parameters . The model describes both the physical object itself, including for instance random motions in the cluster, and the process of observation, i.e. measurement noise. The complete set of observables in a given estimation problem may be represented by the multidimensional variable (vector) . Similarly, the set of model parameters makes up the vector . Mathematically speaking, the model is fully specified by the function , which is the pdf of the random variable for given model parameters .

In the following we use diacritic marks to distinguish between different realisations (or versions) of the same random variable. For the generic variable x we use to denote the true value, the observed (or simulated) value, and the estimated value. A summary of notations is given in Appendix B.

The observations provide a unique realisation of the random variable . The problem is to find the `best' estimate of consistent with the observed data. We use the principle of maximum likelihood to obtain this estimate. The likelihood function is defined as and the ML estimate is the set of parameters maximising the likelihood or, equivalently, the log-likelihood function .

The curvature of in the vicinity of its maximum is a measure of the sharpness (precision) of the ML estimate. Statistical theory provides an estimate of the covariance of in the form of a lower bound, known as the Cramér-Rao inequality (Kendall & Stuart 1979). Subject to regularity conditions this bound can be written, for the vector-valued parameter ,

(Silvey 1970). Here E is the statistical expectation operator and the prime denotes matrix transposition. Although Eq. (1) formally only provides a lower bound to the covariance, it is in practice often quite accurate. However, it is recommended that its validity is always checked by means of Monte Carlo simulations (Sect. 4).

A complete formulation of the moving-cluster problem thus requires specification of the model parameters (), the observables (), and the probability density function . Additionally, the mathematical formulation includes auxiliary data which are regarded as fixed, i.e. known a priori, or with sufficient accuracy that their uncertainties need not be taken into account in the estimation process. These include in our case the positions of the stars, the standard errors of the observed proper motions and parallaxes, and the correlation coefficients.

### 3.3. Cluster model parameters

The stars in a cluster are distinguished by the subscript i running from 1 to n, the number of stars considered. The kinematic state of the cluster is completely specified by the position and velocity of each star relative to the solar system barycentre. The cluster model provides a parametrised statistical description of .

The three-dimensional position (in pc) of a star can be written , where is the direction (unit vector) towards the star and the parallax in mas. We regard as error-free, i.e. belonging to the category of auxiliary data, and as a parameter of the model. The n parallaxes of the cluster constitute a vector which is part of the general parameter vector .

Let be the centroid velocity, i.e. the mean velocity of the cluster member stars. The equatorial Cartesian components of constitute three more elements of the parameter vector .

The astrometric measurements are accurate enough to detect the deviations of the individual velocities from the centroid velocity, i.e. the tangential components of the peculiar velocities . Thus, a statistical description of the peculiar velocities is needed, in the form of a parametrised pdf for . We assume that the peculiar motions are Maxwellian (i.e., Gaussian in the rectangular components), and thus fully described by a dispersion tensor . We take this to be isotropic and independent of stellar mass and position in the cluster:

The internal velocity dispersion is thus another element of the parameter vector .

However, we shall not a priori exclude the possibility of systematic velocity patterns in the cluster such as rotation and (non-isotropic) dilation. To a first approximation such patterns may be described as a linear velocity field, represented by the tensor introduced in Appendix A of Paper I. The expected space velocity of a star at position is then

where is the centroid position.  1 In equatorial coordinates the components of are the nine partial derivatives for . However, as was shown in Paper I (Appendix A), only eight independent components of can in principle be determined by the present method. The ninth component , representing an isotropic expansion or contraction of the cluster, cannot be separated from a change in based on only astrometric data. To avoid a singularity in the ML equations it is therefore necessary to apply the constraint on the form of , or more generally to assume a fixed value for (e.g. the inverse age for an expanding association). Some other linear combinations of the tensor components have a simple physical interpretation. In particular, the anti-symmetric part of the tensor represents a rigid-body rotation about the centroid. We therefore use the following eight linearly independent components of to represent the internal systematic velocity field:

The components of are uniquely determined by Eq. (4) and the assumed expansion rate , since . is the angular velocity of the cluster, while represents (non-isotropic) dilation. The components of and are additional elements of the general parameter vector .

The complete parameter vector is, therefore, . The total number of parameters is . Although this is our most general cluster model, we shall normally assume that the internal systematic velocities are negligible, in which case only the parameters in are estimated. We refer to this restricted parameter set as the basic cluster model .

In summary, the pdf for the space velocity of star i is assumed to be Gaussian with mean value [Eq. (3)] and covariance [Eq. (2)]. The explicit form of the pdf is

where is the mathematical constant.

### 3.4. Observation model

For each star the observables are the trigonometric parallax () and the proper motion components in right ascension  2 () and declination (). These are collected in arrays

The actually observed values are in the arrays . It is assumed that the observations are unbiased,

with known covariance matrices

The observational errors for the different stars, on the other hand, are assumed to be uncorrelated:

[This assumption does not hold strictly e.g. for Hipparcos data. We discuss this further in Sect. 5.4.] Gaussian error distributions are assumed. The pdf for the observables, conditional upon their true values, is then

Astrometry also provides the barycentric right ascension () and declination () of each star for a certain epoch. For the present purpose the positional data can be regarded as error-free and defining the unit vector from the solar system barycentre towards the star. Two more auxiliary unit vectors, tangent to the unit sphere at , are needed: in the direction of increasing right ascension (local `East'), and in the direction of increasing declination (local `North'). , and form a right-handed orthogonal coordinate frame known as the `normal triad' at with respect to the equatorial frame (Murray 1983). The explicit formulae for these vectors are given in Eq. (A.2).

Given the position and velocity of a star, the `true' observables are calculated as

where  km yr s-1 is the astronomical unit. If there were no velocity dispersion (), then from Eq. (3) could be substituted for in Eq. (11) and the pdf for the observables in Eq. (14) could immediately be written conditional upon the model parameters and , as required for the ML estimation.

In the presence of a non-zero velocity dispersion, however, is itself a random variable with pdf according to Eq. (5). The joint pdf of the observables with the velocity is

since the observational errors are assumed to be independent of the random velocities. The pdf of the observables is then obtained as the marginal density

This integral can be evaluated analytically after insertion of and from Eqs. (5) and (10) in Eq. (12). Since the product of two normal probability density functions is normal, and the marginal density of a normal pdf is also normal, it follows that p is normal and can be written

We find that

and, using the isotropic dispersion tensor from Eq. (2),

### 3.5. The likelihood function

Since the observational errors and random velocities of the individual stars are assumed to be statistically independent, the pdf of the whole set of observables equals the product of the individual pdf 's. The log-likelihood function is, therefore,

where and depend on through Eq. (15) and (16). The ML estimate is obtained by finding the maximum of , or, equivalently, the minimum of

where

The practical algorithm to find the maximum of is discussed in Appendix A.

Given the set of estimated parameters the astrometric radial velocity of star i is computed as the line-of-sight projection of the estimated (non-random) stellar space velocity in Eq. (3). For the basic cluster model (with ) this reduces to

It should be noted that the error in this quantity is the sum of two statistically independent components: (1) the radial component of the estimation error in , and (2) the radial component of the peculiar velocity of the star, ; see Eq. (A.18).

### 3.7. Rejection of outliers

Our formulation of the cluster model does not take into account that the observational material may include field stars which do not share the common (mean) space velocity of the cluster. The ML method requires that the model provide a statistically correct description of the data. In particular, it must only be applied to the actual members of the cluster, or rather to members whose mean space motion during the observing period agrees with the model. In practice this rules out also a number of close binaries, even if they are members of the cluster, since the short-term motions of their photocentres may deviate significantly from the motion of their centres of mass. Because of the high frequency of duplicity and the wide distribution of separations and periods, the subset of astrometrically detectable binaries blends continuously with the non-perturbed, single member stars. The elimination of outliers, whether members or not, is therefore an equally important and delicate part of the application of the moving-cluster method.

Outliers can be detected by computing a suitable goodness-of-fit statistic for each star in the solution. The quantity , where is defined by Eq. (19), is a quadratic measure of the distance between the observed and fitted vector , weighted by the inverse of the expected covariance of the difference. Therefore, can be used for detection of outliers. In order to define a suitable rejection criterion it is desirable to know, at least approximately, the distribution of in the nominal case when the data behave according to the model.

The quadratic form in Eq. (19) and the assumption of Gaussian errors suggest that should approximately follow a chi-square distribution. There are observables and parameters in the basic cluster model, and consequently degrees of freedom, or degrees of freedom for each (if n is not small). In simulations using Gaussian distributions for the peculiar velocities and observation errors we find that a scaled version of is very nearly distributed as . That is,

where is a scaling factor to be determined by the simulations (Sect. 3.8). For a given level of significance the star should therefore be rejected if , where

In simulations of Hyades data (Sect. 4.1.3) we find , so that a 1 per cent significance level requires . As discussed in Sect. 4.2, it is possible to derive an optimal value for if the distribution of peculiar velocities can be properly modelled.

A complication with this rejection procedure is that the goodness-of-fit statistics depend on the estimated through Eq. (16). Eliminating outliers will however decrease the estimated velocity dispersion and consequently increase the values. This, in turn, will in general cause other stars to fall beyond the adopted acceptance limit. It is not obvious how to find the maximum subset of stars for which all , or if this subset is unique or even exists. Testing each of the possible subsets is obviously not a viable method for .

As a practical (if not necessarily optimal) solution we have adopted a sequential rejection procedure, in which the one star with the largest () is removed from the sample. A new solution is then computed, including new values. The process is repeated, removing the star with the largest and computing a new solution, until all . In some cases it may happen that the solution becomes unstable before this criterion is satisfied. In those cases where this happens when has been reduced to practically zero, the number of model parameters could be reduced, e.g. by assuming .

### 3.8. Use of numerical simulations

Monte Carlo simulation of the ML estimation problem is essential for studying the efficiency and convergence of the adopted procedure, as well as the precision and possible bias of the resulting estimates. In a Monte Carlo simulation, a set of `true' parameters is assumed and from this, many realisations of hypothetical (`observed') data are generated according to the adopted model. Random observational errors and other variations, in our case due to the internal velocity dispersion, are simulated by means of a random number generator. Applying the estimation algorithm to each hypothetical data set results in an estimated parameter vector . From the assembly of these vectors one can determine various statistics, in particular the bias and rms scatter of the individual parameter .

Synthetic cluster data are generated according to the following general recipe. First, the overall characteristics of the cluster and observations are specified: the number of stars n and their positions relative to the Sun, the centroid velocity , descriptions of systematic and random internal motions, and the observational accuracies. These data define the `true' parameter vector as well as the (error-free) auxiliary quantities and . Next, the true velocities of the individual stars are computed as , where follows from Eq. (4) and the Cartesian components of are drawn from independent Gaussian distributions with mean value zero and standard deviation . The true observables are then computed from Eq. (11). Finally, observation noise with covariance is added to give the hypothetical data set .  3

We use the following terminology: an experiment is a single realisation of `observed' data, plus the subsequent estimation of model parameters. A simulation is the assembly of N such experiments based on a fixed set of model parameters and auxiliary data, but with different realisations of the random variables. An important question is how big N needs to be. The precision by which the bias can be estimated is , where S is the rms scatter of the estimated values. To ascertain the level of biases, and compute the corresponding corrections, it is usually sufficient to know them to within 10 per cent of the scatter, which requires . However, one additional purpose of the simulations is to check the formal standard errors based on the Cramér-Rao bound, Eq. (1). A relative precision of the order of 1 per cent in the scatter is then desirable. For a normal distribution the relative precision of the scatter (sample standard deviation) is . To reach a precision of 1 per cent, the simulations should consequently consist of  experiments each.

© European Southern Observatory (ESO) 2000

Online publication: April 17, 2000