In this section we briefly review the classical Richardson-Lucy algorithm and its features in order to provide a basic understanding of the multiple-data Richardson-Lucy algorithm.
2.1. Richardson-Lucy algorithm
where is the function of interest, is the function accessible through observation, and the integral kernel reflects the measurement process. In general, and represent probability density functions, which implies that they and the kernel obey normalization and non-negativity constraints. is the probability - presumed to be known - that will fall in the interval when it is known that .
where the denotes the observed distribution, while is the measured quantity having sampling errors. A derivation of the iterative algorithm based on Bayes' theorem for conditional probabilities can be found in Lucy (1974).
The iteratively constructed functions satisfy the nonnegativity constraint: From Eq. (2b) it follows that if . The normalization constraint is fulfilled, as one can prove by integrating Eq. (2b) with respect to and using the normalizations of the probabilities and .
Ideally we would like an iterative algorithm to converge to the exact solution, and from Eq. 2 it can be seen that the above scheme converges if is sufficiently close to for all points x, except for those in a set of zero measure. However, this inherent convergence criterion is much too strong for practical purposes where may be contaminated by non-negligible measurement errors. Looking again at the two coupled Eqs. 2a and 2b, we see that deviations of from unity on a length scale large compared to that of are removed in essentially one iteration, whereas deviations on a small length scale are mostly averaged out when convolved with , and result only in small corrections to . Thus the algorithm has the property of first fitting the large-scale differences between the given initial guess and the true solution, while it fits the small-scale fluctuations only in later iteration steps. Under the reasonable assumption that the small scale fluctuations are more likely to be caused by statistical errors in , this behaviour of the RL algorithm is indeed highly desirable. It ensures that the RL procedure very quickly results in an approximate solution in which most of the significant information in the observed is already recovered. But one has to keep in mind that there is no obvious convergence criterion for recovering the large-scale fluctuations only; in general it is not easy to say when one starts to fit small-scale statistical fluctuations. Thus, for the algorithm to work, one has to know and to control the errors in the measured data very well.
2.2. Richardson-Lucy algorithm: the axisymmetric case
After deriving the basic RL-algorithm and discussing its features, we need to specify a cluster model in order to derive the kernel and formulate the RL-scheme for this specific model.
Consider a cluster of galaxies covered by a system of cartesian coordinates . We are interested in recovering the distribution of some physical quantity , which we assume to have axial symmetry with respect to the Z-axis of the cluster coordinate system:
Furthermore, we assume that the projection of is observed as some quantity , where the observer's coordinate system is inclined by an angle i, and in which and z is the LOS (see Fig. 1). The transformation between cluster coordinates and observer's coordinates is thus given by
where has to be normalized to unity. The derivation follows closely the derivation for elliptical galaxies given in Binney et al. (1990) and is reproduced in Appendix A. We can identify the kernel
and find that it is properly normalized:
Having the explicit expression 7 for the kernel , we could now start to apply the RL-scheme for recovering from an observed by means of Eqs. (2.1 a,b). However, since the probability kernel contains a -function, which is notoriously difficult to deal with in the context of discretized grid-data, we have chosen to reformulate the main Eqs. (2.1 a,b) for this special axisymetric case.
For the first integral of the iterative scheme the formulation A.5 of the integral using the probability kernel is not necessary. Instead, this integral can be evaluated as a simple integral along the LOS,
where the coordinate transformation (4c) is used to evaluate the integration along z. The integral (9) is analytically equivalent to the integral A.5, but not numerically. The approach using the probability kernel as in the integral (2a), which was used in Binney et al. (1990), involves the -functions. For our purpose, we found the approach given in A.5 to be numerically extremely unstable, and therefore we employed the direct approach stated in (9).
The second step 2b in the iterative RL-scheme reads in our case
For this second integral the evaluation of the probability kernel cannot be avoided. However, it is possible to eliminate the -function by again applying the same rule as reproduced in the Appendix to and by subsequently integrating over y as is again demonstrated in Appendix A. The calculation leads to our final result, namely a formulation for the second integral in the RL-scheme without -functions:
Please note that the integral (11) describes a full ellipse on the sky. The difference between Eq. (11) and the Appendix of Binney et al. (1990) is, that Eq. (11) uses the full information on the ellipse, while Binney et al. (1990), use only half of it. This formulation has the advantage that no assumptions on the data are made, i.e. the latter are not assumed to be symmetric along any of the projection axes.
© European Southern Observatory (ESO) 2000
Online publication: January 29, 2001