## 2. Bayesian statisticsIn the Bayesian approach to statistics probability is interpreted as a measure of credibility rather than the frequency of occurrence as in the classical approach (see e.g. O'Hagan 1994). This permits to assign a probability for the validity of a physical theory, in our case of a certain law of star formation. ## 2.1. Probability of a theoryLet us consider a set of The prior probabilities represent the
investigator's degree of belief or his knowledge from previous
measurements. The likelihood is a measure of
how well the predictions of match the data.
Thus, Eq. (1) describes how a measurement or observation improves our
knowledge: It states that the In case a hypothesis contains free
parameters , the above likelihood
is replaced by the where is the (joint) prior probability density. Since this prior is normalised over the parameter space, the When further data are available, it can easily be incorporated by applying Bayes' theorem sequentially, using the old posterior probability distributions just for the new prior. If one has only weak prior information about a parameter there are rules how to construct the prior distribution (Jeffreys 1983). Such a rule ensures for instance that for estimation of parameters the confidence region drawn from the posterior density distribution reproduces the 'classical' confidence region, i.e. given an experiment could be performed very often, the true value of a parameter is within the interval indeed as often as the chosen confidence level states. In practice, especially for good data, the actual prior may be quite unimportant: the truth has then a good chance against our prejudice (as long as an inappropriate prior does not prevent it at all). Only if the data were so poor or scant that they do not add to our knowledge, the posterior will reflect just our prejudice. Often one has 'nuisance parameters', i.e. formal parameters in a hypothesis, whose true values are of no interest or relevance, like offsets and factors of proportionality. Then the likelihood is also integrated over the space of this nuisance parameter. ## 2.2. Simulations with artificial dataAs a demonstration of how in the outlined method Occam's razor is at work, and to show some general features, we apply the method to a simpler problem: we wish to decide whether a measured profile of the optical surface brightness of a galaxy is better represented by an exponential law with a scale length that has the exponent The factors of proportionality are considered to be nuisance parameters. The 'observational' data are random realizations of which is sampled at points at . The noise applied is a Gaussian with zero mean and variance . Then the likelihood is: For each of the two laws (4) and (5) we compute the mean likelihood
by integrating Eq. (7) over the parameter
space. Following Jeffreys, the parameter priors are assumed to be
uniform over In Fig. 1 we compare the values of for the two fitting laws (4) and (5), as obtained from the individual data configurations for several values of noise level indicated by different symbols. For very low noise (the crowded cloud of open circles), the probability for the more complex law is much larger than that for the simple law. Hence, the true law underlying the data is unambiguously found. As the noise level increases, the cloud of points shows a larger scatter (triangles), and also the mean probability for the complex law drops. At dex the small dots are almost evenly clustered around the diagonal, i.e. on the average both laws are equally probable. Thus it depends on the particular configuration of the observational data which fitting law happens to be come out as the better one. At dex the cloud of crosses has become elongated along the diagonal, but most points are found in the regime with a higher probability for the simpler law. This is the action of Occam's razor. At still higher levels of the noise, the cloud of black squares is more strongly concentrated towards the diagonal, and in the limit of infinite noise, both laws are equally probable. Then the likelihood is constant in the whole parameter space.
Application of Bayes' Theorem requires that both laws are mutually exclusive (), one can assign to both laws probabilities which add up to unity. Points lying above the 99 per cent line mean that the probability of the law (5) is higher than 0.99. That there are no points lying below the 1 per cent line is due to the small parameter space . Enlarging this range would make the law (5) more unreasonable, and all points would move down. This example shows the workings of Occam's razor quite well. It also demonstrates that the scatter in due to different realizations can be quite appreciable. In practice, one often has but one set of data. ## 2.3. Application to our problemNow we apply this method to assess models of the SFR. A nested hierarchy of SFR laws is considered, with the most general one being with gas surface density and distance
Apart from the most general law (8), several simpler SFR prescriptions and as well as combinations are considered, in order to cover already proposed SF hypotheses. Since we shall assume that we have no preference to any one of these laws, we assign equal prior probabilities to all models, irrespective of the number of free parameters. In what follows the random noise due to observational errors as
well as any intrinsic scatter be normally distributed in
with zero mean and variance
. Hence, as suggested by published error bars,
it is assumed that the Assuming that the data points are independent from each other, the likelihood is given by: where is the observed SFR indicator, i.e.
the H surface brightness, and the summation is
over all Unfortunately, information about the error bars of the surface brightnesses in the various wavelengths is not only rather scarce, and is difficult to compare due to the different angular resolutions used, but one must also keep in mind that possibly there is quite a large genuine scatter which is averaged out by the azimuthal integration. In principle a thorough discussion of all the scatter involved should enable one to construct a more informative prior. Since the information necessary is not available, only weak prior information about the true error level is assumed. Thus, we consider an additional nuisance parameter, and apply a prior uniform in . The use of a prior uniform in does not affect the posterior inference markedly. The factor of proportionality Hence, the likelihood is integrated over where . The normalization integral for this prior diverges for , but since we consider the ratios of mean likelihoods, and since the same nuisance parameter with the same range is present in all the laws considered, the normalisation factors cancel. Integrating (10) afterwards with respect to results in a likelihood : Finally, integration over the parameter spaces gives the mean likelihood which is a measure for the posterior probability of a SFR law: and are the
Since depends on the volume of the parameter
space, the choice of the range for which would contain all reasonable situations, but would not be too restrictive. Within these boundaries the prior density distribution is assumed uniform. If the likelihood 'mountain' occupies only a small area well within this region, any further increase of the parameter space decreases by the factor with which the volume grows. This analysis is performed for each galaxy separately, yielding for each object -values for each of the different SFR prescriptions. The values are suitably normalised to facilitate comparison of the SFR laws, and these Bayes factors are collected in Table 2. To assess how well the various laws reproduce the data of the
complete sample of galaxies, there are two possibilities: The
probability for any law, but allowing the best combination of its
parameters to vary from object to object, is obtained by multiplying
simply the Bayes factors from Table 2. On the other hand, the
density distribution for the joint mean likelihood can be computed
from all objects, by taking the product over all © European Southern Observatory (ESO) 1997 Online publication: April 28, 1998 |