3. The fourier transform
In this section we discuss the effects of data sampling on the Fourier transform. The Fourier transform of a continuous function is defined by
and its inverse by
When the function consists of N discrete measurements (), which constitute an equidistant time series of length T, then the discrete Fourier transform results in N values () for the Fourier components
The inverse discrete transform is given by
In these expressions with so that the time steps are given by . The coefficients are related to the frequencies so that the resolution in frequency is given by . The highest frequency, corresponding to the Nyquist frequency, is given by . Note that corresponds in our case to the total number of photons detected during time T. We define the power spectrum as
The relation between the continuous and the discrete Fourier transform for the datasets considered in this paper can be derived as follows. Each dataset consists of N measurements of the photon counts in a specific pixel. Each individual measurement consists of integrating the photon counts over a short time interval of length . The total length of the time series is T. Let be the number of photons arriving from the source at time t. Because of the finite integration time the actual photon flux which is sampled corresponds to a function with indicating a (Fourier) convolution and with the binning function
The convolution of and corresponds to averaging the actual photon counts over a bin of width around time t so that is given by
The discrete sampling of the data amounts to multiplying the function with a window function and with a sampling function . The window function is given by
and the sampling function by
The sampling function indicates that function is discretely sampled at times while the window function accounts for the finite duration of the time series. Let indicate a measured time series. From the discussion above it follows that . Let denote the Fourier transform and let , and . Then the Fourier transform of the measured time series is given by
From Eq. (9) we find that
The relation of the continuous Fourier transform and the discrete transform is established by taking so that
The components of the power spectrum are then given by
For the observations described in this paper , , and . A sample of 122 data points results in a power spectrum at 62 discrete frequencies (one at zero frequency). The highest frequency, the Nyquist frequency, corresponds to or a period of . Due to the discrete sampling any period in the signal shorter than 43 seconds will be aliased. In general the process of measuring the data points will automatically result in a suppression of high frequencies so that aliasing is a not a too serious problem. However in our case is only 0.128 seconds so that aliasing will not be suppressed due to the finite time a measurement takes. This can also be seen in Eq. (15). At the Nyquist frequency so that over the whole frequency range considered and hence no suppression of high frequencies occurs. So care has to be exercised for the possibility of aliasing.
A second effect which occurs is caused by the data windowing. Function is a box car function with length T. The transform of the signal is convolved with which has a central peak of width and side lobes. The effect of windowing is that the power at a given frequency is distributed over neighbouring frequency bins.
A third effect which occurs is due to the use of the discrete Fourier transform. When looking for a periodic signature with frequency , the associated power is only recovered when corresponds exactly with one of the frequencies at which the power spectrum is evaluated. When is exactly in-between two frequency bins the power is distributed over the neighbouring frequency bins and even some bins further away. So the spread of power over adjacent bins is caused by two effects: 1) the finite length of the time series which results in windowing; 2) the use of the discrete Fourier transform. The effect of windowing can be suppressed by using a window function for the time series which differs from the box car (e.g. Welch, Hanning, Parzen etc.). However, we decided not to use any of the above windowing functions, as such a function broadens the secular variations due to a slow increase/decrease in the counts and therefore affects the determination of the variations we seek to analyse.
The time series can be characterized as follows. The average number of photon counts per 0.128 seconds amounts to a few hundred counts, 300 - 400 in most datasets although, as we will see later, some parts of a dataset can have counts in the 1000-1500 range while other parts are in the 50-100 range. In each dataset, there are secular variations due to a slow decrease or a slow increase of the counts. This occurs on time scales in the range . Superimposed on these slow variations are faster variations. Simply looking at the time series suggests already that some variations are (quasi-)periodic. The amplitudes of the variations are larger than expected from pure Poisson statistics (e.g. or ). Because of the high average counting level and the presence of secular variations it can be anticipated that there will be significant power at low frequencies, say . This power can be reduced by subtracting some `average' from the observed counts, e.g., a first-order polynomial fit. However, there is little to be gained by this procedure.
A description of the statistical properties of the power spectrum can be found in Jenkins and Watts (1968), Leahy et al. (1983), and is comprehensively summarized in van der Klis (1989). The normalization of the power spectrum (Eq. (5)) is chosen in such a way that if the noise in the data is (only) Poissonian, then the distribution is given by the distribution with two degrees of freedom (dof). The probability that exceeds a threshold power level is
with Q the integral probability of the distribution
For two dof the standard deviation of the noise powers is equal to their mean value . This implies that in the power spectrum the magnitude of the noise component is not well defined. There exist basically two methods to decrease the noise in the power spectrum. One method is to rebin the power spectrum by averaging W consecutive frequency bins at the expense of a reduced frequency resolution. The other method, which can be used in combination with the previous, is to divide the data into M segments of equal length. For each of the data segments the power spectrum is determined and the resulting power spectra are then averaged. The resulting power distribution of the noise corresponds then to a -distribution with dof which is scaled with a factor . In this case we have that . The mean of the distribution is still equal to 2 but the standard deviation has been reduced to . We note that the noise in the power spectrum can, effectively, only be reduced at the expense of frequency resolution. Increasing the observing time T will not change the mean and the standard deviation of the noise distribution but, in the end, longer observing times, in combination with segmenting and/or rebinning, do permit reduction of the standard deviation of the noise while achieving an improved frequency resolution.
Suppose that we have a power spectrum at N frequencies and want to establish which powers have a low probability of being caused by noise. The power at each of the frequencies can be considered as an independent trial. Define as the probability that a power exceeds detection level and is not caused by noise. For N independent powers this probability is so that the chance to exceed and to be caused by noise is for . From this it follows that the detection level is given by
In this paper we use a confidence level of 99.9% () to determine . For , the detection level is given by . For , Eq. (17) results in an implicit relation for .
© European Southern Observatory (ESO) 1997
Online publication: April 8, 1998