Next: Random Number Generation Up: Fittingparameterization and Previous: Distributions

Smoothing

Histograming is the least expensive and most popular density estimator, but has several statistical drawbacks. To name only two, it fails to identify structures that are much narrower than the bin size, and exhibits sharp discontinuities (statistical fluctuations) among adjacent low population bins.

The first problem is usually solved by adapting the bin width to the experimental resolution, or by re-binning after looking at the histogram. To filter out the statistical fluctuations, smoothing algorithms can be applied.

Three such techniques are implemented in HBOOK, the so called 353QH (HSMOOF), the method of B-splines (HSPLI1, HSPLI2, HSPFUN), and multiquadric smoothing (HQUAD) . Before trying them out references

should be consulted, and results taken with care.

CALL HSMOOF (ID,ICASE,CHI2*)

Action: Routine to smooth a 1-dimensional histogram according to algorithm 353QH, TWICE (see [13]).

Input parameters:
ID: Histogram identifier
ICASE: 0 and 1 replace original histogram by smoothed version;
2 superimposes as a function when editing.
Output Parameter:
CHI2: chisquare $χ$ ² between original and smoothed histogram.

Remark:

The mean value and standard deviation are recalculated if ICASE=1
The routine can be called several times for the same histogram identifier ID, for ICASE=1 or 2.

CALL HSPLI1 (ID,IC,N,K,CHI2*)

Action: B-splines smoothing of a 1-dimensional histogram.

Input parameters:
ID: Identifier of an existing 1-dimensional histogram
IC: Superimposition flag (IC=0 is identical to IC=1)
1 Replaces original contents by the value of the spline;
2 Superimposes the spline function when editing.
N: Number of knots (when N $≤$ 0 then N=13).
K: Degree of the splines (when K $≥$ 3 then K=3).
Output Parameter:
CHI2: chisquare $χ$ ² between original and smoothed histogram.

Remarks:

HSPLI1 can be called several times for the same histogram identifier ID, for any value of the parameters
If the distribution to be smoothed exibits NP statistically relevant peaks then a rule of thumb to define the number of knots is, N = 4*NP+6 for a spline of degree 3.

CALL HSPLI2 (ID,NX,NY,KX,KY)

Action: B-splines smoothing of a 2-dimensional histogram.

Input parameters:
ID: Identifier of an existing 2-dimensional
NX: Number of knots in the X interval (when NX $≤$ 0 then NX=13).
NY: Number of knots in the Y interval (when NY $≤$ 0 then NY=13).
KX: Degree of the spline in X (when KX $≥$ 3 then KX=3).
KY: Degree of the spline in Y (when KY $≥$ 3 then KY=3).

Remark:

The original contents of the histogram are replaced by the value of the spline approximation.
See the remark about the number of knots for routine HSPLI1.

HSPFUNS = HSPFUN (ID,X,N,K)

Action: Performs a B-spline smoothing of a 1-dimensional histogram and returns the value at a given abscissa point.

Input parameters:
ID: Identifier of an existing 1-dimensional histogram
X: Abscissa
N: Number of knots (when N $≤$ 0 then N=13).
K: Degree of the splines (when K $≥$ 3 then K=3).

CALL HQUAD (ID,CHOPT,MODE,SENSIT,SMOOTH,NSIG*,CHISQ*,NDF*,FMIN*,FMAX*, IERR*)

Action: This routine fits multiquadric radial basis functions to the bin contents of a histogram or the event density of an Ntuple. (For Ntuples this is currently limited to ``simple'' ones, i.e., with 1, 2 or 3 variables; all events are used -- no selection mechanism is implemented. Thus the recommended practice at the moment is to create a ``simple'' Ntuple and fill it from your ``master'' Ntuple with the NTUPLE/LOOP command and an appropriate SELECT.FOR function.) Routine HQUAD is called automatically in PAW by the existing command SMOOTH. For a complete description of the method see reference [16].

Input parameters:

ID

Histogram or Ntuple ID.

CHOPT

Character variable containing option characters:

0: Replace original histogram by smoothed.
2: Do not replace original histogram but store values of smoothed function and its parameters. (The fitted function is regenerated from the values or the parameters with the FUNC option in HISTOGRAM/PLOT for histograms or with NTUPLE/DRAW for Ntuples.)
V: Verbose.

MODE

Mode of operation

0: Same as MODE = 3 (see below).
3: find significant points and perform unconstrained fit. If the histogram or Ntuple is unweighted perform a Poisson likelihood fit, otherwise a least squares fit (see MODE = 4).
4: force an unconstrained least squares fit in all cases. (This is a linear least squares problem and therefore the most efficient possible since it allows a single step calculation of the best fit and covariances. But note it assumes gaussian errors, even for low statistics, including the error on zero being 1.)

SENSIT

Sensitivity parameter. It controls the sensitivity to statistical fluctuations (see Remarks). SENSIT = 0. is equivalent to SENSIT = 1.

SMOOTH

Smoothness parameter. It controls the (radius of) curvature of the multiquadric basis functions. SMOOTH = 0. is equivalent to SMOOTH = 1.

Output parameters:

NSIG

no. of significant points or centres found, i.e., no. of basis functions used.

CHISQ

chi-squared (see Remarks).

NDF

no. of degrees of freedom.

FMIN

minimum function value.

FMAX

maximum function value.

IERR

error flag, 0 if all's OK. (Hopefully helpful error messages are printed where possible.)

Remarks:

Empty bins are taken into account. (Poisson statistics are used for the unweighted case.)
The multiquadric basis functions are $r$ ²+Δ² , where r is the radial distance from its ``centre'', and $Δ$ is a scale parameter and also the curvature at the ``centre''. ``Centres'', also referred to as ``significant points'', are located at points where the 2nd differential or Laplacian of event density is statistically significant.
The data must be statistically independent, i.e., events (weighted or unweighted) drawn randomly from a parent probability distribution or differential cross-section, e.g., you cannot further smooth a previously smoothed distribution.
For histograms, the chi-squared (CHISQ) is that of the fit to the original histogram assuming gaussian errors on the original histogram even for low statistics, including the error on zero being 1. It is calculated like this even for a Poisson likelihood fit; in that case the maximum likelihood may not correspond to the minimum chi-squared, but CHISQ can still be used, with NDF (the no. of degrees of freedom), as a goodness-of-fit estimator. For Ntuples, an internally generated and temporary histogram is used to calculate CHISQ in the same way.

Next: Random Number Generation Up: Fittingparameterization and Previous: Distributions

Last update: Tue May 16 09:09:27 METDST 1995