Next: Random Number Generation
Up: Fittingparameterization and
Previous: Distributions
Histograming is the least expensive and most popular density
estimator, but has several statistical drawbacks.
To name only two, it
fails to identify structures that are much narrower than the bin size,
and exhibits sharp discontinuities (statistical fluctuations) among
adjacent low population bins.
The first problem is usually solved by adapting the bin width to the
experimental resolution, or by re-binning after looking at the histogram.
To filter out the statistical fluctuations, smoothing algorithms can be
applied.
Three such techniques are implemented in HBOOK
, the so called
353QH (HSMOOF), the method of B-splines
(HSPLI1, HSPLI2, HSPFUN), and multiquadric
smoothing (HQUAD) .
Before trying them out references
should be consulted, and results taken with care.
CALL HSMOOF (ID,ICASE,CHI2*)
Action:
Routine to smooth a 1-dimensional histogram according to algorithm 353QH
,
TWICE
(see [13]).
- Input parameters:
-
- ID
- Histogram identifier
- ICASE
- 0 and 1 replace original histogram by smoothed version;
2 superimposes as a function when editing.
- Output Parameter:
-
- CHI2
- chisquare
between original and smoothed histogram.
Remark:
- The mean value and standard deviation are recalculated if
ICASE=1
- The routine can be called several times for the same histogram identifier
ID
, for ICASE=1
or 2
.
CALL HSPLI1 (ID,IC,N,K,CHI2*)
Action:
B-splines smoothing of a 1-dimensional histogram.
- Input parameters:
-
- ID
- Identifier of an existing 1-dimensional histogram
- IC
- Superimposition flag (
IC=0
is identical to IC=1
)
1 Replaces original contents by the value of the spline;
2 Superimposes the spline function when editing.
- N
- Number of knots (when
N
0
then N=13
).
- K
- Degree of the splines (when
K
3
then K=3
).
- Output Parameter:
-
- CHI2
- chisquare
between original and smoothed histogram.
Remarks:
- HSPLI1 can be called several times for the same
histogram identifier
ID
, for any value of the parameters
- If the distribution to be smoothed exibits
NP
statistically relevant peaks then a rule of thumb to define the
number of knots is, N = 4*NP+6
for a spline of degree 3.
CALL HSPLI2 (ID,NX,NY,KX,KY)
Action:
B-splines smoothing of a 2-dimensional histogram.
- Input parameters:
-
- ID
- Identifier of an existing 2-dimensional
- NX
- Number of knots in the X interval (when
NX
0
then NX=13
).
- NY
- Number of knots in the Y interval (when
NY
0
then NY=13
).
- KX
- Degree of the spline in X (when
KX
3
then KX=3
).
- KY
- Degree of the spline in Y (when
KY
3
then KY=3
).
Remark:
- The original contents of the histogram are replaced by the value of the
spline approximation.
- See the remark about the number of knots for routine HSPLI1.
HSPFUNS = HSPFUN (ID,X,N,K)
Action:
Performs a B-spline smoothing of a 1-dimensional histogram
and returns the value at a given abscissa point.
- Input parameters:
-
- ID
- Identifier of an existing 1-dimensional histogram
- X
- Abscissa
- N
- Number of knots (when
N
0
then N=13
).
- K
- Degree of the splines (when
K
3
then K=3
).
CALL HQUAD (ID,CHOPT,MODE,SENSIT,SMOOTH,NSIG*,CHISQ*,NDF*,FMIN*,FMAX*, IERR*)
Action: This routine fits multiquadric radial basis functions to the bin contents of a
histogram or the event density of an Ntuple.
(For Ntuples this is currently limited to ``simple'' ones, i.e., with 1, 2 or 3
variables; all events are used -- no selection mechanism is implemented. Thus
the recommended practice at the moment is to create a ``simple'' Ntuple and
fill it from your ``master'' Ntuple with the NTUPLE/LOOP
command and an
appropriate SELECT.FOR
function.)
Routine HQUAD is called automatically
in PAW by the existing command SMOOTH
.
For a complete description of the method see reference [16].
- Input parameters:
-
- ID
- Histogram or Ntuple ID.
- CHOPT
- Character variable containing option characters:
- 0
- Replace original histogram by smoothed.
- 2
- Do not replace original histogram but store values of smoothed
function and its parameters. (The fitted function is regenerated
from the values or the parameters with the
FUNC
option in
HISTOGRAM/PLOT
for histograms or with NTUPLE/DRAW
for Ntuples.)
- V
- Verbose.
- MODE
- Mode of operation
- 0
- Same as
MODE = 3
(see below).
- 3
- find significant points and perform unconstrained fit. If
the histogram or Ntuple is unweighted perform a Poisson likelihood
fit, otherwise a least squares fit (see
MODE = 4
).
- 4
- force an unconstrained least squares fit in all cases.
(This is a linear least squares problem and therefore the most
efficient possible since it allows a single step calculation of the
best fit and covariances. But note it assumes gaussian errors,
even for low statistics, including the error on zero being 1.)
- SENSIT
- Sensitivity parameter.
It controls the sensitivity to statistical fluctuations (see Remarks).
SENSIT = 0.
is equivalent to SENSIT = 1.
- SMOOTH
- Smoothness parameter.
It controls the (radius of) curvature of the multiquadric basis functions.
SMOOTH = 0.
is equivalent to SMOOTH = 1.
- Output parameters:
-
- NSIG
- no. of significant points or centres found, i.e., no. of basis
functions used.
- CHISQ
- chi-squared (see Remarks).
- NDF
- no. of degrees of freedom.
- FMIN
- minimum function value.
- FMAX
- maximum function value.
- IERR
- error flag, 0 if all's OK. (Hopefully helpful error messages are
printed where possible.)
Remarks:
- Empty bins are taken into account. (Poisson statistics are used for the
unweighted case.)
- The multiquadric basis functions are
, where r is
the radial distance from its ``centre'', and
is a scale
parameter and also the curvature at the ``centre''. ``Centres'', also
referred to as ``significant points'', are located at points where the
2nd differential or Laplacian of event density is statistically
significant.
- The data must be statistically independent, i.e., events (weighted or
unweighted) drawn randomly from a parent probability distribution or
differential cross-section, e.g., you cannot further smooth a previously
smoothed distribution.
- For histograms, the chi-squared (
CHISQ
) is that of the fit to the
original histogram assuming gaussian errors on the original histogram
even for low statistics, including the error on zero being 1. It is
calculated like this even for a Poisson likelihood fit; in that case the
maximum likelihood may not correspond to the minimum chi-squared, but
CHISQ
can still be used, with NDF
(the no. of degrees of freedom), as a
goodness-of-fit estimator. For Ntuples, an internally generated and
temporary histogram is used to calculate CHISQ
in the same way.
Next: Random Number Generation
Up: Fittingparameterization and
Previous: Distributions
Last update:
Tue May 16 09:09:27 METDST 1995