[HPARAMET]
In analyzing experimental data, it is sometimes very important to be able
to parametrize relationships among data by means of linear combinations of
elementary functions. One of these expansions is the well known Taylor
series which is an expansion of an analytic function using as elementary
functions (regressors) the powers of the independent variables. In the
case of the Taylor series, the coefficients of the linear combination are
a well known expression involving the derivatives of the function to be
represented at a given (initial) point. Even if the Taylor series is very
powerful for its noticeable features and its analytical simplicity, from
the point of view of the numerical analist it might be not the best
expansion in terms of elementary functions. Ideally one would like to use
the smallest possible number of regressors, in order to describe the
relationship among data with a given precision, and this is not always
guaranteed using the Taylor series. Another feature of the monomials is
that they do not form a complete orthonormal set of functions, in the
sense of the Hilbert spaces, while other functions, which possess this
feature, may be more computationally convenient to use.
The entries HPARAM (for histograms and scatter plots) and HPARMN (for
n-dimensional distributions) perform regressions on the data in order to
find the best parametrization in terms of elementary functions
(regressors).
Let us suppose that we have collected NP sets of NDIM+1
th
data, being (y ,x . ..x ) with 1< =i< =NP the i set.
i 1,i NDIM,i
Here y is the value observed at the coordinates x .. .x .
i 1,i NDIM,i
NDIM must be greater or equal to 1 and it is the dimensionality
of our parametrization problem, that is the number of independent
variables that we want to use in order to parametrize our data. In the
case of a histogram NDIM is clearly 1, x is the centre of a
histogram channel, y is the contents of the corresponding channel and
NP is the number of channels of the histogram. In the case of a
scatter plot NDIM is 2, x and x are the ordinate and abscissa
1 2
of the centre of a cell, y is the contents of the cell and NP is
NX*NY, the total number of cells. In case of histograms or
scatter plots the entry HPARAM should be used. For data which are not
contained in an HBOOK histogram or scatter plot and for higher
dimensionality sets (NDIM <=10) the entry HPARMN has to be used.
We could express a possible expansion of our data in elementary functions
in the following way:
y = c *F (x ,. .., x )+ ...
1 1 1 1,1 NDIM,1
+c * F (x ,. .., x )+ u
NCO NCO 1,1 NDIM,1 1
.
.
y = c *F (x ,. .., x )+ ...
NP 1 1 1,NP NDIM,NP
+c * F (x ,. .., x )+ u
NCO NCO 1,NP NDIM,NP NP
where we are summing NCO elementary functions each one multiplied
by its own coefficient. In other words we are pretending that our data
points are fitted by the given expansion but for a residual, namely u .
i
The expansion itself is the linear combination of the regressors
th
F computed at the j data point. The strategy of the two entries is to
j
minimize the residual variance from fitting various possible regressors
out of a set which is either system or user-defined. The previous
expression can be rewritten in a more synthetical notation:
y= Fc+u
y array of NP data points
F matrix of regressors (NP lines
times NCO columns)
c array of NCO coefficients
u array for the NP residuals
As already said, we want to use the smallest possible number of regressors
for a given set of data points, which yields the desired fit (in the terms
explained below). That is to say that the rank of the matrix F should be
NCO. If it were smaller, then some (at least one) of the columns
could be represented as a linear combination of others, and so,
rearranging the coefficients, we could get rid of these regressors.
The residual variance is minimized under the following constraints:
- The NP values of the residuals should have a mean of 0.
- The residuals should be independently distributed. This means that
their covariance matrix has the form of a positive diagonal matrix (we
call it D).
- The regressors should be linearly independent.
t -1 t -1
Hypothesis 2 implies that (F F) exists, where D(F F) is the
covariance matrix of the coefficients.
The coefficients and regressors are determined iteratively until the
convergence criteria for the fit are reached. The user has the choice to
specify that the criteria are met either when the residual variance stops
decreasing or when the multiple correlation coefficient reaches the value
specified via the input parameter R2MIN (see later). Once the
list of all possible regressors has been established, at each step the one
which gives the highest drop in the residual sum of squares is added to
the expansion. In addition, all previously chosen regressors are tested
for possible rejection, using standard test statistics (F-tests), as it
can happen that a regressor becomes superfluous at a later stage of the
fitting process.
The routines offer three choices for the regressors: standard elementary
functions, user-given elementary functions or user-given regressors. In
the first two cases each regressor is the product of NDIM
elementary functions, one for each variable (the constant function may be
included in the set). This means that in the first two cases we will have
a matrix of NDIM*NCO elementary functions, and the array of
regressors will be the result of the scalar product of the NDIM
columns of this matrix. In the last case each regressor is user-defined,
and we have just NCO regressors to be specified as user functions
(see below). The NCO regressors have to be linearly combined with
the NCO coefficients contained in the output array COEFF
to obtain the parametrization. Each elementary function, either standard
or user-defined is identified by a three digit code, let's say
nnm. The last digit, m, is a one digit code specifying
the kind of regressor used, with the following meaning:
- 0
- Standard elementary function
- 1
- User-provided elementary function
- 2
- User-provided regressor
The first two digits define the function number. In the first two cases,
we have NDIM*NCO of these codes arranged in an array, so that the
first NDIM codes are the code of the elementary functions to be
multiplied to obtain the first regressor. For system defined elementary
functions nn is the degree of the polynomial. For user-provided
regressors, the meaning of this code is user-defined, in the sense that
the user is free to give any meaning he wishes to this code in the routine
he has to supply. In the case of user-provided regressors, we will have
just NCO regressor codes in the array ITERM at the end
of the fitting process. Only one code is needed to define a user-given
regressor, whereas NDIM codes are needed to specify a regressor
composed of elementary functions.
The parametrization routines will select the most efficient subset of
regressors out of the set selected by the user, in the sense of the
criteria specified above.
Once the initial set of regressors has been specified, the user still has
the opportunity to tell the system how to make a further selection among
those regressors, before the fitting process actually starts:
- The maximum degrees of the standard polynomials for each independent
variable are given by MAXPOW (array of dimension NDIM).
This input parameter is mandatory and there is no default provided for it.
- The total degree of a regressor made up from the product of standard
polynomials, may be limited by setting PSEL (via HSETPR) to a
positive value . PSEL controls the exclusion of the
combinations of standard polynomials which give the higher order
regressors (see the description of this parameter).
- Each initial regressor, whose code is passed to the logical function
HSELBF, written by the user, may be accepted or rejected according to user
criteria
- The argument ITERM (under control of IC below) may
be used as input to the routines to contain the list of the codes of the
acceptable regressors.
As soon as the best parametrization is obtained, the user may call the
DOUBLE PRECISION function HRVAL(X), which returns the value of
the parametrization, computed at point X, where X is an
array of dimension NDIM.
The program can handle up to 10 dimensions (i.e.
NDIM<=10) Up to 50 regressors may appear in the final
expression. This is controlled by the PNCX parameter set by the
routine HSETPR (see below).
Memory requirements: a 1-dimensional histogram with 100 channels and with
the maximum number of regressors PNCX set to 10, requires in
DOUBLE PRECISION version, less than 5000 words; for large values
of PNCX and a large number of points, memory behaves roughly as
2*NP*PNCX (DOUBLE PRECISION).
Histograms and plots
+----------------------------------------------------+
|CALL HPARAM (ID,IC,R2MIN,MAXPOW,COEFF*,ITERM*,NCO*) |
+----------------------------------------------------+
- Input parameters:
-
- ID
- Histogram or plot identifier.
- IC
- Control word (see below)
- R2MIN
- Maximum required value of multiple correlation
coefficient
- MAXPOW
- Maximum degrees of standard polynomials
(INTEGER array of size NDIM)
- ITERM
- Acceptable function codes (see explanation
above) (INTEGER array of size <=PNBF, see explanation
above)
- Output parameters:
-
- NCO
- number of regressors in final expression.
- COEFF
- Coefficients of terms
(DOUBLE PRECISION array of size NCO)
- ITERM
- Accepted regressors codes (INTEGER
array of size NDIM*NCO or NCO, see explanation above)
IC is a coded integer with the following fields:
7 6 5 4 3 2
IC= N*10 +R* 10 + C*10 +B*10 +T* 10 + W*10 + P*10 +S
- S
- This switch controls the superimposition of the
result when printing the histogram, it is effective only for 1-dimensional
histograms
- 1
- Resulting parametric fit superimposed on histogram
- 0
- No superposition
- P
- This switch controls the amount of information
printed during the fitting process
- 0
- Minimal output: the residual sum of squares is
printed
- 1
- Normal output: in addition, the problem
characteristics and options are printed; also the standard deviations and
confidence intervals of the coefficients
- 2
- extensive output: the results of each iteration
are printed with the normal output
- W
- This switch controls the weights to applied to the
data during the fit
- 0
- weights on histogram contents are already defined
via HBARX or HPAKE. If not they are taken to be equal to the square-root
of the contents
- 1
- weights are equal to 1
- T
- This switch controls the system-defined elementary
functions set to use
- 0
- Monomials will be selected as the standard
elementary functions
- 1
- Chebyshev polynomials with a definition region:
[-1,1]
- 2
- Legendre polynomials with a definition region:
[-1,1]
- 3
- shifted Chebyshev polynomials with a definition
region: [0,1]
- 4
- Laguerre polynomials with a definition region:
[0,+1]
- 5
- Hermite polynomials with a definition region:
[-1,+1]
- B
- This switch controls the kind of regressor used in
the fit
- 0
- Regressors are products of standard polynomials
(see preceeding switch)
- 1
- Regressors are products of user-defined elementary
functions. The user should write a DOUBLE PRECISION function
HELEFT(IEF,X) where IEF is the elementary function code in the
sense explained above: 10 times the function code plus 1. The function
code number can vary from 1 to PNEF, number of user supplied
elementary functions. PNEF has to be specified by the user before
calling HPARAM, via a call to the routine HSETPR described below. The
seconf parameter X is the point where the function should be
calculated.
- 2
- Regressors are user-defined, in this case the
variable PNBF, number of basic functions, must be set by the user
via a call to HSETPR (see below). User regressors values are to be
returned by the user-supplied DOUBLE PRECISION function
HBASFT(IBF,X) where IBF is the user-defined regressor code in the
sense explained above: 10 times the regressor code plus 2. The regressor
code number can vary from 1 to PNBF, number of basic functions
used for the parametrization. PNBF has to be specified by the
user before calling HPARAM, via a call to the routine HSETPR described
below, in the case that the switches B or C of the
control parameter IC have the value 2. The parameter X
is an array of length NDIM defining the coordinate where the
regressor should be calculated.
- C
- This switch controls the selection of the
regressors
- 0
- Regressors are selected by the system. The
parameter PSEL and the array MAXPOW help the user in
controlling the total degree of each regressor (see below).
- 1
- for each elementary function or for each
regressor, the logical function HSELBF(ELEF) gets called once before the
beginning of the actual fitting procedure to set up the table of available
elementary functions or regressors. IELEF is the code of the
regressor or of the function being tested for acceptance. A value of
.TRUE. returned will cause the regressor/elementary function to
be accepted. The default version of this function which is supplied in
HBOOK always returns the value .TRUE., while the user
may want to write his own version of this function to exclude some of the
regressors or of the elementary functions (according to the value of the B
switch he has selected).
- 2
- The input array ITERM contains a list of
selected regressors or elementary functions according to the value of the
switch B. The array has to be at least NDIM*PNBF words
and the variable PNBF, maximum number of elementary functions or
regressors, should have been set via a call to HSETPR. In the case of
elementary functions, the element ITERM(IDIM+(IBF-1)*NDIM) is the
function of the variable x to be multiplied by the coefficent number
IDIM
IBF, given a maximum expected number of PNBF
coefficents. This can be of course reduced by the fitting program under
considerations of linear dependency, as stated above, returning only
NCO coefficents, thus after the fit ITERM will identify
which regressors were actually used. In the case of user-defined
regressors, only the first PNBF position of the ITERM
array need to be filled, and the array will contain directly the code of
the selected user-defined regressors.
- R
- This switch controls the system selection
mechanism for the regressors:
- 0
- Stepwise regression, the regression is calculated
as explained before, and selected regressors may be eliminated in a
further step of the procedure
- 1
- Forward regression. The backward stage is
suppressed: a regressor included at one time will always remain in the
regression later on
- 2
- Fixed-term regression: all selected regressors
will appear in the final expression
- N
- This switch controls the normalization of the
X range during the computation.
- 0
- No normalization of X-range
- 1
- X scaled in the range [-1,1]
- 2
- X scaled in the range [0,1]
- 3
- X scaled in the range [0,+1]
The value of R2MIN is used to determine the satisfactory end of
the fitting process. If 0<R2MIN<1, this value will be compared to
the current sum of the squares of the residuals to determine whether the
fit should be stopped or continued. If R2MIN>=1, the fitting
process will be halted when the residual variance is minimal.
Various parameters which are relevant for the parametrization can be set
via the routine HSETPR.
+----------------------------+
| CALL HSETPR (CHNAME,VALUE) |
+----------------------------+
- Input parameters:
-
- CHNAME
- Name of the parameter to be set (of type
CHARACTER)
- VALUE
- Value assigned to parameter CHNAME
(of type REAL)
Possible values for CHNAME are:
- 'PNEF'
- This parameter specifies to the system the
number of elementary functions that are supplied by the user, in case the
switch B of the IC parameter is set to 1. The system
will build PNBF basic regressors made up by products of
NDIM elementary functions, possibly user selected, as specified
by the switch C of the parameter IC.
- 'PNBF'
- This parameter must be specified in case user
regressors are used (B switch of IC set to 2), or in
case the ITERM contains on input the selected regressors or basic
elementary functions (C switch of IC parameter set to
2). The parameter is ignored in the other cases. This is the total number
of user-defined regressors or elementary functions which the system will
use for the parametrization process. Please note that in case of user
given regressors, if the C switch of the IC parameter is
either 0 or 1, the logical function HSELBF(ELEF) will always be called in
order to verify the inclusion of the regressor IELEF in the list
for the current fitting operation.
- 'PSEL'
- This parameter is used only in case of
system-supplied elementary functions. It is the maximum limit for the sum
of the terms POW(I)/MAXPOW(I) in each regressor, where
POW(I) is the power of the system-supplied elementary function
th
for the I variable (1<=I<NDIM) and MAXPOW(I) is the
user supplied maximum value for the degree of the system-supplied
th
elementary functions for the I variable. This value must be
0. Note that setting PSEL to 0 selects the
constant term for all the regressors. Setting PSEL to a value
>=NDIM has the effect of removing any limitation on the total
degree of the regressors, leaving MAXPOW(I) as the only effective
limitation on the degree of the the elementary functions. This means that
the total degree of the regressors can be equal to the sum of the
NDIM elements of MAXPOW(I). The system supplied default
is 1.
- 'PFLV'
- This parameter determines the F-test
significance level for rejection of already included regressors; by
default it is set to 1. A higher value makes it more difficult for
regressors to remain in the regression. The value of PFLV has to
be >0.
- 'PLUN'
- This parameter indicates the logical unit for
writing the Fortran code of the function FPARAM(X). which gives the value
of the parametrization at point X. This Fortran code can be
compiled and used in subsequent jobs. By default no code is written.
- 'PNBX'
- Maximum number of regressors that may be
entered as input to the fitting process (after selection). By default
PNBX is set to 500; setting PNBX to an
appropriate value will avoid wasting space in dynamic area
/PAWC/. This parameter may be set to the same value of
PNBF or PNEF whichever is appropriate to save space.
- 'PNCX'
- Maximum number of regressors that may appear
in the final expression of the parametrization (this number has to be
<=50). Default setting is 50. The same remark applies as for
PNBX.
Distributions
+----------------------------------------------------------------+
|CALL HPARMN (X,Y,EY,NP,NVAR,IC,R2MIN,MAXPOW,COEFF*,ITERM*,NCO*) |
+----------------------------------------------------------------+
- Input parameters:
-
- X
- Coordinates of data points (array of size
NP*NVAR)
- Y
- Data to fit (array of length NP)
- EY
- Errors on data (array of length NP)
- NP
- Number of points to fit
- NVAR
- Dimension of X-space
- MAXPOW
- see HPARAM
- IC
- see HPARAM
- R2MIN
- see HPARAM
- Output parameters:
-
-
- See HPARAM.
Remark:
- See HPARAM
- The only difference concerns option W of IC
- W = 0 errors are taken from EY
- W = 1 errors are set to 1