More general Ntuples: Column-Wise-Ntuples (CWN)

[sec:NtupleCWN]

A CWN supports the storage of the following data types: floating point numbers (REAL*4 and REAL*8), integers, bit patterns (unsigned integers), booleans and character strings.

Data Compression

Floating point numbers, integers and bit patterns can be packed by specifying a range of values or by explicitly specifying the number of bits that should be used to store the data. Booleans are always stored using one bit. Unused trailing array elements will not be stored when an array depends on an index variable. In that case only as many array elements will be stored as specified by the index variable.

For example, the array definition NHITS(NTRACK) defines NHITS to depend on the index variable NTRACK. When NTRACK is 16, the elements NHITS(1..16) are stored, when NTRACK is 3, only elements NHITS(1..3) are stored, etc.

Storage Model

Column wise storage allows direct access to any column in the Ntuple. Histogramming one column from a 300 column CWN requires reading only 1/300 of the total data set. However, this storage scheme requires one memory buffer per column as opposed to only one buffer in total for an RWN. By default the buffer length is 1024 words, in which case a 100 column Ntuple requires 409600 bytes of buffer space. In general, performance increases with buffer size. Therefore, one should tune the buffer size (using routine HBSET) as a function of the number of columns and the amount of available memory. Highest efficiency is obtained when setting the buffer size equal to the record length of the RZ HBOOK file (as specified in the call to HROPEN). A further advantage of column wise storage is that a CWN can easily be extended with one or more new columns.

Columns are logically grouped into blocks (physically, however, all columns are independent). Blocks allow users to extend a CWN with private columns or to group relevant columns together. New blocks can even be defined after a CWN has been filled. The newly created blocks can be filled using routine HFNTB. For example, a given experiment might define a number of standard Ntuples. These would be booked in a section of the code that would not normally be touched by an individual physicist. However, with a CWN a user may easily add one or more blocks of information as required for their particular analysis.

Note that arrays are treated as a single column. This means that a CWN will behave like a RWN, with the addition of data typing and compression, if only one array of NVAR elements is declared. This is not recommended as one thereby loses the direct column access capabilities of a CWN.

Performance

Accessing a relatively small number of the total number of defined columns results in a huge increase in performance compared to a RWN. However, reading a complete CWN will take slightly longer than reading a RWN due to the overhead introduced by the type checking and compression mechanisms and because the data is not stored sequentially on disk. The performance increase with a CWN will most clearly show up when using PAW, where one typically histograms one column with cuts on a few other columns. The advantages of having different data types and data compression generally outweighs the performance penalty incurred when reading a complete CWN.

Booking a CWN

[HNTUBOOKT]

The booking is performed in two stages. Firstly, a call to HBNT is made to declare the Ntuple identifier and title. Secondly, one or more calls are made to HBNAME or HBNAMC to describe the variables that are to be stored in the Ntuple. Routine HBNAMC is used to define CHARACTER variables, while all other variable types are defined with routine HBNAME.

                      +----------------------------+
                      |CALL  HBNT (ID,CHTITL,CHOPT) |
                      +----------------------------+
                                  

Action: Books a CWN.

Input parameters:
ID
Identifier of the Ntuple.
CHTITL
Character variable specifying the title associated to the Ntuple.
CHOPT
Character variable specifying the desired options.
' '
for disk resident Ntuples (default).
'D'
idem as ' '.
'M'
for memory resident Ntuples.

The CWN will be stored in the current HBOOK directory. The variables to be stored in the Ntuple will be specified with routine HBNAME or HBNAMC described below.

When the CWN will be filled with HFNT, the memory buffers associated with each column will be written to the file and directory corresponding to the current working directory when HBNT was called. Remember that when routine HROPEN is called, the current working directory is automatically set to the top directory of that file. It is therefore convenient to call HBNT immediately after HROPEN. If this was not the case, routine HCDIR must be called prior to HBNT to set the current working directory. When the Ntuple has been filled (via calls to HFNT) the resident buffers in memory as well as the Ntuple header must be written to the file with a call to HROUT. Before calling HROUT, the current working directory must be set to the current directory when HBNT was called.

                     +------------------------------+
                     |CALL  HBSET (CHOPT,IVAL,IERR*) |
                     +------------------------------+
                                  

Action: Set global Ntuple options.

Input parameters:
CHOPT
Character variable specifying the parameter to set.
'BSIZE'
Set the buffer size. For each variable (i.e. column) a buffer of BSIZE words is created in memory. The default for BSIZE is 1024.
IVAL
Value for the parameter specified with CHOPT.
Output parameters:
IERR
Error return code (=0 means no errors).

If the total memory in /PAWC/, allocated via HLIMIT is not sufficient to accomodate all the column buffers of HBNT will automatically reduce the buffer size in such a way that all buffers can fit into memory. It is strongly recommended to allocate enough memory to /PAWC/ in such a way that each column buffer is at least equal to the block size of the file. A simple rule of thumb in the case of no data compression is to have NWPAWC>NCOL*LREC, where NWPAWC is the total number of words allocated by HLIMIT, LREC is the block size of the file in machine words as given in the call to HROPEN and NCOL is the number of columns.

Describing the columns of a CWN

[HNTUDESC]

              +--------------------------------------------+
              | CALL  HBNAME (ID, CHBLOK, VARIABLE, CHFORM) |
              +--------------------------------------------+
                                  

              +--------------------------------------------+
              | CALL  HBNAMC (ID, CHBLOK, VARIABLE, CHFORM) |
              +--------------------------------------------+
                                  

Action: Describe the variables to be stored in a CWN (non-character and character variables, respectively).

Input parameters:
ID
Identifier of the Ntuple as in the call to HBNT.
CHBLOK
Character variable of maximum length 8 characters specifying the name by which the block of variables described by CHFORM is identified.
VARIABLE
The first variable that is described in CHFORM. Variables must be in common blocks but may not be in a ZEBRA bank. For example, given the common block CEXAM described below, one would call HBNAME with the argument IEVENT. In the case of character variables, the routine HBNAMC must be used. In all other cases one should use HBNAME.
CHFORM
Can be either a character string describing the variables to be stored in block CHBLOK or:
'$CLEAR'
To clear the addresses of all variables in the Ntuple.
'$SET'
To set the addresses in which to write the values of all variables in block CHBLOK.
The last two forms are used before reading back the Ntuple data using HGNT, HGNTB, HGNTV or HGNTF. See also HUWFUN.

With CHFORM the variables, their type, size and, possibly, range (or packing bits) can all be specified at the same time. Note however that variable names should be unique, even when they are in different blocks.. In the simplest case CHFORM corresponds to the COMMON declaration. For example:

       COMMON /CEXAM/ IEVENT, ISWIT(10), IFINIT(20), NEVENT, NRNDM(2)

can be described by the following CHFORM:

       CHFORM = 'IEVENT, ISWIT(10), IFINIT(20), NEVENT, NRNDM(2)'

in this case the Fortran type conventions are followed and the default sizes are taken, no packing is done. Note that to get a nice one-to-one correspondance between the COMMON and the CHFORM statements the dimension of the variables are specified in the COMMON and not in a DIMENSION statement.

The default type and size of a variable can be overridden by extending the variable name with :*:

    type of variable     values            default    routine
     R          floating-point      4, 8                              4        HBNAME
     I          integer             4, 8                              4        HBNAME
     U          unsigned integer    4, 8                              4        HBNAME
     L          logical             4                                 4        HBNAME
     C          character           [4<=s< =32] (multiple of 4)       4        HBNAMC
When the range of a type U, I or R variable is known, its storage size (number of packing bits) may be added behind the :* specification using : for types U and I and ::[,] for type R. Floating-points are packed into an integer using:

IPACK = ((R - )/( - )*(2** - 1) + 0.5

When :... is not specified a variable is stored using the number of bytes given by or the default. In case the default type and size of a variable should be used, the packing bits can be specified as ::.... must be in the range 1<=b< =8*. Automatic bit packing will happen, for type U and I, when a range is specified like: ICNT[-100,100]. In this case ICNT will be packed in 8 bits (7 bits for the integer part and 1 bit for the sign). In case CNT is an integer ranging from -100 to 100 it could be specified either like CNT[-100,100]:I or like CNT:I::[-100,100]. Logical variables will always be stored in 1 bit. All variables must be aligned on a word boundary and character variables must have a length modulo 4. The maximum length of the variable name is 32 characters.

Variable-length Ntuple rows and looping over array components are also supported to optimize Ntuple storage and Ntuple plotting. Variable row length can occur when arrays in the Ntuple depend on an index variable.

          Example of a variable row length CWN definition
                                  

      PARAMETER (MAXTRK = 100, MAXHIT = 300)
      COMMON /CMTRK/ NTRACK, NHITS(MAXTRK), PX(MAXTRK), PY(MAXTRK),
     +               PZ(MAXTRK), XHITS(MAXHIT,MAXTRK), YHITS(MAXHIT,MAXTRK),
     +               ZHITS(MAXHIT,MAXTRK)
      CALL HBNAME(ID,'VARBLOK2',NTRACK,
     +            'NTRACK[0,100], NHITS(NTRACK)[0,300],'//
     +            'PX(NTRACK), PY(NTRACK), PZ(NTRACK), XHITS(300,NTRACK),'//
     +            'YHITS(300,NTRACK), ZHITS(300,NTRACK)')

In this example the number of elements to store in one Ntuple row depends on the number of tracks, NTRACK. The call to HBNAME declares NTRACK to be an index variable and that the size of the Ntuple row depends on the value of this index variable. The range of an index variable is specified using [,], where is the lower and the upper limit of the arrays using this index variable. In the above example the lower limit of NTRACK is 0 and the upper limit is 100 (= MAXTRK). While filling a CWN HBOOK can also easily test for array out-of-bound errors since it knows the range of NTRACK. Only the last dimension of a multi-dimensional array may be variable and the index variable must be specified in the block where it is used. Array lower bounds must, just like the lower range of the index variable, be 0.

HBNAME may be called more than once per block as long as no data has been stored in the block. New blocks can be added to an Ntuple at any time, even after filling has started, whereas existing blocks may only be extended before filling has started.

Filling a CWN

[HNTUFILLT]

                            +----------------+
                            | CALL  HFNT (ID) |
                            +----------------+
                                  

Action: Fill a CWN.

Input parameter:
ID
Identifier of the CWN.

    Example of saving contents of common variables in an Ntuple
                                  

        COMMON/GCFLAG/IDEBUG,IDEMIN,IDEMAX,ITEST,IDRUN,IDEVT,IEORUN
       +        ,IEOTRI,IEVENT,ISWIT(10),IFINIT(20),NEVENT,NRNDM(2)
        COMMON/GCTRAK/VECT(7),GETOT,GEKIN,VOUT(7),NMEC,LMEC(MAXMEC)
       + ,NAMEC(MAXMEC),NSTEP ,MAXNST,DESTEP,DESTEL,SAFETY,SLENG
       + ,STEP  ,SNEXT ,SFIELD,TOFG  ,GEKRAT,UPWGHT,IGNEXT,INWVOL
       + ,ISTOP ,IGAUTO,IEKBIN, ILOSL, IMULL,INGOTO,NLDOWN,NLEVIN
       + ,NLVSAV,ISTORY
        COMMON/GCTMED/NUMED,NATMED(5),ISVOL,IFIELD,FIELDM,TMAXFD,DMAXMS
       +      ,DEEMAX,EPSIL,STMIN,CFIELD,PREC,IUPD,ISTPAR,NUMOLD
        CHARACTER*4 TYPE
        COMMON/CMCC/TYPE
*     The code to book and fill the Ntuple would look like this:
*
*   Initialisation phase.
*   Note that the calls to HROPEN, HBNT and HBNAME
*   may be placed in different intialisation routines.
*   In this case the Ntuple will be stored in directory //MYFILE.
*

        CALL HROPEN(1,'MYFILE','geant.ntup','N',1024,ISTAT)
        CALL HBNT(10,'Geant Ntuple',' ')
*
        CALL HBNAME(10, 'RUN',    IDRUN,  'IDRUN::16,IDEVT::16')
        CALL HBNAME(10, 'RUN',    IEORUN, 'IEORUN::16')
        CALL HBNAME(10, 'VECT',   VECT,   'VECT(6)')
        CALL HBNAME(10, 'GEKIN',  GEKIN,  'GEKIN')
        CALL HBNAME(10, 'INWVOL', INWVOL, 'INWVOL[1,7],ISTOP[1,7]')
        CALL HBNAME(10, 'NUMED',  NUMED,  'NUMED::10')
        CALL HBNAME(10, 'NSTEP',  NSTEP,  'NSTEP::16')
        CALL HBNAMC(10, 'TYPE',   TYPE,   'TYPE:C')
*
*    to fill the Ntuple, when the common blocks are filled just invoke
*    routine HFNT which knows the addresses and the number of variables.
*
        DO 10 I = 1, 1000000
           ...
           CALL HFNT(10)
10      CONTINUE
*
*    at the end of the job, proceed as usual
*
        CALL HROUT(10, ICYCLE, ' ')
        CALL HREND('MYFILE')
 
                        +------------------------+
                        | CALL  HFNTB (ID,CHBLOK) |
                        +------------------------+
                                  

Action: Fill the named block CHBLOK in a CWN.

Input parameters:
ID
Identifier of the Ntuple.
CHBLOK
Character variable specifying the block that is to be filled.

                            +----------------+
                            |CALL  HPRNT (ID) |
                            +----------------+

                                  

Action: Print the definition of the CWN ID as defined by the calls to HBNAME and/or HBNAMC.

Input parameter:
ID
Identifier of the CWN.

Recovery procedure

[sec:Ntuple-recovery]

The Ntuple header, containing the essential definitions associated with an Ntuple, are now written to the output file when the first buffer is written. If the job producing the Ntuple does not terminate in a clean way (i.e. the job crashs or you forgot to call HROUT), it is now possible to rebuild the Ntuple header from the information available in the file. Note, however, that the events corresponding to the last Ntuple buffer in memory are lost.

                        +------------------------+
                        | CALL  HRECOV (ID,CHOPT) |
                        +------------------------+
                                  

Action: Recover the information associated with a CWN.

Input parameter:
ID
Identifier of the CWN.
CHOPT
Character variable specifying the option desired. At present Not used at present; ' ' should be specified