+----------------------------------+ | CALL HDIFF (ID1,ID2,PROB*,CHOPT) | +----------------------------------+Action: Statistical test of compatibility in shape between two histograms using the Kolmogorov test. The histograms are compared and the probability that they could come from the same parent distribution is calculated.
The comparison may be done between two 1-dimensional histograms or between two 2-dimensional histograms. For further details on the method, see [more info] below.
Remark:
It is possible to compare weighted with weighted histograms, and weighted with unweighted histograms, but only if HBOOK has been instructed to maintain the necessary information by appropriate calls (before filling) to HBARX. However it is not possible to take into account underflow or overflow bins if the events are weighted.
If there is saturation (more than the maximum allowed contents in one or more bins), the probability PROB is calculated as if the bin contents were exactly at their maximum value, ignoring the saturation. This usually will result in a higher value of PROB than would be the case if the memory allowed the full contents to be stored, but not always. It should therefore be realized that the results of HDIFF are not accurate when there is saturation, and it is the user's responsability to avoid this condition.
Routine HDIFF cannot work if the events are weighted, since, in the current version of HBOOK, the necessary information is not maintained. HDIFF will also refuse to compare 2-dimensional histograms if there is saturation, since it does not have enough information in this case.
The calculations in routine HDIFF are based on the Kolmogorov Test (See, e.g. [bib-EADIE], pages 269-270). It is usually superior to the better-known Chisquare Test for the following reasons:
In discussing the Kolmogorov test, we must distinguish between the two most important properties of any test: its power and the calculation of its confidence level.
The job of a statistical test is to distinguish between a null hypothesis (in this case: that the two histograms are compatible) and the alternative hypothesis (in this case: that the two are not compatible). The power of a test is defined as the probability of rejecting the null hypothesis when the alternative is true. In our case, the alternative is not well-defined (it is simply the ensemble of all hypotheses except the null) so it is not possible to tell whether one test is more powerful than another in general, but only with respect to certain particular deviations from the null hypothesis. Based on considerations such as those given above, as well as considerable computational experience, it is generally believed that tests like the Kolmogorov or Smirnov-Cramer-Von-Mises (which is similar but more complicated to calculate) are probably the most powerful for the kinds of phenomena generally of interest to high-energy physicists. This is especially true for two-dimensional data where the Chisquare Test is of little practical use since it requires either enormous amounts of data or very big bins.
Using the terms introduced above, the confidence level is just the probability of rejecting the null hypothesis when it is in fact true. That is, if you accept the two histograms as compatible whenever the value of PROB is greater than 0.05, then truly compatible histograms should fail the test exactly 5% of the time. The value of PROB returned by HDIFF is calculated such that it will be uniformly distributed between zero and one for compatible histograms, provided the data are not binned (or the number of bins is very large compared with the number of events). Users who have access to unbinned data and wish exact confidence levels should therefore not put their data into histograms, but should save them in ordinary Fortran arrays and call the routine TKOLMO which is being introduced into the Program Library. On the other hand, since HBOOK is a convenient way of collecting data and saving space, the routine HDIFF has been provided, and we believe it is the best test for comparison even on binned data. However, the values of PROB for binned data will be shifted slightly higher than expected, depending on the effects of the binning. For example, when comparing two uniform distributions of 500 events in 100 bins, the values of PROB, instead of being exactly uniformly distributed between zero and one, have a mean value of about 0.56. Since we are physicists, we can apply a useful rule: As long as the bin width is small compared with any significant physical effect (for example the experimental resolution) then the binning cannot have an important effect. Therefore, we believe that for all practical purposes, the probability value PROB is calculated correctly provided the user is aware that: