PopTools logo

Numerical algorithms used in PopTools

Early versions of Excel (before 2003) used a pseudo-random number generator (PRNG) that had relatively poor statistical properties, and some of the statistical functions could return inaccurate results (Kn├╝sel 1998, McCullough & Wilson 1999). These issues are unlikely to affect most users, and appear to have been fixed in more recent versions (see, for example, Microsoft’s knowledgebase articles on the PRNG, and the Analysis Toolpack).

I have tried to use robust algorithms in PopTools. If you need to know about specific algorithm, e-mail me, and I may be able to send the code. If you would like PopTools to become completely open source, I will need to be inundated with requests :-)

Here are some general references for the algorithms used in PopTools:

  1. All PopTools functions and procedures that depend on random variables use the Mersenne Twister algorithm for generation of pseudo-random numbers. This PRNG has excellent statistical properties, and has passed George Marsaglia’s battery of tests called DIEHARD. All routines for generating random variables, and for shuffling and resampling call this routine rather than Excel’s built-in PRNG. The PopTools function dRandReal can be used as a replacement for Excel’s RAND() function.

  2. The REGRESSION and REGRESS routines are replacements for Excel’s LINEST function, which apparently had problems with near singular data matrices in Excel (before the version of 2003). The PopTools routines have been tested with the statistical reference datasets provided by the National Institutes of Standards in Technology (NIST). You can download a zipped workbook showing the results of the tests – and methods for using the functions – here. Main features of the function “Regression” are better performance than LINEST with highly collinear problems, and correct R2 with forced intercept. The function can be entered using the menu system.

  3. The new routines for parameter estimation using ML (PopTools version 2.0 and above) use open-source code in the TPMath library. Some routines for function minimisation in PopTools depend on Solver. Apparently, Solver can report solutions that are not local minima, especially with the default options (Kn├╝sel 1998, McCullough & Wilson 1999), although they may no longer be the case. Solver is an add-in made by Frontline Systems.

  4. Some routines in PopTools use Excels’ built-in statistical tables (eg, CHIINV, FINV and TTEST). If you see a P-value in the output, it probably depends on these routines, although the regression routines use the TPMath functions.

  5. None of the matrix routines depend on Excel’s built-in features; all use routines available in the open-source TPMath library.

  6. The likelihood routines do not use any of Excel’s calculation features. They should produce more reliable results at extreme values than Excel’s native formulas.

  7. I have tested the routines dPoisson and dNormDist: dPoisson gives accurate results; dNormDist gives slightly more accurate results than Excel’s NORMDIST function, but depends on a polynomial approximation that loses accuracy in the tails of the distribution.

  8. The ANOVA function in PopTools uses the simple algorithm discussed in various places (Sokal & Rohlf 1995, Kn├╝sel 1998, McCullough & Wilson 1999). It is useful – in the sense that it returns a dynamic result which can used, say, to simulate an experiment – but it should not be used for a formal statistical test, since it has the same inherent problems as Excel’s algorithm, which are described in the literature (McCullough & Wilson 1999, Kn├╝sel 1998, McCullough & Wilson 1999).

  9. The functions dVariance and dVarianceP introduced in PopTools 1.3.7 are replacements for Excel’s formulas (VAR and VARP). They use the usual formula: image rather than the hand computational formula image. I have tested then with Numacc3 and Numacc4. As stated in McCullough & Wilson (1999), they return several accurate digits with these data sets.


Footer image - matrix analysis