In a subsequent test on some 10,740 formulas from the same source
[Gradshtein and Rhyzik,
A Table of Integrals, Series, and Products]
(extracted from the file GRAD.DAT on the CDROM), a slightly
modified version of the recognizer announced 5906 errors and presumed
success for the remaining 4834. Again, the success rate is actually
lower, since the presumed successes include some 170 formulas with
unhandled derivative forms, as well as numerous unflagged,
semantically questionable forms. Of the 5906 reported errors, some
1878 are due to unrecognized control sequences that we have not yet
considered, including matrix constructions, equation alignment
sequences, and macros for many special function names. An additional
804 errors are due to \hbox constructions with unrecognized
contents, including more special function names and embedded narrative
comments. We suspect, therefore, that simply handling more special
function names (and their more complicated super/sub-scripting) would
allow the engine to recognize several thousand additional formulas.
Other errors are due to formula forms not handled by the grammar,
including some 300 ellipsis constructions (\dots,
\cdots, \ldots),
and forms with unexpected punctuation
or bracing (we do not count the 300 ellipses as "unrecognized control
sequences" since we recognize them -- we just do not know what to do
with them!). Note that the stated error counts are in fact estimates
which come from tallying parser error diagnostics, and may therefore
be inaccurate due to cascading of errors.