Calculation of the P-Value

Protein length L
Repeat length r
Number of repeats n


The probability P to find
at least n repeats of length r in a random sequence of length L equals the number of possible sequences that contain the desired repeat, divided by the total number of possible sequences.

Formula 1


P* is an overestimate because the sequences with more than n repeats are counted too often. This is taken into account by the inclusion-exclusion principle. The corrected formula for P is:

Formula 2


Which converts as follows:

Formula 3

REPTILE stops calculating the sum as soon as the first six digits of the mantissa have converged. This speeds up computation since the contributions of the summands to P decrease very rapidly with increasing number of repeats.



back