Calculation of the P-Value
|
Protein
length |
L
|
Repeat
length |
r
|
Number
of repeats |
n
|
The probability P to
find at least n
repeats of length r in a random sequence of length L equals the number
of possible sequences that contain the desired repeat, divided by the
total number of possible sequences.
|
|
P* is an
overestimate because the sequences with more than n repeats are counted
too often. This is taken into account by the inclusion-exclusion
principle. The corrected formula for P is:
|
|
Which converts as follows:
|
|
REPTILE stops calculating the sum as soon as the first six digits of the
mantissa have converged. This speeds up computation since the
contributions of the summands to P decrease very rapidly with
increasing
number of repeats.
|
|