| |
 |
|
| Science Forum Index » Mathematics Forum » Probability to find a substring in a string |
|
Page 1 of 1 |
|
| Author |
Message |
| Robert Israel |
Posted: Fri Jul 14, 2006 4:55 pm |
|
|
|
Guest
|
In article <1152878479.687483.57580@b28g2000cwb.googlegroups.com>,
Eighty <eightyx@gmail.com> wrote:
[quote:b564c00dc5]How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?
(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)
[/quote:b564c00dc5]
It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada |
|
|
| Back to top |
|
|
|
| Eighty |
Posted: Fri Jul 14, 2006 7:22 pm |
|
|
|
Guest
|
[quote:e3c0d276c6]It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like[/quote:e3c0d276c6]
"bob" and "coco" do).
My above calculation was incorrect, correct one here:
Assuming the above, we can write
f(n)=(r^0-f(0))r^(n-d) + ... + (r^(n-d)-f(n-d))r^0
which gives us the recurrence
f(n)-rf(n-1)+f(n-d)=r^(n-d) where f(n) = 0 for n<d
and substituting f(n)=r^n (1+h(n)), we get the simpler
h(n)-h(n-1)+h(n-d)/r^d=0 where h(n) = -1 for n<d.
However, this recurrence's characteristic polynomial is in general not
solvable, so I guess neither is the problem (for d>4).
But if you know if something's been written on this, I'd like to read
it. |
|
|
| Back to top |
|
|
|
| Robert Israel |
Posted: Fri Jul 14, 2006 8:28 pm |
|
|
|
Guest
|
Eighty wrote:
[quote:842e0a20fb]It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like
"bob" and "coco" do).
[/quote:842e0a20fb]
Suppose f(n) is the number you want, where there are n letters, the
string has length n and the substring has length d, and "the substring
doesn't end like it starts", i.e. there is no j in {1,...,d-1) such
that the substring's initial j letters form its final j letters.
Then f(n) - r f(n-1) + f(n-d) = r^(n-d), with f(n) = 0 for n < d.
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada |
|
|
| Back to top |
|
|
|
| Eighty |
Posted: Fri Jul 14, 2006 9:16 pm |
|
|
|
Guest
|
[quote:b5e59656ed]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the[/quote:b5e59656ed]
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.  |
|
|
| Back to top |
|
|
|
| Robert Israel |
Posted: Sun Jul 16, 2006 6:12 am |
|
|
|
Guest
|
Eighty wrote:
[quote:afdde2cf0a]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.
[/quote:afdde2cf0a]
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada |
|
|
| Back to top |
|
|
|
| Robert Israel |
Posted: Sun Jul 16, 2006 6:12 am |
|
|
|
Guest
|
Eighty wrote:
[quote:8035cb4da1]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated.
[/quote:8035cb4da1]
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada |
|
|
| Back to top |
|
|
|
| Robert Israel |
Posted: Sun Jul 16, 2006 3:17 pm |
|
|
|
Guest
|
In article <1153041139.092376.109100@b28g2000cwb.googlegroups.com>,
Robert Israel <israel@math.ubc.ca> wrote:
[quote:027f9b4603]
Eighty wrote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. :)
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
[/quote:027f9b4603]
I think "Z transform" is the name to look up.
[quote:027f9b4603]BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.
[/quote:027f9b4603]
Since the contribution of the other roots to f(n) is small,
a good approximation should be obtained using only the root close
to r. If we write X = rY, we have
(rY)^d - r (rY)^(d-1) + 1 = r^d (Y^d - Y^(d-1) + r^(-d))
so our approximation to f(n) is
r^n - r^(n+1) y^(n+1)/(r - d r^(1-d) y^(1-d))
= r^n (1 - y^(n+1)/(1 - d r^(-d) y^(1-d)))
where y = g(r^(-d)), if g(z) is the root near 1 of
Y^d - Y^(d-1) + z.
The Lagrange Inversion Formula says g(z) has a series expansion
g(z) = 1 - sum_{m=1}^infty (product_{j=2}^m (md-j)) z^m/m!
which should converge for |z| < 1/(d e) according to the Ratio
Test.
Thus for d = 7 and r = 26, taking the sum up to m=5 yields
g(26^(-7)) ~ 4178084238098676561481061015118275629148483030241
/ 4178084238618868666397164711796811655402510876672
and the approximation
f(n) ~ 26^n (1 - y^(n+1)/(1-7/(26^7 y^6)))
using this value for y appears to have a relative error of less than
2*10^(-9) for all n >= 7, and less than 10^(-40) for n >= 28.
Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Sat Nov 07, 2009 11:31 pm
|
|