Deal of the Month: 50% Discount on Windows 7 (Limited Amazon.com offer) Main Page | Report this Page
Science Forum Index  »  Mathematics Forum  »  Probability to find a substring in a string
Page 1 of 1    

Probability to find a substring in a string

Author Message
Robert Israel
Posted: Fri Jul 14, 2006 4:55 pm
Guest
In article <1152878479.687483.57580@b28g2000cwb.googlegroups.com>,
Eighty <eightyx@gmail.com> wrote:
[quote:b564c00dc5]How many strings of length n are there that contain a substring of
length d (at least once), if every character can assume r different
values?

(or the equivalent problem: what's the probability to find a substring
of length d in a string of length n?)
[/quote:b564c00dc5]
It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
 
Eighty
Posted: Fri Jul 14, 2006 7:22 pm
Guest
[quote:e3c0d276c6]It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like[/quote:e3c0d276c6]
"bob" and "coco" do).

My above calculation was incorrect, correct one here:

Assuming the above, we can write
f(n)=(r^0-f(0))r^(n-d) + ... + (r^(n-d)-f(n-d))r^0
which gives us the recurrence
f(n)-rf(n-1)+f(n-d)=r^(n-d) where f(n) = 0 for n<d
and substituting f(n)=r^n (1+h(n)), we get the simpler
h(n)-h(n-1)+h(n-d)/r^d=0 where h(n) = -1 for n<d.

However, this recurrence's characteristic polynomial is in general not
solvable, so I guess neither is the problem (for d>4).

But if you know if something's been written on this, I'd like to read
it.
 
Robert Israel
Posted: Fri Jul 14, 2006 8:28 pm
Guest
Eighty wrote:
[quote:842e0a20fb]It depends on the substring. For example, for r = 2, n = 4 and d = 3
there are four strings containing 101, but only three containing 000.
Right. Let's assume the substring doesn't end like it starts (like
"bob" and "coco" do).
[/quote:842e0a20fb]
Suppose f(n) is the number you want, where there are n letters, the
string has length n and the substring has length d, and "the substring
doesn't end like it starts", i.e. there is no j in {1,...,d-1) such
that the substring's initial j letters form its final j letters.
Then f(n) - r f(n-1) + f(n-d) = r^(n-d), with f(n) = 0 for n < d.
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
 
Eighty
Posted: Fri Jul 14, 2006 9:16 pm
Guest
[quote:b5e59656ed]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the[/quote:b5e59656ed]
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile
 
Robert Israel
Posted: Sun Jul 16, 2006 6:12 am
Guest
Eighty wrote:
[quote:afdde2cf0a]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile
[/quote:afdde2cf0a]
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
 
Robert Israel
Posted: Sun Jul 16, 2006 6:12 am
Guest
Eighty wrote:
[quote:8035cb4da1]And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. Smile
[/quote:8035cb4da1]
Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.

BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
 
Robert Israel
Posted: Sun Jul 16, 2006 3:17 pm
Guest
In article <1153041139.092376.109100@b28g2000cwb.googlegroups.com>,
Robert Israel <israel@math.ubc.ca> wrote:
[quote:027f9b4603]
Eighty wrote:
And the solution is f(n) = r^n - sum_z z^(n+1)/(r - d z^(1-d))
where the sum is over the roots z of the polynomial X^d - r X^(d-1) + 1
(I assume there are no multiple roots).
Thanks! How did you arrive at that? That is, how did you get the
explicit solution for the homogeneous equation f(n) - r f(n-1) + f(n-d)
= 0 with f(n)=0, n < d, without all the annoying gaussian elimination?
It's probably a standard method, so the name of it, or a link to a
paper would be appreciated. :)

Actually I cheated and used Maple, but it should be fairly
straightforward
to check that this works.
[/quote:027f9b4603]
I think "Z transform" is the name to look up.

[quote:027f9b4603]BTW, you may be interested to know that when r is large, the
polynomial X^d - r X^(d-1) + 1 has one root very close to r - r^(1-d)
while
the others are close to the (d-1)'th roots of 1/r.
[/quote:027f9b4603]
Since the contribution of the other roots to f(n) is small,
a good approximation should be obtained using only the root close
to r. If we write X = rY, we have
(rY)^d - r (rY)^(d-1) + 1 = r^d (Y^d - Y^(d-1) + r^(-d))
so our approximation to f(n) is
r^n - r^(n+1) y^(n+1)/(r - d r^(1-d) y^(1-d))
= r^n (1 - y^(n+1)/(1 - d r^(-d) y^(1-d)))
where y = g(r^(-d)), if g(z) is the root near 1 of
Y^d - Y^(d-1) + z.

The Lagrange Inversion Formula says g(z) has a series expansion

g(z) = 1 - sum_{m=1}^infty (product_{j=2}^m (md-j)) z^m/m!

which should converge for |z| < 1/(d e) according to the Ratio
Test.

Thus for d = 7 and r = 26, taking the sum up to m=5 yields
g(26^(-7)) ~ 4178084238098676561481061015118275629148483030241
/ 4178084238618868666397164711796811655402510876672

and the approximation
f(n) ~ 26^n (1 - y^(n+1)/(1-7/(26^7 y^6)))
using this value for y appears to have a relative error of less than
2*10^(-9) for all n >= 7, and less than 10^(-40) for n >= 28.

Robert Israel israel@math.ubc.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Sat Nov 07, 2009 11:31 pm