| |
 |
|
|
Science Forum Index » Statistics - Math Forum » interaction of two normalized distributions...
Page 1 of 1
|
| Author |
Message |
| zikester... |
Posted: Wed Jun 25, 2008 9:13 am |
|
|
|
Guest
|
I'm a stats newbie, so I have a basic question here:
I'm trying to model a sports simulation where given a batter who has a
X% history of performing some event ( like striking out ) in a single
at bat, and a pitcher who has a Y% history of performing the flipside
of that event ( like striking someone out ), determining what the odds
are of that batter performing that event when facing that pitcher in a
single at bat. One basic way I've thought of this is to treat
striking out / striking someone out as traits that follow a normal
distribution. Thus, I first take league-wide data and calculate what
the mean/stddev is for the league for both traits. Then I establish
what percentile the batter and pitcher have at their respective
traits.
The part I'm wondering is how these should interact in a
mathematically plausible way. The assumption I'm making is that the
pitcher has a Y% chance of striking someone out when facing a batter
at the mean of the curve ( similar logic for the batter ). Also, for
simplicity, I think I want to use just one normal distribution per
event type ( the same one for both batters and pitchers ): so now we
have one normal distribution called "strikeout tendency". So say a
pitcher is at the 75th percentile at striking guys out, and a batter
at the 50th; then the result should be whatever the strikeout rate is
at the 75th percentile. What about pitcher=75th %ile ( strikes
people out more than average ), batter=25 %ile ( strikes out less than
average )---intuitively this should even out somewhere around the
mean. But pitcher=75th %ile, batter=75th %ile should push the result
further away from the mean. Any ideas would be appreciated! |
|
|
| Back to top |
|
| RichUlrich... |
Posted: Thu Jun 26, 2008 3:35 pm |
|
|
|
Guest
|
On Wed, 25 Jun 2008 12:13:30 -0700 (PDT), zikester
<isaacyho at (no spam) gmail.com> wrote:
Quote: I'm a stats newbie, so I have a basic question here:
I'm trying to model a sports simulation where given a batter who has a
X% history of performing some event ( like striking out ) in a single
at bat, and a pitcher who has a Y% history of performing the flipside
of that event ( like striking someone out ), determining what the odds
are of that batter performing that event when facing that pitcher in a
single at bat. One basic way I've thought of this is to treat
striking out / striking someone out as traits that follow a normal
distribution. Thus, I first take league-wide data and calculate what
the mean/stddev is for the league for both traits. Then I establish
what percentile the batter and pitcher have at their respective
traits.
The part I'm wondering is how these should interact in a
mathematically plausible way. The assumption I'm making is that the
pitcher has a Y% chance of striking someone out when facing a batter
at the mean of the curve ( similar logic for the batter ). Also, for
simplicity, I think I want to use just one normal distribution per
event type ( the same one for both batters and pitchers ): so now we
have one normal distribution called "strikeout tendency".
Something like this has been asked before, and I don't recall that
there was ever any good reference on the problem. A version where
it is more natural to have the same distribution is model of League
competition -- Two teams each have a winning percentage belonging
to the same distribution, and what are the Odds that the better team
will win, based on their separate records?
Two comments right away: Logistic is more likely to fit than Normal,
IMHO; and Pitchers versus Batters are not a very good fit for the
assumption that the distributions are the same.
Quote: So say a
pitcher is at the 75th percentile at striking guys out, and a batter
at the 50th; then the result should be whatever the strikeout rate is
at the 75th percentile. What about pitcher=75th %ile ( strikes
people out more than average ), batter=25 %ile ( strikes out less than
average )---intuitively this should even out somewhere around the
mean. But pitcher=75th %ile, batter=75th %ile should push the result
further away from the mean. Any ideas would be appreciated!
One determining factor would seem to be the overall "variance" or
extremeness of winning and losing. That is, for a league of teams:
The chance of either team winning remains pretty random when the
maximum Wins falls in a narrow range, say, 40% to 60%. If, on the
other hand, for a long schedule, team range from " practically
undefeated" to "almost no wins", then there is high determination
taking place, based on superiority.
- So, if you can combine the "extremeness" of outcomes, it will
be a pragmatic matter of whether you might (say) add the z-scores
to get a z-score for an estimated outcome and get a useful result.
You might be able to take that number and "regress to the mean"
to get a pragmatic estimate. I would think of using Monte Carlo
to find what works. But maybe it could be set up as a regression
problem using real data.
- One awkwardness of looking at pitchers and batters, I suspect,
will arise from the opposite skewness of curves. Some pitchers
specialize in strike-outs (15% or more); some batters specialize in
non-strikeouts (under 2%); and the opposite extremes are not
as likely.
--
Rich Ulrich |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Fri Dec 05, 2008 10:24 am
|
|