Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Chi-square GOF new (?) procedure...
Page 1 of 1    
Author Message
Luis A. Afonso...
Posted: Wed May 07, 2008 10:39 am
Guest
Chi-square GOF new (?) procedure


This matter is trivial but no consequences deprived.

__First of all the test statistics is intrinsically DISCONTINUOUS: To approximate it to a continuous one leads always to errors that could be negligible on the most favorable cases.
The domain is formed by isolate single values that haven’t regular frequencies: immediately following a scarce populated value (and after many absent ones, the holes) one can find a substantially populated one and vice-versa.
(It’s not hard to get that ALL SAMPLE STATISTICS based on RANKS are discontinues too).
__Classically had have discussion what are the minimum values by class to which the Chi-squared Distribution fits the real test statistics: one can find (in Literature) advising expecting 5 (10) values per class, others the number the empty classes should not be greater than 20% of the total classes.
**********************************
_______________[ 1 ]

NOTE : Here are the limiting points that lead to Equiprobability k intervals for standard normal Distribution

_k=4 +/-0.67449 0
_k=5 +/-0.84152 +/-0.25335
_k=6 +/-0.96742 +/-0.43073 0
_k=7 +/-1.06757 +/-0.25295 +/-0.18001
_k=8 +/-1.15035 +/-0.67449 +/-0.31864 0

For example n= 4 (four classes) one have the intervals:
(-infinity, -0.67449], (-.67119, 0 ] , [0 , 0.67449] , (0.67449, + infinity), then EACH ONE HAVING PROBABILITY = 0.25,
This note suggests a new (?) way to face the GOF: instead dividing into classes having arbitrarily (constant) width, we propose that they should enclose CONSTANT 1/k PROBABILITIES of data..
We see two vantages:
__1) Equalization of the EXPECTED number of values per class: n/ k (n the sample size),
__2) Using M. C. simulation one can get ride the Chi squared Distribution once for all..
__3) Sample Statistics (the usual Chi squared) have equal critical values no matter the fitting Distribution.
__4) in order to obtain CRITICAL VALUES one can rely on samples obtained from Random Numbers.
__5) The Sample Statistics critical values are EXACT (but hardly contain the conventional 0.95, 0.99, etc probabilities).
__6) contrarily to the conventional way grouping classes (in order all of them have at least 5 values) is not set.

**********************************
_______________[ 2 ] Numerical Example

_Model :____N(0,1):10_____classes =5

(# samples = 400´000)

__a)__using U(0, 1) samples (2 attempts)

__Crit. Val.____ prob.first__prob.second
_____7________0.904____0.905
_____8________0.945____0.946
_____9________0.960____0.961
____11________0.986____0.986

__b)__using N(0, 1) samples

__Crit. Val.____ prob.first__prob.second
_____7________0.904____0.904
_____8________0.944____0.946
_____9________0.960____0.960
____11________0.986____0.986


Note that the Chi2(4) values are rather different::
______________7_____0.864
______________8_____0.908
______________9_____0.939
_____________11_____0.975

[critical value (5% significance level) = 9.488]



Luis Amaral Afonso (The Moderator Destroyer)
pamela fluente...
Posted: Wed May 07, 2008 11:25 am
Guest
On 7 Mag, 22:39, "Luis A. Afonso" <lic... at (no spam) hotmail.com> wrote:
Quote:
Chi-square GOF new (?) procedure

This matter is trivial but no consequences deprived.

__First of all the test statistics is intrinsically DISCONTINUOUS: To approximate it to a continuous one leads always to errors that could be negligible on the most favorable cases.
The domain is formed by isolate single values that haven’t regular frequencies: immediately following a scarce populated value (and after many absent ones, the holes) one can find a substantially populated one and vice-versa..
(It’s not hard to get that ALL SAMPLE STATISTICS based on RANKS are discontinues too).
__Classically had have discussion what are the minimum values by class to which the Chi-squared Distribution fits the real test statistics: one can find (in Literature) advising expecting 5 (10) values per class, others the number the empty classes should not be greater than 20% of the total classes.
**********************************
_______________[ 1 ]

NOTE : Here are the limiting points that lead to Equiprobability k intervals for standard normal Distribution

_k=4    +/-0.67449      0
_k=5    +/-0.84152      +/-0.25335
_k=6    +/-0.96742      +/-0.43073      0
_k=7    +/-1.06757      +/-0.25295      +/-0.18001
_k=8    +/-1.15035      +/-0.67449      +/-0.31864      0

For example n= 4 (four classes) one have the intervals:
(-infinity, -0.67449], (-.67119, 0 ] , [0 , 0.67449] , (0.67449, + infinity), then EACH ONE HAVING PROBABILITY = 0.25,
This note suggests a new (?) way to face the GOF: instead dividing into classes having arbitrarily (constant) width, we propose that they should enclose CONSTANT  1/k PROBABILITIES of data..
We see two vantages:
__1) Equalization of the EXPECTED number of values per class: n/ k (n the sample size),
__2) Using M. C. simulation one can get ride the Chi squared Distribution once for all..
__3) Sample Statistics (the usual Chi squared) have equal critical values no matter the fitting Distribution.
__4) in order to obtain CRITICAL VALUES one can rely on samples obtained from Random Numbers.
__5) The Sample Statistics critical values are EXACT (but hardly contain the conventional 0.95, 0.99, etc probabilities).
__6) contrarily to the conventional way grouping classes (in order all of them have at least 5 values) is not set.

**********************************
_______________[ 2 ] Numerical Example

_Model :____N(0,1):10_____classes =5  

(# samples = 400´000)

__a)__using U(0, 1) samples (2 attempts)

__Crit. Val.____ prob.first__prob.second
_____7________0.904____0.905
_____8________0.945____0.946
_____9________0.960____0.961
____11________0.986____0.986

__b)__using N(0, 1) samples

__Crit. Val.____ prob.first__prob.second
_____7________0.904____0.904
_____8________0.944____0.946
_____9________0.960____0.960
____11________0.986____0.986

Note that the Chi2(4) values are rather different::
______________7_____0.864
______________8_____0.908
______________9_____0.939
_____________11_____0.975

[critical value (5% significance level) = 9.488]

Luis Amaral Afonso (The Moderator Destroyer)


If you want to reduce all test to pure simulation, i have the
impression that there is no point using such a test statistics, which
"loses" information by grouping (and isn't the easiest to simulate).

I would go for some measure of "difference" between the cdf's. You
can image one which does not "lose" information and is simple to
compute. Since you do not care about distribution derivation or
asymptotics, I would go for the way which is computationally more
convenient.

Anyway, i don't think statisticians are much interested in such an
approach. It's too trivial that by computer one can approximate a
distribution of any quantity as accurately as he/she wishes. Even a
child can do that and it requires no knowledge of statistical
sciences. Statisticians, I believe, are mostly concerned with the
theoretical study of statistics and their distributions.

-P
Luis A. Afonso...
Posted: Wed May 07, 2008 12:22 pm
Guest
Perneta said:

If you want to reduce all test to pure simulation, i have the impression that there is no point using such a test statistics, which "loses" information by grouping (and isn't the easiest to simulate).
I would go for some measure of "difference" between the cdf's. You can image one which does not "lose" information and is simple to compute. Since you do not care about distribution derivation or asymptotics, I would go for the way which is computationally more convenient.
Anyway, i don't think statisticians are much interested in such an approach. It's too trivial that by computer one can approximate a distribution of any quantity as accurately as he/she wishes. Even a child can do that and it requires no knowledge of statistical sciences. Statisticians, I believe, are mostly concerned with the theoretical study of statistics and their distributions. P
****************
__1__I never, ever, though to reduce ALL tests to simulation, PERNETA
__2__My goal was much less ambitious: to find exact critical values for the Chi squared one sample test
__3__Every one knows that one does lose information by grouping data (WHAT A NEW THING you are telling!). One can be sure that the test has low power: the Type II error is so large that we easily FAIL TO REJECT H0 even if data doesn’t follow the proposed Distribution.
__4__What you’re saying about tests using the difference in the two Distributions is correct, however something OUT OF CONTEXT, aren’t it, PERNETA?

Luis Amaral Afonso
pamela fluente...
Posted: Wed May 07, 2008 10:15 pm
Guest
On 8 Mag, 00:22, "Luis A. Afonso" <lic... at (no spam) hotmail.com> wrote:
Quote:
Perneta said:

If you want to reduce all test to pure simulation, i have the impression that there is no point using such a test statistics, which "loses" information by grouping (and isn't the easiest to simulate).
 I would go for some measure of "difference" between the cdf's. You can image one which does not "lose" information and is simple to compute. Since you do not care about distribution derivation or asymptotics, I would go for the way which is computationally more convenient.
 Anyway, i don't think statisticians are much interested in such an approach. It's too trivial that by computer one can approximate a distribution of any quantity as accurately as he/she wishes. Even a child can do that and it requires no knowledge of statistical sciences. Statisticians, I believe, are mostly concerned with the theoretical study of statistics and their distributions. P
****************
__1__I never, ever, though to reduce ALL tests to simulation, PERNETA
__2__My goal was much less ambitious: to find exact critical values for the Chi squared one sample test
__3__Every one knows that one does lose information by grouping data (WHAT A NEW THING you are telling!). One can be sure that the test has low power: the Type II error is so large that we easily FAIL TO REJECT H0 even if data doesn’t follow the proposed Distribution.
__4__What you’re saying about tests using the difference in the two Distributions is correct, however something OUT OF CONTEXT, aren’t it, PERNETA?

Luis Amaral Afonso



Quote:
aren’t it ?

You can call me PERNACCHIA, which is the appropriate way to respond
to your posts.

See http://www.youtube.com/watch?v=gkrnK0igAP0 for an explanation.

-P
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Fri Dec 05, 2008 7:24 am