Main Page | Report this Page
 
   
Science Forum Index  »  Space - Consult Forum  »  Effect size with no control group?
Page 2 of 3    Goto page Previous  1, 2, 3  Next
Author Message
Guest
Posted: Fri Nov 24, 2006 8:41 am
scientia@ipotesi.net wrote:
Quote:
Thanks to everybody for the explanations and the suggestions.
I understand all the math now.
About the experiment: I think we will start with a simple one,
with a pre-test and a post-test, without a control group, just to
see if the effect size is large as we expect (d = -0.7 or so).
Then we can look forward to make a full experiment, with
a control group (more than 10 subjects in each group).
Fabrizio Coppola


Remember that the effect size you see in a small experiment can be quite
different from what you'll see in a larger experiment. People often forget
that small sample size is associated not only with low power but also unstable
effect size estimates.
David A. Heiser
Posted: Sun Nov 26, 2006 2:38 pm
Guest
"Marc Schwartz" <marc_schwartz@comcast.net> wrote in message
news:BNOdna-gfKynlv7YnZ2dnUVZ_ridnZ2d@comcast.com...
Quote:
scientia@ipotesi.net wrote:
First of all, thanks to everybody who answered.
Later I will post something about the topics explained
by Bruce Weaver and Marc Schwartz.
+++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++
Quote:
A final question: I see that tables of Student's t distribution
never report alpha beyond 0.001: is this considered extreme?

The particular cutoff can be, to some extent, community standard specific.
These days, most folks use software (such as R[1], which is what I use)
and these will usually calculate p values to 15 or 16 significant digits.
However, save for special circumstances, most will report extreme values
as:

p < 0.0001

Four decimal places is common.

HTH,
From a historical view (post R.A. Fisher), the 4 decimal approach to p

values is what has been used. However this creates an unsolvable problem.
Does this mean then at all of the millions of software routines developed
since 1950 in all concevable computer languages and instruction objects to
calculate p values only needs to be accurate to 4 decimals? There are still
people out there that are writting new distribution p value routines (such
as Alfonso). How then can we evaluate stat programs for correctness when the
only criteria is 4 decimal places.?

Or is this issue of correctness moot?

David Heiser
Richard Ulrich
Posted: Mon Nov 27, 2006 12:07 am
Guest
On Sun, 26 Nov 2006 10:38:59 -0800, "David A. Heiser"
<dah_box1@innercite.com> wrote:

Quote:

"Marc Schwartz" <marc_schwartz@comcast.net> wrote in message
news:BNOdna-gfKynlv7YnZ2dnUVZ_ridnZ2d@comcast.com...
scientia@ipotesi.net wrote:
First of all, thanks to everybody who answered.
Later I will post something about the topics explained
by Bruce Weaver and Marc Schwartz.
+++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++
A final question: I see that tables of Student's t distribution
never report alpha beyond 0.001: is this considered extreme?

The particular cutoff can be, to some extent, community standard specific.
These days, most folks use software (such as R[1], which is what I use)
and these will usually calculate p values to 15 or 16 significant digits.
However, save for special circumstances, most will report extreme values
as:

p < 0.0001

Four decimal places is common.

HTH,
From a historical view (post R.A. Fisher), the 4 decimal approach to p
values is what has been used. However this creates an unsolvable problem.
Does this mean then at all of the millions of software routines developed
since 1950 in all concevable computer languages and instruction objects to
calculate p values only needs to be accurate to 4 decimals? There are still
people out there that are writting new distribution p value routines (such
as Alfonso). How then can we evaluate stat programs for correctness when the
only criteria is 4 decimal places.?

Or is this issue of correctness moot?

In some research, folks use more precise values, when they
need to correct for multiplicity of testing.

I think that the issue for a lot of research is that the
"correctness" is very poor beyond 0.01, since the practical
inaccuracy of parametric test statistics increases in tails.
How bad is the violation of assumptions? It shows up in
the precision of the test. In clinical research, for the
usual scaled data, etc., there is also advice that Bonferroni
testing should be limited to no more than 10 or 15 variables.

Back to the accuracy --
You can fairly safely assume that your "0.05" nominal t-test
is more precisely within 10%, say, 0.045 to 0.055. But your
nominal 0.01 test might be twice that 0.01, or half that size,
and the 0.001 test is worse. Also, with a single test at 0.001,
you ought to be pretty positive that your result is not 'random',
and you should become a lot more concerned with the actual
size and what it denotes.


--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Reef Fish
Posted: Mon Nov 27, 2006 1:18 am
Guest
Old Mac User wrote:
Quote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7

If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

Anything else that is done, is merely pedantic, for THIS
data set and this one only.

While there are worthwhile discussions IN GENERAL,
without specific data and CONTEXT of the physics
experiment, about tests and controls, etc., this is the
case that any discussion of a "significant" test or p-
value is of values ONLY to the drunks looking at the
lamp post for support.

What attracted me to this thread in the first place was the
mention of "Cohen's effect size", and then one of the Cohen
Quacks in these groups, Richard Ulrich.

I have yet to see one single idea of Cohen that is worth
consideration for his artificially contrived measures and
terms outside of the mainstream of statistics.

In this particular case, once the DATA is given, the rest of
the discussant (except perhaps Old Mac User) who
wanted to see the DATA in the first place and merely
completed the routine arithmetic to come to the obvious
conclusion, the rest of the discussants seemed to be
BLIND to the OBVIOUS, poking all around in search of
some objects for "support" as a drunk seeks support to
keep from rolling off his feet.

What DIFFERENCE does it make what p-value that
correspond, to two decimals or 4 decimals?

The fact that all 10 differences are positive has one
chance in 2^10 or 1024, even if there is no parametric
information to get support on some T-lamppost or
Z-lamppost.

OMU's t value of 22.85 for 9 d.f. merely gives a

pedantic and worthless p-value of less than .000000001.

"So what? " as that question had been asked in other
contexts in this thread, but irrelevant to the actual DATA
and experiment.

Does anyone really NEED that number before they can
see the VERY OBVIOUS, from the raw data, without
any calculation, except perhaps my mental note of 1
in 1024?

-- Reef Fish Bob.
Anon.
Posted: Mon Nov 27, 2006 2:30 am
Guest
Reef Fish wrote:
Quote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7

If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

I don't see the logic here: you seem to be suggesting that the data was

faked purely because all of the differences were in the same direction.
Do I understand you correctly?

<snip>

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Reef Fish
Posted: Mon Nov 27, 2006 2:49 am
Guest
Anon. wrote:
Quote:
Reef Fish wrote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7

If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

I don't see the logic here: you seem to be suggesting that the data was
faked purely because all of the differences were in the same direction.
Do I understand you correctly?

snip

You completely MISUNDERSTOOD my point which I elaborated
in the rest of my post, as you always misunderstood everything
I have ever posted.

There was no sign or indication, from what the OP said or others
said, about the data being faked.

My point was that the CONCLUSION of "significant effect" was SO
OBVIOUS from the data, that it borders something that people
MIGHT think the data was faked. It was that "extremely obviously
significant" in the statistical sense that hardly required any
pedantic support.

The point was that everyone was seeking SUPPORT like a drunk
seek a lamp post, by statistical significance or p-value and other
academic mumbo jumbo that they seemed to have overlooked the
perfectly OBVIOUS result that the result was significant.
Anon.
Posted: Mon Nov 27, 2006 3:38 am
Guest
Reef Fish wrote:
Quote:
Anon. wrote:
Reef Fish wrote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7
If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

I don't see the logic here: you seem to be suggesting that the data was
faked purely because all of the differences were in the same direction.
Do I understand you correctly?

snip

You completely MISUNDERSTOOD my point which I elaborated
in the rest of my post, as you always misunderstood everything
I have ever posted.

There was no sign or indication, from what the OP said or others
said, about the data being faked.

My point was that the CONCLUSION of "significant effect" was SO
OBVIOUS from the data, that it borders something that people
MIGHT think the data was faked. It was that "extremely obviously
significant" in the statistical sense that hardly required any
pedantic support.

The point was that everyone was seeking SUPPORT like a drunk
seek a lamp post, by statistical significance or p-value and other
academic mumbo jumbo that they seemed to have overlooked the
perfectly OBVIOUS result that the result was significant.

Ah, OK, but it wasn't clear: it would have helped if you make your main

point clear at the start, not several paragraphs in. You argument reads
as "I think the data might be fake, because of the low p-value".

Incidentally, the OP was asking about the effect size, not a test of the
null hypothesis anyway!

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Reef Fish
Posted: Mon Nov 27, 2006 7:29 am
Guest
Anon. wrote:
Quote:
Reef Fish wrote:
Anon. wrote:
Reef Fish wrote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7
If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

I don't see the logic here: you seem to be suggesting that the data was
faked purely because all of the differences were in the same direction.
Do I understand you correctly?

snip

You completely MISUNDERSTOOD my point which I elaborated
in the rest of my post, as you always misunderstood everything
I have ever posted.

There was no sign or indication, from what the OP said or others
said, about the data being faked.

My point was that the CONCLUSION of "significant effect" was SO
OBVIOUS from the data, that it borders something that people
MIGHT think the data was faked. It was that "extremely obviously
significant" in the statistical sense that hardly required any
pedantic support.

The point was that everyone was seeking SUPPORT like a drunk
seek a lamp post, by statistical significance or p-value and other
academic mumbo jumbo that they seemed to have overlooked the
perfectly OBVIOUS result that the result was significant.

Ah, OK, but it wasn't clear: it would have helped if you make your main
point clear at the start, not several paragraphs in.

It was clear in my first sentence.

Quote:
You argument reads
as "I think the data might be fake, because of the low p-value".

Why would I need any p-value to know that the effect was OBVIOUS?


Quote:

Incidentally, the OP was asking about the effect size, not a test of the
null hypothesis anyway!

No, Anon Bob. You're the one who ALWAYS came late. NEVER
read any part of the thread, and immediately jumped into where I
entered and immediately made your NOISE showing that you hadn't
read anything and didn't understand anything I said.

In this particular case, the OP had already showed what he meant on
Nov 21, that is Day 2 of the thread, after he had already seen OMU's
paired t-test analysis, when he wrote:

scientia> I felt that it had to be much more significant than z = 3
scientia> but I was not able to understand how to calculate
scientia> the real t.

which was essentially saying, I had quite a few drinks of vodka, I
wasn't quite sure how strong the lamp post has to be to support me
from falling.

Other drunks followed, Schwartz, Heiser, and Ulrich, DAYS later,
until I read it for the first time on Nov 27 and couldn't stand it any
longer to make my comment.

On Nov 27, I DID make it a point to go back to read from the
start.

Anon Bob O'Hara, on the other hand, jumped in ONLY because I
posted, making a fool of himself (as usual), and continue to show
his foolishness by showing that he hadn't read ANY of the earlier
posts when he continued to dance with the foot in his mouth.

Anon Bob> > Incidentally, the OP was asking about the effect size,
Anon Bob> > not a test of the null hypothesis anyway!

You are the ONLY one in this thread who failed to recognize that
he was involved with a statistical test of hypothesis while all the
others were talking about designs and control, completely oblivous
to looking at the DATA he had shown OMU.

That was my POINT that you had missed completely, Anon Bob.

You never THINK, of the infinitesimal bit of statistics you picked
up from your biological reading. You have become such an
OLD DOG in your missed training, coupled with your practice of
never reading the relevant portion of ANY thread before opening
your mouth, makes you an untrainable OLD DOG to teach any
new trick.

Just learn to stop barking, Anon Bob. That's the latest advice I
have for you.

The previous standing comment that your FREE TUITION had
been taken away was my mistake of over-estimation of your
capacity to learn, to think that if you had the free tuition you
could possibly learn. Sorry. That was one of the rare mistakes
I had made in this group. :-)

You are a completely untrainable old dog that barks with feet
in the mouth every time you post.

Hope that helps, as Greg Heath would tell you, in his fashion of
trying to teach Stepen when he was in the same training school
as you. LOL.

-- Reef Fish Bob.
Guest
Posted: Mon Nov 27, 2006 8:58 am
Reef Fish wrote:

Quote:
[snip]

The matter is very simple: since I was drunk, or simply ignorant,
or both, I confused the "Effect size" with the t-score, and I could
not understand why the t-score was so low (around 1.3).

In fact, I understand now that 1.3 is huge as an "Effect size", but
I did not know that before posting my question: I only noticed that
1.3 is not very significant as a t-score: this is what puzzled me.

Actually, the real value was t=22: I felt that it had to be very, very
large, but I was confused by the concept of Effect size d (that was
new for me) so I thought that the t-score had to be replaced
by this d, in cases like this!

Sorry for the misunderstanding.
And thanks to all those who helped me to understand the matter.
By the way, nobody who replied seems drunk to me.

Fabrizio Coppola
Istituto Scientia
Italy
Anon.
Posted: Mon Nov 27, 2006 9:11 am
Guest
Reef Fish wrote:
Quote:
Anon. wrote:
Reef Fish wrote:
Anon. wrote:
Reef Fish wrote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7
If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

I don't see the logic here: you seem to be suggesting that the data was
faked purely because all of the differences were in the same direction.
Do I understand you correctly?

snip
You completely MISUNDERSTOOD my point which I elaborated
in the rest of my post, as you always misunderstood everything
I have ever posted.

There was no sign or indication, from what the OP said or others
said, about the data being faked.

My point was that the CONCLUSION of "significant effect" was SO
OBVIOUS from the data, that it borders something that people
MIGHT think the data was faked. It was that "extremely obviously
significant" in the statistical sense that hardly required any
pedantic support.

The point was that everyone was seeking SUPPORT like a drunk
seek a lamp post, by statistical significance or p-value and other
academic mumbo jumbo that they seemed to have overlooked the
perfectly OBVIOUS result that the result was significant.

Ah, OK, but it wasn't clear: it would have helped if you make your main
point clear at the start, not several paragraphs in.

It was clear in my first sentence.

A sentence which makes no mention of significance tests, p-values etc.


Quote:
You argument reads
as "I think the data might be fake, because of the low p-value".

Why would I need any p-value to know that the effect was OBVIOUS?

Indeed: I totally agree.

Incidentally, the OP was asking about the effect size, not a test of the
null hypothesis anyway!

No, Anon Bob. You're the one who ALWAYS came late. NEVER
read any part of the thread, and immediately jumped into where I
entered and immediately made your NOISE showing that you hadn't
read anything and didn't understand anything I said.

I certainly read the title, and the original post, in which the OP wrote

"Suppose I want to estimate the effect of a treatment, with no control
group." Sounds like estimation, not hypothesis testing to me.

Oh, and the reason I didn't participate in the discussion last week was
because I was at a statistical workshop. With proper statisticians :-)

Bob

--
Bob O'Hara

Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org
Old Mac User
Posted: Mon Nov 27, 2006 10:22 am
Guest
Reef Fish...

I basically agree with you. My only reason for doing the paired t-test
was to illustrate to the OP the difference between "paired" and "not
paired" and the 1-column t-test. Plainly, with 10 out of 10 in one
direction, there is no justification for doing these calculations
merely to cite the value of a t-test or its p-value.

The first thing that struck me (concerning these data) is the rather
small amount of variation in the differences. If this were really data
from one of the "hard sciences" (physics, etc.) then I might expect
that. But now that we've learned these are hypothetical data from
experiments on humans, the modest amount of variation in the
differences looks artificial.
If I had know the source of these data (hypothetical or "real") I would
have raised a question about the small variation in the difference.

Regardless of "all the above", I'm now astonished that this thread has
turned into "discussions" about the number of significant figures in
reported p-values. Are those "discussants" programmed to claim that
(say) p = 0.049 is "not signficiant" and p = 0.051 is "significant"?

Sidebar comment: I have major problems with using expressions such as
"significant" and "not significant". I would write a publishable paper
on the reasons for this. I'll cite just one. I sometimes serve as an
expert witness in certain legal actions concerned with the validity of
patents. When I see a patent that proclaims that certain data is
"significant" or "not significant" then I know there's a good chance we
can give the patent owner great pain and probably render the patent
invalid. There are other reasons, but this is one card I delight in
playing.

I am horrified at how so many cite outcomes from "Cohen's Test" and a
multiplicity of others. It would be helpful if they would say "I have
this much data structure in the following manner... coming from work
done in (chemistry, human experiments, etc.) and I'm trying to
determine whether changing (independent variables) actually made a
difference in (fill in the blanks). What test or tests are appropriate
here?"

Every discipline (if they can be called that) seems to have their own
homebrewed "tests"... most trying to accomplish the same as other
homebrewed tests from other disciplines.
The characteristics of many of these data (Type I Error Rates, power,
etc.) are unknown.
In fact, the essential characteristics of many of these tests are
actually very poor when compared to well-established tests found in
generic statistical textbooks.

So the educations types have their own recipes. The psychologists have
their own. When a novice finds one of these in software or in a filthy
gutter, they just assume they have found a little miracle. This is a
sad state of affairs. OMU


Quote:
Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7

If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

Anything else that is done, is merely pedantic, for THIS
data set and this one only.

While there are worthwhile discussions IN GENERAL,
without specific data and CONTEXT of the physics
experiment, about tests and controls, etc., this is the
case that any discussion of a "significant" test or p-
value is of values ONLY to the drunks looking at the
lamp post for support.

What attracted me to this thread in the first place was the
mention of "Cohen's effect size", and then one of the Cohen
Quacks in these groups, Richard Ulrich.

I have yet to see one single idea of Cohen that is worth
consideration for his artificially contrived measures and
terms outside of the mainstream of statistics.

In this particular case, once the DATA is given, the rest of
the discussant (except perhaps Old Mac User) who
wanted to see the DATA in the first place and merely
completed the routine arithmetic to come to the obvious
conclusion, the rest of the discussants seemed to be
BLIND to the OBVIOUS, poking all around in search of
some objects for "support" as a drunk seeks support to
keep from rolling off his feet.

What DIFFERENCE does it make what p-value that
correspond, to two decimals or 4 decimals?

The fact that all 10 differences are positive has one
chance in 2^10 or 1024, even if there is no parametric
information to get support on some T-lamppost or
Z-lamppost.

OMU's t value of 22.85 for 9 d.f. merely gives a

pedantic and worthless p-value of less than .000000001.

"So what? " as that question had been asked in other
contexts in this thread, but irrelevant to the actual DATA
and experiment.

Does anyone really NEED that number before they can
see the VERY OBVIOUS, from the raw data, without
any calculation, except perhaps my mental note of 1
in 1024?

-- Reef Fish Bob.
Reef Fish
Posted: Mon Nov 27, 2006 10:46 am
Guest
Old Mac User wrote:
Quote:
Reef Fish...

I basically agree with you. My only reason for doing the paired t-test
was to illustrate to the OP the difference between "paired" and "not
paired" and the 1-column t-test.

Understood and acknowledged in my post.


Quote:
Plainly, with 10 out of 10 in one
direction, there is no justification for doing these calculations
merely to cite the value of a t-test or its p-value.

Agreed, especially in view of the OBVIOUS result.

Quote:

The first thing that struck me (concerning these data) is the rather
small amount of variation in the differences. If this were really data
from one of the "hard sciences" (physics, etc.) then I might expect
that. But now that we've learned these are hypothetical data from
experiments on humans, the modest amount of variation in the
differences looks artificial.
If I had know the source of these data (hypothetical or "real") I would
have raised a question about the small variation in the difference.

It wasn't my intent to suggest that the data was faked. Nor would
I ever reject such a POSSIBILITY no matter how real the data looked.
:-)

Quote:

Regardless of "all the above", I'm now astonished that this thread has
turned into "discussions" about the number of significant figures in
reported p-values. Are those "discussants" programmed to claim that
(say) p = 0.049 is "not signficiant" and p = 0.051 is "significant"?

Yes. In light of the DATA, that discussion was especially pedantic,
IMO.

Quote:

Sidebar comment: I have major problems with using expressions such as
"significant" and "not significant". I would write a publishable paper
on the reasons for this. I'll cite just one. I sometimes serve as an
expert witness in certain legal actions concerned with the validity of
patents. When I see a patent that proclaims that certain data is
"significant" or "not significant" then I know there's a good chance we
can give the patent owner great pain and probably render the patent
invalid. There are other reasons, but this is one card I delight in
playing.

That is something we have to LIVE with. The term statistical
signifiance
has a well-defined meaning in the N-P school of hypothesis testing.
The ordinary usage of the word "significant" is non-quantitative nor
quantifiable. I simply make a distinction of "statistically
significant"
from "practical usefulness" as two completely separate and DIFFERENT
concept. There are threads in the archives about those terms.

Quote:

I am horrified at how so many cite outcomes from "Cohen's Test" and a
multiplicity of others. It would be helpful if they would say "I have
this much data structure in the following manner... coming from work
done in (chemistry, human experiments, etc.) and I'm trying to
determine whether changing (independent variables) actually made a
difference in (fill in the blanks). What test or tests are appropriate
here?"

Unfortunately, Cohen is one of these folk heros in certain areas of
social sciences that seemed to have acquired the status of Majarishi
Mehesh Yogi. Whomever can quote Cohen gets a few automatic
brownie points from a certain sact of the non-statistical society,
while
I am sure they get NEGATIVE points from the mainstream statisticians
for valid reasons.

Quote:

Every discipline (if they can be called that) seems to have their own
homebrewed "tests"... most trying to accomplish the same as other
homebrewed tests from other disciplines.
The characteristics of many of these data (Type I Error Rates, power,
etc.) are unknown.
In fact, the essential characteristics of many of these tests are
actually very poor when compared to well-established tests found in
generic statistical textbooks.

So the educations types have their own recipes. The psychologists have
their own. When a novice finds one of these in software or in a filthy
gutter, they just assume they have found a little miracle. This is a
sad state of affairs. OMU

Yes, but somewhere along the line, when something COMPLETELY
OBVIOUS is showing itself into form of presumed "real DATA". one
ought to recognize the effect immediately and not go into the
auto-pilot of pedantic significance test and how many digits of
decimals have to be met kind of nonsense. The god-given BRAIN
cell (even if there are only a few of them for some) are to be used
for THINKING, and not memorizing formulas blindly or resorting
to something Cohen had written for whatever purpose he had.

The preceding paragraph was my POINT, and I believe it very
strongly, and that was all I was trying to convey in this thread.

-- Reef Fish Bob.
Quote:


Old Mac User wrote:
Subject Initial After Differences
1 50 38 12
2 39 30 9
3 36 28 8
4 52 41 11
5 54 44 10
6 40 31 9
7 35 27 8
8 49 40 9
9 55 45 10
10 53 42 11
Avgs 46.3 36.6 9.7

If I had been shown this "data" and asked about my
assessment of the "before and after" effect, I would
have suspected that the data was "manufacturered"
by someone who had never seen any real data to fake
some results to make a publication, as has often been
done.

I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.

Anything else that is done, is merely pedantic, for THIS
data set and this one only.

While there are worthwhile discussions IN GENERAL,
without specific data and CONTEXT of the physics
experiment, about tests and controls, etc., this is the
case that any discussion of a "significant" test or p-
value is of values ONLY to the drunks looking at the
lamp post for support.

What attracted me to this thread in the first place was the
mention of "Cohen's effect size", and then one of the Cohen
Quacks in these groups, Richard Ulrich.

I have yet to see one single idea of Cohen that is worth
consideration for his artificially contrived measures and
terms outside of the mainstream of statistics.

In this particular case, once the DATA is given, the rest of
the discussant (except perhaps Old Mac User) who
wanted to see the DATA in the first place and merely
completed the routine arithmetic to come to the obvious
conclusion, the rest of the discussants seemed to be
BLIND to the OBVIOUS, poking all around in search of
some objects for "support" as a drunk seeks support to
keep from rolling off his feet.

What DIFFERENCE does it make what p-value that
correspond, to two decimals or 4 decimals?

The fact that all 10 differences are positive has one
chance in 2^10 or 1024, even if there is no parametric
information to get support on some T-lamppost or
Z-lamppost.

OMU's t value of 22.85 for 9 d.f. merely gives a

pedantic and worthless p-value of less than .000000001.

"So what? " as that question had been asked in other
contexts in this thread, but irrelevant to the actual DATA
and experiment.

Does anyone really NEED that number before they can
see the VERY OBVIOUS, from the raw data, without
any calculation, except perhaps my mental note of 1
in 1024?

-- Reef Fish Bob.
David A. Heiser
Posted: Tue Nov 28, 2006 4:21 pm
Guest
Some very quick comments here to Reeffish:
Quote:
I don't need to check any arithmetic. I don't need to check
any statistical significance. One glance at the ALL positive
(and LARGE) differences made it unmistakable that the
result is "significant" in the sense of OBVIOUSLY there.
++++++++++++++++

Right on. The problem is all those journal editors and the vast cattle herd
of reviewers who require the obvious to be stated in statistical terms.
DAH




Quote:

Anything else that is done, is merely pedantic, for THIS
data set and this one only.

While there are worthwhile discussions IN GENERAL,
without specific data and CONTEXT of the physics
experiment, about tests and controls, etc., this is the
case that any discussion of a "significant" test or p-
value is of values ONLY to the drunks looking at the
lamp post for support.
++++++++++++++++++++++++++

Well we know now that the "drunks" are the editors and reviewers of
publications and the standards for publication established by the APA.
It still is a useful screen to find the "file drawer" papers. Remember that
each of the "drunks" looks at "statisitical significance" differently. One
has to satisfy the lowest common denomenator.
DAH
Quote:

What attracted me to this thread in the first place was the
mention of "Cohen's effect size", and then one of the Cohen
Quacks in these groups, Richard Ulrich.

I have yet to see one single idea of Cohen that is worth
consideration for his artificially contrived measures and
terms outside of the mainstream of statistics.
++++++++++++++++++++++++++++++++++++++

When a reviewer specifically asks for "effect sizes", will you put it in or
just tell the editor to "go-fish"?
Tenure in the University of California system requiies a "significant
number" of publications. Would you give up "tenure" to avoid using "effect
sizes"?
DAH


Quote:

In this particular case, once the DATA is given, the rest of
the discussant (except perhaps Old Mac User) who
wanted to see the DATA in the first place and merely
completed the routine arithmetic to come to the obvious
conclusion, the rest of the discussants seemed to be
BLIND to the OBVIOUS, poking all around in search of
some objects for "support" as a drunk seeks support to
keep from rolling off his feet.

What DIFFERENCE does it make what p-value that
correspond, to two decimals or 4 decimals?
Depends on what the standards are for the "publication" and publication

reviewers. Knowing that software package "X" calculates p values to a
minimum accuracy (used in Kahan's context) of 15 decimals (including leading
zero's) is a sort of "support" to the internal process of evaluating the
claims. How can you trust the results as being accurate when only 2 digits
are reported?

There are authors who made their reputatioin on the fact that software
packages calculate p values incorrectly.

There are great "gobs" of new software for statistics being invented and
being marketed, that are incompletely tested and are appearing to be used in
current publications. Commercial software advertising supports AMSTAT News.
There are a lot of bad algorithms out there.

If you have problems with Microsoft's Windows giving strange responses and
getting trapped in "loops" why then do you take your statistical software as
being "perfect"?

DAH

Quote:

The fact that all 10 differences are positive has one
chance in 2^10 or 1024, even if there is no parametric
information to get support on some T-lamppost or
Z-lamppost.

OMU's t value of 22.85 for 9 d.f. merely gives a

pedantic and worthless p-value of less than .000000001.

"So what? " as that question had been asked in other
contexts in this thread, but irrelevant to the actual DATA
and experiment.

Does anyone really NEED that number before they can
see the VERY OBVIOUS, from the raw data, without
any calculation, except perhaps my mental note of 1
in 1024?

Yes, the editors and reviewers want to see it.
DAH



Quote:

-- Reef Fish Bob.
Richard Ulrich
Posted: Wed Nov 29, 2006 2:50 am
Guest
On 27 Nov 2006 06:22:02 -0800, "Old Mac User"
<chendrixstats@yahoo.com> wrote:

Quote:
Reef Fish...

I basically agree with you. My only reason for doing the paired t-test

[snip]
Quote:

Regardless of "all the above", I'm now astonished that this thread has
turned into "discussions" about the number of significant figures in
reported p-values. Are those "discussants" programmed to claim that
(say) p = 0.049 is "not signficiant" and p = 0.051 is "significant"?

I'm surprised that anyone is "astonished" that a thread has
wandered to issues irrelevant to the original post. Your post
takes the wandering even further astray, and that is fine.

No one claimed anything about the exact 5% test, but my
own post warned that p-values generally are not typically
that precise. Further, I expect that there are many readers
who did not know the information that I passed along, to the
effect that p-levels (from parametric tests) smaller than 0.001
are increasingly bad estimates, merely on narrow statistical grounds.
(After the statistical grounds, there are "inferential" grounds.
That's a bigger topic.)

Quote:

Sidebar comment: I have major problems with using expressions such as
"significant" and "not significant". I would write a publishable paper
on the reasons for this. I'll cite just one. I sometimes serve as an
expert witness in certain legal actions concerned with the validity of
patents. When I see a patent that proclaims that certain data is
"significant" or "not significant" then I know there's a good chance we
can give the patent owner great pain and probably render the patent
invalid. There are other reasons, but this is one card I delight in
playing.

I am horrified at how so many cite outcomes from "Cohen's Test" and a
multiplicity of others. It would be helpful if they would say "I have

This slap at amateurs would be more impressive if there were
such a test as "Cohen's test." Googling the net gives me only 289
hits, and the first 20 are about Bernie Cohen and Baron-Cohen.
Neither of them is the Jacob Cohen who advocated "effect sizes"
as an alternative to reporting p-levels. (Which the original poster
confused with t-tests, in ways my post clarified. As the OP has
recently validated.)

Quote:
this much data structure in the following manner... coming from work
done in (chemistry, human experiments, etc.) and I'm trying to
determine whether changing (independent variables) actually made a
difference in (fill in the blanks). What test or tests are appropriate
here?"

Or, beyond testing, "How *big* is the effect, after you are sure
that there is one?"

Quote:

Every discipline (if they can be called that) seems to have their own
homebrewed "tests"... most trying to accomplish the same as other
homebrewed tests from other disciplines.
The characteristics of many of these data (Type I Error Rates, power,
etc.) are unknown.
In fact, the essential characteristics of many of these tests are
actually very poor when compared to well-established tests found in
generic statistical textbooks.

So the educations types have their own recipes. The psychologists have
their own. When a novice finds one of these in software or in a filthy
gutter, they just assume they have found a little miracle. This is a
sad state of affairs. OMU

I've seen posts that document how bad *industry* has been
with its "Six Sigma" testing, and so on, but I don't remember
*widespread* errors by psychologists or "education types" that
are comparable by being inherently wrong. Oh, Reef Fish has
posted claims to that end, but Reef Fish knows so little about
social sciences that he probably believed that there's a popular
"Cohen's test."

I *do* see folks doing a poor job of applying tests that
are well-established, but that is not the problem that you
described -- "their own recipes" where they assume "they
have found a little miracle". Is that a flight of rhetoric, or,
maybe, do you have a more concrete example?

[snip, rest]

--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Old Mac User
Posted: Wed Nov 29, 2006 10:56 am
Guest
RU...

You wrote...

"I've seen posts that document how bad *industry* has been
with its "Six Sigma" testing, and so on, but I don't remember
*widespread* errors by psychologists or "education types" that
are comparable by being inherently wrong. Oh, Reef Fish has
posted claims to that end, but Reef Fish knows so little about
social sciences that he probably believed that there's a popular
"Cohen's test."

I *do* see folks doing a poor job of applying tests that
are well-established, but that is not the problem that you
described -- "their own recipes" where they assume "they
have found a little miracle". Is that a flight of rhetoric, or,
maybe, do you have a more concrete example?"

End of a clip from your post and the beginning of my comments.

At the risk of prolonging the agony, IMHO there's nothing worse
than the ill-trained (but often egotistical) "Black Belts" and
"Belts of Many Colors" in industry. To be honest about it, my
thinking... and some of my rants... stem from watching "Belts"
do their thing.

My education and experience is in engineering and also statistics.
However, after I "retired" and began consulting full-time I've been
involved with a wide range of subjects. Let's begin with educators
and "No Child Left Behind". If you want to experience unbounded
misery, delve into that one. 49 states have their own tests to
"prove" they are meeting requirements. Data are collected and
evaluated in diverse ways so that comparisons among states are
virtually impossible. The whole point is to "prove" that NCLB is
working. Please send more money so we can make it work better.

The NY Times..."the newspaper of record" (they wish) selects
appropriate
articles for publication in their Sunday edition. Articles about
sports, Iraq,
science, medicine, etc. They do set standards because other newspapers
pick up those articles and re-publish them... usually in the following
week.
Careful reading of many of those articles suggests flaws in the designs
of
the trials from which there came forth data... medical data being one
of the
most troublesome. Time passes... we get more details... and other
articles (not necessarily in the NY Times) refute the early "findings".
No wonder the public is confused about "hormones" Imclone, etc.
We all know and will admit that getting valid data re: medical
treatments
is difficult and sometimes iffy. But the NY Times publishes those
articles with such force... and they do influence what is published
across the country the next week... that they often confuse more than
they enlighten. I've been in communication with a high-level person at
the
NY Times concerning this, and at times I believe my communications may
have helped them think before they jump. But then my ego may be
working
overtime because I don't see consistent improvement.

And the point is... everybody who has the price of a "software package"
has
now become a statistician. "Belts of Many Colors", starving graduate
students,
medical doctors, educators, and anyone who has access to data. While I
don't always agree with Reef Fish (I get annoyed with his negative
attitude)
I agree with him on this point. Much of the "statistical work" that is
done
today is of very low quality because the "analysts" have little insight
into
statistical principles. I would also add this... doing the
presentation in
multiple colors (maybe animation) in PowerPoint makes bad designs and
poor analyses look better. Just like putting lipstick on a pig.

Do I have concrete examples? Oh, yes. I have a large folder (papers)
of
homebrewed examples. Some of these would take the enamel off your teeth
and make your fingernails fall off. Most are from contacts with real
people
over the last 45 years. Do you want one or two? How about a study done
on coal samples in which the author began with 10 samples of coal from
diverse sources (different mines)... measured certain physical and
chemical
properties of the coals... ultimately building a model that is supposed
to
predict a certain property from the other properties. That model has
56
coefficients and an R-sq somewhat less than 1.00. Amazing!! Did you
see
that... all of this from just ten samples of coal. How did this genius
do that?
With computers and software, of course. A sort of regression
on principal components... but truly a homemade recipe. Fortunately
that
one was funded by the Canadian government. I you want a copy of this
one
I can snail mail it to you. There are actually two published
articles...
I've only cited the worst of the two.

Here's one I've seen several times... thanks to the "miracle" of
spreadsheets.
Begin with paired data (2 columns)... then rank the numbers in each
column
from smallest to largest... then take the paired differences and
calculate the
t-ratio for a 1-column t-test. Bingo... another miracle!! Whereas the
t-ratio for
the data in its native form was "weak", it is greatly enhanced by this
procedure.
It's bad enough to do this. Worse when the "inventor" of this
bastardized
method publishes it and touts it as a wonderful discovery. Ranking is
soooo
easy with a spreadsheet. This one should have been published in the
Journal of Irreproducible Results.

There's an entire folder devoted to "Taguchi" who confused and sucked a
lot of
money out of American industry (notably Ford, later GM) with his
"Taguchi
Designs" and warped methods of analysis. The "designs", of course,
came from
ancient Japanese games. Foolish Americans!!! Taguchi developed a cult
that followed his teachings until the quality of American cars finally
caused
the folks in Detroit to turn to more serious matters... like survival.

I'll not name an American-grown collection of strange statistical
wonders because
I'll surely get sued if I do. But there's another cult that follows
this one... all
created by one person.

Oh, yes, then there's the icon of the Global Warmers... the "Hockey
Stick". This
saga had its roots in a large data file (a spreadsheet) which, on
inspection, has
some numbers that simply make no sense at all. These errors are visible
to anyone who just looks at them. But apparently the authors of the
"Hockey Stick
curve" didn't bother to look... just bulled their way ahead with exotic
analyses of
"the data". That is, until someone finally got their attention. I
believe I still have
links to this stuff if you'd care to join the fray.

How about a dose of Neural Nets and some demos in which we fit random
numbers (from a deck of cards) to a "Net" in an attempt to stop foolish
people
from damaging their careers?

Do I have many more? I sure do. My "chamber of horrors" continues to
expand
monthly. Aside from accidents in handling data (copy/paste errors
abound)
the majority of these now comes from homemade recipes and/or a complete
lack of understanding of statistical principles.

So... "Is that a flight of rhetoric, or, maybe, do you have a more
concrete example?"
The problem is that I have too many concrete examples. The human mind
is
creative. Armed with piles of data... software (data mining,
anyone?)... perhaps
some low-paid grad students... and a dash of statistical know-how... we
can "analyze"
in ways our forefathers never imagined. OMU









Richard Ulrich wrote:
Quote:
On 27 Nov 2006 06:22:02 -0800, "Old Mac User"
chendrixstats@yahoo.com> wrote:

Reef Fish...

I basically agree with you. My only reason for doing the paired t-test

[snip]

Regardless of "all the above", I'm now astonished that this thread has
turned into "discussions" about the number of significant figures in
reported p-values. Are those "discussants" programmed to claim that
(say) p = 0.049 is "not signficiant" and p = 0.051 is "significant"?

I'm surprised that anyone is "astonished" that a thread has
wandered to issues irrelevant to the original post. Your post
takes the wandering even further astray, and that is fine.

No one claimed anything about the exact 5% test, but my
own post warned that p-values generally are not typically
that precise. Further, I expect that there are many readers
who did not know the information that I passed along, to the
effect that p-levels (from parametric tests) smaller than 0.001
are increasingly bad estimates, merely on narrow statistical grounds.
(After the statistical grounds, there are "inferential" grounds.
That's a bigger topic.)


Sidebar comment: I have major problems with using expressions such as
"significant" and "not significant". I would write a publishable paper
on the reasons for this. I'll cite just one. I sometimes serve as an
expert witness in certain legal actions concerned with the validity of
patents. When I see a patent that proclaims that certain data is
"significant" or "not significant" then I know there's a good chance we
can give the patent owner great pain and probably render the patent
invalid. There are other reasons, but this is one card I delight in
playing.

I am horrified at how so many cite outcomes from "Cohen's Test" and a
multiplicity of others. It would be helpful if they would say "I have

This slap at amateurs would be more impressive if there were
such a test as "Cohen's test." Googling the net gives me only 289
hits, and the first 20 are about Bernie Cohen and Baron-Cohen.
Neither of them is the Jacob Cohen who advocated "effect sizes"
as an alternative to reporting p-levels. (Which the original poster
confused with t-tests, in ways my post clarified. As the OP has
recently validated.)

this much data structure in the following manner... coming from work
done in (chemistry, human experiments, etc.) and I'm trying to
determine whether changing (independent variables) actually made a
difference in (fill in the blanks). What test or tests are appropriate
here?"

Or, beyond testing, "How *big* is the effect, after you are sure
that there is one?"


Every discipline (if they can be called that) seems to have their own
homebrewed "tests"... most trying to accomplish the same as other
homebrewed tests from other disciplines.
The characteristics of many of these data (Type I Error Rates, power,
etc.) are unknown.
In fact, the essential characteristics of many of these tests are
actually very poor when compared to well-established tests found in
generic statistical textbooks.

So the educations types have their own recipes. The psychologists have
their own. When a novice finds one of these in software or in a filthy
gutter, they just assume they have found a little miracle. This is a
sad state of affairs. OMU

I've seen posts that document how bad *industry* has been
with its "Six Sigma" testing, and so on, but I don't remember
*widespread* errors by psychologists or "education types" that
are comparable by being inherently wrong. Oh, Reef Fish has
posted claims to that end, but Reef Fish knows so little about
social sciences that he probably believed that there's a popular
"Cohen's test."

I *do* see folks doing a poor job of applying tests that
are well-established, but that is not the problem that you
described -- "their own recipes" where they assume "they
have found a little miracle". Is that a flight of rhetoric, or,
maybe, do you have a more concrete example?

[snip, rest]

--
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
 
Page 2 of 3    Goto page Previous  1, 2, 3  Next   All times are GMT - 5 Hours
The time now is Mon Sep 08, 2008 11:50 am