Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Basic stats question
Page 1 of 1    
Author Message
S-boy
Posted: Thu May 01, 2008 3:35 am
Guest
I'm a journalist and non-stats guy but I have a problem I think could
be easily solved with stats.

I have a database of inspection reports of a certain type of
machinery. The inspections measure degree of errors found in the
machines. If an error is within the tolerance zone of -99 to 99, no
error is reported and the machine is recorded as passed inspection.
But if it falls outside the tolerance an error is recorded and grouped
with others in the same range. So, an error of 261 will be counted
within the 200 to 299 range.

The majority of machines pass their inspections and fall within the
+/- 100 tolerance zone. But those that fall outside the tolerance
skew towards the higher end of the scale. That is, there are about 3
times more machines that score above 100 than those that score below
-100.

My question: How can I predict an average value for the machines that
fall within the tolerance, based on the pattern of those outside the
tolerance? Could I put them on a curve?

Given that those outside tolerance skew high, I expect those within
the tolerance would show a similar pattern. There should be more
machines scoring between 0 and 99 than those between 0 and -99. I need
to find out the average value of these within the tolerance. I don't
have to be bang-on accurate. Just looking for a ballpark figure.

Any suggestions on how to do this in Excel would be greatly
appreciated.

This is my data set if anyone wants to take a crack.

ERROR #OF MACHINES
900 and > 132
800 to 899 20
700 to 799 24
600 to 699 36
500 to 599 92
400 to 499 185
300 to 399 328
200 to 299 1522
100 to 199 7819
-99 to 99 191561 (WITHIN TOLERANCE)
-100 to -199 2260
-200 to -299 653
-300 to -399 251
-400 to -499 119
-500 to -599 90
-600 to -699 34
-700 to -799 32
-800 to -899 19
-900 and < 150

Thanks,

Glen McGregor
S-boy
Posted: Thu May 01, 2008 4:12 pm
Guest
Unfortunately, no. I don't have data with more detailed breakdowns.
Thanks for the feedback, anyway.

For future reference, how would I do this if I did have better data?


On May 1, 7:09 pm, "Phil Holman" <piholmanc@yourservice> wrote:
Quote:
"S-boy" <sushibo...@gmail.com> wrote in message

news:0159b865-f3da-4a70-ac47-850e8755ed97@a23g2000hsc.googlegroups.com...



I'm a journalist and non-stats guy but I have a problem I think could
be easily solved with stats.

I have a database of inspection reports of a certain type of
machinery. The inspections measure degree of errors found in the
machines. If an error is within the tolerance zone of -99 to 99, no
error is reported and the machine is recorded as passed inspection.
But if it falls outside the tolerance an error is recorded and grouped
with others in the same range. So, an error of 261 will be counted
within the 200 to 299 range.

The majority of machines pass their inspections and fall within the
+/- 100 tolerance zone. But those that fall outside the tolerance
skew towards the higher end of the scale. That is, there are about 3
times more machines that score above 100 than those that score below
-100.

My question: How can I predict an average value for the machines that
fall within the tolerance, based on the pattern of those outside the
tolerance? Could I put them on a curve?

Given that those outside tolerance skew high, I expect those within
the tolerance would show a similar pattern. There should be more
machines scoring between 0 and 99 than those between 0 and -99. I need
to find out the average value of these within the tolerance. I don't
have to be bang-on accurate. Just looking for a ballpark figure.

Any suggestions on how to do this in Excel would be greatly
appreciated.

This is my data set if anyone wants to take a crack.

ERROR #OF MACHINES
900 and > 132
800 to 899 20
700 to 799 24
600 to 699 36
500 to 599 92
400 to 499 185
300 to 399 328
200 to 299 1522
100 to 199 7819
-99 to 99 191561 (WITHIN TOLERANCE)
-100 to -199 2260
-200 to -299 653
-300 to -399 251
-400 to -499 119
-500 to -599 90
-600 to -699 34
-700 to -799 32
-800 to -899 19
-900 and < 150

Thanks,

Glen McGregor

94% of your distribution (almost +/- 2 standard deviations) is contained
in one central interval of your data which makes it impossible to
comment on the actual shape. Relying on only 6% of your data at the
extremes is not very reliable.

Is it possble to divide all of the data into intervals of 25 (0-25,
26-50 etc)? Then, input the data into excel on a single row, highlight
it, and then click on the chartwizard icon and view it as a column
chart.

Phil H
Phil Holman
Posted: Thu May 01, 2008 6:09 pm
Guest
"S-boy" <sushiboy21@gmail.com> wrote in message
news:0159b865-f3da-4a70-ac47-850e8755ed97@a23g2000hsc.googlegroups.com...
Quote:
I'm a journalist and non-stats guy but I have a problem I think could
be easily solved with stats.

I have a database of inspection reports of a certain type of
machinery. The inspections measure degree of errors found in the
machines. If an error is within the tolerance zone of -99 to 99, no
error is reported and the machine is recorded as passed inspection.
But if it falls outside the tolerance an error is recorded and grouped
with others in the same range. So, an error of 261 will be counted
within the 200 to 299 range.

The majority of machines pass their inspections and fall within the
+/- 100 tolerance zone. But those that fall outside the tolerance
skew towards the higher end of the scale. That is, there are about 3
times more machines that score above 100 than those that score below
-100.

My question: How can I predict an average value for the machines that
fall within the tolerance, based on the pattern of those outside the
tolerance? Could I put them on a curve?

Given that those outside tolerance skew high, I expect those within
the tolerance would show a similar pattern. There should be more
machines scoring between 0 and 99 than those between 0 and -99. I need
to find out the average value of these within the tolerance. I don't
have to be bang-on accurate. Just looking for a ballpark figure.

Any suggestions on how to do this in Excel would be greatly
appreciated.

This is my data set if anyone wants to take a crack.

ERROR #OF MACHINES
900 and > 132
800 to 899 20
700 to 799 24
600 to 699 36
500 to 599 92
400 to 499 185
300 to 399 328
200 to 299 1522
100 to 199 7819
-99 to 99 191561 (WITHIN TOLERANCE)
-100 to -199 2260
-200 to -299 653
-300 to -399 251
-400 to -499 119
-500 to -599 90
-600 to -699 34
-700 to -799 32
-800 to -899 19
-900 and < 150

Thanks,

Glen McGregor

94% of your distribution (almost +/- 2 standard deviations) is contained
in one central interval of your data which makes it impossible to
comment on the actual shape. Relying on only 6% of your data at the
extremes is not very reliable.

Is it possble to divide all of the data into intervals of 25 (0-25,
26-50 etc)? Then, input the data into excel on a single row, highlight
it, and then click on the chartwizard icon and view it as a column
chart.

Phil H
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Tue May 13, 2008 3:20 pm