Main Page | Report this Page
 
   
Science Forum Index  »  Space - Consult Forum  »  non random missing data
Page 1 of 1    
Author Message
Stephen
Posted: Thu Jan 25, 2007 9:31 pm
Guest
Folks - There has been much progress made on dealing with missing data when
it is MAR or MCAR, but what is the 'best practice' when missingness is not
either of these. More specifically, what is the best thing to do when
particular sections of a questionnaire are not completed, but this does not
appear to be a random event (due to specific parts not having been
completed), although there appears to be no identifiable characteristics
that differentiate responders from non-responders? The percentage of
non-responders for any particular section is quite small (between 1% and
5%).

Ideally I would like to use one consistent sample for the entire analysis,
not have a slightly different sample for each section.

Imputation would seem to making quite heroic assumptions. Listwise deletion
is also making heroic assumptions, but reduces power (marginally) making sig
testing slightly more conservative. Of course, I would not know how my
estimates are being biased.

I am aware there are other complex issues at work here, but any general
comments about how best to proceed with current approaches will be most
appreciated.

:)

Stephen
Gordon Sande
Posted: Fri Jan 26, 2007 10:17 am
Guest
On 2007-01-25 21:31:13 -0400, "Stephen" <stephenc@powerup.com.au> said:

Quote:
Folks - There has been much progress made on dealing with missing data when
it is MAR or MCAR, but what is the 'best practice' when missingness is not
either of these. More specifically, what is the best thing to do when
particular sections of a questionnaire are not completed, but this does not
appear to be a random event (due to specific parts not having been
completed), although there appears to be no identifiable characteristics
that differentiate responders from non-responders? The percentage of
non-responders for any particular section is quite small (between 1% and
5%).

Welcome to the real world where bias is the problem.

Some would suggest that you try to do some followup. Real followup with
folks who think like anthropologists rather than just hourly paid
telemarketers. You might then discover the characteristics that make
folks more prone to skip sections. Not cheap but then bad data may be
no bargain either.

Quote:
Ideally I would like to use one consistent sample for the entire analysis,
not have a slightly different sample for each section.

Imputation would seem to making quite heroic assumptions. Listwise deletion
is also making heroic assumptions, but reduces power (marginally) making sig
testing slightly more conservative. Of course, I would not know how my
estimates are being biased.

I am aware there are other complex issues at work here, but any general
comments about how best to proceed with current approaches will be most
appreciated.

:)

Stephen
Bill H
Posted: Fri Jan 26, 2007 11:38 am
Guest
I believe the best practice is multiple imputation, which gives you
good estimates of means and variance, and only assumes MAR, which means
that missingness for a given item may be conditional on measured
variables in the model, eg. from other sections. One to 5% missing in
a section is not that bad. I assume the problem comes when you analyze
data across sections and the pattern of missingness rises to over 10 or
20%. If it is only 1-5% overall, then I don't think missingness is a
problem, use listwise deletion. With multiple imputation, you run the
imputation model once, get your datasets, then analyze to your heart's
content. As far as making herioc assumptions, sure, if the missing
data depends on variables not measured and thus not in the model, you
have a problem. You may try a sensitivity analysis to judge the extent
of the problem, eg. artificially inflate or deflate the missing data
and see how it effects your estimates of interest. But it sounds like
you have no hypothesis about the missing data mechanism, only a hunch
that it might be NMAR. Remember that MCAR is a stronger assumption
than MAR. MCAR is the "random event" type of missing data. MAR is
missingness conditional on data in the model. Maybe your data are MAR?
Maybe the glass is half full? Bill H, MS Epi



On Jan 25, 7:31 pm, "Stephen" <steph...@powerup.com.au> wrote:
Quote:
Folks - There has been much progress made on dealing with missing data when
it is MAR or MCAR, but what is the 'best practice' when missingness is not
either of these. More specifically, what is the best thing to do when
particular sections of a questionnaire are not completed, but this does not
appear to be a random event (due to specific parts not having been
completed), although there appears to be no identifiable characteristics
that differentiate responders from non-responders? The percentage of
non-responders for any particular section is quite small (between 1% and
5%).

Ideally I would like to use one consistent sample for the entire analysis,
not have a slightly different sample for each section.

Imputation would seem to making quite heroic assumptions. Listwise deletion
is also making heroic assumptions, but reduces power (marginally) making sig
testing slightly more conservative. Of course, I would not know how my
estimates are being biased.

I am aware there are other complex issues at work here, but any general
comments about how best to proceed with current approaches will be most
appreciated.

:)

Stephen
Stephen
Posted: Sun Jan 28, 2007 4:30 am
Guest
"Bill H" <whowells@yahoo.com> wrote in message
news:1169825914.938347.189630@q2g2000cwa.googlegroups.com...
Quote:
I believe the best practice is multiple imputation, which gives you
good estimates of means and variance, and only assumes MAR, which means
that missingness for a given item may be conditional on measured
variables in the model, eg. from other sections. One to 5% missing in
a section is not that bad. I assume the problem comes when you analyze
data across sections and the pattern of missingness rises to over 10 or
20%. If it is only 1-5% overall, then I don't think missingness is a
problem, use listwise deletion. With multiple imputation, you run the
imputation model once, get your datasets, then analyze to your heart's
content. As far as making herioc assumptions, sure, if the missing
data depends on variables not measured and thus not in the model, you
have a problem. You may try a sensitivity analysis to judge the extent
of the problem, eg. artificially inflate or deflate the missing data
and see how it effects your estimates of interest. But it sounds like
you have no hypothesis about the missing data mechanism, only a hunch
that it might be NMAR. Remember that MCAR is a stronger assumption
than MAR. MCAR is the "random event" type of missing data. MAR is
missingness conditional on data in the model. Maybe your data are MAR?
Maybe the glass is half full? Bill H, MS Epi

Thanks for response.

I do have a not easily tested hypothesis about why the data is missing, but
it would require follow up, or more precisely, other sets of data to be
collected. (Current data is deidentified, so the 'conditional' variable
can't be retro-fitted).

I am aware MI (and ML imputation) both work reasonably well in my
circumstance IF I have MAR, but I would only be assuming that is the case. I
had read that work was progressing at attempts to model other forms of
missingness that are not MAR, but have not seen anything published. It was
that reason I posted - in case I had missed something new in the literature.


Quote:
On Jan 25, 7:31 pm, "Stephen" <steph...@powerup.com.au> wrote:
Folks - There has been much progress made on dealing with missing data
when
it is MAR or MCAR, but what is the 'best practice' when missingness is
not
either of these. More specifically, what is the best thing to do when
particular sections of a questionnaire are not completed, but this does
not
appear to be a random event (due to specific parts not having been
completed), although there appears to be no identifiable characteristics
that differentiate responders from non-responders? The percentage of
non-responders for any particular section is quite small (between 1% and
5%).

Ideally I would like to use one consistent sample for the entire
analysis,
not have a slightly different sample for each section.

Imputation would seem to making quite heroic assumptions. Listwise
deletion
is also making heroic assumptions, but reduces power (marginally) making
sig
testing slightly more conservative. Of course, I would not know how my
estimates are being biased.

I am aware there are other complex issues at work here, but any general
comments about how best to proceed with current approaches will be most
appreciated.

:)

Stephen
Stephen
Posted: Sun Jan 28, 2007 4:39 am
Guest
"Gordon Sande" <g.sande@worldnet.att.net> wrote in message
news:2007012610170875249-gsande@worldnetattnet...
Quote:
On 2007-01-25 21:31:13 -0400, "Stephen" <stephenc@powerup.com.au> said:

Folks - There has been much progress made on dealing with missing data
when
it is MAR or MCAR, but what is the 'best practice' when missingness is
not
either of these. More specifically, what is the best thing to do when
particular sections of a questionnaire are not completed, but this does
not
appear to be a random event (due to specific parts not having been
completed), although there appears to be no identifiable characteristics
that differentiate responders from non-responders? The percentage of
non-responders for any particular section is quite small (between 1% and
5%).

Welcome to the real world where bias is the problem.

Some would suggest that you try to do some followup. Real followup with
folks who think like anthropologists rather than just hourly paid
telemarketers. You might then discover the characteristics that make
folks more prone to skip sections. Not cheap but then bad data may be
no bargain either.

Thanks for the (quite belated) welcome. I agree with you about follow up,
and the cost, and value of bad data. Unfortunately cost is part of the real
word also (even for we non-telemarketers). :)


Quote:

Ideally I would like to use one consistent sample for the entire
analysis,
not have a slightly different sample for each section.

Imputation would seem to making quite heroic assumptions. Listwise
deletion
is also making heroic assumptions, but reduces power (marginally) making
sig
testing slightly more conservative. Of course, I would not know how my
estimates are being biased.

I am aware there are other complex issues at work here, but any general
comments about how best to proceed with current approaches will be most
appreciated.

:)

Stephen

 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Wed Dec 03, 2008 3:43 pm