英国诺丁汉大学讲义如何估计随机效应模型.pptx

资源描述

MCMC Estimation for Random Effect Modelling The MLwiN experienceDr William J.BrowneSchool of Mathematical SciencesUniversity of NottinghamContentsRandom effect modelling,MCMC and MLwiN.Methods comparison Guatemalan child health example.Extendibility of MCMC algorithms:Cross classified and multiple membership models.Artificial insemination and Danish chicken examples.Further Extensions.Random effect modelsModels that account for the underlying structure in the dataset.Originally developed for nested structures(multilevel models),for example in education,pupils nested within schools.An extension of linear modelling with the inclusion of random effects.A typical 2-level model is Here i indexes pupils and j indexes schools.MLwiNSoftware package designed specifically for fitting multilevel models.Developed by a team led by Harvey Goldstein and Jon Rasbash at the Institute of Education in London over past 15 years or so.Earlier incarnations ML2,ML3,MLN.Originally contained classical IGLS estimation methods for fitting models.MLwiN launched in 1998 also included MCMC estimation.My role in team was as developer of MCMC functionality in MLwiN during 4.5 years at the IOE.Estimation Methods for Multilevel ModelsDue to additional random effects no simple matrix formulae exist for finding estimates in multilevel models.Two alternative approaches exist:1.Iterative algorithms e.g.IGLS,RIGLS,EM in HLM that alternate between estimating fixed and random effects until convergence.Can produce ML and REML estimates.2.Simulation-based Bayesian methods e.g.MCMC that attempt to draw samples from the posterior distribution of the model.MCMC AlgorithmConsider the 2-level modelMCMC algorithms work in a Bayesian framework and so we need to add prior distributions for the unknown parameters.Here there are 4 sets of unknown parameters:We will add prior distributions MCMC Algorithm(2)The algorithm for this model then involves simulating in turn from the 4 sets of conditional distributions.Such an algorithm is known as Gibbs Sampling.MLwiN uses Gibbs sampling for all normal response models.Firstly we set starting values for each group of unknown parameters,Then sample from the following conditional distributions,firstly To get .MCMC Algorithm(3)We next sample fromto get ,thento get ,then finallyTo get .We have then updated all of the unknowns in the model.The process is then simply repeated many times,each time using the previously generated parameter values to generate the next setBurn-in and estimatesBurn-in:It is general practice to throw away the first n values to allow the Markov chain to approach its equilibrium distribution namely the joint posterior distribution of interest.These iterations are known as the burn-in.Finding Estimates:We continue generating values at the end of the burn-in for another m iterations.These m values are then average to give point estimates of the parameter of interest.Posterior standard deviations and other summary measures can also be obtained from the chains.Methods for non-normal responsesWhen the response variable is Binomial or Poisson then different algorithms are required.IGLS/RIGLS methods give quasilikelihood estimates e.g.MQL,PQL.MCMC algorithms including Metropolis Hastings sampling and Adaptive Rejection sampling are possible.Numerical Quadrature can give ML estimates but is not without problems.So why use MCMC?Often gives better estimates for non-normal responses.Gives full posterior distribution so interval estimates for derived quantities are easy to produce.Can easily be extended to more complex problems.Potential downside 1:Prior distributions required for all unknown parameters.Potential downside 2:MCMC estimation is much slower than the IGLS algorithm.The Guatemalan Child Health dataset.This consists of a subsample of 2,449 respondents from the 1987 National Survey of Maternal and Child Helath,with a 3-level structure of births within mothers within communities.The subsample consists of all women from the chosen communities who had some form of prenatal care during pregnancy.The response variable is whether this prenatal care was modern(physician or trained nurse)or not.Rodriguez and Goldman(1995)use the structure of this dataset to consider how well quasi-likelihood methods compare with considering the dataset without the multilevel structure and fitting a standard logistic regression.They perform this by constructing simulated datasets based on the original structure but with known true values for the fixed effects and variance parameters.They consider the MQL method and show that the estimates of the fixed effects produced by MQL are worse than the estimates produced by standard logistic regression disregarding the multilevel structure!The Guatemalan Child Health dataset.Goldstein and Rasbash(1996)consider the same problem but use the PQL method.They show that the results produced by PQL 2nd order estimation are far better than for MQL but still biased.The model in this situation is In this formulation i,j and k index the level 1,2 and 3 units respectively.The variables x1,x2 and x3 are composite scales at each level because the original model contained many covariates at each level.Browne and Draper(2004)considered the hybrid Metropolis-Gibbs method in MLwiN and two possible variance priors(Gamma-1(,)and Uniform.Simulation ResultsThe following gives point estimates(MCSE)for 4 methods and 500 simulated datasets.Parameter(True)MQL1PQL2GammaUniform0(0.65)0.474(0.01)0.612(0.01)0.638(0.01)0.655(0.01)1(1.00)0.741(0.01)0.945(0.01)0.991(0.01)1.015(0.01)2(1.00)0.753(0.01)0.958(0.01)1.006(0.01)1.031(0.01)3(1.00)0.727(0.01)0.942(0.01)0.982(0.01)1.007(0.01)2v(1.00)0.550(0.01)0.888(0.01)1.023(0.01)1.108(0.01)2u(1.00)0.026(0.01)0.568(0.01)0.964(0.02)1.130(0.02)Simulation ResultsThe following gives interval coverage probabilities(90%/95%)for 4 methods and 500 simulated datasets.Parameter(True)MQL1PQL2GammaUniform0(0.65)67.6/76.886.2/92.086.8/93.288.6/93.61(1.00)56.2/68.690.4/96.292.8/96.492.2/96.42(1.00)13.2/17.684.6/90.888.4/92.688.6/92.83(1.00)59.0/69.685.2/89.886.2/92.288.6/93.62v(1.00)0.6/2.470.2/77.689.4/94.487.8/92.22u(1.00)0.0/0.021.2/26.884.2/88.688.0/93.0Summary of simulationsThe Bayesian approach yields excellent bias and coverage results.For the fixed effects,MQL performs badly but the other 3 methods all do well.For the random effects,MQL and PQL both perform badly but MCMC with both priors is much better.Note that this is an extreme scenario with small levels 1 in level 2 yet high level 2 variance and in other examples MQL/PQL will not be so bad.Extension 1:Cross-classified modelsForexample,schoolsbyneighbourhoods.Schoolswilldrawpupilsfrommanydifferentneighbourhoodsandthepupilsofaneighbourhoodwillgotoseveralschools.Nopurehierarchycanbefoundandpupilsaresaidtobecontainedwithinacross-classificationofschoolsbyneighbourhoods:nbhd 1nbhd 2Nbhd 3School 1xxxSchool 2xxSchool 3xxxSchool 4xxxxSchoolS1S2S3S4PupilP1P2P3P4P5P6P7P8P9P10P11P12NbhdN1N2N3NotationWithhierarchicalmodelsweuseasubscriptnotationthathasonesubscriptperlevelandnestingisimpliedreadingfromtheleft.Forexample,subscriptpatternijk denotestheithlevel1unitwithinthejthlevel2unitwithinthekthlevel3unit.Ifmodelsbecomecross-classifiedweusethetermclassificationinsteadoflevel.Withnotationthathasonesubscriptperclassification,thatcapturestherelationshipbetweenclassifications,notationcanbecomeverycumbersome.WeproposeanalternativenotationintroducedinBrowneetal.(2001)thatonlyhasasinglesubscriptnomatterhowmanyclassificationsareinthemodel.Single subscript notationSchoolS1S2S3S4PupilP1P2P3P4P5P6P7P8P9P10P11P12NbhdN1N2N3inbhd(i)sch(i)111221311422512622723833934102411341234WewritethemodelasWhereclassification2isneighbourhoodandclassification3isschool.Classification1alwayscorrespondstotheclassificationatwhichtheresponsemeasurementsaremade,inthiscasepatients.Forpupils1and11equation(1)becomes:Classification diagramsSchoolPupilNeighbourhoodSchoolPupilNeighbourhoodNestedstructurewhereschoolsarecontainedwithinneighbourhoodsCross-classifiedstructurewherepupilsfromaschoolcomefrommanyneighbourhoodsandpupilsfromaneighbourhoodattendseveralschools.Inthesinglesubscriptnotationweloseinformationabouttherelationship(crossedornested)betweenclassifications.Ausefulwayofconveyingthisinformationiswiththeclassification diagram.Whichhasonenodeperclassificationandnodeslinkedbyarrowshaveanestedrelationshipandunlinkednodeshaveacrossedrelationship.Example:Artificial insemination by donor 1901women279donors1328donations12100ovulatorycyclesresponseiswhetherconceptionoccursinagivencycleIntermsofaunitdiagram:DonorWomanCycleDonationOraclassificationdiagram:Model for artificial insemination dataWecanwritethemodelasParameterDescriptionEstimate(se)intercept-4.04(2.30)azoospermia*0.22(0.11)semenquality0.19(0.03)womensage35-0.30(0.14)spermcount0.20(0.07)spermmotility0.02(0.06)inseminationtoearly-0.72(0.19)inseminationtolate-0.27(0.10)womenvariance1.02(0.21)donationvariance0.644(0.21)donorvariance0.338(0.07)Results:Note cross-classified models can be fitted in IGLS but are far easier to fit using MCMC estimation.Extension 2:Multiple membership modelsWhenlevel1unitsaremembersofmorethanonehigherlevelunitwedescribeamodelforsuchdataasamultiplemembershipmodel.Forexample,Pupilschangeschools/classesandeachschool/classhasaneffectonpupiloutcomes.Patientsareseenbymorethanonenurseduringthecourseoftheirtreatment.NotationNotethatnurse(i)nowindexesthesetofnursesthattreatpatientiandw(2)i,j isaweightingfactorrelatingpatientitonursej.Forexample,withfourpatientsandthreenurses,wemayhavethefollowingweights:n1(j=1)n2(j=2)n3(j=3)p1(i=1)0.500.5p2(i=2)100p3(i=3)00.50.5p4(i=4)0.50.50Herepatient1wasseenbynurse1and3butnotnurse2andsoon.Ifwesubstitutethevaluesofw(2)i,j,iandj.fromthetableinto(2)wegettheseriesofequations:Classification diagrams for multiple membership relationshipsDoublearrowsindicateamultiplemembershiprelationshipbetweenclassifications.patientnurseWecanmixmultiplemembership,crossedandhierarchicalstructuresinasinglemodel.patientnursehospitalGPpracticeHerepatientsaremultiplemembersofnurses,nursesarenestedwithinhospitalsandGPpracticeiscrossedwithbothnurseandhospital.Example involving nesting,crossing and multiple membership Danish chickensProductionhierarchy10,127childflocks725houses304farmsBreedinghierarchy10,127childflocks200parentflocksChildflockHouseFarmParentflockAsaunitdiagram:Asaclassificationdiagram:Model and resultsParameterDescriptionEstimate(se)intercept-2.322(0.213)1996-1.239(0.162)1997-1.165(0.187)hatchery2-1.733(0.255)hatchery3-0.211(0.252)hatchery4-1.062(0.388)parentflockvariance0.895(0.179)housevariance0.208(0.108)farmvariance0.927(0.197)Results:Note multiple membership models can be fitted in IGLS and this model/dataset represents roughly the most complex model that the method can handle.Such models are far easier to fit using MCMC estimation.Further Extensions/Work in progress1.Multilevel factor models2.Response variables at different levels3.Missing data and multiple imputation4.ESRC grant:Sample size calculations,MCMC efficiency&Model identifiability5.Wellcome Fellowship grant for Martin GreenMultilevel factor analysis modellingIn sample surveys there are often many responses for each individual.Techniques like factor analysis are often used to identify underlying latent traits amongst these responses.Multilevel factor analysis allows factor analysis modelling to identify factors at various levels/classifications in the dataset so we can identify shared latent traits as well as individual level traits.Due to the nature of MCMC algorithms by adding a step to allow for multilevel factor models in MLwiN,cross-classified models can also be fitted without any additional programming!See Goldstein and Browne(2002,2005)for more detail.Responses at different levelsIn a medical survey some responses may refer to patients in a hospital while others may refer to the hospital itself.Models that combine these responses can be fitted using the IGLS algorithm in MLwiN and shouldnt pose any problems to MCMC estimation.The Centre for Multilevel modelling in Bristol are investigating such models as part of their LEMMA node in the ESRC research methods program.I am a named collaborator for the Lemma project.They are also looking at MCMC algorithms for latent growth models.Missing data and multiple imputationMissing data is proliferate in survey research.One popular approach to dealing with missing data is multiple imputation(Rubin 1987)where several imputed datasets are created and then the model of interest is fitted to each dataset and the estimates combined.Using a multivariate normal response multilevel model to generate the imputations using MCMC in MLwiN is described in chapter 17 of Browne(2003)James Carpenter(LSHTM)has begun work on macros in MLwiN that automate the multiple imputation procedure.Sample size calculationsAnother issue in data collection is how big a sample do we need to collect?Such sample size calculations have simple formulae if we can assume that an independent sample can be generated.If however we wish to account for the data structure in the calculation then things are more complex.One possibility is a simulation-based approach similar to that used in the model comparisons described earlier where many datasets are simulated to look at the power for a fixed sample size.Mousa Golalizadeh Lehi will be joining me in February on an ESRC grant looking at such an approach.A 4th year MMath.student(Lynda Leese)is looking at the approach for nested models.Efficient MCMC algorithmsIn MLwiN we have tended to use the simplest,most generally applicable MCMC algorithms for multilevel models.For particular models there are many approaches that may improve the performance/mixing of the MCMC algorithm.We will also investigate some of these methods in the ESRC grant.Browne(2004)looked at some reparameterisation methods for cross-classified models in a bird nesting dataset.A second 4th year MMath.student(Francis Bourchier)is looking at MCMC methods based around the IGLS representation of nested models which are interesting.Model IdentifiabilityThe final part of the ESRC grant is to look at whether a model is identifiable/estimable given a particular set of data.Cross-classified datasets where there are few level 1 units per higher level unit can result in each observation being factored into several random effects with very few observations being used to estimate each random effect.We are interested in establishing whether we can really estimate all parameters in such models.An example where we cant would be a dataset of patients who are attended by doctors in wards.Now if there is only one doctor per ward and likewise one ward per doctor then we cannot tease out doctor and ward effects.Again this work was motivated by a bird nesting dataset.Wellcome FellowshipMartin Green has been successful in obtaining 4 years of funding from Wellcome to come and work with me.The project is entitled Use of Bayesi

展开阅读全文