ImageVerifierCode 换一换
格式:PDF , 页数:11 ,大小:195.67KB ,
资源ID:3631370      下载积分:10 金币
验证码下载
登录下载
邮箱/手机:
验证码: 获取验证码
温馨提示:
支付成功后,系统会自动生成账号(用户名为邮箱或者手机号,密码是验证码),方便下次登录下载和查询订单;
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

开通VIP
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.zixin.com.cn/docdown/3631370.html】到电脑端继续下载(重复下载【60天内】不扣币)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  
声明  |  会员权益     获赠5币     写作写作

1、填表:    下载求助     索取发票    退款申请
2、咨信平台为文档C2C交易模式,即用户上传的文档直接被用户下载,收益归上传人(含作者)所有;本站仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿,我们不确定上传用户享有完全著作权,根据《信息网络传播权保护条例》,如果侵犯了您的版权、权益或隐私,请联系我们,核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
3、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确),网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据,平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺,下载前须认真查看,确认无误后再购买,务必慎重购买;若有违法违纪将进行移交司法处理,若涉侵权平台将进行基本处罚并下架。
4、本站所有内容均由用户上传,付费前请自行鉴别,如您付费,意味着您已接受本站规则且自行承担风险,本站不进行额外附加服务,虚拟产品一经售出概不退款(未进行购买下载可退充值款),文档一经付费(服务费)、不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
5、如你看到网页展示的文档有www.zixin.com.cn水印,是因预览和防盗链等技术需要对页面进行转换压缩成图而已,我们并不对上传的文档进行任何编辑或修改,文档下载后都不会有水印标识(原文档上传前个别存留的除外),下载后原文更清晰;试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓;PPT和DOC文档可被视为“模板”,允许上传人保留章节、目录结构的情况下删减部份的内容;PDF文档不管是原文档转换或图片扫描而得,本站不作要求视为允许,下载前自行私信或留言给上传者【自信****多点】。
6、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用;网站提供的党政主题相关内容(国旗、国徽、党徽--等)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
7、本文档遇到问题,请及时私信或留言给本站上传会员【自信****多点】,需本站解决可联系【 微信客服】、【 QQ客服】,若有其他问题请点击或扫码反馈【 服务填表】;文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“【 版权申诉】”(推荐),意见反馈和侵权处理邮箱:1219186828@qq.com;也可以拔打客服电话:4008-655-100;投诉/维权电话:4009-655-100。

注意事项

本文(不可忽略的无响应缺失下的协变量选择.pdf)为本站上传会员【自信****多点】主动上传,咨信网仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知咨信网(发送邮件至1219186828@qq.com、拔打电话4008-655-100或【 微信客服】、【 QQ客服】),核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载【60天内】不扣币。 服务填表

不可忽略的无响应缺失下的协变量选择.pdf

1、应用概率统计第 40 卷第 2 期2024 年 4 月Chinese Journal of Applied Probability and StatisticsApr.,2024,Vol.40,No.2,pp.287-297doi:10.3969/j.issn.1001-4268.2024.02.005Covariate Selection under Nonignorable NonresponseSHAO Jun(School of Statistics,East China Normal University,Shanghai,200062,China;Department of Sta

2、tistics,University of Wisconsin-Madison,Madison,WI 53706,USA)WANG Lei(School of Statistics and Data Science&LPMC,Nankai University,Tianjin,300071,China)Abstract:This paper aims at developing a covariate selection approach for high-dimensionalcovariate vector in the presence of nonignorable nonrespon

3、se.Because of nonignorable missingresponses,a novel covariate selection method has to be developed to eliminate covariates associatedwith neither the response variable nor the nonresponse mechanism.Once the redundant covariatesare removed,existing methods for propensity estimation and other analyses

4、 by inverse propensityweighting can be applied.We provide some simulation results to show the effectiveness of ourapproach.Keywords:created responses;high dimensionality;missing not at random;propensity;semi-parametric method2020 Mathematics Subject Classification:primary 62D19;secondary 62F07;62G20

5、Citation:SHAO J,WANG L.Covariate selection under nonignorable nonresponseJ.Chinese JAppl Probab Statist,2024,40(2):287297.1IntroductionHigh-dimensional covariate vector is often encountered in many fields of modern sci-entific research,such as signal processing,biomedical and functional magnetic res

6、onanceimaging,and finance,where only a small number of covariates are actually related withthe response of interest.Various covariate selection procedures have been developed toreduce the dimensionality of covariate vector114.The problem becomes more challeng-ing when the response of interest has no

7、nresponse that is nonignorable in the sense thatthe probability of nonresponse conditioned on the response and covariates depends on thevalue of response.Lei Wangs research was supported by the Fundamental Research Funds for the Central Universitiesand the National Natural Science Foundation of Chin

8、a(Grant No.12271272).Corresponding author,E-mail:shaostat.wisc.edu.Received November 27,2023.Revised January 29,2024.288Chinese Journal of Applied Probability and StatisticsVol.40In the presence of a high-dimensional covariate vector and a response having nonig-norable nonresponse,this paper aims at

9、 developing a covariate selection method to selectuseful covariates for predicting response and handling nonresponse.The main difficulty inthis problem is how to carry out variable selection in the presence of nonignorable missingresponse values.We apply a roundabout approach based on a decompositio

10、n for the con-ditional density of response and its indicator of observing given covariates in terms of otherdensities that can be used to perform variable selection in the presence of nonignorablemissing responses.Our approach can be implemented using any existing covariate selec-tion procedure appl

11、ied to a created“response”(a function of response and its indicator ofobserving)without any missing value.Once covariates that are related with neither theresponse variable nor the nonresponse mechanism are eliminated,analysis can be carriedout using the approach in either 15 or 16.After we develop

12、the methodology in Section2,we carry out some simulations for illustration of our proposed procedure.2MethodologyLet Y be the response variable of interest and X be the associated covariate vectorwith possibly high dimension.Suppose that the value of X is always observed but thevalue of Y may be mis

13、sing(nonresponse).Let be the observed binary indicator ofwhether Y is observed or not.Throughout we assume thatP(=1|Y,X)=11+expg(X)+Y,(1)where is an unknown parameter and g is an unknown and unspecified function.Theprobability function in(1)is referred to as nonresponse propensity or simply propensi

14、ty.If =0,the propensity actually does not depend on Y that may be missing and,thus,the nonresponse of Y is ignorable.When =0,the propensity depends on Y regardlessof what g(X)is,and nonresponse is nonignorable.Finally,the propensity model(1)isparametric in Y and nonparametric in X as g is unspecifie

15、d.2.1Covariate SelectionWithout nonresponse,covariate selection amounts to finding the smallest subset XYof X such that Y X|XY,i.e.,Y and X are independent conditional on XY,sincecovariates in X but not in XYare useless.When Y has ignorable nonresponse,i.e.,thedistribution of Y|X is the same as that

16、 of Y|X,=1,any covariate selection methodfor the case without nonresponse can be applied using data observed from(Y,X,=1)No.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse289and we do not need to find out which covariates are related with.In the presence ofnonignorable nonrespons

17、e,however,finding XYis not enough for analysis17,because acovariate not related with Y but related with is still useful.Hence,we need to find thesmallest subset XY,of X such that(,Y)X|XY,.(2)Covariates in X but not in XY,are related with neither response Y nor nonresponseindicator and,hence,should b

18、e eliminated because they are useless.However,XY,cannot be directly obtained through existing covariate selection meth-ods for the case with nonignorable nonresponse because Y is not available when =0 andthe distributions of Y|X and Y|X,=1 are different.To overcome this difficulty,weutilize the iden

19、tityf(,Y|X)=f(|Y,X)f(Y|X)=f(|Y,X)P(=1|X)f(Y|X,=1)P(=1|Y,X),where f(|)is a generic notation for conditional probability density.This expression andassumption(1)imply that XY,XY|=1 X g(X),where g(X)is given in(1)andXY|=1and Xare the smallest subsets of X such thatY X|XY|=1,=1and X|X.(3)It follows from

20、(1)thatexpg(X)=E(1|X)E(eY|X).The key of our approach is to create a new“response”W=eYor W=Y ifP(Y=0|X)=0 almost surely.Note that W=ieY=eY(or Y)if Y is observed,andW=eY=0 if Y is missing regardless of what the value Y is.Hence,the new createdresponse W does not have any missing value.Let XWbe the sma

21、llest subset of X such thatW X|XW.(4)Then,g(X)X XWand,hence,XY,XY|=1 X XW.Since W is a functionof(,Y),XW XY,.We now establish the following nice result useful for covariate selection to find the setXY,.290Chinese Journal of Applied Probability and StatisticsVol.40Theorem 1LetXY,XY|=1,X,andXWbe defin

22、ed in(2)(4).ThenXW=XY|=1 Xand,consequently,XY,=XW.ProofWe prove the result for W=eY,since the proof for W=Y is almost thesame.For any t 0,P(W 6 t|X)=P(=1,eY6 t|X)+P(=0|X)=P(Y 6 lnt|X,=1)P(=1|X)+P(=0|X),where lnt is defined to be when t=0.Hence,XW XY|=1 X.On the otherhand,if(4)holds,thenP(=0|X)=P(W=0

23、|X)=P(W=0|XW)=P(=0|XW),which implies that X XW.This implies thatP(W 6 t|XW)=P(Y 6 lnt|XW,=1)P(=1|XW)+P(=0|XW)=P(Y 6 lnt|XW,=1)P(=1|X)+P(=0|X).On the other hand,if XWsatisfies(4),thenP(W 6 t|XW)=P(W 6 t|X)=P(Y 6 lnt|X,=1)P(=1|X)+P(=0|X).This shows that P(Y 6 lnt|XW,=1)=P(Y 6 lnt|X,=1),i.e.,XY|=1 XW.T

24、hus,XY|=1 X XWand,hence,XW=XY|=1 X.Since we previously showedthat XW XY,XY|=1 X XW,we conclude that XY,=XW.This completesthe proof.?Similar to our Theorem 1,Zheng et al.18derived a result for finding the centraldimension reduction linear space containing linear combinations of XY,.Their conditionsar

25、e the same as ours,except that they consider W=Y and must assume P(Y=0|X)=0.The results on dimension reduction(in 18)and covariate slection(in our paper)areboth useful in applications;the former finds linear combinations of XY,whereas the latterselects variables in XY,.One advantage of covariate sel

26、ection is that the result is easierinterpret.The only condition for Theorem 1 is the semi-parametric propensity model(1).Theresult in Theorem 1 is not robust against condition(1),although it is not a serious as-sumption.No.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse2912.2Meth

27、od of Finding XWAccording to Theorem 1,any existing model-free covariate selection method usingfully observed data can be directly applied to find XWbased on observed data Wi,Xi,i=1,2,n without any missing value,where i,Yi,Xi,i=1,2,n is a randomsample from the population of(,Y,X),Wi=ieYi=eYi(or Yi)i

28、f Yiis observed,andWi=ieYi=0 if Yiis missing.To complete our step of finding XW,we propose to apply the distance correlationbased sure independent screening(DC-SIS)9,14.For Xk=the kth component of X,k=1,2,p,the marginal distance correlation between Xkand W is defined ask=dcov(Xk,W)dcov(Xk,Xk)dcov(W,

29、W),where dcov(u,v)represents the distance covariance between two random variables u andv defined asdcov(u,v)=E|u e u|v e v|+E|u e u|E|v e v|+2EE(|u e u|u)E(|v e v|v),with(e u,e v)being an independent copy of(u,v).It can be estimated by the followingsample distance correlation between Xkand W:b k=ddc

30、ov(Xk,W)ddcov(Xk,Xk)ddcov(W,W),whereddcov(Xk,W)=1n2ni=1nj=1|Xik Xjk|Wi Wj|+1n4ni=1nj=1|Xik Xjk|ni=1nj=1|Wi Wj|+1n3nl=1ni=1nj=1|Xik Xlk|Wj Wl|,Xikis the kth component of Xi,ddcov(Xk,Xk)is defined asddcov(Xk,W)with W replacedby Xk,andddcov(W,W)is defined asddcov(Xk,W)with Xkreplaced by W.We rank theim

31、portance of Xkaccording to b kand estimate XWbycXW=k:b kis among the topbd largest of all,wherebd is an estimated dimension of XW.Huang et al.19proposed an approach todeterminebd based on the maximum ratio criterion,but their method may lead to abd292Chinese Journal of Applied Probability and Statis

32、ticsVol.40larger than the dimension of XW.We modify the approach of 19 and propose to usebd=argmaxk=1,2,dmaxb(k)+b(k+1)+b(k+2)b(k+1)+b(k+2)+b(k+3),where b(1)b(2)b(p)are the ordered values of b ks and dmaxis a user-specifiedpositive integer.In applications,we may take dmaxas n or ln(n),which is a com

33、monlyused value in the feature screening literature.Assuming that d=the dimension of XWdoes not vary with n,it can be shown thatbd d as the sample size n ,following the argument in 19.The main argument isthat(b(k)+b(k+1)+b(k+2)/(b(k+1)+b(k+2)+b(k+3)Op(1)for k=d and(b(k)+b(k+1)+b(k+2)/(b(k+1)+b(k+2)+

34、b(k+3)for k=d.Our numerical experimentssuggest that this works fairly well for selecting XW.2.3Analysis after Covariate SelectionTypically,covariate selection is not the only purpose of statistical analysis.After wereduce the covariate set from X to XWwith dimension d 6 p,we need to carry out somean

35、alysis with data Wi,Xi,i=1,2,n,where Wi=ieYior iYi.When nonresponse is nonignorable,the distribution of(,Y,XW)may be not identifi-able20,21.Two sufficient conditions for the identifiability of distribution of(,Y,XW)are:(I)XWcan be split into two sub-vectors,XW=(U,Z)with Z=,such that thepropensity P(

36、=1|Y,XW)=P(=1|Y,U)and f(Y|XW)depends on Z.(II)There is a parametric component in either f(Y|XW)or P(=1|Y,U).Condition(I)means that,when Y cannot be excluded from the propensity,a subset Zof XWcan be excluded,and Z is still a useful covariate for Y since f(Y|XW)dependson Z.Wang et al.21refer to such

37、a Z as a nonresponse instrument.Excluding Y or Zsimplifies the form of the propensity and enables us to identify it.Although(I)and(II)are sufficient,without either of them leads to a nonidentifiable distribution of(,Y,XW);see 21 for(I)and 20 for(II).For(I),we assume that XW=(U,Z),Z=,and(1)holds with

38、 g(X)replaced byg(U).Since all components in XWare related with(,Y)after the covariate selection inSection 2,it is automatically true that f(Y|XW)depends on Z,i.e.,(I)holds as long asZ exists and=.For(II),if g(U)in(1)follows a parametric model,then the model and instrumentselection approach in 15 ca

39、n be applied to find U and Z,estimate the propensity,andNo.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse293parameters in f(Y|XW)using the estimated propensity.If g(U)in(1)is nonparametric,then the semi-parametric approach in 22 can be applied.Alternatively,if f(Y|XW)follows a p

40、arametric model,then the model and instrument selection approach in 16 canbe used.3Simulation StudiesWe conduct a simulation study to examine the proposed method of selecting relevantcovariates,the selection of instrument Z,and the estimation of E(Y).The population of(,Y,X)is given as follows.First,

41、the covariate vector X=(X1,X2,Xp)is generated from a p-dimensional normal distribution with all meansequal to 1,all variances equal to 2,leg one covariance Cov(Xj,Xj1)=2/3,leg 2 co-variance Cov(Xj,Xj2)=1/3,and all other covariances equal to 0.Second,the responseconditional on X is generated asY N(X2

42、1+X22+X23,2),i.e.,only the first three components of X are related with Y.Finally,given(Y,X),theresponse indicator is from a Bernoulli distribution with propensity(Y,U)=P(=1|Y,XW),XW=(X1,X2,X3),from one of the following three different cases:(i)(Y,U)=1+exp(+Y)1with =0.4,=0.3,the best instrumentZ=(X1

43、,X2,X3),and U=.(ii)(Y,U)=1+exp(+2X2+Y)1with =0.8,2=1.2,=0.3,the bestinstrument Z=(X1,X3),and U=X2.(iii)(Y,U)=1+exp(+1X1+2X2+Y)1with =1.2,1=0.6,2=0.6,=0.3,the best instrument Z=X3,and U=(X1,X2).The coefficients in(Y,U)are chosen such that the unconditional rates of missing re-sponses are between 20%a

44、nd 40%.After we apply the procedure in Sections 2.12.2 to obtain XW,we apply the in-strument selection method PVC in 15 to select U and Z,assuming that we do not knowthe models in(i)(iii),and estimate the propensity by b(Y,U)=1+exp(b +b Y)1,1+exp(b +b2X2+b Y)1,or 1+exp(b +b1X1+b2X2+b Y)1,for cases(i

45、)(iii)respectively.Then,the parameter E(Y)is estimated asE(Y)=1nni=1iYib(Yi,Ui).(5)294Chinese Journal of Applied Probability and StatisticsVol.40We consider sample sizes n=300 and 500,and dimensions p=100,500,and 1000.Thus,for all six combinations of n and p,n p in two cases,n=p in one case,and n pi

46、n three cases.We evaluate(a)the finite sample performance of the proposed method for covariateselection under nonignorable nonresponse by the following criteria as in 9:Pj=theproportion that the active covariate Xjis selected,j=1,2,3,and PA=the proportionthat all active covariates X1,X2,and X3are se

47、lected;(b)the finite sample performanceof the PVC in 15 to select instrument Z after selecting XWwith PC=the proportion ofcorrectly selecting an instrument and PB=the proportion of selecting the best Z;and(c)the bias,standard deviation(SD),and root mean squared error(RMSE)of the estimatorof E(Y)in(5

48、).For comparison,we also include the performance of the naive estimator ofE(Y),the sample mean of observed data,and the oracle estimator of E(Y)assuming weexactly know which covariates are useful and which propensity models(i)(iii)generatesthe data.Results based on 1000 simulation replications are g

49、iven in Table 1 for P1,P2,P3,PA,PC,and PB,and in Table 2 for bias,SD,and RMSE of the estimation of E(Y).For any given p,it can be seen that both the proposed method of selecting XWandthe PCV method of selecting the instrument work well with high values of Ps,many ofthem are equal to 1000 in 1000 sim

50、ulations.In terms of SD,our proposed estimator(5)performs comparably with the oracle estimator.Estimator(5)has larger bias than theoracle estimator,but all biases are insignificant compared with SD.The sample mean ofobserved data,on the other hand,is seriously biased due to the nonignorable nonrespo

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2024 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服