不可忽略的无响应缺失下的协变量选择.pdf-资源下载-咨信网助力知识提升-让知识获取变得高效!

不可忽略的无响应缺失下的协变量选择.pdf

1、应用概率统计第 40 卷第 2 期2024 年 4 月Chinese Journal of Applied Probability and StatisticsApr.,2024,Vol.40,No.2,pp.287-297doi:10.3969/j.issn.1001-4268.2024.02.005Covariate Selection under Nonignorable NonresponseSHAO Jun(School of Statistics,East China Normal University,Shanghai,200062,China;Department of Sta

2、tistics,University of Wisconsin-Madison,Madison,WI 53706,USA)WANG Lei(School of Statistics and Data Science&LPMC,Nankai University,Tianjin,300071,China)Abstract:This paper aims at developing a covariate selection approach for high-dimensionalcovariate vector in the presence of nonignorable nonrespon

3、se.Because of nonignorable missingresponses,a novel covariate selection method has to be developed to eliminate covariates associatedwith neither the response variable nor the nonresponse mechanism.Once the redundant covariatesare removed,existing methods for propensity estimation and other analyses

4、 by inverse propensityweighting can be applied.We provide some simulation results to show the effectiveness of ourapproach.Keywords:created responses;high dimensionality;missing not at random;propensity;semi-parametric method2020 Mathematics Subject Classification:primary 62D19;secondary 62F07;62G20

5、Citation:SHAO J,WANG L.Covariate selection under nonignorable nonresponseJ.Chinese JAppl Probab Statist,2024,40(2):287297.1IntroductionHigh-dimensional covariate vector is often encountered in many fields of modern sci-entific research,such as signal processing,biomedical and functional magnetic res

6、onanceimaging,and finance,where only a small number of covariates are actually related withthe response of interest.Various covariate selection procedures have been developed toreduce the dimensionality of covariate vector114.The problem becomes more challeng-ing when the response of interest has no

7、nresponse that is nonignorable in the sense thatthe probability of nonresponse conditioned on the response and covariates depends on thevalue of response.Lei Wangs research was supported by the Fundamental Research Funds for the Central Universitiesand the National Natural Science Foundation of Chin

8、a(Grant No.12271272).Corresponding author,E-mail:shaostat.wisc.edu.Received November 27,2023.Revised January 29,2024.288Chinese Journal of Applied Probability and StatisticsVol.40In the presence of a high-dimensional covariate vector and a response having nonig-norable nonresponse,this paper aims at

9、 developing a covariate selection method to selectuseful covariates for predicting response and handling nonresponse.The main difficulty inthis problem is how to carry out variable selection in the presence of nonignorable missingresponse values.We apply a roundabout approach based on a decompositio

10、n for the con-ditional density of response and its indicator of observing given covariates in terms of otherdensities that can be used to perform variable selection in the presence of nonignorablemissing responses.Our approach can be implemented using any existing covariate selec-tion procedure appl

11、ied to a created“response”(a function of response and its indicator ofobserving)without any missing value.Once covariates that are related with neither theresponse variable nor the nonresponse mechanism are eliminated,analysis can be carriedout using the approach in either 15 or 16.After we develop

12、the methodology in Section2,we carry out some simulations for illustration of our proposed procedure.2MethodologyLet Y be the response variable of interest and X be the associated covariate vectorwith possibly high dimension.Suppose that the value of X is always observed but thevalue of Y may be mis

13、sing(nonresponse).Let be the observed binary indicator ofwhether Y is observed or not.Throughout we assume thatP(=1|Y,X)=11+expg(X)+Y,(1)where is an unknown parameter and g is an unknown and unspecified function.Theprobability function in(1)is referred to as nonresponse propensity or simply propensi

14、ty.If =0,the propensity actually does not depend on Y that may be missing and,thus,the nonresponse of Y is ignorable.When =0,the propensity depends on Y regardlessof what g(X)is,and nonresponse is nonignorable.Finally,the propensity model(1)isparametric in Y and nonparametric in X as g is unspecifie

15、d.2.1Covariate SelectionWithout nonresponse,covariate selection amounts to finding the smallest subset XYof X such that Y X|XY,i.e.,Y and X are independent conditional on XY,sincecovariates in X but not in XYare useless.When Y has ignorable nonresponse,i.e.,thedistribution of Y|X is the same as that

16、 of Y|X,=1,any covariate selection methodfor the case without nonresponse can be applied using data observed from(Y,X,=1)No.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse289and we do not need to find out which covariates are related with.In the presence ofnonignorable nonrespons

17、e,however,finding XYis not enough for analysis17,because acovariate not related with Y but related with is still useful.Hence,we need to find thesmallest subset XY,of X such that(,Y)X|XY,.(2)Covariates in X but not in XY,are related with neither response Y nor nonresponseindicator and,hence,should b

18、e eliminated because they are useless.However,XY,cannot be directly obtained through existing covariate selection meth-ods for the case with nonignorable nonresponse because Y is not available when =0 andthe distributions of Y|X and Y|X,=1 are different.To overcome this difficulty,weutilize the iden

19、tityf(,Y|X)=f(|Y,X)f(Y|X)=f(|Y,X)P(=1|X)f(Y|X,=1)P(=1|Y,X),where f(|)is a generic notation for conditional probability density.This expression andassumption(1)imply that XY,XY|=1 X g(X),where g(X)is given in(1)andXY|=1and Xare the smallest subsets of X such thatY X|XY|=1,=1and X|X.(3)It follows from

20、(1)thatexpg(X)=E(1|X)E(eY|X).The key of our approach is to create a new“response”W=eYor W=Y ifP(Y=0|X)=0 almost surely.Note that W=ieY=eY(or Y)if Y is observed,andW=eY=0 if Y is missing regardless of what the value Y is.Hence,the new createdresponse W does not have any missing value.Let XWbe the sma

21、llest subset of X such thatW X|XW.(4)Then,g(X)X XWand,hence,XY,XY|=1 X XW.Since W is a functionof(,Y),XW XY,.We now establish the following nice result useful for covariate selection to find the setXY,.290Chinese Journal of Applied Probability and StatisticsVol.40Theorem 1LetXY,XY|=1,X,andXWbe defin

23、|X)=P(W=0|XW)=P(=0|XW),which implies that X XW.This implies thatP(W 6 t|XW)=P(Y 6 lnt|XW,=1)P(=1|XW)+P(=0|XW)=P(Y 6 lnt|XW,=1)P(=1|X)+P(=0|X).On the other hand,if XWsatisfies(4),thenP(W 6 t|XW)=P(W 6 t|X)=P(Y 6 lnt|X,=1)P(=1|X)+P(=0|X).This shows that P(Y 6 lnt|XW,=1)=P(Y 6 lnt|X,=1),i.e.,XY|=1 XW.T

24、hus,XY|=1 X XWand,hence,XW=XY|=1 X.Since we previously showedthat XW XY,XY|=1 X XW,we conclude that XY,=XW.This completesthe proof.?Similar to our Theorem 1,Zheng et al.18derived a result for finding the centraldimension reduction linear space containing linear combinations of XY,.Their conditionsar

25、e the same as ours,except that they consider W=Y and must assume P(Y=0|X)=0.The results on dimension reduction(in 18)and covariate slection(in our paper)areboth useful in applications;the former finds linear combinations of XY,whereas the latterselects variables in XY,.One advantage of covariate sel

26、ection is that the result is easierinterpret.The only condition for Theorem 1 is the semi-parametric propensity model(1).Theresult in Theorem 1 is not robust against condition(1),although it is not a serious as-sumption.No.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse2912.2Meth

27、od of Finding XWAccording to Theorem 1,any existing model-free covariate selection method usingfully observed data can be directly applied to find XWbased on observed data Wi,Xi,i=1,2,n without any missing value,where i,Yi,Xi,i=1,2,n is a randomsample from the population of(,Y,X),Wi=ieYi=eYi(or Yi)i

28、f Yiis observed,andWi=ieYi=0 if Yiis missing.To complete our step of finding XW,we propose to apply the distance correlationbased sure independent screening(DC-SIS)9,14.For Xk=the kth component of X,k=1,2,p,the marginal distance correlation between Xkand W is defined ask=dcov(Xk,W)dcov(Xk,Xk)dcov(W,

29、W),where dcov(u,v)represents the distance covariance between two random variables u andv defined asdcov(u,v)=E|u e u|v e v|+E|u e u|E|v e v|+2EE(|u e u|u)E(|v e v|v),with(e u,e v)being an independent copy of(u,v).It can be estimated by the followingsample distance correlation between Xkand W:b k=ddc

31、portance of Xkaccording to b kand estimate XWbycXW=k:b kis among the topbd largest of all,wherebd is an estimated dimension of XW.Huang et al.19proposed an approach todeterminebd based on the maximum ratio criterion,but their method may lead to abd292Chinese Journal of Applied Probability and Statis

32、ticsVol.40larger than the dimension of XW.We modify the approach of 19 and propose to usebd=argmaxk=1,2,dmaxb(k)+b(k+1)+b(k+2)b(k+1)+b(k+2)+b(k+3),where b(1)b(2)b(p)are the ordered values of b ks and dmaxis a user-specifiedpositive integer.In applications,we may take dmaxas n or ln(n),which is a com

33、monlyused value in the feature screening literature.Assuming that d=the dimension of XWdoes not vary with n,it can be shown thatbd d as the sample size n ,following the argument in 19.The main argument isthat(b(k)+b(k+1)+b(k+2)/(b(k+1)+b(k+2)+b(k+3)Op(1)for k=d and(b(k)+b(k+1)+b(k+2)/(b(k+1)+b(k+2)+

34、b(k+3)for k=d.Our numerical experimentssuggest that this works fairly well for selecting XW.2.3Analysis after Covariate SelectionTypically,covariate selection is not the only purpose of statistical analysis.After wereduce the covariate set from X to XWwith dimension d 6 p,we need to carry out somean

35、alysis with data Wi,Xi,i=1,2,n,where Wi=ieYior iYi.When nonresponse is nonignorable,the distribution of(,Y,XW)may be not identifi-able20,21.Two sufficient conditions for the identifiability of distribution of(,Y,XW)are:(I)XWcan be split into two sub-vectors,XW=(U,Z)with Z=,such that thepropensity P(

36、=1|Y,XW)=P(=1|Y,U)and f(Y|XW)depends on Z.(II)There is a parametric component in either f(Y|XW)or P(=1|Y,U).Condition(I)means that,when Y cannot be excluded from the propensity,a subset Zof XWcan be excluded,and Z is still a useful covariate for Y since f(Y|XW)dependson Z.Wang et al.21refer to such

37、a Z as a nonresponse instrument.Excluding Y or Zsimplifies the form of the propensity and enables us to identify it.Although(I)and(II)are sufficient,without either of them leads to a nonidentifiable distribution of(,Y,XW);see 21 for(I)and 20 for(II).For(I),we assume that XW=(U,Z),Z=,and(1)holds with

38、 g(X)replaced byg(U).Since all components in XWare related with(,Y)after the covariate selection inSection 2,it is automatically true that f(Y|XW)depends on Z,i.e.,(I)holds as long asZ exists and=.For(II),if g(U)in(1)follows a parametric model,then the model and instrumentselection approach in 15 ca

39、n be applied to find U and Z,estimate the propensity,andNo.2SHAO J.,WANG L.:Covariate Selection under Nonignorable Nonresponse293parameters in f(Y|XW)using the estimated propensity.If g(U)in(1)is nonparametric,then the semi-parametric approach in 22 can be applied.Alternatively,if f(Y|XW)follows a p

40、arametric model,then the model and instrument selection approach in 16 canbe used.3Simulation StudiesWe conduct a simulation study to examine the proposed method of selecting relevantcovariates,the selection of instrument Z,and the estimation of E(Y).The population of(,Y,X)is given as follows.First,

41、the covariate vector X=(X1,X2,Xp)is generated from a p-dimensional normal distribution with all meansequal to 1,all variances equal to 2,leg one covariance Cov(Xj,Xj1)=2/3,leg 2 co-variance Cov(Xj,Xj2)=1/3,and all other covariances equal to 0.Second,the responseconditional on X is generated asY N(X2

42、1+X22+X23,2),i.e.,only the first three components of X are related with Y.Finally,given(Y,X),theresponse indicator is from a Bernoulli distribution with propensity(Y,U)=P(=1|Y,XW),XW=(X1,X2,X3),from one of the following three different cases:(i)(Y,U)=1+exp(+Y)1with =0.4,=0.3,the best instrumentZ=(X1

43、,X2,X3),and U=.(ii)(Y,U)=1+exp(+2X2+Y)1with =0.8,2=1.2,=0.3,the bestinstrument Z=(X1,X3),and U=X2.(iii)(Y,U)=1+exp(+1X1+2X2+Y)1with =1.2,1=0.6,2=0.6,=0.3,the best instrument Z=X3,and U=(X1,X2).The coefficients in(Y,U)are chosen such that the unconditional rates of missing re-sponses are between 20%a

44、nd 40%.After we apply the procedure in Sections 2.12.2 to obtain XW,we apply the in-strument selection method PVC in 15 to select U and Z,assuming that we do not knowthe models in(i)(iii),and estimate the propensity by b(Y,U)=1+exp(b +b Y)1,1+exp(b +b2X2+b Y)1,or 1+exp(b +b1X1+b2X2+b Y)1,for cases(i

45、)(iii)respectively.Then,the parameter E(Y)is estimated asE(Y)=1nni=1iYib(Yi,Ui).(5)294Chinese Journal of Applied Probability and StatisticsVol.40We consider sample sizes n=300 and 500,and dimensions p=100,500,and 1000.Thus,for all six combinations of n and p,n p in two cases,n=p in one case,and n pi

46、n three cases.We evaluate(a)the finite sample performance of the proposed method for covariateselection under nonignorable nonresponse by the following criteria as in 9:Pj=theproportion that the active covariate Xjis selected,j=1,2,3,and PA=the proportionthat all active covariates X1,X2,and X3are se

47、lected;(b)the finite sample performanceof the PVC in 15 to select instrument Z after selecting XWwith PC=the proportion ofcorrectly selecting an instrument and PB=the proportion of selecting the best Z;and(c)the bias,standard deviation(SD),and root mean squared error(RMSE)of the estimatorof E(Y)in(5

48、).For comparison,we also include the performance of the naive estimator ofE(Y),the sample mean of observed data,and the oracle estimator of E(Y)assuming weexactly know which covariates are useful and which propensity models(i)(iii)generatesthe data.Results based on 1000 simulation replications are g

49、iven in Table 1 for P1,P2,P3,PA,PC,and PB,and in Table 2 for bias,SD,and RMSE of the estimation of E(Y).For any given p,it can be seen that both the proposed method of selecting XWandthe PCV method of selecting the instrument work well with high values of Ps,many ofthem are equal to 1000 in 1000 sim

50、ulations.In terms of SD,our proposed estimator(5)performs comparably with the oracle estimator.Estimator(5)has larger bias than theoracle estimator,but all biases are insignificant compared with SD.The sample mean ofobserved data,on the other hand,is seriously biased due to the nonignorable nonrespo

邮箱/手机：
验证码：	获取验证码
温馨提示：	支付成功后，系统会自动生成账号（用户名为邮箱或者手机号，密码是验证码），方便下次登录下载和查询订单；
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？