收藏 分销(赏)

缺失数据过程的自适应多元EWMA控制图.pdf

上传人:自信****多点 文档编号:3656284 上传时间:2024-07-12 格式:PDF 页数:21 大小:299.88KB
下载 相关 举报
缺失数据过程的自适应多元EWMA控制图.pdf_第1页
第1页 / 共21页
缺失数据过程的自适应多元EWMA控制图.pdf_第2页
第2页 / 共21页
缺失数据过程的自适应多元EWMA控制图.pdf_第3页
第3页 / 共21页
亲,该文档总共21页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、应用概率统计第 40 卷第 2 期2024 年 4 月Chinese Journal of Applied Probability and StatisticsApr.,2024,Vol.40,No.2,pp.343-363doi:10.3969/j.issn.1001-4268.2024.02.008An Adaptive Multivariate EWMA Control Chart forMonitoring Missing DataPU XiaolongXIANG DongdongCHEN Xinyan(KLATASDS-MOE,School of Statistics,East Ch

2、ina Normal University,Shanghai,200062,China)Abstract:With the increasing complexity of production processes,there has been a growingfocus on online algorithms within the domain of multivariate statistical process control(SPC).Nonetheless,conventional methods,based on the assumption of complete data

3、obtained at uniformtime intervals,exhibit suboptimal performance in the presence of missing data.In our pursuit ofmaximizing available information,we propose an adaptive exponentially weighted moving average(EWMA)control chart employing a weighted imputation approach that leverages the relationships

4、between complete and incomplete data.Specifically,we introduce two recovery methods:an im-proved K-Nearest Neighbors imputing value and the conventional univariate EWMA statistic.Wethen formulate an adaptive weighting function to amalgamate these methods,assigning a dimin-ished weight to the EWMA st

5、atistic when the sample information suggests an increased likelihoodof the process being out of control,and vice versa.The robustness and sensitivity of the proposedscheme are shown through simulation results and an illustrative example.Keywords:online monitoring;completely random missing;weighted i

6、mputing values;EWMA;improved K-nearest neighbors2020 Mathematics Subject Classification:62P30Citation:PU X L,XIANG D D,CHEN X Y.An adaptive multivariate EWMA control chartfor monitoring missing dataJ.Chinese J Appl Probab Statist,2024,40(2):343363.1IntroductionThe field of multivariate statistical p

7、rocess control(MSPC)has undergone significantadvancements over the past decade,propelled by the continuous augmentation of com-putational capabilities and the evolution of real-time information capture systems16.MSPC find applications across diverse domains,ranging from solar flare monitoring7tointe

8、lligent decision-making in business8and fault diagnosis in industrial scenes911,among others.Notably,conventional approaches within this domain mainly rely on thepresupposition of complete data acquired at uniform time intervals.However,there is aCorresponding author,E-mail:.Received December 18,202

9、3.344Chinese Journal of Applied Probability and StatisticsVol.40conspicuous scarcity of methodologies specifically tailored to handle missing data.Thispaper directs its attention to the enhancement of control schemes to effectively monitormean shifts while accounting for the challenges posed by miss

10、ing data.The issue of monitoring missing data is common in practical applications12,13.Apertinent example is found in the realm of Intelligent Transportation Systems(ITS),wherethe detection system plays a crucial role in alleviating traffic congestion and enhancingdriver assistance systems.In this s

11、ystem,diverse characteristics are gathered throughvarious sensors like surveillance cameras and automobile data recorders,subsequentlytransmitted to a central processing unit.The efficacy of transportation systems is contin-gent on high-quality traffic data for ITS applications.However,missing data

12、issues persistwithin the ITS due to hardware limitations and subjective factors14.Firstly,samplingintervals differ among sensors,with data pertaining to individual details(e.g.,walk speed,access and egress times)being more frequently acquired than information on environmen-tal factors(e.g.,train loa

13、d,tap-in and tap-out times).Consequently,the amalgamationof these diverse characteristics for monitoring may result in missing data in variables withlower sampling frequencies.Secondly,the occurrence of unavailable variables transpiresrandomly due to transmission distortion and loss of vehicle ident

14、ification15,16.As notedby Chen et al.17,incomplete data significantly impacts the accuracy of traffic estimation,inference,and prediction.Similar challenges are encountered in various domains,includingtelehealth systems18,patient records19,and chemical control processes20.To date,the statistical qua

15、lity control literature has addressed the aforementionedproblem only sparingly,particularly within the domain of multivariate control charts an instrumental tool for simultaneous detection of multiple characteristics and signalingabnormal conditions.Within this context,a straightforward yet ineffect

16、ive approach in-volves removing observations or variables with missing data and proceeding to detect thesubsequent observation.This method,as elucidated by Mason and Young21,is validonly if the remaining data continues to be representative of the process and if the causeof the missing data is indepe

17、ndent of the values themselves.When shifts coincide withmissing data,the control system typically exhibits a slower response than in the completedata scenario.In response to this limitation,Mason and Young21proposed a predictivevalue to estimate the mean of the missing variable,relying on regression

18、 with the other re-maining variables.While this approach was applied in the T2chart,two adjusted weightsfor the exponentially weighted moving average(EWMA)chart were introduced,with themethod based on the previous value of the EWMA statistic for a variable exhibiting thebest overall performance22.Ho

19、wever,this methods drawback lies in potential challengesNo.2PU X.L.,et al.:An Adaptive Multivariate EWMA Control Chart for Monitoring Missing Data345to estimation robustness when missing values occur near shift points.To address this,advanced imputation methods have been employed.Madbuly etal.23expl

20、ored mean substitution,regression,stochastic regression,and the expectationmaximization algorithm methods for handling missing values in constructing the mul-tivariate EWMA control chart.Their findings indicated that regression-based imputa-tion methods performed well.Nevertheless,such methods,relia

21、nt on fixed relationships,counter the core objective of MSPC to promptly monitor changes thus compromis-ing algorithmic efficiency.Furthermore,even in instances where missing data problemsarise within an in-control process,inaccuracies in imputed values may occur due to theabsence of large-scale com

22、plete historical data.Importantly,simple recovery methodsprove insufficient in enhancing data quality within monitoring systems,given the inherentcharacteristics of data heterogeneity and spatial-temporal correlation24.Consequently,the imperative arises to develop a robust and sensitive monitoring s

23、cheme capable of effec-tively detecting online processes with intermittent missing data,applicable across diversescenarios.This paper introduces a novel charting scheme designed for the robust and sensitivedetection of incomplete online data.Specifically,our approach addresses scenarios wherea subse

24、t of variables is missing at random,allowing for the utilization of relationshipsbetween complete and incomplete variables to recover missing data dynamically.The pro-posed method employs two distinct imputing values to enhance the performance of theEWMA control chart in the context of missing data.

25、Firstly,we introduce an improvedK-Nearest Neighbors(KNN)imputing value,considering the correlativity of variables.This method identifies previous samples that closely resemble the missing point throughcomplete variables,eliminating the need for assumptions regarding normal distributionand large-scal

26、e samples,as required by regression-based imputing values.Secondly,theconventional univariate EWMA statistic of the missing variable is employed,serving asan unbiased estimation of the in-control(IC)mean.Both algorithms are computation-ally straightforward and fulfill the prerequisites for online mo

27、nitoring.To optimize thecombination of these imputing values,we introduce an adaptive weighting function thatdynamically assigns a smaller weight to the EWMA statistic when sample informationindicates a higher likelihood of the process being out-of-control(OC)and vice versa.Sub-sequently,we present

28、an advanced EWMA control chart based on the adaptive weightedimputing values.The paper discusses the performance of this chart under varying missingratios in both Phase I and Phase II,illustrating its broad applicability and efficacy indiverse scenarios.346Chinese Journal of Applied Probability and

29、StatisticsVol.40The remainder of this paper is organized as follows.In Section 2,the proposedEWMA charting scheme for handling missing data is introduced in detail.In Section 3,theperformance of the proposed chart is assessed in comparison to two types of EWMA controlcharts one that directly elimina

30、tes missing data and another that recovers missing datasolely based on the EWMA statistic.Section 4 provides a real-world example to elucidatethe practical feasibility of the new chart.The paper concludes with our findings andimplications in the last section.2MethodologyConsider a scenario where,at

31、time point t,a p-dimensional random vector Xt=X1,t,X2,t,Xp,tTis collected.The observations are assumed to be independentlyand identically distributed(i.i.d).Detection algorithms are developed in two distinctphases,namely Phase I and Phase II:In Phase I,m samples,denoted as Xm+1,Xm+2,X0,are obtained

32、from an ICprocess.These samples are utilized to estimate certain unknown parameters,such aslocation and scale parameters.In Phase II,the primary objective is to promptly detect an out-of-control situation.The EWMA control chart is a common scheme for achieving this purpose and can beformulated as fo

33、llows25:Et=(Xt 0)+(1 )Et1andt=2 0,V2t=EttEt,(1)where is a weighting parameter,E0=0,0and 0are the IC mean vector andcovariance matrix,respectively.The corresponding chart,with the charting statisticEtin Equation(1),signals a process mean shift at the t-th time point if V2t h,where h 0 is the control

34、limit.In the presence of missing data,the estimation of parameters in Phase I may exhibitincreased bias,and the calculation of V2tbecomes impracticable.In this section,ourproposed scheme addresses these challenges through a novel recovery method.Three keyissues are considered:(i)how to recover missi

35、ng data using fast and robust algorithms thatmeet the requirements of online detection,(ii)how to develop a combining method thatbalances the gains and losses of various imputing values in IC and OC cases,and(iii)howto design a sensitive scheme for detecting the OC process that fully utilizes all re

36、levantNo.2PU X.L.,et al.:An Adaptive Multivariate EWMA Control Chart for Monitoring Missing Data347information,including that derived from missing data.To facilitate comprehension,theremainder of this section is divided into three subsections,each focusing on one of thesecomponents.2.1Imputation of

37、Missing DataIn the context of online monitoring,we assume that the dataset from time 1 to timet 1 is complete and the issue of missing data arises at time t in variables,denoted asXi1,t,Xi2,t,Xi,t.Let S=i1,i2,i represent the subset of incomplete variablesand S=1,2,pS,c1,c2,cp represent the subset of

38、 complete variables.To recover the missing data at time t,we present two unconditional mean imputationmethods.Firstly,considering that the EWMA statistic is recognized as an unbiased estimationof the sample mean when there are no changes in the process,its previous value forthe missing variable is c

39、onsidered as one of the imputing values.Specifically,the j-thincomplete variable Xj,t(j S)can be replaced by Ej,t1in Equation 2:Ej,0=0,Ej,t1=Xj,t+(1 )Ej,t2.(2)Secondly,taking into consideration the properties of ease of calculation and infor-mation utilization efficiency,we develop an improved KNN m

40、ethod as another imputingvalue.In contrast to conventional statistical analysis methods such as mean methods26,expectation maximization27,regression analysis28,hot deck imputation29,etc.,theKNN algorithm relies solely on correlation relationships and does not necessitate large-scale samples.Conseque

41、ntly,it exhibits robust performance in detection systems.Toelaborate,we choose K(K 6 m)data points closest to time t,and from these,we selectthe k most relevant points based on information from complete variables,denoting asXl1,Xl2,Xlk.The ij-th incomplete variable Xj,t(j S)can then be replaced by t

42、hemean of these data points:Xj,t=1kk=1Xj,l.(3)It is important to note that the Mahalanobis distance of complete variables is employedhere to measure the correlation between the data at time t and the data at other times.In summary,the improved KNN algorithm is presented in Algorithm 1.348Chinese Jou

43、rnal of Applied Probability and StatisticsVol.40Algorithm 1Improved KNN AlgorithmInput:Complete data sets Xt1,Xt2,XtK,incomplete data Xt,IC covariance matrix0.Denote the vector of complete variables at time t as XS,t.Then its corresponding covariancematrix and values at time are 0,Sand XS,respective

44、ly.1:for t 1,t 2,t K do2:Compute the Mahalanobis distance between XS,and XS,t:D=(XS,XS,t)(0,S)1(XS,XS,t).3:end for4:Choose the k smallest Mahalanobis distance and keep a record of the corresponding time pointsas l1,l2,lk.5:for j S do6:Compute the mean of the selected data:Xj,t=1kk=1Xj,l.7:Make Xj,tf

45、or recovering j-th variable at time t.8:end for9:return Imputing values Xi1,t,Xi2,t,Xi,t.2.2Adaptive Weighted Combination of the Imputing ValuesEach of the above imputation schemes has its own drawbacks.On one hand,althoughthe EWMA statistic is robust,it may be insensitive as it solely utilizes info

46、rmation aboutthe missing variable itself.On the other hand,the KNN method exhibits fast OC signaland unstable estimation,particularly when shifts occur in the complete variables.Toleverage the strengths of both algorithms effectively,we propose an adaptive weightingfunction capable of combining the

47、two imputing values in various case settings.At time t,the local EWMA statistic of the i-th complete variable(i S)can becomputed as:Ei,0=0,Ei,t=Xi,t+(1 )Ei,t1.(4)The proximity of the local EWMA statistic and local mean indicates a higher likelihoodof an IC process.Consequently,smaller weights are as

48、signed to the KNN statistic.Thefollowing parameter can quantify the importance of the KNN statistic:1=(eEt e 0)(e0)1(eEt e 0),(5)No.2PU X.L.,et al.:An Adaptive Multivariate EWMA Control Chart for Monitoring Missing Data349whereeEt=(Ec1,t,Ec1,t,Ecp,t)is the local EWMA statistic vector of completevari

49、ables at time t,and e 0ande0are the IC mean vector and covariance matrix of thesecomplete variables.Moreover,since the local EWMA statistic of incomplete variables at time t 1 rep-resents the potential shift trend of missing data,larger weights should be assigned to theEWMA imputing values.The follo

50、wing parameter estimates its significance:2=|Ej,t1 j,0|j,t1,(6)where 2j,t1=(2 )12i,0,j,0and 2j,0are the IC mean and variance of the j-thvariable,respectively.Additionally,1and 2indicate the importance of KNN and EWMA statistics,re-spectively,with both standard deviations equal to 1.To obtain the nor

展开阅读全文
相似文档                                   自信AI助手自信AI助手
猜你喜欢                                   自信AI导航自信AI导航
搜索标签

当前位置:首页 > 学术论文 > 论文指导/设计

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2024 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服