1、应用概率统计第 40 卷第 2 期2024 年 4 月Chinese Journal of Applied Probability and StatisticsApr.,2024,Vol.40,No.2,pp.229-263doi:10.3969/j.issn.1001-4268.2024.02.002Selective Review of Biased Sampling Problems withApplications in Modern StatisticsQIN Jing(National Institute of Allergy and Infectious Diseases,NIH
2、,Bethesda,MD 20892,USA)Abstract:Biased sampling is a pervasive issue that transcends various disciplines,impactingfields such as econometrics,epidemiology,medicine,survey research,and more recently,machinelearning and artificial intelligence(AI).This ubiquitous challenge arises when the selection of
3、 datapoints for analysis or research introduces systematic biases,potentially compromising the accuracyand reliability of research outcomes.In this paper,our objective is to provide a comprehensiveoverview of the foundational concepts related to biased sampling problems and the methods ofinference.F
4、urthermore,we aim to establish a connection between biased sampling issues and themore recent discussions in machine learning regarding distribution shift problems.Additionally,we will delve into the latest advancements in biased sampling,particularly within the context oftransfer learning and confo
5、rmal inference for predictive confidence intervals.Our ultimate goal is topresent this material in a manner that is accessible to graduate students,enabling them to identifyapplications of biased sampling problems within their own research endeavors.It is with deep respect and gratitude that we dedi
6、cate this paper to the memory of the late ProfessorShisong Mao,whose guidance and wisdom have been invaluable throughout the years.Keywords:biased sampling problems;casual inference;conformal predictive interval;distribu-tional shift;transfer learning;in memory Professor Shisong Mao2020 Mathematics
7、Subject Classification:62D20;62G20;62G05Citation:QIN J.Selective review of biased sampling problems with applications in modernstatisticsJ.Chinese J Appl Probab Statist,2024,40(2):229263.1Interactions with Professor Shisong MaoIn the world of academia,the passing of a distinguished professor is a po
8、ignant mo-ment,marked not only by the loss of an esteemed scholar but also by the legacy he or sheleaves behind.It is a time when we come together to remember,honor,and celebrate theprofound contributions of a remarkable individual whose work has indelibly shaped thefield of statistics.This paper is
9、 a tribute to the enduring influence of Professor ShisongMao,whose wisdom,dedication,and innovative thinking continue to inspire us.ProfessorE-mail:jingqinniaid.nih.gov.Received November 9,2023.Revised January 29,2024.230Chinese Journal of Applied Probability and StatisticsVol.40Mao was not merely a
10、 statistician but a luminary whose work transcended the boundariesof traditional statistical analysis.With a passion for both the art and science of statistics,he broke new ground in research,teaching,and mentorship.His career was characterizedby a relentless pursuit of knowledge,a commitment to sta
11、tistical rigor,and an unwaveringbelief in the power of data to illuminate the mysteries of the world.In the traditional academic setting,its customary for each professor to mentor onlya limited number of graduate students due to various constraints.Recognizing the criticalshortage of statistics educ
12、ators,the Chinese Department of Education took a proactivestep by introducing a two-year program in 1984 to enroll college graduates for the studyof statistics.This initiative aimed to address the growing demand for skilled statistician-s in various sectors.Under the visionary leadership of Professo
13、r Mao,the Departmentof Statistics at East China Normal University embraced this educational challenge.Inan exemplary move,the department opened its doors to a cohort of 24 special gradu-ate students specializing in statistics.This significant addition complemented ProfessorMaos ongoing mentorship of
14、 his two regular three-year graduate students.This boldand forward-thinking decision not only expanded the horizons for aspiring statisticiansbut also underscored Professor Maos commitment to nurturing future talent in the fieldof statistics.It reflected his dedication to bridging the gap in statist
15、ics education andinspiring the next generation of statisticians in China.My initial encounter with Professor Mao occurred during the spring of 1984.I had justreceived the news that I had successfully passed the preliminary graduate student entryexamination,which marked the beginning of a transformat
16、ive journey.My destination wasEast China Normal University,where I was to face the second round of oral examinations.This momentous occasion held great significance for me,not only in terms of my academicaspirations but also because it marked my first venture from a remote and relatively smalltown i
17、n Sichuan Province.The town,known as Wanzhou,would later become part ofChongqing Special District due to the construction of the Three Gorges Dam.It was aplace where life moved at a slower pace,far removed from the bustling metropolis thatawaited me in Shanghai,Chinas largest city.The transition fro
18、m a tranquil and close-knitcommunity to the dynamic and sprawling urban landscape of Shanghai was a monumentalshift.As an unsophisticated young middle school math teacher stationed in an isolatedtownship,I embarked on my first-ever solo journey to a dynamic urban center.The mixof excitement and nerv
19、ousness was palpable.The prospect of studying in Shanghai andpursuing my academic dreams was exhilarating,but it was also accompanied by a senseNo.2QIN J.:Selective Review of Biased Sampling Problems with Applications in Modern Statistics231of the unknown.The citys vastness and the anonymity of its
20、busy streets were boththrilling and intimidating.It was a stark contrast to the closely bonded community I wasaccustomed to.The cacophony of traffic,the towering skyscrapers,and the neon lightspainted a picture of a world entirely different from what I had known.My situation indeedresonates with the
21、 description in the famous Chinese novel“Dream of the Red Chamber”,where Grandma Lius entry into the grand house brings her face-to-face with an entirelyunfamiliar and overwhelming environment.In my case,the parallel may be even morepronounced,given the added challenge of grappling with a different
22、dialect,the Shanghaiaccent.Much like the character in the story,my experience of joining the platform createdby Professor Mao,with its innovative approach and a diverse group of graduate students,might have felt like stepping into a world filled with novel experiences and opportunities.Unquestionabl
23、y,this adventure was a vital step in my personal and professional growth.It symbolized not only a geographical transition but a leap into the uncharted waters ofhigher education and self-discovery.The small-town math teacher was on the brink of anew chapter,eager to embrace the challenges and opport
24、unities that the big city had tooffer.Professor Maos reputation extends far and wide,earning him the respect and recog-nition of colleagues from universities across the academic landscape,especially within thefield of statistics.As we had the privilege of meeting Professor Mao,I observed that manyof
25、 my fellow students,representing various universities apart from East China Normal U-niversity,could extend warm regards from their own professors.However,when it came tomy turn to exchange handshakes with Professor Mao,I found myself in a unique position.None of my professors had a prior connection
26、 with him,primarily due to the unfamiliaritybetween him and my undergraduate college.In light of this,I chose to express my personaladmiration and warm regards,addressing Professor Mao with sincerity,“Professor Mao,please accept my heartfelt greetings.”In response,Professor Mao warmly reciprocatedwi
27、th a smile,creating a brief yet meaningful connection that exceeded any prior lack offamiliarity.During my tenure at East China Normal University from 1984 to 1988,I frequentlyheard Professor Mao underscore the significance of nurturing a profound passion for theart of data collection.He advocated t
28、reating data with the same care and devotion onemight reserve for a loved one.Moreover,he encouraged individuals to delve deeply intothe data,allowing the information to organically weave its own narrative.In the autumnof 1987,just like any other departments,the statistics department was bustling wi
29、thpreparations for the schools upcoming anniversary celebration.As tradition dictated,232Chinese Journal of Applied Probability and StatisticsVol.40a compelling lecture based on ones own research was the chosen way to contribute tothis special occasion.However,during this particular period,I found m
30、yself immersed ina different endeavor completing the rigorous application process for graduate schoolin North America.My days were filled with the arduous task of completing numerousapplication forms and preparing for the TOFF(Test of Foreign Language)examination.One day,while I was engrossed in the
31、se preparations,Professor Mao,a respected figurein the department,approached me with a question about my research.I couldnt helpbut feel a wave of embarrassment wash over me,for I had no research work to speak ofat that point.It was an awkward moment as I explained my current circumstances toProfess
32、or Mao.In response,Professor Mao,a seasoned and understanding mentor,offereda reassuring gesture.“I can excuse you this time,”he said with a kind smile,“but I hopethis is the last time.”This encounter with Professor Mao served as a pivotal moment inmy academic journey.It was a gentle nudge,a reminde
33、r of the importance of researchand the academic commitment that lay ahead.From that day forward,I embarked on aquest to delve into the world of statistical research,determined to ensure that it wouldindeed be the last time I found myself unprepared in the presence of my academic peersand mentors.Lit
34、tle did I know that this experience would serve as the catalyst for a richand rewarding academic journey,one that would ultimately lead me to make significantcontributions to the field of statistics.In hindsight,I am grateful for Professor Maosguidance and understanding,as it ignited a passion for r
35、esearch that continues to shapemy career in the world of statistics.In late 1991 and early 1992,as I was deeply immersed in my Ph.D.journey at theUniversity of Waterloo,a special and cherished connection blossomed in my academic life a close bond with Professor Mao and Professor Jixiang Zhou,another
36、 distinguishedprofessor from East China Normal University.Their visit to our university during that pe-riod marked a pivotal and memorable chapter in my academic and personal development.Professors Mao,Zhou,and I engaged in numerous discussions that covered a wide spec-trum of subjects,ranging from
37、the intricacies of statistical methodologies to the broadertapestry of life itself.These conversations were not confined to lecture halls or meetingrooms;we extended our discussions beyond the academic sphere.In fact,many eveningsfound us teaming up in the kitchen to prepare dinner together.These sh
38、ared culinaryexperiences transcended mere meal preparation;they served as an extension of our intel-lectual and personal connection.As we chopped,stirred,and simmered,we continued ourexchanges on topics that spanned from academic challenges to the joys and complexitiesof life.No.2QIN J.:Selective Re
39、view of Biased Sampling Problems with Applications in Modern Statistics233This period of close interaction with Professor Mao left an in-erasable mark on myacademic and personal journey.It was more than just a professor-student relationship;itwas a mentorship filled with profound insights,camaraderi
40、e,and shared experiences.Thewisdom I gained from these discussions,both academic and personal,has continued toshape my path in profound ways.As I reflect on those shared dinners and conversations,I am reminded of the lasting impact of those moments and the invaluable guidance thatProfessor Mao provi
41、ded during my formative years in academia.His visit to the Universityof Waterloo was not just an academic event;it was a transformative experience thatcontinues to inspire and influence my academic and personal pursuits to this day.Professor Mao emphasized the paramount importance of mastering the a
42、rt of datacollection in the realm of scientific research.To advance our understanding and insights,we must dedicate our utmost efforts to this fundamental aspect.However,the reality ofpractical applications often brings forth an inescapable challenge:the specter of selec-tion bias.In this paper,we e
43、mbark on a comprehensive exploration of the concept ofbiased sampling.We delve into the nuances of this topic,dissecting its implications,andtracing its impact on the fields of modern statistics and machine learning.Our objec-tive is to unravel the multifaceted nature of biased sampling,shed light o
44、n its real-worldconsequences,and investigate its relevance in the contemporary landscape of data-drivendisciplines.Through this discourse,we aim to foster a deeper appreciation for the intri-cacies of data collection,the challenges it presents,and the innovative solutions that arisein the face of se
45、lection bias.2Introduction on Biased Sampling and Distribution Shift in MachineLearningBiased sampling is a phenomenon that arises when an investigator collects samplesfrom a population in a way that the resulting sampling distribution differs from the char-acteristics of the target population.This
46、disparity in distribution occurs because,underthe chosen sampling method,not all units within the population have an equal oppor-tunity to be included in the sample.In other words,the natural sampling plan,whileconvenient and often intuitive,inadvertently introduces disparities in the representation
47、of different segments of the population,thus leading to a biased sample.This occur-rence can significantly impact the validity and generalizability of the research findings,underscoring the importance of understanding and mitigating bias in the sampling pro-cess.Biased sampling issues are indeed wid
48、espread,transcending various domains such234Chinese Journal of Applied Probability and StatisticsVol.40as survey sampling,epidemiology studies,econometrics,and recently in machine learningliterature.These challenges are not limited to one specific field but have the potential toaffect the quality an
49、d validity of research in a wide array of disciplines.As pointed out byProfessor James Heckman1,the 2000 Nobel Laureate in Economics,“Sample selectionbias may arise in practice for two reasons.First,there may be self selection by the indi-viduals or data units being investigated.Second,sample select
50、ion decisions by analystsor data processors operate in much the same fashion as self selection”.In the ever-evolving landscape of machine learning,data is the lifeblood that fuelsthe algorithms driving everything from recommendation systems to image recognition andnatural language processing.Data,ho
©2010-2024 宁波自信网络信息技术有限公司 版权所有
客服电话:4008-655-100 投诉/维权电话:4009-655-100