收藏 分销(赏)

如何教好数据挖掘课程省名师优质课赛课获奖课件市赛课百校联赛优质课一等奖课件.pptx

上传人:w****g 文档编号:9947087 上传时间:2025-04-14 格式:PPTX 页数:43 大小:3.61MB
下载 相关 举报
如何教好数据挖掘课程省名师优质课赛课获奖课件市赛课百校联赛优质课一等奖课件.pptx_第1页
第1页 / 共43页
如何教好数据挖掘课程省名师优质课赛课获奖课件市赛课百校联赛优质课一等奖课件.pptx_第2页
第2页 / 共43页
点击查看更多>>
资源描述
<p>Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,2022年10月19日星期三,Data Mining:Concepts and Techniques,#,1,1,Data MiningA Fast Expanding Frontier:How to Teach a Data Mining Class?,In celebration of the Publication of the Chinese Translation of the 3,rd,Edition of Data Mining:Concepts and Techniques,by,Jiawei Han,Micheline Kamber,and Jian Pei,Morgan Kaufman 2023,第1页,2,Why Is Data Mining a New Science?,The explosive growth of data:from terabytes to petabytes,Data collection and data availability,Automated data collection tools,database systems,Web,computerized society,Major sources of abundant data,Business:Web,e-commerce,transactions,stocks,Science:Remote sensing,bioinformatics,scientific simulation,Society and everyone:news,digital cameras,YouTube,We are drowning in data,but starving for knowledge!,“Necessity is the mother of invention”,:,Data miningAutomated analysis of massive data sets,第2页,3,Evolution of Sciences:New Data Science Era,Before 1600:,Empirical science,1600-1950s:,Theoretical science,Each discipline has grown a,theoretical,component.Theoretical models often motivate experiments and generalize our understanding.,1950s-1990s:,Computational science,Over the last 50 years,most disciplines have grown a third,computational,branch(e.g.empirical,theoretical,and computational ecology,or physics,or linguistics.),Computational Science traditionally meant simulation.It grew out of our inability to find closed-form solutions for complex mathematical models.,1990-now:,Data science,The flood of data from new scientific instruments and simulations,The ability to economically store and manage petabytes of data online,The Internet and computing Grid that makes all these archives universally accessible,Scientific info.management,acquisition,organization,query,and visualization tasks scale almost linearly with data volumes,Data mining,is a major new challenge!,Jim Gray and Alex Szalay,The World Wide Telescope:An Archetype for Online Science,Comm.ACM,45(11):50-54,Nov.2023,第3页,4,A Brief History of Data Mining Society,1989 IJCAI Workshop on Knowledge Discovery in Databases,Knowledge Discovery in Databases(G.Piatetsky-Shapiro and W.Frawley,1991),1991-1994 Workshops on Knowledge Discovery in Databases,Advances in Knowledge Discovery and Data Mining(U.Fayyad,G.Piatetsky-Shapiro,P.Smyth,and R.Uthurusamy,1996),1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining(KDD95-98),Journal of Data Mining and Knowledge Discovery(1997),ACM SIGKDD conferences since 1998 and SIGKDD Explorations,More conferences on data mining,PAKDD(1997),PKDD(1997),SIAM-Data Mining(2023),(IEEE)ICDM(2023),WSDM(2023),etc.,ACM Transactions on KDD(2023),第4页,5,A Brief History of This Data Mining Book,My first paper on data mining:1989 IJCAI Workshop on Knowledge Discovery in Databases,PC co-chairman,1996(2nd)Int.Conf.on Knowledge Discovery and Data Mining(KDD96),Discussion of the book:1996 ACM-SIGMOD Conference Tutorial:“Data Mining Techniques”,Montreal,Canada,June 1996,1,st,edition of the book:Jiawei Han and Micheline Kamber,Data Mining:Concepts and Techniques,(Foreword by Jim Gray),Morgan Kaufmann,2023,2,nd,edition:Jiawei Han and Micheline Kamber,Data Mining:Concepts and Techniques,(Foreword by Jim Gray),2,nd,ed.,Morgan Kaufmann,2023,3,rd,edition:Jiawei Han,Micheline Kamber,and Jian Pei:Data Mining:Concepts and Techniques,(Foreword by Christos Faloutsos),3,rd,ed.,Morgan Kaufmann,2023,第5页,6,6,Data and Information Systems(DAIS:)Course Structures at CS/UIUC,Coverage:Database,data mining,text information systems,Web and bioinformatics,Data mining,Intro.to data warehousing and mining,(CS412:Han,Fall,),Data mining:Principles and algorithms,(CS512:Han,Spring,),Seminar:Advanced Topics in Data mining(CS591Han,Fall and Spring.1 credit unit),Independent Study:only if you seriously plan to do your Ph.D./M.S.on data mining and try to demonstrate your ability,Database Systems:,Introd.to database systems(CS411:Kevin Chang+Others:,Fall and Spring,),Advanced database systems(CS511:Kevin Chang,Fall11,),Text information systems,Text information system(CS410 ChengXiang Zhai:,Spring,),Bioinformatics,Introduction to BioInformatics(Saurabh Sinha),CS591 Seminar on Bioinformatics(Sinha:1 credit unit),Yahoo!-DAIS seminar(CS591DAIS,Fall and Spring.1 credit unit),第6页,7,7,CS412 Coverage(Chapters 1-7 of the TextBook),CS412 Coverage(BK2:2,nd,Ed.),Introduction,Data Preprocessing,Data Warehouse and OLAP Technology:An Introduction,Advanced Data Cube Technology and Data Generalization,Mining Frequent Patterns,Association and Correlations,Classification and Prediction,Cluster Analysis,CS412 Coverage(BK3:3,rd,ed.),Introduction,Getting to Know Your Data,Data Preprocessing,Data Warehouse and OLAP Technology:An Introduction,Advanced Data Cube Technology,Mining Frequent Patterns&amp;Association:Basic Concepts,Mining Frequent Patterns&amp;Association:Advanced Methods,Classification:Basic Concepts,Classification:Advanced Methods,Cluster Analysis:Basic Concepts,The textbook book will be covered in two courses at CS,UIUC,CS412:,Introduction to Data Mining,(,Fall,)Chapters 1-10,CS512:,Data Mining:Principles and Algorithms,(,Spring,)Chaps.11-13,第7页,8,8,CS512 Coverage(Chapters 11,12,13+More Advanced Topics),Cluster Analysis:Advanced Methods(Chapter 11),Outlier Analysis(Chapter 12),Mining data streams,time-series,and sequence data,Mining graph data,Mining social and information networks,Mining object,spatial,multimedia,text and Web data,Mining complex data objects,Spatial and spatiotemporal data mining,Multimedia data mining,Text and,Web mining,Additional(often current)themes if time permits,第8页,9,9,第9页,10,10,第10页,第11页,第12页,13,13,CS 412.Course Page&amp;Class Schedule,Class Homepage:,https:/wiki.engr.illinois.edu/display/cs412,Wiki course outline,Course Information,Course Schedule,Lecture media,Assignments,Resources and Reading Lists,Staff,Project Only for students taking 4 credits for the course,Comments and Suggestions,Textbook,Slides,Class Presentation,and Teaching,Class-Related Questions and Answers,第13页,14,14,CS 412:Course Project 4,th,credit,Survey:,Writing a comprehensive survey on a focused topic,e.g.,clustering heterogeneous information networks,Quiz maker:,Making excellent quiz questions and answers for selected chapters of the course(Chapters 1-10),Software maker:,Implementing one high-performance,fully documented open source data mining function for those taught in the book,in Java/C+,including user-interfaces and visualization packageNote:No plagiarism!,Answer book maker:,Enhancing and making answers for exercise questions in the bookAssigned to three students already,第14页,15,Chapter 1.Introduction,Why Data Mining?,What Is Data Mining?,A Multi-Dimensional View of Data Mining,What Kind of Data Can Be Mined?,What Kinds of Patterns Can Be Mined?,What Technology Are Used?,What Kind of Applications Are Targeted?,Major Issues in Data Mining,A Brief History of Data Mining and Data Mining Society,Summary,第15页,16,Data Mining:Confluence of Multiple Disciplines,Data Mining,Machine,Learning,Statistics,Applications,Algorithm,Pattern,Recognition,High-Performance,Computing,Visualization,Database,Technology,第16页,17,Conferences and Journals on Data Mining,KDD Conferences,ACM SIGKDD Int.Conf.on Knowledge Discovery in Databases and Data Mining(,KDD,),SIAM Data Mining Conf.(,SDM,),(IEEE)Int.Conf.on Data Mining(,ICDM,),European Conf.on Machine Learning and Principles and practices of Knowledge Discovery and Data Mining(,ECML,-,PKDD,),Pacific-Asia Conf.on Knowledge Discovery and Data Mining(,PAKDD,),Int.Conf.on Web Search and Data Mining(,WSDM,),Other related conferences,DB conferences:ACM SIGMOD,VLDB,ICDE,EDBT,ICDT,Web and IR conferences:WWW,SIGIR,WSDM,ML conferences:ICML,NIPS,PR conferences:CVPR,Journals,Data Mining and Knowledge Discovery(DAMI or DMKD),IEEE Trans.On Knowledge and Data Eng.(TKDE),KDD Explorations,ACM Trans.on KDD,第17页,18,Where to Find References?DBLP,CiteSeer,Google,Data mining and KDD(SIGKDD:CDROM),Conferences:ACM-SIGKDD,IEEE-ICDM,SIAM-DM,PKDD,PAKDD,etc.,Journal:Data Mining and Knowledge Discovery,KDD Explorations,ACM TKDD,Database systems(SIGMOD:ACM SIGMOD Anthology,CD ROM),Conferences:ACM-SIGMOD,ACM-PODS,VLDB,IEEE-ICDE,EDBT,ICDT,DASFAA,Journals:IEEE-TKDE,ACM-TODS/TOIS,JIIS,J.ACM,VLDB J.,Info.Sys.,etc.,AI&amp;Machine Learning,Conferences:Machine learning(ML),AAAI,IJCAI,COLT(Learning Theory),CVPR,NIPS,etc.,Journals:Machine Learning,Artificial Intelligence,Knowledge and Information Systems,IEEE-PAMI,etc.,Web and IR,Conferences:SIGIR,WWW,CIKM,etc.,Journals:WWW:Internet and Web Information Systems,Statistics,Conferences:Joint Stat.Meeting,etc.,Journals:Annals of statistics,etc.,Visualization,Conference proceedings:CHI,ACM-SIGGraph,etc.,Journals:IEEE Trans.visualization and computer graphics,etc.,第18页,19,Recommended Reference Books,E.Alpaydin.Introduction to Machine Learning,2nd ed.,MIT Press,2023,S.Chakrabarti.Mining the Web:Statistical Analysis of Hypertex and Semi-Structured Data.Morgan Kaufmann,2023,R.O.Duda,P.E.Hart,and D.G.Stork,Pattern Classification,2ed.,Wiley-Interscience,2023,T.Dasu and T.Johnson.Exploratory Data Mining and Data Cleaning.John Wiley&amp;Sons,2023,U.M.Fayyad,G.Piatetsky-Shapiro,P.Smyth,and R.Uthurusamy.Advances in Knowledge Discovery and Data Mining.AAAI/MIT Press,1996,U.Fayyad,G.Grinstein,and A.Wierse,Information Visualization in Data Mining and Knowledge Discovery,Morgan Kaufmann,2023,J.Han and M.Kamber.Data Mining:Concepts and Techniques.Morgan Kaufmann,2,nd,ed.,2023(3ed.2023),T.Hastie,R.Tibshirani,and J.Friedman,The Elements of Statistical Learning:Data Mining,Inference,and Prediction,2,nd,ed.,Springer-Verlag,2023,B.Liu,Web Data Mining,Springer 2023.,T.M.Mitchell,Machine Learning,McGraw Hill,1997,P.-N.Tan,M.Steinbach and V.Kumar,Introduction to Data Mining,Wiley,2023,S.M.Weiss and N.Indurkhya,Predictive Data Mining,Morgan Kaufmann,1998,I.H.Witten and E.Frank,Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations,Morgan Kaufmann,2,nd,ed.2023,第19页,20,Chapter 2:Getting to Know Your Data,Data Objects and Attribute Types,Basic Statistical Descriptions of Data,Data Visualization,Measuring Data Similarity and Dissimilarity,Summary,第20页,21,21,Chapter 3:Data Preprocessing,Data Preprocessing:An Overview,Data Quality,Major Tasks in Data Preprocessing,Data Cleaning,Data Integration,Data Reduction,Data Transformation and Data Discretization,Summary,第21页,22,Chapter 4:Data Warehousing and On-line Analytical Processing,Data Warehouse:Basic Concepts,Data Warehouse Modeling:Data Cube and OLAP,Data Warehouse Design and Usage,Data Warehouse Implementation,Data Generalization by Attribute-Oriented Induction,Summary,第22页,23,23,Chapter 5:Data Cube Technology,Data Cube Computation:Preliminary Concepts,Data Cube Computation Methods,Processing Advanced Queries by Exploring Data Cube Technology,Multidimensional Data Analysis in Cube Space,Summary,第23页,24,Chapter 6:Mining Frequent Patterns,Association and Correlations:Basic Concepts and Methods,Basic Concepts,Frequent Itemset Mining Methods,Which Patterns Are Interesting?Pattern Evaluation Methods,Summary,第24页,25,Chapter 7:Advanced Frequent Pattern Mining,Pattern Mining:A Road Map,Pattern Mining in Multi-Level,Multi-Dimensional Space,Constraint-Based Frequent Pattern Mining,Mining High-Dimensional Data and Colossal Patterns,Mining Compressed or Approximate Patterns,Pattern Exploration and Application,Summary,第25页,26,Chapter 8.Classification:Basic Concepts,Classification:Basic Concepts,Decision Tree Induction,Bayes Classification Methods,Rule-Based Classification,Model Evaluation and Selection,Techniques to Improve Classification Accuracy:Ensemble Methods,Summary,第26页,27,Chapter 9.Classification:Advanced Methods,Bayesian Belief Networks,Classification by Backpropagation,Support Vector Machines,Classification by Using Frequent Patterns,Lazy Learners(or Learning from Your Neighbors),Other Classification Methods,Additional Topics Regarding Classification,Summary,第27页,28,Chapter 10.,Cluster Analysis:Basic Concepts and Methods,Cluster Analysis:Basic Concepts,Partitioning Methods,Hierarchical Methods,Density-Based Methods,Grid-Based Methods,Evaluation of Clustering,Summary,28,第28页,29,29,How to Teach a Data Mining Undergraduate Class Using This Book?,Select only part of the materials in the book to teach,For a machine learning flavored class,Omit in-depth materials on data warehouse+data cube technology,Treat light on association and correlation mining,For a database flavored class,Omit advanced clustering,outlier analysis,etc.,For both classes,Leave advanced clustering,mining complex data typed in the second class on data mining,Select materials based on the preparation and background of students,Regular assignment and exams will be important to digest materials,Programming assignments will help,Motivate students based on your data and application needs,第29页,30,Chapter 11.,Cluster Analysis:Advanced Methods,Probability Model-Based Clustering,Clustering High-Dimensional Data,Clustering Graphs and Network Data,Clustering with Constraints,Summary,30,第30页,31,Chapter 12.,Outlier Analysis,Outlier and Outlier Analysis,Outlier Detection Methods,Statistical Approaches,Proximity-Base Approaches,Clustering-Base Approaches,Classification Approaches,Mining Contextual and Collective Outliers,Outlier Detection in High Dimensional Data,Summary,第31页,32,32,Topic Coverage of CS512,Textbook:Han,Kamber,Pei.Data Mining:Concepts and Techniques.Morgan Kaufmann,3,rd,ed.2023,Chaps.1-10:covered in CS412,Chaps.11-12:CS512(Chap.13:self reading),Chap.11:Advanced Clustering Methods,Chap.12:Outlier Analysis,Additional themes to be covered in 2023 Spring,Introduction to network analysis(ref:Newman,2023 textbook),Mining information networks(ref:research papers+slides),Mining data streams(ref.2,nd,ed.Textbook(BK2):Chap.8),Mining sequence and time-series patterns(ref.BK2:Chap.8),Graph mining:patterns&amp;classifications(ref.BK2:Chap.9),Spatiotemporal and moving object data mining(ref:BK2:Chap.10),Not covered,:Text/Web mining,etc.(ref:BK2:Chap.10,Prof.Zhais classes),第32页,33,Course Work:Assignments,Exams and Course Project,Assignments:,10%(2 assignments),Two Midterm exams,:40%in total(20%each),Survey and research project proposals,:(0%)A 1-2 page proposal on survey+research project will be due at the end of 5,th,week,Survey report,:20%,Encourage to have similar topic as your research topic,Hand-in together with a set of companion presentation slides Hand in Monday,11,th,week:right after the Spring break!,Selected surveys will be presented at the 12,th,week of class,Final course project:,30%(due at the end of semester),The final project will be evaluated based on(1)technical innovation,(2)thoroughness of the work,and(3)clarity of presentation,The final project will need to hand in:(1)project report(length will be similar toa typical 8-12 page double-column conference paper),and(2)project pr</p>
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传
相似文档                                   自信AI助手自信AI助手

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服