1、
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,2022年10月19日星期三,Data Mining:Concepts and Techniques,#,1,1,Data MiningA Fast Expanding Frontier:How to Teach a Data Mining Class?,In celebration of the Publication of the Chinese Tran
2、slation of the 3,rd,Edition of Data Mining:Concepts and Techniques,by,Jiawei Han,Micheline Kamber,and Jian Pei,Morgan Kaufman 2023,第1页,2,Why Is Data Mining a New Science?,The explosive growth of data:from terabytes to petabytes,Data collection and data availability,Automated data collection tools,da
3、tabase systems,Web,computerized society,Major sources of abundant data,Business:Web,e-commerce,transactions,stocks,Science:Remote sensing,bioinformatics,scientific simulation,Society and everyone:news,digital cameras,YouTube,We are drowning in data,but starving for knowledge!,“Necessity is the mothe
4、r of invention”,:,Data miningAutomated analysis of massive data sets,第2页,3,Evolution of Sciences:New Data Science Era,Before 1600:,Empirical science,1600-1950s:,Theoretical science,Each discipline has grown a,theoretical,component.Theoretical models often motivate experiments and generalize our unde
5、rstanding.,1950s-1990s:,Computational science,Over the last 50 years,most disciplines have grown a third,computational,branch(e.g.empirical,theoretical,and computational ecology,or physics,or linguistics.),Computational Science traditionally meant simulation.It grew out of our inability to find clos
6、ed-form solutions for complex mathematical models.,1990-now:,Data science,The flood of data from new scientific instruments and simulations,The ability to economically store and manage petabytes of data online,The Internet and computing Grid that makes all these archives universally accessible,Scien
7、tific info.management,acquisition,organization,query,and visualization tasks scale almost linearly with data volumes,Data mining,is a major new challenge!,Jim Gray and Alex Szalay,The World Wide Telescope:An Archetype for Online Science,Comm.ACM,45(11):50-54,Nov.2023,第3页,4,A Brief History of Data Mi
8、ning Society,1989 IJCAI Workshop on Knowledge Discovery in Databases,Knowledge Discovery in Databases(G.Piatetsky-Shapiro and W.Frawley,1991),1991-1994 Workshops on Knowledge Discovery in Databases,Advances in Knowledge Discovery and Data Mining(U.Fayyad,G.Piatetsky-Shapiro,P.Smyth,and R.Uthurusamy,
9、1996),1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining(KDD95-98),Journal of Data Mining and Knowledge Discovery(1997),ACM SIGKDD conferences since 1998 and SIGKDD Explorations,More conferences on data mining,PAKDD(1997),PKDD(1997),SIAM-Data Mining(2023),(IEEE)I
10、CDM(2023),WSDM(2023),etc.,ACM Transactions on KDD(2023),第4页,5,A Brief History of This Data Mining Book,My first paper on data mining:1989 IJCAI Workshop on Knowledge Discovery in Databases,PC co-chairman,1996(2nd)Int.Conf.on Knowledge Discovery and Data Mining(KDD96),Discussion of the book:1996 ACM-
11、SIGMOD Conference Tutorial:“Data Mining Techniques”,Montreal,Canada,June 1996,1,st,edition of the book:Jiawei Han and Micheline Kamber,Data Mining:Concepts and Techniques,(Foreword by Jim Gray),Morgan Kaufmann,2023,2,nd,edition:Jiawei Han and Micheline Kamber,Data Mining:Concepts and Techniques,(For
12、eword by Jim Gray),2,nd,ed.,Morgan Kaufmann,2023,3,rd,edition:Jiawei Han,Micheline Kamber,and Jian Pei:Data Mining:Concepts and Techniques,(Foreword by Christos Faloutsos),3,rd,ed.,Morgan Kaufmann,2023,第5页,6,6,Data and Information Systems(DAIS:)Course Structures at CS/UIUC,Coverage:Database,data min
13、ing,text information systems,Web and bioinformatics,Data mining,Intro.to data warehousing and mining,(CS412:Han,Fall,),Data mining:Principles and algorithms,(CS512:Han,Spring,),Seminar:Advanced Topics in Data mining(CS591Han,Fall and Spring.1 credit unit),Independent Study:only if you seriously plan
14、 to do your Ph.D./M.S.on data mining and try to demonstrate your ability,Database Systems:,Introd.to database systems(CS411:Kevin Chang+Others:,Fall and Spring,),Advanced database systems(CS511:Kevin Chang,Fall11,),Text information systems,Text information system(CS410 ChengXiang Zhai:,Spring,),Bioi
15、nformatics,Introduction to BioInformatics(Saurabh Sinha),CS591 Seminar on Bioinformatics(Sinha:1 credit unit),Yahoo!-DAIS seminar(CS591DAIS,Fall and Spring.1 credit unit),第6页,7,7,CS412 Coverage(Chapters 1-7 of the TextBook),CS412 Coverage(BK2:2,nd,Ed.),Introduction,Data Preprocessing,Data Warehouse
16、and OLAP Technology:An Introduction,Advanced Data Cube Technology and Data Generalization,Mining Frequent Patterns,Association and Correlations,Classification and Prediction,Cluster Analysis,CS412 Coverage(BK3:3,rd,ed.),Introduction,Getting to Know Your Data,Data Preprocessing,Data Warehouse and OLA
17、P Technology:An Introduction,Advanced Data Cube Technology,Mining Frequent Patterns&Association:Basic Concepts,Mining Frequent Patterns&Association:Advanced Methods,Classification:Basic Concepts,Classification:Advanced Methods,Cluster Analysis:Basic Concepts,The textbook book will be covered
18、 in two courses at CS,UIUC,CS412:,Introduction to Data Mining,(,Fall,)Chapters 1-10,CS512:,Data Mining:Principles and Algorithms,(,Spring,)Chaps.11-13,第7页,8,8,CS512 Coverage(Chapters 11,12,13+More Advanced Topics),Cluster Analysis:Advanced Methods(Chapter 11),Outlier Analysis(Chapter 12),Mining data
19、 streams,time-series,and sequence data,Mining graph data,Mining social and information networks,Mining object,spatial,multimedia,text and Web data,Mining complex data objects,Spatial and spatiotemporal data mining,Multimedia data mining,Text and,Web mining,Additional(often current)themes if time per
20、mits,第8页,9,9,第9页,10,10,第10页,第11页,第12页,13,13,CS 412.Course Page&Class Schedule,Class Homepage:,https:/wiki.engr.illinois.edu/display/cs412,Wiki course outline,Course Information,Course Schedule,Lecture media,Assignments,Resources and Reading Lists,Staff,Project Only for students taking 4 credits
21、for the course,Comments and Suggestions,Textbook,Slides,Class Presentation,and Teaching,Class-Related Questions and Answers,第13页,14,14,CS 412:Course Project 4,th,credit,Survey:,Writing a comprehensive survey on a focused topic,e.g.,clustering heterogeneous information networks,Quiz maker:,Making exc
22、ellent quiz questions and answers for selected chapters of the course(Chapters 1-10),Software maker:,Implementing one high-performance,fully documented open source data mining function for those taught in the book,in Java/C+,including user-interfaces and visualization packageNote:No plagiarism!,Answ
23、er book maker:,Enhancing and making answers for exercise questions in the bookAssigned to three students already,第14页,15,Chapter 1.Introduction,Why Data Mining?,What Is Data Mining?,A Multi-Dimensional View of Data Mining,What Kind of Data Can Be Mined?,What Kinds of Patterns Can Be Mined?,What Tech
24、nology Are Used?,What Kind of Applications Are Targeted?,Major Issues in Data Mining,A Brief History of Data Mining and Data Mining Society,Summary,第15页,16,Data Mining:Confluence of Multiple Disciplines,Data Mining,Machine,Learning,Statistics,Applications,Algorithm,Pattern,Recognition,High-Performan
25、ce,Computing,Visualization,Database,Technology,第16页,17,Conferences and Journals on Data Mining,KDD Conferences,ACM SIGKDD Int.Conf.on Knowledge Discovery in Databases and Data Mining(,KDD,),SIAM Data Mining Conf.(,SDM,),(IEEE)Int.Conf.on Data Mining(,ICDM,),European Conf.on Machine Learning and Prin
26、ciples and practices of Knowledge Discovery and Data Mining(,ECML,-,PKDD,),Pacific-Asia Conf.on Knowledge Discovery and Data Mining(,PAKDD,),Int.Conf.on Web Search and Data Mining(,WSDM,),Other related conferences,DB conferences:ACM SIGMOD,VLDB,ICDE,EDBT,ICDT,Web and IR conferences:WWW,SIGIR,WSDM,ML
27、 conferences:ICML,NIPS,PR conferences:CVPR,Journals,Data Mining and Knowledge Discovery(DAMI or DMKD),IEEE Trans.On Knowledge and Data Eng.(TKDE),KDD Explorations,ACM Trans.on KDD,第17页,18,Where to Find References?DBLP,CiteSeer,Google,Data mining and KDD(SIGKDD:CDROM),Conferences:ACM-SIGKDD,IEEE-ICDM
28、SIAM-DM,PKDD,PAKDD,etc.,Journal:Data Mining and Knowledge Discovery,KDD Explorations,ACM TKDD,Database systems(SIGMOD:ACM SIGMOD Anthology,CD ROM),Conferences:ACM-SIGMOD,ACM-PODS,VLDB,IEEE-ICDE,EDBT,ICDT,DASFAA,Journals:IEEE-TKDE,ACM-TODS/TOIS,JIIS,J.ACM,VLDB J.,Info.Sys.,etc.,AI&Machine Learni
29、ng,Conferences:Machine learning(ML),AAAI,IJCAI,COLT(Learning Theory),CVPR,NIPS,etc.,Journals:Machine Learning,Artificial Intelligence,Knowledge and Information Systems,IEEE-PAMI,etc.,Web and IR,Conferences:SIGIR,WWW,CIKM,etc.,Journals:WWW:Internet and Web Information Systems,Statistics,Conferences:J
30、oint Stat.Meeting,etc.,Journals:Annals of statistics,etc.,Visualization,Conference proceedings:CHI,ACM-SIGGraph,etc.,Journals:IEEE Trans.visualization and computer graphics,etc.,第18页,19,Recommended Reference Books,E.Alpaydin.Introduction to Machine Learning,2nd ed.,MIT Press,2023,S.Chakrabarti.Minin
31、g the Web:Statistical Analysis of Hypertex and Semi-Structured Data.Morgan Kaufmann,2023,R.O.Duda,P.E.Hart,and D.G.Stork,Pattern Classification,2ed.,Wiley-Interscience,2023,T.Dasu and T.Johnson.Exploratory Data Mining and Data Cleaning.John Wiley&Sons,2023,U.M.Fayyad,G.Piatetsky-Shapiro,P.Smyth,
32、and R.Uthurusamy.Advances in Knowledge Discovery and Data Mining.AAAI/MIT Press,1996,U.Fayyad,G.Grinstein,and A.Wierse,Information Visualization in Data Mining and Knowledge Discovery,Morgan Kaufmann,2023,J.Han and M.Kamber.Data Mining:Concepts and Techniques.Morgan Kaufmann,2,nd,ed.,2023(3ed.2023),
33、T.Hastie,R.Tibshirani,and J.Friedman,The Elements of Statistical Learning:Data Mining,Inference,and Prediction,2,nd,ed.,Springer-Verlag,2023,B.Liu,Web Data Mining,Springer 2023.,T.M.Mitchell,Machine Learning,McGraw Hill,1997,P.-N.Tan,M.Steinbach and V.Kumar,Introduction to Data Mining,Wiley,2023,S.M
34、Weiss and N.Indurkhya,Predictive Data Mining,Morgan Kaufmann,1998,I.H.Witten and E.Frank,Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations,Morgan Kaufmann,2,nd,ed.2023,第19页,20,Chapter 2:Getting to Know Your Data,Data Objects and Attribute Types,Basic Statistical
35、Descriptions of Data,Data Visualization,Measuring Data Similarity and Dissimilarity,Summary,第20页,21,21,Chapter 3:Data Preprocessing,Data Preprocessing:An Overview,Data Quality,Major Tasks in Data Preprocessing,Data Cleaning,Data Integration,Data Reduction,Data Transformation and Data Discretization,
36、Summary,第21页,22,Chapter 4:Data Warehousing and On-line Analytical Processing,Data Warehouse:Basic Concepts,Data Warehouse Modeling:Data Cube and OLAP,Data Warehouse Design and Usage,Data Warehouse Implementation,Data Generalization by Attribute-Oriented Induction,Summary,第22页,23,23,Chapter 5:Data Cu
37、be Technology,Data Cube Computation:Preliminary Concepts,Data Cube Computation Methods,Processing Advanced Queries by Exploring Data Cube Technology,Multidimensional Data Analysis in Cube Space,Summary,第23页,24,Chapter 6:Mining Frequent Patterns,Association and Correlations:Basic Concepts and Methods
38、Basic Concepts,Frequent Itemset Mining Methods,Which Patterns Are Interesting?Pattern Evaluation Methods,Summary,第24页,25,Chapter 7:Advanced Frequent Pattern Mining,Pattern Mining:A Road Map,Pattern Mining in Multi-Level,Multi-Dimensional Space,Constraint-Based Frequent Pattern Mining,Mining High-Di
39、mensional Data and Colossal Patterns,Mining Compressed or Approximate Patterns,Pattern Exploration and Application,Summary,第25页,26,Chapter 8.Classification:Basic Concepts,Classification:Basic Concepts,Decision Tree Induction,Bayes Classification Methods,Rule-Based Classification,Model Evaluation and
40、 Selection,Techniques to Improve Classification Accuracy:Ensemble Methods,Summary,第26页,27,Chapter 9.Classification:Advanced Methods,Bayesian Belief Networks,Classification by Backpropagation,Support Vector Machines,Classification by Using Frequent Patterns,Lazy Learners(or Learning from Your Neighbo
41、rs),Other Classification Methods,Additional Topics Regarding Classification,Summary,第27页,28,Chapter 10.,Cluster Analysis:Basic Concepts and Methods,Cluster Analysis:Basic Concepts,Partitioning Methods,Hierarchical Methods,Density-Based Methods,Grid-Based Methods,Evaluation of Clustering,Summary,28,第
42、28页,29,29,How to Teach a Data Mining Undergraduate Class Using This Book?,Select only part of the materials in the book to teach,For a machine learning flavored class,Omit in-depth materials on data warehouse+data cube technology,Treat light on association and correlation mining,For a database flavo
43、red class,Omit advanced clustering,outlier analysis,etc.,For both classes,Leave advanced clustering,mining complex data typed in the second class on data mining,Select materials based on the preparation and background of students,Regular assignment and exams will be important to digest materials,Pro
44、gramming assignments will help,Motivate students based on your data and application needs,第29页,30,Chapter 11.,Cluster Analysis:Advanced Methods,Probability Model-Based Clustering,Clustering High-Dimensional Data,Clustering Graphs and Network Data,Clustering with Constraints,Summary,30,第30页,31,Chapte
45、r 12.,Outlier Analysis,Outlier and Outlier Analysis,Outlier Detection Methods,Statistical Approaches,Proximity-Base Approaches,Clustering-Base Approaches,Classification Approaches,Mining Contextual and Collective Outliers,Outlier Detection in High Dimensional Data,Summary,第31页,32,32,Topic Coverage o
46、f CS512,Textbook:Han,Kamber,Pei.Data Mining:Concepts and Techniques.Morgan Kaufmann,3,rd,ed.2023,Chaps.1-10:covered in CS412,Chaps.11-12:CS512(Chap.13:self reading),Chap.11:Advanced Clustering Methods,Chap.12:Outlier Analysis,Additional themes to be covered in 2023 Spring,Introduction to network ana
47、lysis(ref:Newman,2023 textbook),Mining information networks(ref:research papers+slides),Mining data streams(ref.2,nd,ed.Textbook(BK2):Chap.8),Mining sequence and time-series patterns(ref.BK2:Chap.8),Graph mining:patterns&classifications(ref.BK2:Chap.9),Spatiotemporal and moving object data minin
48、g(ref:BK2:Chap.10),Not covered,:Text/Web mining,etc.(ref:BK2:Chap.10,Prof.Zhais classes),第32页,33,Course Work:Assignments,Exams and Course Project,Assignments:,10%(2 assignments),Two Midterm exams,:40%in total(20%each),Survey and research project proposals,:(0%)A 1-2 page proposal on survey+research
49、project will be due at the end of 5,th,week,Survey report,:20%,Encourage to have similar topic as your research topic,Hand-in together with a set of companion presentation slides Hand in Monday,11,th,week:right after the Spring break!,Selected surveys will be presented at the 12,th,week of class,Fin
50、al course project:,30%(due at the end of semester),The final project will be evaluated based on(1)technical innovation,(2)thoroughness of the work,and(3)clarity of presentation,The final project will need to hand in:(1)project report(length will be similar toa typical 8-12 page double-column conference paper),and(2)project pr
©2010-2025 宁波自信网络信息技术有限公司 版权所有
客服电话:4009-655-100 投诉/维权电话:18658249818