收藏 分销(赏)

数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf

上传人:人****来 文档编号:4540486 上传时间:2024-09-27 格式:PDF 页数:7 大小:531.66KB
下载 相关 举报
数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf_第1页
第1页 / 共7页
数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf_第2页
第2页 / 共7页
数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf_第3页
第3页 / 共7页
数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf_第4页
第4页 / 共7页
数据仓库与数据挖掘技术彭宏学度第一学期期末考试试卷副本-共7页.pdf_第5页
第5页 / 共7页
点击查看更多>>
资源描述

1、1/7华南理工大学计算机科学与工程学院华南理工大学计算机科学与工程学院 20052006 学年度第一学期期末考试学年度第一学期期末考试 数据仓库与数据挖掘技术试数据仓库与数据挖掘技术试 卷卷 专业:双语班专业:双语班 年级:年级:2002 姓名:姓名:学号:学号:注意事项:注意事项:1.本试卷共四大题,满分 100 分,考试时间 120 分钟;2.所有答案请直接答在试卷上;题号题号 一一 二二 三三 四 总分 得分得分 一.Fill in the following blanks.(1 point per blank,the total:20 points)1.A data warehouse

2、 is a _,_,_ and _ collection of data in support of managements decision making process.2.The most popular data model for a data warehouse is a multidimensional model.Such a model can exist in the form of a _ schema,a _ schema,or a _ schema.3.List four OLAP operations _,_,_,and _.4.Measures can be or

3、ganized into the following three categories,based on the kind of aggregate functions used,_,_,and _.5.For interestingness measures of a pattern,there are four objective measures:_,_,_ and novelty.6.List three knowledge types to be mined:_,_,and _.二.Miscellaneous questions.(8 points per question,the

4、total:40 points)1.Suppose that the data for analysis include the attribute age.The age values for the data tuples are:13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,35,36,40,45,46,52,70.(a).Use min-max normalization to transform the value 35 for age onto the range 0.0,1.0.2/7(b).Use

5、z-score normalization to transform the value 35 for age,where the deviation of age is 12.94 years.(c).Use normalization by decimal scaling to transform the value 35 for age.2.Consider Association Rule(1)bellow,which was mined from a university database:major(X,“science”)status(X,“undergraduate”).Sup

6、pose that the number of students at the university(that is,the number of task-relevant data tuples)is 5000,that 56%of undergraduates at the university major in science,that 64%of the students are registered in programs leading to undergraduates degrees,and that 70%of the students are majoring in sci

7、ence.(a).Compute the confidence and support of rule(1).(b).Consider Rule(2)below:major(X,“biology”)status(X,“undergraduate”)17%,80%.Suppose that 30%of science students are majoring in biology.Would you consider the rule(2)to be novel with respect to rule(1)?Explain.3.Given the following table(Table

8、1):Table 1 Locationitem DVD TV Computer Guangzhou 280 180 340 Beijing 260 220 320 Shanghai 360 200 240(1).Map the class Beijing(target class)into a(bi-directional)quantitative descriptive rule.For example,X,Guangzhou(X)TV(X)t:x%,d:y%.(2).Map the class Computer(target class)into a(bi-directional)quan

9、titative descriptive rule.4.A partitioning of variation of Apriori subdivides the transactions of a database D into n nonoverlapping partitions.Prove that any itemset that is frequent in D must be frequent in at least one part of D.3/7 5.Prove that the constraints sum(S)(a S,a 0)is monotone,and sum(

10、S)(a S,a 0)is antimonotone.三.Problems.(The total:30 points)1.Given the following transaction database(table 2),and the minimum support is 60%,minimum confidence is 80%.(1).Find all frequent patterns using Apriori algorithm,and generate strong association rules from L3(i.e.the frequent 3-pattern).Ass

11、ume the support count is 2 and the confidence is 80%.(12 points)(2).Draw the frequent pattern tree.(6 points)Table 2 T1 I1 I2 I6 T2 I1 I3 I5 I6 T3 I1 I2 I6 T4 I1 I3 I4 T5 I1 I2 I4 I6 4/7 2.Table 3 presents a training set of data tuples about whether to play basketball.Given a tuple(Outlook=sunny,tem

12、perature=cool,Humidity=high,Wind=strong),decide that the target class Playbasketball is yes or no using Bayesiannave classifier.(18 points)Table 3 No.Outlook Temperature Humidity Wind Playbasketball 1 Overcast Hot High Weak Yes 2 Sunny Hot High Weak No 3 Sunny Hot High Strong No 4 Overcast Hot Norma

13、l Weak Yes 5 Rain Mild High Weak Yes 6 Sunny Cool Normal Weak Yes 7 Rain Cool Normal Weak Yes 8 Rain Mild Normal Weak Yes 9 Rain Cool Normal Strong No 10 Overcast Cool Normal Strong Yes 11 Sunny Mild High Weak No 12 Overcast Mild High Strong Yes 5/7 3.Table 4 presents distances between any two objec

14、ts,e.g.the distance between objects 1 and 2 is 2.5.Assume the distance between two clusters d(C1,C2)is defined as follows:d(C1,C2)=Maxdij|i C1,j C2,where C1,C2 are two clusters,and dij is the distance between objects i and j,Max is used to compute the minimum value of a set.Clustering the objects us

15、ing the agglomerative hierarchical clustering method and draw the dendrogram(i.e.shows how the clusters are merged hierarchically).(10 points)Table 4 1 2 3 4 5 1 0 2 9 0 3 4 6 0 4 8 5 2 0 5 10 7 3 5 0 华南理工大学计算机科学与工程学院华南理工大学计算机科学与工程学院 20052006 学年度第一学期期末考试学年度第一学期期末考试 数据仓库与数据挖掘技术试数据仓库与数据挖掘技术试 卷卷 答案答案(一

16、)略(一)略(二)(二).1.和和541,平均值,平均值541/20=27.05 标准差的平方标准差的平方(13-27.05)2+(15-27.05)2+(16-27.05)2+(16-27.05)2+(19-27.05)2+(20-27.05)2+(20-27.05)2+(21-27.05)2+(22-27.05)2+(25-27.05)2+(25-27.05)2+(30-27.05)2+(33-27.05)2+(33-27.05)2+(35-27.05)2+(35-27.05)2+(35-27.05)2+(36-27.05)2+(40-27.05)2+(52-27.05)2=1960.95

17、 标准差为(1960.95/(20-1)1/2=10.16(a)=0+(30-13)/(52-13)*(1.0-0)=0.44 6/7(b)=(30-27.05)/10.16=0.29(c)30/100=0.3 (二).2(a).55%10000/70%1000055/70=78.58%(Confidence)55%10000/10000=55%(Support)(b)因为 55%33%=18.15%,所以 R2 没有什么意义 (二).3(a).X,Guangzhou(X)DVD(X)t=280/1000,d=280/900 TV(X)t=380/1000,d=380/1100 Compute

18、r(X)340/1000,d=340/1200.(b).X,Computer(X)Guangzhou(X)t=340/1200,d=340/1000 Beijing(X)t=320/1200,d=320/800 Shanghai(X)540/1200,d=540/1400.(三).1 I1:5 I2:3 I3:2 I4:2 I5:1 I6:4 有:2模式 I1I2:3 I1I6:4 I2I6:3 有 3 模式:I1I2I6:3(四).P(Outlook=sunny|yes)=1/7 P(Outlook=sunny|no)=3/5 P(temperature=cool|yes)=3/7 P(te

19、mperature=cool|no)=1/5 P(Humidity=high|yes)=2/7 P(Humidity=high|No)=4/5 P(wind=strong|yes)=2/7 P(Humidity=strong|No)=3/5 P(yes)=7/12 P(no)=5/12 P(X|YES)=1/7 3/7 2/7 2/7 7/12=0.00292 P(X|NO)=3/5 1/5 4/5 3/5 5/12=0.024 (五)NullI1:5I6:4I2:37/7第一步:3 和 4 合并得到3,4 3,4和1的距离min4,8=4 3,4和2的距离min5,9=5 3,4和5的距离min5,6=5 第二步:1和2合并得到1,2 1,2和3,4的距离min4,5=4 1,2和5的距离min10,7=7 第三步步:1,2和3,4合并得到1,2,3,4 1 2 3 4 5 1 0 2 3 0 3 4 9 0 4 8 5 2 0 5 10 7 6 5 0 1 2 3,4 5 1 0 2 3 0 3,4 4 5 0 5 10 7 5 0 1,2 3,4 5 1,2 0 3,4 4 0 5 7 5 0

展开阅读全文
部分上传会员的收益排行 01、路***(¥15400+),02、曲****(¥15300+),
03、wei****016(¥13200+),04、大***流(¥12600+),
05、Fis****915(¥4200+),06、h****i(¥4100+),
07、Q**(¥3400+),08、自******点(¥2400+),
09、h*****x(¥1400+),10、c****e(¥1100+),
11、be*****ha(¥800+),12、13********8(¥800+)。
相似文档                                   自信AI助手自信AI助手
搜索标签

当前位置:首页 > 教育专区 > 其他

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2024 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服