1、薛福名称 专业班级 驳名%号敖师太原老浣窗惚与自劭化系薛程微新想告智能数据挖掘智能科学与技术19-1王炳乾62021-2022 (1)白文江2021年72月宓日pred = predict_winner(teaml, team2, model) prob = pred00if prob 0. 5:winner = teamlloser = team2d(winner, loser, prob) else:winner = team2loser = teamld(winner, loser, 1 - prob)with open(f 1, w) as f:writer = r(f)row(win
2、,lose1, probability) rows(result)print ( done.)#我们实验Pandas预览生成预测结果文件文件:try:pd. read_csv(f f, header=0)except:pass#查看生成csv文件 import csvwithopen (! f) as f:csv 二headersfor rowr (f)=next(f_csv) in f csv:print (row)五、结果及分析L结果:Building data set.X 1700, 27. 2, 32. 0, 50. 0, 33. 0, 49.0, -2. 73, 0. 0, -2.
3、74,104. 6, 107. 6, 93.4,0. 255,0. 256, 0. 527,0. 483,12. 6,23. 7, 0. 205,0. 48700000000000004, 10. 5, 75. 8, 0. 204, 812292. 0,38. 0,85. 8, 0.44299999999999995, 7. 6,22. 4,0, 341, 30. 3, 63.4, 0,479,17. 5, 23. 2, 0,754, 10.9, 33.3, 44.2, 20.8, 7.2, 4.2, 11.3, 18.5,101.1, 36.9,84.0, 0.439, 7.4, 21.5,
4、0.34600000000000003,29.4,62.5,17. 2, 21. 4, 0. 805, 10. 4, 34. 0, 44. 4, 20. o,5. 7,5. 7, 13.4,19. 7, 98. 4, 1600, 28.2, 48.0, 34.0, 51. 0,31.0,3. 61,-0. 12, 3. 49,105. 1, 1Fitting on 1316 game samples.Doing cross-validation.Predicting on new schedule.done.结果1J Indiana Pacers, J7 Houston Rockets, 1
5、Los Angeles Lakers, ,J二 Memphis Grizzlies, , Minnesota Timberwolves,, Charlotte Hornets, Milwaukee Bucks, 0.7690952655343486二Denver Nuggets5f Miami Heat, Orlando Magic, 7 Oklahoma City Thunder, Philadelphia 76ers, 0.7 Sacramento Kings, Phoenix Suns, 0.7258686751709502? Toronto Raptors, 1 Detroit Pis
6、tons*, 0.7535364327866106,Atlanta Hawks5, , Washington Lizards, * 0.630574956949663V Chicago Bulls, Boston Celtics, Los Angeles Clippers, Portland Trail Blazers5,ZSan Antonio Spurs, Sacramento Kings, 7 Indiana Pacers, , Brooklyn Nets, Dallas Mavericks, Houston Rockets5, .0. 5626069950589027 Detroit
7、Pistons1, 1 Orlando Magic, 0.6856948252207253Miami Heat, Charlotte HornetsJ, Golden State Warriors, Oklahoma City Thunder, 1,结果二2 .分析:我们利用的局部统计数据,计算每支NBA比赛队伍的Elo socre,和利用这些基本统计 数据评价每支队伍过去的比赛情况,并且根据国际等级划分方法Elo Score对队伍现在的战 斗等级进行评分,最终结合这些不同队伍的特征判断在一场比赛中,哪支队伍能够占到优势。 但在我们的预测结果中,与以往不同,我们没有给出绝对的正负之分,而是给出
8、胜算较大一 方的队伍能够赢另外一方的概率。当然在这里,我们所采用评价一支队伍性能的数据量还太 少(只采用了 1516年一年的数据),如果想要更加准确、系统的判断。通过本次的课程设 计让我学习到了数据挖掘的分析方法,更加的熟练掌握数据分析的重点。加强巩固了分析的 思维。3 .心得:学习数据挖掘这门课程已经有一个学期了,在这十余周的学习过程中,我对数据挖掘这 门课程的一些技术有了一定的了解,并明确了一些容易混淆的概念,数据挖掘,简单说,就 是从大量的数据中,抽取出潜在的、有价值的知识、模型或规那么的过程。我会继续学习这门 课程,努力为今后的课题研究或论文打好基础。参考文献1方巍,Python数据挖
9、掘与机器学习实战,机械工业出版社,20192邵峰晶于忠清等数据挖掘原理与算法,中国水利水电出版社,20033韩家炜数据挖掘:概念与技术,机械工业出版社20174 4 Margaret H. Dunham数据挖掘教程,清华大学出版社,20055张俊妮,数据挖掘与应用,北京大学出版社,2009import pandas as pdimport mathimport csvimport randomimport numpy as npfrom sklearn import linear.modelfrom _selection import cross_val_score#当每支队伍没有elo等级分
10、时,赋予其基础elo等级分 base_elo = 1600team_elos =team_stats =)X 二 口y 二口# #存放数据的目录folder =data#根据每支队伍的Miscellaneous Opponent, Team统计数据csv文件进行初始化 def initialize_data(Mstat, Ostat, Tstat):new_Mstat = (Rk, Arena, axis=l)new_0stat =(Rk, G, MP, axis=l)new_Tstat =(1Rk1, G, MP, axis=l)teamstatsl = pd. merge(new_Msta
11、t, new_0stat, how=left, on=,Team*) team_statsl = pd. merge(team_statsl, new_Tstat, how=,left1, on=Team) return teamindex(rTeam*, inplace=False, drop二True)def get_elo(team):try:return team_elosteamexcept:#当最初没有elo时,给每个队伍最初赋base_eloteam_elosteam = base_eloreturn team_elosteam#计算每个球队的elo值def calc_elo(w
12、in_team, lose_team): winner_rank = get_elo(win_team) loser_rank = get_elo(lose_team)rank_diff = winner_rank - loser_rank exp = (rank_diff * -1) / 400odds = 1 / (1 + (10, exp)#根据rank级别修改K值if winner_rank = 2100 and winner_rank 0.5:X. append(teaml_features + team2_features)y. append(0)else:X. append (t
13、eam2_features + teaml_features)y. append (1)if skip 二二 0:print (fXf,X)skip = 1#根据这场比赛的数据更新队伍的elo值new_winner_rank, new_loser_rank = calc_elo(Wteam, Lteam) teamelosWteam = new_winner_rankteam_elosLteam = new_loser_rankreturn np.nan 一to num (X), y if _name_ = ,_main:Mstat = pd. read _csv(folder + ,/15-
14、16Miscellaneous_1)Ostat = pd. read_csv(folder + 1/15-160pponent_Per_Game_1)Tstat = pd. read_csv(folder + f/15-16Team_Per_Game_1)team_stats = initializedata(Mstat, Ostat, Tstat)result_data = pd. read_csv (folder + 1/2015-2016_r)X, y = bui1d_dataSet(resu1t_data)#训练网络模型print(Fitting on %d game samples.
15、 H % len(X)model = linear_ticRegression()(X, y)#利用10折交叉验证计算训练正确率print (HDoing cross-validation. . n)print(cross_val_score(model, X, y, cv = 10, scoring=,accuracy1, n_jobs=-l). mean()def predict_winner(team1, team 2, model): features =# team 1,客场队伍d (get_elo (teaml)for key, value in team_team 1. iter
16、items(): d (value)team 2,主场队伍d(get_elo(team_2) + 100)for key, value in team_team2. iteritems(): d (value)features = np. nan_to_num(features)return ct_proba(features)#利用训练好的model在16-17年的比赛中进行预测print(Predicting on new schedule.1) schedulel617 = pd. read csv (folder + /) result =for index, row in ows()
17、: teaml = rowf Vteam* team2 = row1Hteam1 pred = predict_winner(teaml, team2, model) prob = pred 0 0 if prob 0. 5: winner = teaml loser 二 team2 d(winner, loser, prob)else:winner = team2 loser = teaml d(winner, loser, 1 - prob)with open(f , w) as f: writer = r(f) row(1 win1, lose, probability1) rows(r
18、esult) print (f done.)try:pd. read_csv(f1, header=0) except:passimport csvwith open() as f:f_csv = r(f)headers 二 next (f_csv)for row in f_csv: print (row)课程设计题目:NBA比赛结果预测一、工程背景和目标1二、数据概览1三、数据分析2四、代码实现3五、结果及分析7参考文献8一、工程背景和目标工程背景不知道你是否朋友圈被刷屏过NBA的某场比赛进度或者结果?或者你就是一个NBA狂 热粉,比赛中的每个进球,抢断或是逆转压哨球都能让你热血沸腾。除去观
19、赏精彩的比赛过 程,我们也同样好奇比赛的结果会是如何。因此本节课程,将给同学们展示如何使用NBA比 赛的以往统计数据,判断每个球队的战斗力,及预测某场比赛中的结果。1.1 工程目标比照赛数据分析,得到代表每场比赛每支队伍状态的特征表达,利用每场比赛与胜利队 伍的关系,对2016-2017的比赛进行预测。八数据概览1.1获取NBA比赛统计数据我们将以获取Team Per Game Stats表格数据为例,展示如何获取这三项统计数据:1 .进入到Basketball中,在导航栏中选择Season并选择20152016赛季中的Summary:2 .进入到20152016年的Summary界面后,滑动
20、窗口找到Team Per Game Stats表格,并选 择左上方的Share & more,在其下拉菜单中选择 Get table as CSV (for Excel):TeamGameSuts *rwovmb I , riA ORA DR8 TRB AST BLK TOV Pf PIS的!W 247.4 S3 IMxxf SMmg Tooajit792 1),1 157.4H0.22 241.S 40.025.5.725 i0.6 33.7 44.2 24.5 氟94,20.4106.66 241.0“4 n.S M.T 4M 22.2 10.0S.2 UM106.5.$n$ 6).i,”
21、1,.410 10.5.490 17.4 Q10547 LflLACQCSA-UlSSCia18“4 55.71B.2“ IXO 21.32 242.1 M.O 10.7 ”6.$14 U.) 2L7.748 28 319 44,5 22.7104.J6 240.9 ”.S 85.B .460 8.6 24.2$01 U.S 12,730944L MS8.63 14 5 20 辱 104 .12 240,40.1 82.9 .444 U.S,3H 3X.2 64.4SIS U.4 J0.49.4 34.5 43G 24,USS:42.1 ”Q M.4.4 IB.? ),.7t.O M.O 4)
22、.* JI.710J.4,I2IM.I 3.a 4a iIf 1g 】1loa.eI)ft86PgMh imXaJUBtsD* 15TcfienwdwZ 241.2 M,S 95.942 241.2 X.7 Bi.J82 242.461.3a 244.0 ”.4 M.i4S1 B.6 2) 4.444 5.S X&4,M,X9M4 ss.e4% 21.4 ”Q48 17.7m 10.0 Jl.S 4t.s ”4&.044 )SjO 20.7102.4102.33,复制在界面中生成的csv格式数据,并粘贴至一个文本编辑器保存为csv文件即可:RkleanAgeVLPVPLKOVSOSSRSOR”
23、DRPaceFTr3PArTS%cFG%TOV%ORB%FT/FGAeFG%TOWDRBXFT/FGA Arena Attendanc1Golden St 27. 4739651710.76-0. 3810. 38114.5103.899.30. 250. 3620. 5930. 56313.523.50. 1910.47912.6760.208 Oracle Ai8034362San Antox30.367:5671510.63-0. 3610. 28110.39993.80.2460. 2230. 5640. 52612.4230. 1970.47714. 179. 10. 182 ATt
24、T Cen!7564453Oklahoma25.8552759237.28-0.197. 09113.1105.696.70.2920.2750. 5650. 5241431.10.2280.484H.7760. 205 Chesapeall 74632314Clevelanc28.1572557256-0.555.45110.9104.593.30.2590. 3520. 5580. 52412.725.10.1940.49612.678.50.205 Quicken I8430425Los Angel29.7532953294.28-0.154. 13108.3103.895.80.318
25、0. 3240. 5660. 52412.120.10. 220.4813.873.80.222 STAPLES C7869106Toronto26.3562653294.5-0.424.08110105.292.90.3280.2870. 5520. 50412.324.60.2550.49812.777.70.201 Air Canac8128637Atlanta !28.2483451313.61-0.123.49105.1101.497. 10.2370. 3360. 5520.51613.819. 10. 1850.4814.474.60.194 Philips /6901508Bo
26、ston25.2483450323.21-0.372.84106.8103.698.50.2640.2930. 5310.48812.125. 10.2080.48714.674.60.231 TD Garder7490769 Charlotte26483449332.72-0.362.36107.1104.395.70.280. 3480. 5450. 50211.7200.2220.49612.579.80.191 Time Varr71689410Utah Jazs24.2404246361.790.051.84105.9103.9910.2860.2970.540. 50114.225
27、.90.2130.49513.577.70. 21 Vivint Sn79148911Indiana I26.9453746361.71-0.091.62104.6102.996.60.2680.270. 5360. 49713.523.40.2050.48914.3760.205 Bankers 169073312Miaiii He j28.4483446361.65-0.141.5106.1104.493.60.2820. 2210. 5450. 50813.323.80.2!0. 48512. 177.80. 196 American/80935013Portland24.3443843
28、390.830.150. 98108.8108960.2680. 3320. 5480.51113.225.90.2020.50312. 176.20. 225 Moda Cent79408514Detroit 125443843390.61-0.180.43106.1105.595.10.2960. 3030. 5220. 49112.2270.1970.50412.579.30.196 The Palac67713815Houston27.8414142400.20.140.34108.3108.197.60.3520.370. 5530.51614.225.70.2440.51614.7
29、72.80.219 Toyota Cc73724416Dallas30.342404042-0.30.29-0. 02106.710794.30.2650. 3390. 5440. 5021220.60.2110. 50412.876.20. 198 Anerican82590117Washmgtc27.341414042-0.50-0.5105.3105.898.50.2630.2820. 5440.51113.120.60. 1920.51514.677.70.218 Verizon C72542618Chicago I27.642403745-l.480.0!-1.46105106.59
30、5.70.240.2440. 5260.48712.624.50. 1890.48510.774.90.182 United Ce89465919 Orlando23.9354736467.12-0.06-1.68105.1106.8960.2320. 2550. 5330.512.823.10.1750.51313.876.50. 215 Anway Cer71927520Memphis (30.542403547-2.240.11-2. 14105.4107.893.30.2950.2220. 5240. 47712.325.30.2310.51815.275.10.251 FedEx F
31、oi70189421Sacraiiien26.633493448-2.480.16-2.32106108.41000.2950.260. 5460.5114.223.90.2140. 5211474.90.202 Sleep Trw70762622Nev York27.232503349-2.730-2.74104.6107.693.40.2550.2560. 5270. 48312.623.70.2050.48710.575.80. 204 Madison 81229223Denver Ni24.7334。3349-3.I0.29-2.8!105.6108.995.70.2820. 2770
32、. 53!0.48913.225.80.2160.51512.677.30. 216 Pepsi Cer57789824Minnesota24.629533151-3.540.15-3.38106.5110.195.20.3320. 2020. 5490. 49813.924.30.2630.52413.674.70.2 Target Ce58117825Nev Orle26.630523151-3.790.24-3.56105.6109.596.80.2590.2770. 5370. 49812.321.20.2010.52312.778.80.225 Snoothie68654926Mil
33、waukee23.533492953-4.180.2-3. 98104.3108.794.20.2760. 1890. 5370. 49914.224.90.2070.5114.273.10.221 BJIO Harri62180827Phoenix2623592458-6.660.34-6.32102.210998.50.2710. 3020. 5260. 48715.225.40.2040. 52313.577. 10. 237 Talking J70140528Brooklyn26.92!612260-7.350.24-7.12103.2110.995.20.2460.2180. 527
34、0. 49213.624. 10.1860.53413. 175.70. 176 Barclays62014229Los Ange26.517651765-9.560.64-8. 92101.6111.695.60.2920.290. 5090. 4612.523.10.2280.52311.674.70.202 STAPLES C77887730Philadclj23.310721666-10.230.31-9. 9298.8109.297.90.2690. 3270.5190.48714.820.60.1860.5113.5740.24 Veils Fai614650三、数据分析在这里我们
35、将基于国际象棋比赛,大致地介绍下Elo等级划分制度。在上图中Eduardo 在窗户上写下的公式就是根据Logistic Distribution计算PK双方(A和B)对各自的胜 率期望值计算公式。假设A和B的当前等级分为RAR_ARA和RBR_BRB,那么A对B的胜率 期望值为:1 + 10(Rb-RQ/4OOB对A的胜率期望值为Eb =1 + 10(&-品)/400和他的胜如果棋手A在比赛中的真实得分SAS.ASA (胜1分,和0. 5分,负0分) 率期望值EAE_AEA不同,那么他的等级分要根据以下公式进行调整=R + K(Sa -用/)在国际象棋中,根据等级分的不同K值也会做相应的调整:
36、 大于等于2400, K=1621002400 分,K=24 小于等于2100, K=32因此我们将会用以表示某场比赛数据的特征向量为(假如A与B队比赛):A队Eloscore, A队的T,0和M表统计数据,B队Elo score, B队的T,0和M表统计数据四、代码实现引入实验相关模块 import pandas as pd import mathimport csvimport randomimport numpy as npfrom sklearn import linear_modelfrom _selection import cross_val_score设置后归训练时所需用到的参
37、薪变量# #当每支队伍没有elo等级分时,赋予其基础elo等级分base_elo = 1600team_elos =team_stats = X =y =# #存放数据的目录folder = data1# 在最开始需要初始化数据,从T、0和M表格中读入数据,去除一些无关数据并将这三个 表格通过Team属性列进行连接根据每支队伍的Miscellaneous Opponent, Team统计数据csv文件进行初始化def initialize_data(Mstat, Ostat, Tstat):new_Mstat = (1Rk1, fArena*, axis=l)new_0stat = (fRk1
38、, fGr, MP, axis=l)new_Tstat = (1Rk1, G,MP, axis=l)team_statsl = pd. merge (new_Mstat, new_0stat, how=!leftr, on=,Team1) team_statsl = pd. merge(team_statsl, new_Tstat, how=rleftf, on=Team) return team_index(!Team, inplace=False, drop=True) def get_elo(team):try:return team_elosteamexcept:#当最初没有elo时,
39、给每个队伍最初赋base_eloteam_elosteam = base_eloreturn team_elosteam#计算每个球队的elo值def calc_elo(win_team, lose_team):winner_rank = get_elo(win_team)loser_rank = get_elo (lose_team)rank_diff = winner_rank - loser_rankexp 二(rank_diff * -1) / 400odds - 1 / (1 + (10, exp)#根据rank级别修改K值if winner_rank = 2100 and winner_rank 2400:k 二 24else: