ImageVerifierCode 换一换
格式:DOC , 页数:11 ,大小:297.74KB ,
资源ID:4137481      下载积分:8 金币
验证码下载
登录下载
邮箱/手机:
验证码: 获取验证码
温馨提示:
支付成功后,系统会自动生成账号(用户名为邮箱或者手机号,密码是验证码),方便下次登录下载和查询订单;
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

开通VIP
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.zixin.com.cn/docdown/4137481.html】到电脑端继续下载(重复下载【60天内】不扣币)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  
声明  |  会员权益     获赠5币     写作写作

1、填表:    下载求助     留言反馈    退款申请
2、咨信平台为文档C2C交易模式,即用户上传的文档直接被用户下载,收益归上传人(含作者)所有;本站仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿,我们不确定上传用户享有完全著作权,根据《信息网络传播权保护条例》,如果侵犯了您的版权、权益或隐私,请联系我们,核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
3、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确),网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据,个别因单元格分列造成显示页码不一将协商解决,平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺,下载前须认真查看,确认无误后再购买,务必慎重购买;若有违法违纪将进行移交司法处理,若涉侵权平台将进行基本处罚并下架。
4、本站所有内容均由用户上传,付费前请自行鉴别,如您付费,意味着您已接受本站规则且自行承担风险,本站不进行额外附加服务,虚拟产品一经售出概不退款(未进行购买下载可退充值款),文档一经付费(服务费)、不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
5、如你看到网页展示的文档有www.zixin.com.cn水印,是因预览和防盗链等技术需要对页面进行转换压缩成图而已,我们并不对上传的文档进行任何编辑或修改,文档下载后都不会有水印标识(原文档上传前个别存留的除外),下载后原文更清晰;试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓;PPT和DOC文档可被视为“模板”,允许上传人保留章节、目录结构的情况下删减部份的内容;PDF文档不管是原文档转换或图片扫描而得,本站不作要求视为允许,下载前自行私信或留言给上传者【丰****】。
6、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用;网站提供的党政主题相关内容(国旗、国徽、党徽--等)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
7、本文档遇到问题,请及时私信或留言给本站上传会员【丰****】,需本站解决可联系【 微信客服】、【 QQ客服】,若有其他问题请点击或扫码反馈【 服务填表】;文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“【 版权申诉】”(推荐),意见反馈和侵权处理邮箱:1219186828@qq.com;也可以拔打客服电话:4008-655-100;投诉/维权电话:4009-655-100。

注意事项

本文(计算机体系结构课后习题.doc)为本站上传会员【丰****】主动上传,咨信网仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知咨信网(发送邮件至1219186828@qq.com、拔打电话4008-655-100或【 微信客服】、【 QQ客服】),核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载【60天内】不扣币。 服务填表

计算机体系结构课后习题.doc

1、计算机体系结构课后习题1.1 Three enhancements with the following speedups are proposed for a new architecture :Speedup1=30Speedup2=20Speedup3=15Only one enhancement is usable at a time.(1) If enhancements 1 and 2 are each usable for 25% of the time ,what fraction of the time must enhancement 3 be used to achiev

2、e an overall speedup of 10?(2)Assume the enhancements can be used 25%,35% and 10% of the time for enhancements 1,2,and 3,respectively .For what fraction of the reduced execution time is no enhancement in use?(3)Assume ,for some benchmark,the possible fraction of use is 15% for each of enhancements 1

3、 and 2 and 70% for enhancement 3.We want to maximize performance .If only one enhancement can be implemented ,which should it be ?If two enhancements can be implemented ,which should be chosen?答:(1)Assume: the fraction of the time enhancement 3 must be used to achieve an overall speedup of 10 is x.S

4、peedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhanced10=11-25%-25%-x+25%30+25%20+x15 So , x=45%(2)Assume:The total execution time before the three enhancements can be used is Timebefore ,The execution time for no enhancement is Timeno.Timeno=1-25%-35%-10%TimebeforeThe total execution time

5、after the three enhancements can be used is TimeafterTimeafter=Timeno+25%30Timebefore+35%20Timebefore+10%15TimebeforeSo,TimenoTimeafter=90.2%(3)By Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedIf only one enhancement can be implemented:Speedupoverall1=11-15%+15%30=1.17Speedupoveral

6、l2=11-15%+15%20=1.166Speedupoverall3=11-15%+15%15=2.88So,we must select enhancement 1 and 3 to maximize performance.Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall12=11-15%-15%+15%30+15%20=1.40Speedupoverall13=11-15%-70%+15%30+70%15=4.96Speedupoverall23=11-15%-70%+15%2

7、0+70%15=4.90So,we must select enhancement 1 and 3 to maximize performance.1.2 Suppose there is a graphics operation that accounts for 10% of execution time in an application ,and by adding special hardware we can speed this up by a factor of 18 . In further ,we could use twice as much hardware ,and

8、make the graphics operation run 36 times faster.Give the reason of whether it is worth exploring such an further architectural change?答:Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall1=11-10%+10%18=10.9+0.0055555=1.104Speedupoverall2=11-10%+10%36=10.9+0.0027777=1.108So

9、,It is not worth exploring such an further architectural change.1.3 In many practical applications that demand a real-time response,the computational workload W is often fixed.As the number of processors increases in a parallel computer,the fixed workload is distributed to more processors for parall

10、el execution.Assume 20 percent of W must be executed sequentially ,and 80 percent can be executed by 4 nodes simultaneously .What is a fixed-load speedup?答:Speedupoverall=11-Fracionenhanced+FrationenhancedSpeedupenhancedSpeedupoverall1=WW20%+W80%4=10.2+0.2=2.5So,a fixed-load speedup is 2.5.2.1 There

11、 is a model machine with nine instructions,which frequencies are ADD(0.3), SUB(0.24), JOM(0.06), STO(0.07), JMP(0.07), SHR(0.02), CIL(0.03), CLA(0.2), STP(0.01),respectively. There are several GPRs in the machine.Memory is byte addressable,with accessed addresses aligned .And the memory word width i

12、s 16 bit.Suppose the nine instructions with the characteristics as following :nTwo operands instructionsnTwo kinds of instruction lengthnExtended codingnShorter instruction operands format:R(register)-R(register)nLonger instruction operands format:R(register)-M(memory)nWith displacement memory addre

13、ssing modeA. Encode the nine instructions with Huffman-coding, and give the average code length.B. Designed the practical instruction codes,and give the average code length.C. Write the two instruction word formats in detail.D. What is the maximum offset for accessing memory address?答: Huffman codin

14、g by Huffman treenADD30%01nSUB24% 11nCLA 20% 10nJOM6% 0001nSTO7%0011nJMP7%0010nSHR2%000001nCIL3%00001nSTP1%000000So,the average code length isi=19pili=2.61bits(B)Two kinds of instruction length extended codingnADD30%01nSUB 24% 11nCLA20% 10nJOM6% 11000nSTO7%11001nJMP7%11010nSHR2%11011nCIL3%11100nSTP1

15、%11101So,the average code length is(C)Shorter instruction format:Opcode2bitsRegister3bitsRegister3bitsLonger instruction format:opcode5bitsRegister3bitsRegister3bitsoffset5bits(D)The maximum offset for accessing memory address is 32 bytes.3.1Identify all of the data dependences in the following code

16、 .Which dependences are data hazards that will be resolved via forwarding?ADDR2,R5,R4ADDR4,R2,R5SW R5,100(R2)ADDR3,R2,R4答:3.2How could we modify the following code to make use of a delayed branch slot?Loop: LW R2,100(R3)ADDI R3,R3,#4BEQ R3,R4,Loop答:LW R2,100(R3)Loop:ADDI R3,R3,#4BEQ R3,R4,LoopDelaye

17、d branch slotLW R2,100(R3)3.3Consider the following reservation table for a four-stage pipeline with a clock cycle t=20ns.A. What are the forbidden latencies and the initial collision vector?B. Draw the state transition diagram for scheduling the pipeline.C. Determine the MAL associated with the sho

18、rtest greedy cycle.D. Determine the pipeline maximumthroughput corresponding to the MAL and given t.s1s2s3s4123456答:A. the forbidden latencies F=1,2,5 the initial collision vectorC=(10011)B.the state transition diagramC. MAL (Minimal Average Latency)=3 clock cyclesD. The pipeline maximum throughput

19、Hk=1/(320ns)3.4Using the following code fragment:Loop: LW R1,0(R2); load R1 from address 0+R2ADDI R1,R1,#1;R1=R1+1SW0(R2),R1;store R1 at address 0+R2ADDI R2,R2,#4;R2=R2+4SUBR4,R3,R2;R4=R3-R2BNEZ R4,Loop;Branch to loop if R4!=0Assume that the initial value of R3 is R2+396.Throughout this exercise use

20、 the classic RISC five-stage integer pipeline and assume all memory access take 1 clock cycle.A. Show the timing of this instruction sequence for the RISC pipeline without any forwarding or bypassing hardwarebut assuming a register read and a write in the same clock cycle “forwards”through the regis

21、ter file. Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?B. Show the timing of this instruction sequence for the RISC pipeline with normal forwarding and bypassing hardware. Assume that the branch is h

22、andled by predicting it as not taken. If all memory reference take 1 cycle, how many cycles does this loop take to execute?C. Assume the RISC pipeline with a single-cycle delayed branchand normal forwarding and bypassing hardware. Schedule the instructions in the loop including the branch delay slot

23、. You may reorder instructions and modify the individual instruction operands, but do not undertake other loop transformations that change the number or opcode of the instructions in the loop. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.答:A. The

24、loop iterates 396/4=99 times.Go through one complete iteration of the loop and the first instruction in the next iteration.Total length=the length of iterations 0 through 97(The first 98 iterations should be of the same length) +the length of the last iteration.We have assumed the version of DLX des

25、cribed in Figure 3.21(Page 97) in the book,which resolves branches in MEM.From this Figure, the second iteration begin 17 clocks after the first iteration and the last iteration takes 18 cycles to complete.Total length=1798+18=1684 clock cyclesB. From this Figure, the second iteration begin 10 clock

26、s after the first iteration and the last iteration takes 11 cycles to complete.Total length=1098+11=991 clock cyclesC. Loop: LW R1,0(R2);load R1 from address 0+R2ADDI R1,R1,#1;R1=R1+1SW0(R2),R1;store R1 at address 0+R2ADDI R2,R2,#4;R2=R2+4SUBR4,R3,R2;R4=R3-R2BNEZ R4,Loop;Branch to loop if R4!=0Reord

27、er instructions to :Loop: LW R1,0(R2); load R1 from address 0+R2ADDI R2,R2,#4; R2=R2+4SUBR4,R3,R2;R4=R3-R2ADDI R1,R1,#1;R1=R1+1BNEZ R4,Loop;Branch to loop if R4!=0SW-4(R2),R1;store R1 at address 0+R2From Figure the second iteration begin 6 clocks after the first iteration and the last iteration take

28、s 10 cycles to complete.Total length=698+10=598 clock cyclesLoop: LW R1,0(R2); load R1 from address 0+R2stallADDI R1,R1,#1;R1=R1+1SW0(R2),R1;store R1 at address 0+R2ADDI R2,R2,#4; R2=R2+4SUBR4,R3,R2;R4=R3-R2stallBNEZ R4,Loop;Branch to loop if R4!=0stallLoop: LW R1,0(R2);load R1 from address 0+R2(sta

29、ll)ADDI R2,R2,#4;R2=R2+4ADDI R1,R1,#1;R1=R1+1SW-4(R2),R1;store R1 at address 0+R2SUBR4,R3,R2;R4=R3-R2stallBNEZ R4,Loop;Branch to loop if R4!=0stallLoop: LW R1,0(R2);load R1 from address 0+R2(stall)ADDI R2,R2,#4;R2=R2+4SUBR4,R3,R2;R4=R3-R2(stall)ADDI R1,R1,#1;R1=R1+1BNEZ R4,Loop;Branch to loop if R4!

30、=0(stall)SW-4(R2),R1;store R1 at address 0+R23.5Consider the following reservation table for a four-stage pipeline.A. What are the forbidden latencies and the initial collision vector?B. Draw the state transition diagram for scheduling the pipeline.C. Determine the MAL associated with the shortest g

31、reedy cycle.D. Determine the pipeline maximum throughput corresponding to the MAL.E. According to the shortest greedy cycle , put six tasks into the pipeline ,determine the pipeline actual throughput.1234567s1s2s3s4答:A. the forbidden latencies are 2,4,6 the initial collision vector C=(101010)B.the s

32、tate transition diagram:C.the MAL associated with the shortest greedy cycle is 4 cycles.schedulingAverage latency(1,7)4(3,5)4(5,3)4(5)5(3,7)5(5,7)6(7)7D. the pipeline maximum throughput corresponding to the MAL :Hk=1/(4 clock cycles)E. According to the shortest greedy cycle , put six tasks into the

33、pipeline.The best scheduling is the greedy cycle(l,7).because :according to (1,7) scheduling :actual throughput Hk=6/(1+7+1+7+1+7)=6/(24 cycles)according to (3,5) scheduling :actual throughput Hk=6/(3+5+3+5+3+7)=6/(26 cycles)according to (5,3) scheduling :actual throughput Hk=6/(5+3+5+3+5+7)=6/(28 c

34、ycles)4.1 The following C program is run (with no optimizations) on a machine with a cache that has four-word(16-byte)blocksand holds 256 bytes of data:inti,j,c,stride,array256;for(i=0;i10000;i+)for(j=0;j256;j=j+stride)c=arrayj+5;if we consider only the cache activity generated by references to the

35、array and we assume that integer sare words, what is the expected miss rate when the cache is direct-mapped and stride=132? How about if stride=131? Would either of these change if the cache were two-way set associative?答:If stride=132 and the cache is direct-mappedPage 201、211The block number of th

36、e cache is 256/16=16The block address of array0= 0/16 =0The block number that array0maps to cache : 0 mod16=0The block address of array132= 1324/16 =33The block number that array132maps to cache : 33 mod 16=1So,miss rate=2/210000=1/10000If stride=131 and the cache is direct-mappedPage 201、211The blo

37、ck number of the cache is 256/16=16The block address of array0= 0/16 =0The block number that array0maps to cache : 0 mod16=0The block address of array131= 1314/16 =32The block number that array131maps to cache:32 mod 16=0So,miss rate=210000/210000=1If stride=132 and the cache is two-way set associat

38、ivePage 224-227、211The block number of the cache is 256/16=16The set number of the cache is 16/2=8The block address of array0= 0/16 =0The set number that array0maps to cache : 0 mod 8=0The block address of array132= 1324/16 =33The set number that array132maps to cache :33 mod 8=1So,miss rate=2/21000

39、0=1/10000If stride=131 and the cache is two-way setassociativePage 224-227、211The block number of the cache is 256/16=16The set number of the cache is 16/2=8The block address of array0= 0/16 =0The set number that array0maps to cache : 0 mod 8=0The block address of array131= 1314/16 =32The set number

40、 that array131maps to cache :32 mod 8=0So,miss rate=2/210000=1/100004.2 Consider a virtual memory system with the following properties:n40-bitvirtualbyteaddressn16-KBpagesn36-bitphysicalbyteaddress(1)whatisthetotalsizeofthepagetableforeachprocessonthismachine,assumingthatthevalid,protection,dirty,an

41、dusebitstakeatotalof4bitsandthatallthevirtualpagesareinuse?(Assumethatdiskaddressesarenotstoredinthepagetable)(2)Assumethatthevirtualmemorysystemisimplementedwithatwo-wayset-associativeTLBwithatotalof256TLBentries.Showthevirtual-to-physicalmappingwithafigure.Makesuretolabelthewidthofallfieldsandsignals.答:So,the total size of the page table for each process on this machine is:2(40-14) (4+(36-14)bit=22626bit=208M(Byte)

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服