收藏 分销(赏)

Computer-Architecture计算机系统结构-知识点详解.docx

上传人:人****来 文档编号:4857026 上传时间:2024-10-15 格式:DOCX 页数:27 大小:66.94KB
下载 相关 举报
Computer-Architecture计算机系统结构-知识点详解.docx_第1页
第1页 / 共27页
Computer-Architecture计算机系统结构-知识点详解.docx_第2页
第2页 / 共27页
Computer-Architecture计算机系统结构-知识点详解.docx_第3页
第3页 / 共27页
Computer-Architecture计算机系统结构-知识点详解.docx_第4页
第4页 / 共27页
Computer-Architecture计算机系统结构-知识点详解.docx_第5页
第5页 / 共27页
点击查看更多>>
资源描述

1、Computer Architecture计算机系统结构1. Fundamentals of Computer Architecture 计算机系统结构的基本原理1.1 Layers of Computer System计算机的层次 Application Language Machine M5 应用语言机High-Level Language Machine M4 高级语言机Assembly Language Machine M3 汇编语言机Operating System Machine M2 操作系统机Conventional Machine M1 传统机Microprogram Mac

2、hine M0 微程序机1. 每个层次执行相关的功能子集。2. 每个层次要依赖于下一个低层去执行更原始的功能。3. 这就将问题分解成更易处理的子问题。4. 从M2到M5的层次是虚拟机。5. 在传统机上的指令(算数、逻辑等)由微程序级的程序实现。该程序 是作为一个解释器,能理解一组简单的操作集合,称为微指令集。1.2 Computer Architecture and Implementation计算机的系统结构和实现 Computer Architecture 计算机系统结构Refers to those attributes of a system visible to a programm

3、er, or those attributes have direct impact on logical execution of program.程序员可见,或者对程序执行有直接影响的属性 Implementation 实现Two components: Organization and hardware. 两个组件:组织和硬件1. Organization(组织): includes high-level aspects of a computers design, such as: memory system, bus structure, internal CPU. 组织(组织):包

4、括高级方面的计算机的设计,如:内存系统,总线结构、内部CPU。2. Hardware(硬件): refers to the specifics of a machine, include: detailed logic design and packaging technology. 硬件(硬件):指机器的细节,包括:详细的逻辑设计和包装技术。 Architectural Attributes 系统结构方面的属性instruction set, 指令集I/O mechanisms, I/O机制techniques for addressing memory 寻址技术 number of bit

5、s representing various data type(numbers, characters) 表示各种数据类型的位数(数值、字符) Organizational Attributes 组织方面的属性Hardware details transparent to the programmer.对于程序透明的硬件细节 such as: control signals 控制信号computer/peripheral interfaces 计算机/外设接口 memory technology 存储技术 Hardware Attributes 硬件方面的属性packaging techno

6、logy 封装技术power 功耗cooling 冷却 Architectural Design Issue 系统结构设计问题Whether a computer will have a multiply instruction. 是否要有一个乘法指令 Organizational Issue 组织设计问题1. Whether the instruction will be implemented by a special multiply unit or by repeated use of the add unit. 是采用乘法单元还是采用加法单元迭代使用2. The decision m

7、ay be based on the anticipated frequency of use of the multiply instruction, the relative speed of the two approaches, and the cost and physical size of a special multiply unit. 决策取决于乘法指令使用频率,两种方法的相对速度,乘法单元的成本和大小1.3 The Task of A Computer Designer计算机设计者的任务 Determine what attributes are important for

8、 a new machine. 确定哪些属性是重要的Design a machine to maximize performance (性能) while staying within cost (成本) and power(功耗) constraints, 设计一台机器来最大化性能,并保持在成本和力量约束including: instruction set design 指令集设计,functional organization 功能设计, logic design 逻辑设计,implementation(实现): IC design, package, cooling They have

9、to determine the functional requirements: major task: 功能需求是主要的1. The requirements may be specific features inspired by the market. 由市场决定的某个特性2. Application software often drives the choice of certain functional requirements.应用软件驱动3. The presence of a large market for a particular class of applicatio

10、ns might encourage the designers to incorporate requirements. 应用驱动1.4 Measuring and Reporting Performance测量和报告性能 When we say one computer is faster than another, what do we mean? 快的涵义?The user may say a computer is faster when a program runs in less time. 用户:程序运行时间短the computer center manager may sa

11、y a computer is faster when it completes more jobs in an hour. 计算机中心经理:在一小时内做更多工作1. response time (响应时间)The computer user is interested in reducing2. execution time (执行时间)the time between the start and the completion of an event3. throughput (吞吐量)the total amount of work done in a given time.X比Y快n倍:

12、n=Execution timeYExecution timeX=PerformanceYPerformanceX n=执行时间Y执行时间X=性能Y性能X执行时间=1性能 Measuring Performance 测量性能Even execution time can be defined in different ways: 执行时间的不同定义wall-clock time, response time, or elapsed time, which is the latency to complete a task, including disk accesses, memory acc

13、esses, input/output activities, operating system overhead.时钟时间、响应时间,或运行时间,这是延迟完成一个任务,包括磁盘访问,内存的访问、输入/输出,操作系统开销。CPU time (CPU时间):means the time CPU is computing, not including the time waiting for I/O or running other programs. CPU时间:是指CPU计算时间,其中还不包括时间等待I / O或运行其他程序。CPU time can be further divided in

14、to: CPU时间可以进一步分为:1. user CPU time (用户CPU时间):the CPU time spent in the program CPU花在这个程序的时间2. system CPU time (系统CPU时间):the CPU time spent in the operating system performing tasks requested by the program, called). CPU花在操作系统执行任务所要求的项目 Choosing Programs to Evaluate Performance 选择程序来评估性能Four levels of

15、programs listed below in decreasing order of accuracy of prediction. 四个层次的程序, 按精确度从高到底的次序1. Real applications 真实应用 input, output, and options 有输入、输出、可选项2. Kernels 核心程序 key pieces 关键片段 最便于辨析出机器单个特性的性能3. Toy benchmarks 玩具测试基准4. Synthetic benchmarks 合成测试基准 匹配程序中操作和操作数的平均频率 Benchmark Suites 测试基准程序1. put

16、 together collections of benchmarks to measure the performance of processors with a variety of applications 把集合的基准来测量性能的处理器与各种应用程序2. A key advantage of such suites is that the weakness of one benchmark is lessened by the presence of other benchmarks 互补3. Benchmark suits are made of collections of pr

17、ograms, some of which may be kernels, but many of which are typically real programs 有些是核心程序, 但很多是真实程序 Reporting Performance Results 报告性能结果1. The guiding principle of reporting performance measurements should be reproducibility 报告的指导原则的性能测量应再现性2. requires a fairly complete description of the machine,

18、 the compiler flags, as well as the publication of both the baseline and optimized results 需要一个相当完整的描述机器,编译器标志,以及出版的两个基线和优化结果3. contains the actual performance times, shown both in tabular form and as a graph 包含实际的表现时期,显示两个表格形式和图表 Comparing and Summarizing Performance 比较和总结性能battles are fought over

19、what is the fair way to summarize relative performance of a collection of programs. 什么是公平的方法:竞争 Total Execution Time: A Consistent Summary Measure 总体执行时间This summary tracks execution time, our final measure of performance.执行时间:最终性能度量An average of the execution time is the arithmetic mean: 平均执行时间 1ni

20、=1ntimei Weighted Execution Time 加权执行时间第一种方法:对每个程序赋予权值weighted arithmetic mean: 加权算数平均值i=1nWeightiTimei Normalized Execution Time and the Pros and Cons of Geometric Means 归一化执行时间,以及几何平均值的优劣第二种方法:利用归一化的执行时间实际性能=归一化数参考机性能Average normalized execution time can be expressed as either an arithmetic or geo

21、metric mean. 可采用算数或几何平均值ni=1nExecution time ratioi几何平均值的好性质: 几何平均值的比率与比率的几何平均值相同Geometric mean(Xi)Geometric mean(Yi)=Geometric mean(XiYi)In contrast to arithmetic means, geometric means of normalized execution times are consistent no matter which machine is the reference. Hence, the arithmetic mean

22、should not be used to. 无论采用哪个机器作为参考机,归一化执行时间的几何平均值都是一致的。故不应采用算数平均值。harmonic mean geometric mean arithmetic mean 调和均值几何均值算数均值1. Advantage: geometric mean is independent of the running times of individual programs, and it doesnt matter which machine is used to normalize. 优点:与各个程序运行时间无关,与采用哪一个机器进行归一化无关

23、2. Drawback: geometric means violate our fundamental principle of performance measurement they do not predict execution time. 缺点:违反了性能测量的基本原理1.5 Quantitative Principles of Computer Design计算机设计的量化原理 Make Common Case Fast 使常见情况更快照顾经常发生的情况 Amdahls Law 阿姆达尔定律用途:The performance gain obtained by improving

24、 some portion of a computer can be calculated using Amdahls Law 得到的性能改善的一部分电脑可以计算使用Amdahl法则定义:Amdahls Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 阿姆达尔定律的涵义: 由某些部分加速所得到的性能提高受加速部分的

25、百分率所限。加速比:Speedup=系统性能改进后系统性能改进前=总执行时间改进前总执行时间改后前加速比取决于两个因素:1. 能加速的部分 Fractionenhanced12. 能加速的程度 Speedupenhanced1总执行时间改进后=总执行时间改进前1-可改进比例+可改进比例部件加速比系统加速比=总执行时间改进前总执行时间改进后=11-可改进比例+可改进比例部件加速比回报递减法则:Amdahls Law expresses the law of diminishing returns : The incremental improvement in speedup gained by

26、 an additional improvement in just a portion of the computation diminishes as improvements are added. 对于一部分性能的提高,总体加速比的提高呈递减推论:An important corollary of Amdahls Law is that if an enhancement is only usable for a fraction of a task, we cant speed up the task by more than the reciprocal of 1 minus tha

27、t fraction. 总体加速比有上界 The CPU Performance Equation CPU性能方程两种表达方式:CPU时间=总时钟周期数时钟周期=总时钟周期数时钟频率CPI(每条指令时钟数)=总时钟周期数IC(指令数)执行时间公式:总CUP时间=CPIIC时钟频率=CPIIC时钟频率CPU performance is dependent upon : clock cycle (or rate), CPI, and IC。 CPU性能取决于:时钟周期(或比率),CPI,IC很难改变一个参数而不影响其它参数:1. Clock cycle time - Hardware techn

28、ology and organization 时钟周期:硬件技术和组织2. CPI - Organization and ISA CPI:组织和ISA(指令系统)3. Instruction count - ISA and compiler technology IC:指令系统和编译器技术 Measuring the Components of CPU Performance 测量CPU性能的各组成部分To determine the clock cycle: 确定时钟周期1. is easy for an existing CPU. 现有CPU:容易2. Low-level tools, c

29、alled timing estimators or timing verifiers, are used for a completed design. 已完成:用时延估计器或时延验证器3. for a design that is not completed, by examining the critical paths in a design.未完成;用关键路径Measuring the instruction count: 测量指令数compiler together with tools that measure the instruction set behavior.编译器及测

30、量指令集行为的工具1. first way: by instruction set simulator that interprets the instructionsslow but can measure almost any aspect of instruction set behavior accurately. 用指令集模拟器:慢2. second way: uses execution-based monitoring. the binary program is modified to include instrumentation code very fast, since

31、program is executed, rather than interpreted用基于执行的监视:快。Measuring the CPI 测量CPI Locality of Reference 引用局部性Programs tend to reuse data and instructions they have used recently项目往往重用数据和指令最近他们已经使用的1.6 Classification of Computer Architecture计算机系统结构的分类1. SISD(single instruction stream over a single data

32、stream) 单指令流单数据流2. SIMD(single instruction stream over multiple data stream) 单指令流多数据流3. MIMD(multiple instruction over multiple data streams) 多指令流多数据流4. MISD(multiple instruction streams and a single data stream) 多指令流单数据流most parallel computers built in the past assumed the MIMD model for general-pu

33、rpose computations. 过去的大多数并行计算机认为MIMD模型为通用计算。The SIMD and MISD models are more suitable for special-purpose computations. MISD的SIMD和模型更适合专用计算。2. Instruction Set 指令集2.1 Classifying Instruction Set Architecture 指令集分类 The type of internal storage in the CPU is the most basic differentiation. 内部存储的类型是最基

34、本的区别Major choices are a stack, an accumulator, or a set of registers. CUP中用来存储操作数的CUO 单元:堆栈、累加器、寄存器组Operands may be named explicitly or implicitly: 操作数是明确或隐含命名的 三种结构1. stack architecture: implicitly on the top “堆栈结构”,操作数是栈顶2. accumulator architecture: one operand is implicitly the accumulator. “累加器结

35、构”,一个操作数是累加器本身3. general-purpose register architectures (GPR): have only explicit operands-either registers or memory locations “通用寄存器结构”,操作数或者是寄存器,或者是存储器位置 两类寄存器机1. register-memory architecture 寄存器-存储器体系结构2. load-store or register-register architecture. 寄存器-寄存器体系结构3. memory-memory architecture. 存储器

36、-存储器体系结构,现在没有 通用寄存器计算机的优势1. First, registers are faster than memory. 快 2. Second, registers are easier for a compiler to use and can be used more effectively. 易于有效使用3. More importantly, registers can hold variables. Then the memory traffic reduces, the program speeds up (faster), the code density im

37、proves (named with fewer bits). 存放变量,内存流量减少,程序加速,代码密度提高 Two major characteristics divide GPR architectures. GPR按特性划分1. Whether an ALU instruction has two or three operands. ALU有两个还是三个操作数? three-operand format: a result and two source operands 三个操作数的指令,包含两个源操作数和一个目的操作数 One of the operands is both a s

38、ource and a result for the operation. 两个操作数的指令,其中一个操作数既作为源操作数,又作为目的操作数2. how many of the operands may be memory addresses. 有多少操作数可以是存储地址? Typically from none to three.通常0-3个2.2 Interpreting Memory Address 解释存储地址 How is a memory address interpreted? 存储地址是如何被解释的?byte addressed and provide access for b

39、ytes(8 bits), half words(16 bits), and words(32 bits), double words (64 bits). 字节寻址, 可访问字节、半字、字、双字 two different conventions for ordering the bytes within a word. 两种字中字节的排序1. Little Endian byte order puts the byte whose address is “x.x00” at the least-significant position in the word (the little end

40、). 小端字节序:低地址装最低有效数 2. Big Endian byte order puts the byte whose address is “x.x00” at the most-significant position in the word (the big end). 高端字节序:低地址装最高有效数Byte order is a problem when exchanging data among machines with different orderings.不同字节序机器交互数据有问题2.3 Address Modes 寻址模式 计算机如何规定地址an constant

41、, a register, or a in location in memory. 常数, 寄存器, 存储地址The actual memory address specified is called the effective address.有效地址:实际指定的内存地址1. have the ability to significantly reduce instruction counts; 降低指令数量2. also add to the complexity of building a machine. 增加了复杂性3. may increase the average CPI of

42、 computers that implement those modes. 可能增加平均CPI4. the usage of various addressing modes is quite important in helping the architect choose what to include. 选择很重要 Displacement Addressing Mode 位移寻址模式the range of displacements used. 位移范围多大Choosing the displacement field sizes is important because they

43、 directly affect the instruction length. 选择位移字段的大小是很重要的,因为他们直接影响指令长度。 Immediate or Literal Addressing Mode 立即或文字寻址Immediates can be used in arithmetic operations, in comparisons (primarily for branches), and in moves where a constant is wanted in a register. 可用于算数、比较、移动不是所有的操作都支持 the range of values

44、 for immediates. 立即数的范围1. Like displacement values, the sizes of immediate values affect instruction lengths.像位移值一样,立即数大小也影响指令大小2. As the following Figure shows, immediate values that are small are most heavily used.小立即数最常使用3. Large immediates are sometimes used, however, most likely in addressing c

45、alculations. 大立即数有时用,多用于地址计算 Summary: Memory Addressing 总结: 存储器寻址1. A new architecture should support at least: displacement, immediate, and register deferred. they represent 75% to 99% of the addressing modes. 一个新的体系结构应该至少支持位移、立即的、寄存器延迟、以上三种代表了75%-99%的寻址模式。2. The size of the address for displacemen

46、t mode should be at least 12 to 16 bits, these sizes would capture 75% to 99% of the displacements. 地址位移至少12到16位,占75%-99%。3. The size of the immediate field should be at least 8 to 16 bits. these sizes would capture 50% to 80% of the immediates. 立即数至少8到16位,占50%-80%。2.4 Optimizing Instruction Formats

47、 优化指令格式 instruction format length指令格式长度1. This decision affects, and is affected by, memory size, memory organization, bus structure, CPU complexity, and CPU speed 受内存大小、内存组织,总线结构、CPU的复杂性和CPU速度影响。2. The decision determines the richness and flexibility of the machine as seen by the assembly- language

48、 programmer. 决定了机器的丰富性、灵活性3. For a given instruction length, there is clearly a trade-off between the number of opcodes and the power of the addressing capability. 操作码数和寻址能力的折中取舍2.4.1 Opcodes Representation 操作码表示 Fixed-length Opcodes 固定长度的操作码has the advantages of simple hardware decoding and regularity. But it wastes of s

展开阅读全文
部分上传会员的收益排行 01、路***(¥15400+),02、曲****(¥15300+),
03、wei****016(¥13200+),04、大***流(¥12600+),
05、Fis****915(¥4200+),06、h****i(¥4100+),
07、Q**(¥3400+),08、自******点(¥2400+),
09、h*****x(¥1400+),10、c****e(¥1100+),
11、be*****ha(¥800+),12、13********8(¥800+)。
相似文档                                   自信AI助手自信AI助手
搜索标签

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服