Computer-Architecture计算机系统结构-知识点详解.docx

资源描述

Computer Architecture计算机系统结构 1. Fundamentals of Computer Architecture 计算机系统结构的基本原理 1.1 Layers of Computer System计算机的层次 Application Language Machine M5 应用语言机 High-Level Language Machine M4 高级语言机 Assembly Language Machine M3 汇编语言机 Operating System Machine M2 操作系统机 Conventional Machine M1 传统机 Microprogram Machine M0 微程序机 1. 每个层次执行相关的功能子集。 2. 每个层次要依赖于下一个低层去执行更原始的功能。 3. 这就将问题分解成更易处理的子问题。 4. 从M2到M5的层次是虚拟机。 5. 在传统机上的指令（算数、逻辑等）由微程序级的程序实现。该程序是作为一个解释器，能理解一组简单的操作集合，称为微指令集。 1.2 Computer Architecture and Implementation计算机的系统结构和实现 Ø Computer Architecture 计算机系统结构 Refers to those attributes of a system visible to a programmer, or those attributes have direct impact on logical execution of program.程序员可见，或者对程序执行有直接影响的属性 Ø Implementation 实现 Two components: Organization and hardware. 两个组件:组织和硬件 1. Organization(组织): includes high-level aspects of a computer’s design, such as: memory system, bus structure, internal CPU. 组织(组织):包括高级方面的计算机的设计,如:内存系统,总线结构、内部CPU。 2. Hardware(硬件): refers to the specifics of a machine, include: detailed logic design and packaging technology. 硬件(硬件):指机器的细节,包括:详细的逻辑设计和包装技术。 Ø Architectural Attributes 系统结构方面的属性 instruction set, 指令集 I/O mechanisms, I/O机制 techniques for addressing memory 寻址技术 number of bits representing various data type(numbers, characters) 表示各种数据类型的位数(数值、字符) Ø Organizational Attributes 组织方面的属性 Hardware details transparent to the programmer.对于程序透明的硬件细节 such as: control signals 控制信号computer/peripheral interfaces 计算机/外设接口 memory technology 存储技术 Ø Hardware Attributes 硬件方面的属性 packaging technology 封装技术 power 功耗 cooling 冷却 Ø Architectural Design Issue 系统结构设计问题 Whether a computer will have a multiply instruction. 是否要有一个乘法指令 Ø Organizational Issue 组织设计问题 1. Whether the instruction will be implemented by a special multiply unit or by repeated use of the add unit. 是采用乘法单元还是采用加法单元迭代使用 2. The decision may be based on the anticipated frequency of use of the multiply instruction, the relative speed of the two approaches, and the cost and physical size of a special multiply unit. 决策取决于乘法指令使用频率，两种方法的相对速度，乘法单元的成本和大小 1.3 The Task of A Computer Designer计算机设计者的任务 Ø Determine what attributes are important for a new machine. 确定哪些属性是重要的 Design a machine to maximize performance (性能) while staying within cost (成本) and power(功耗) constraints, 设计一台机器来最大化性能,并保持在成本和力量约束 including: instruction set design 指令集设计，functional organization 功能设计， logic design 逻辑设计，implementation(实现): IC design, package, cooling Ø They have to determine the functional requirements: major task: 功能需求是主要的 1. The requirements may be specific features inspired by the market. 由市场决定的某个特性 2. Application software often drives the choice of certain functional requirements.应用软件驱动 3. The presence of a large market for a particular class of applications might encourage the designers to incorporate requirements. 应用驱动 1.4 Measuring and Reporting Performance测量和报告性能 Ø When we say one computer is faster than another, what do we mean? 快的涵义？ The user may say a computer is faster when a program runs in less time. 用户：程序运行时间短 the computer center manager may say a computer is faster when it completes more jobs in an hour. 计算机中心经理：在一小时内做更多工作 1. response time (响应时间)—The computer user is interested in reducing 2. execution time (执行时间)—the time between the start and the completion of an event 3. throughput (吞吐量)—the total amount of work done in a given time. X比Y快n倍：n=Execution timeYExecution timeX=PerformanceYPerformanceX n=执行时间Y执行时间X=性能Y性能X 执行时间=1性能 Ø Measuring Performance 测量性能 Even execution time can be defined in different ways: 执行时间的不同定义 wall-clock time, response time, or elapsed time, which is the latency to complete a task, including disk accesses, memory accesses, input/output activities, operating system overhead.时钟时间、响应时间,或运行时间,这是延迟完成一个任务,包括磁盘访问,内存的访问、输入/输出,操作系统开销。 CPU time (CPU时间)：means the time CPU is computing, not including the time waiting for I/O or running other programs. CPU时间:是指CPU计算时间,其中还不包括时间等待I / O或运行其他程序。 CPU time can be further divided into: CPU时间可以进一步分为: 1. user CPU time (用户CPU时间)：the CPU time spent in the program CPU花在这个程序的时间 2. system CPU time (系统CPU时间)：the CPU time spent in the operating system performing tasks requested by the program, called). CPU花在操作系统执行任务所要求的项目 Ø Choosing Programs to Evaluate Performance 选择程序来评估性能 Four levels of programs listed below in decreasing order of accuracy of prediction. 四个层次的程序, 按精确度从高到底的次序 1. Real applications 真实应用 input, output, and options 有输入、输出、可选项 2. Kernels 核心程序 key pieces 关键片段最便于辨析出机器单个特性的性能 3. Toy benchmarks 玩具测试基准 4. Synthetic benchmarks 合成测试基准匹配程序中操作和操作数的平均频率 Ø Benchmark Suites 测试基准程序 1. put together collections of benchmarks to measure the performance of processors with a variety of applications 把集合的基准来测量性能的处理器与各种应用程序 2. A key advantage of such suites is that the weakness of one benchmark is lessened by the presence of other benchmarks 互补 3. Benchmark suits are made of collections of programs, some of which may be kernels, but many of which are typically real programs 有些是核心程序, 但很多是真实程序 Ø Reporting Performance Results 报告性能结果 1. The guiding principle of reporting performance measurements should be reproducibility 报告的指导原则的性能测量应再现性 2. requires a fairly complete description of the machine, the compiler flags, as well as the publication of both the baseline and optimized results 需要一个相当完整的描述机器,编译器标志,以及出版的两个基线和优化结果 3. contains the actual performance times, shown both in tabular form and as a graph 包含实际的表现时期,显示两个表格形式和图表 Ø Comparing and Summarizing Performance 比较和总结性能 battles are fought over what is the fair way to summarize relative performance of a collection of programs. 什么是公平的方法：竞争 Ø Total Execution Time: A Consistent Summary Measure 总体执行时间 This summary tracks execution time, our final measure of performance.执行时间:最终性能度量 An average of the execution time is the arithmetic mean: 平均执行时间 1ni=1ntimei Ø Weighted Execution Time 加权执行时间第一种方法：对每个程序赋予权值 weighted arithmetic mean: 加权算数平均值i=1nWeighti×Timei Ø Normalized Execution Time and the Pros and Cons of Geometric Means 归一化执行时间，以及几何平均值的优劣第二种方法：利用归一化的执行时间实际性能=归一化数×参考机性能 Average normalized execution time can be expressed as either an arithmetic or geometric mean. 可采用算数或几何平均值ni=1nExecution time ratioi 几何平均值的好性质: 几何平均值的比率与比率的几何平均值相同 Geometric mean(Xi)Geometric mean(Yi)=Geometric mean(XiYi) In contrast to arithmetic means, geometric means of normalized execution times are consistent no matter which machine is the reference. Hence, the arithmetic mean should not be used to. 无论采用哪个机器作为参考机，归一化执行时间的几何平均值都是一致的。故不应采用算数平均值。 harmonic mean ≤geometric mean ≤ arithmetic mean 调和均值≤几何均值≤算数均值 1. Advantage: geometric mean is independent of the running times of individual programs, and it doesn’t matter which machine is used to normalize. 优点：与各个程序运行时间无关，与采用哪一个机器进行归一化无关 2. Drawback: geometric means violate our fundamental principle of performance measurement —they do not predict execution time. 缺点：违反了性能测量的基本原理 1.5 Quantitative Principles of Computer Design计算机设计的量化原理 Ø Make Common Case Fast 使常见情况更快照顾经常发生的情况 Ø Amdahl’s Law 阿姆达尔定律用途：The performance gain obtained by improving some portion of a computer can be calculated using Amdahl’s Law 得到的性能改善的一部分电脑可以计算使用Amdahl法则定义：Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 阿姆达尔定律的涵义: 由某些部分加速所得到的性能提高受加速部分的百分率所限。加速比：Speedup=系统性能改进后系统性能改进前=总执行时间改进前总执行时间改后前加速比取决于两个因素： 1. 能加速的部分 Fractionenhanced≤1 2. 能加速的程度 Speedupenhanced≤1 总执行时间改进后=总执行时间改进前×[1-可改进比例+可改进比例部件加速比] 系统加速比=总执行时间改进前总执行时间改进后=11-可改进比例+可改进比例部件加速比回报递减法则：Amdahl’s Law expresses the law of diminishing returns : The incremental improvement in speedup gained by an additional improvement in just a portion of the computation diminishes as improvements are added. 对于一部分性能的提高，总体加速比的提高呈递减推论：An important corollary of Amdahl’s Law is that if an enhancement is only usable for a fraction of a task, we can’t speed up the task by more than the reciprocal of 1 minus that fraction. 总体加速比有上界 Ø The CPU Performance Equation CPU性能方程两种表达方式：CPU时间=总时钟周期数×时钟周期=总时钟周期数时钟频率 CPI(每条指令时钟数)=总时钟周期数IC(指令数) 执行时间公式：总CUP时间=CPI×IC×时钟频率=CPI×IC时钟频率 CPU performance is dependent upon : clock cycle (or rate), CPI, and IC。 CPU性能取决于:时钟周期(或比率),CPI,IC 很难改变一个参数而不影响其它参数： 1. Clock cycle time -- Hardware technology and organization 时钟周期：硬件技术和组织 2. CPI -- Organization and ISA CPI：组织和ISA（指令系统） 3. Instruction count -- ISA and compiler technology IC：指令系统和编译器技术 Ø Measuring the Components of CPU Performance 测量CPU性能的各组成部分 To determine the clock cycle: 确定时钟周期 1. is easy for an existing CPU. 现有CPU：容易 2. Low-level tools, called timing estimators or timing verifiers, are used for a completed design. 已完成：用时延估计器或时延验证器 3. for a design that is not completed, by examining the critical paths in a design. 未完成；用关键路径 Measuring the instruction count: 测量指令数 compiler together with tools that measure the instruction set behavior. 编译器及测量指令集行为的工具 1. first way: by instruction set simulator that interprets the instructions—slow but can measure almost any aspect of instruction set behavior accurately. 用指令集模拟器：慢 2. second way: uses execution-based monitoring. the binary program is modified to include instrumentation code —very fast, since program is executed, rather than interpreted 用基于执行的监视：快。 Measuring the CPI 测量CPI Ø Locality of Reference 引用局部性 Programs tend to reuse data and instructions they have used recently 项目往往重用数据和指令最近他们已经使用的 1.6 Classification of Computer Architecture计算机系统结构的分类 1. SISD(single instruction stream over a single data stream) 单指令流单数据流 2. SIMD(single instruction stream over multiple data stream) 单指令流多数据流 3. MIMD(multiple instruction over multiple data streams) 多指令流多数据流 4. MISD(multiple instruction streams and a single data stream) 多指令流单数据流 most parallel computers built in the past assumed the MIMD model for general-purpose computations. 过去的大多数并行计算机认为MIMD模型为通用计算。 The SIMD and MISD models are more suitable for special-purpose computations. MISD的SIMD和模型更适合专用计算。 2. Instruction Set 指令集 2.1 Classifying Instruction Set Architecture 指令集分类 Ø The type of internal storage in the CPU is the most basic differentiation. 内部存储的类型是最基本的区别 Major choices are a stack, an accumulator, or a set of registers. CUP中用来存储操作数的CUO 单元：堆栈、累加器、寄存器组 Operands may be named explicitly or implicitly: 操作数是明确或隐含命名的 Ø 三种结构 1. stack architecture: implicitly on the top “堆栈结构”，操作数是栈顶 2. accumulator architecture: one operand is implicitly the accumulator. “累加器结构”，一个操作数是累加器本身 3. general-purpose register architectures (GPR): have only explicit operands--either registers or memory locations “通用寄存器结构”，操作数或者是寄存器，或者是存储器位置 Ø 两类寄存器机 1. register-memory architecture 寄存器-存储器体系结构 2. load-store or register-register architecture. 寄存器-寄存器体系结构 3. memory-memory architecture. 存储器-存储器体系结构，现在没有 Ø 通用寄存器计算机的优势 1. First, registers are faster than memory. 快 2. Second, registers are easier for a compiler to use and can be used more effectively. 易于有效使用 3. More importantly, registers can hold variables. Then the memory traffic reduces, the program speeds up (faster), the code density improves (named with fewer bits). 存放变量，内存流量减少，程序加速，代码密度提高 Ø Two major characteristics divide GPR architectures. GPR按特性划分 1. Whether an ALU instruction has two or three operands. ALU有两个还是三个操作数？ three-operand format: a result and two source operands 三个操作数的指令，包含两个源操作数和一个目的操作数 One of the operands is both a source and a result for the operation. 两个操作数的指令，其中一个操作数既作为源操作数，又作为目的操作数 2. how many of the operands may be memory addresses. 有多少操作数可以是存储地址？ Typically from none to three.通常0-3个 2.2 Interpreting Memory Address 解释存储地址 Ø How is a memory address interpreted? 存储地址是如何被解释的？ byte addressed and provide access for bytes(8 bits), half words(16 bits), and words(32 bits), double words (64 bits). 字节寻址, 可访问字节、半字、字、双字 Ø two different conventions for ordering the bytes within a word. 两种字中字节的排序 1. Little Endian byte order puts the byte whose address is “x...x00” at the least-significant position in the word (the little end). 小端字节序：低地址装最低有效数 2. Big Endian byte order puts the byte whose address is “x...x00” at the most-significant position in the word (the big end). 高端字节序：低地址装最高有效数 Byte order is a problem when exchanging data among machines with different orderings. 不同字节序机器交互数据有问题 2.3 Address Modes 寻址模式 Ø 计算机如何规定地址 an constant, a register, or a in location in memory. 常数, 寄存器, 存储地址 The actual memory address specified is called the effective address. 有效地址：实际指定的内存地址 1. have the ability to significantly reduce instruction counts; 降低指令数量 2. also add to the complexity of building a machine. 增加了复杂性 3. may increase the average CPI of computers that implement those modes. 可能增加平均CPI 4. the usage of various addressing modes is quite important in helping the architect choose what to include. 选择很重要 Ø Displacement Addressing Mode 位移寻址模式 the range of displacements used. 位移范围多大 Choosing the displacement field sizes is important because they directly affect the instruction length. 选择位移字段的大小是很重要的,因为他们直接影响指令长度。 Ø Immediate or Literal Addressing Mode 立即或文字寻址 Immediates can be used in arithmetic operations, in comparisons (primarily for branches), and in moves where a constant is wanted in a register. 可用于算数、比较、移动不是所有的操作都支持 Ø the range of values for immediates. 立即数的范围 1. Like displacement values, the sizes of immediate values affect instruction lengths. 像位移值一样，立即数大小也影响指令大小 2. As the following Figure shows, immediate values that are small are most heavily used. 小立即数最常使用 3. Large immediates are sometimes used, however, most likely in addressing calculations. 大立即数有时用，多用于地址计算 Ø Summary: Memory Addressing 总结: 存储器寻址 1. A new architecture should support at least: displacement, immediate, and register deferred. they represent 75% to 99% of the addressing modes. 一个新的体系结构应该至少支持位移、立即的、寄存器延迟、以上三种代表了75%-99%的寻址模式。 2. The size of the address for displacement mode should be at least 12 to 16 bits, these sizes would capture 75% to 99% of the displacements. 地址位移至少12到16位，占75%-99%。 3. The size of the immediate field should be at least 8 to 16 bits. these sizes would capture 50% to 80% of the immediates. 立即数至少8到16位，占50%-80%。 2.4 Optimizing Instruction Formats 优化指令格式 Ø instruction format length指令格式长度 1. This decision affects, and is affected by, memory size, memory organization, bus structure, CPU complexity, and CPU speed 受内存大小、内存组织,总线结构、CPU的复杂性和CPU速度影响。 2. The decision determines the richness and flexibility of the machine as seen by the assembly- language programmer. 决定了机器的丰富性、灵活性 3. For a given instruction length, there is clearly a trade-off between the number of opcodes and the power of the addressing capability. 操作码数和寻址能力的折中取舍 2.4.1 Opcodes Representation 操作码表示 Ø Fixed-length Opcodes 固定长度的操作码 has the advantages of simple hardware decoding and regularity. But it wastes of s

展开阅读全文