DSP讲义(C6K).doc_咨信网zixin.com.cn

资源描述

Chapter 1 Introduction Learning Objectives u Why process signals digitally? u Definition of a real-time application. u Why use Digital Signal Processing processors? u What are the typical DSP algorithms? u Parameters to consider when choosing a DSP processor. u Programmable vs. ASIC DSP. u Texas Instruments’ TMS320 family. 1.1 Why go digital? u Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance. u Examples: w FIR filter with linear phase. w Adaptive filters. u Analogue signal processing is achieved by using analogue components such as: w Resistors. w Capacitors. w Inductors. u The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry. u With DSP it is easy to: w Change applications. w Correct applications. w Update applications. u Additionally DSP reduces: w Noise susceptibility. w Chip count. w Development time. w Cost. w Power consumption. 1.2 Why NOT go digital? u High frequency signals cannot be processed digitally because of two reasons: w Analog to Digital Converters, ADC cannot work fast enough. w The application can be too complex to be performed in real-time. 1.3 Real-time processing u DSP processors have to perform tasks in real-time, so how do we define real-time? u The definition of real-time depends on the application. u Example: a 100-tap FIR filter is performed in real-time if the DSP can perform and complete the following operation between two samples: u We can say that we have a real-time application if: w Waiting Time ³ 0 1.3 Why do we need DSP processors? u Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? w What is the power consumption of a Pentium and a DSP processor? w What is the cost of a Pentium and a DSP processor? u Use a DSP processor when the following are required: w Cost saving. w Smaller size. w Low power consumption. w Processing of many “high” frequency signals in real-time. u Use a GPP processor when the following are required: w Large memory. w Advanced operating systems. 1.4 What are the typical DSP algorithms? The Sum of Products (SOP) is the key element in most DSP algorithms: 1.4.1 Hardware vs. Microcode multiplication u DSP processors are optimized to perform multiplication and addition operations. u Multiplication and addition are done in hardware and in one cycle. u Example: 4-bit multiply (unsigned). 1.5 Parameters to consider when choosing a DSP processor u C6711 Datasheet: \Links\TMS320C6711.pdf u C6211 Datasheet: \Links\TMS320C6211.pdf 1.6 Floating vs. Fixed point processors u Applications which require: w High precision. w Wide dynamic range. w High signal-to-noise ratio. w Ease of use. Need a floating point processor. u Drawback of floating point processors: w Higher power consumption. w Can be more expensive. w Can be slower than fixed-point counterparts and larger in size. u It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost. u For educational purposes, use the floating-point device (C6711) as it can support both fixed and floating point operations. 1.7 General Purpose DSP vs. DSP in ASIC u Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions. u The advantages and disadvantages of using ASICs are listed below: 1.8 Texas Instruments’ TMS320 family u Different families and sub-families exist to support different markets. TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems. TMS320C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive. 1.9 C6000 Roadmap Useful Links u Selection Guide: w \Links\DSP Selection Guide.pdf w \Links\DSP Selection Guide.pdf (3Q 2004) w \Links\DSP Selection Guide.pdf (4Q 2004) Chapter 2 TMS320C6000 Architectural Overview Learning Objectives u Describe C6000 CPU architecture. u Introduce some basic instructions. u Describe the C6000 memory map. u Provide an overview of the peripherals. 2.1 General DSP System Block Diagram 2.2 Implementation of Sum of Products (SOP) It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture. Let’s implement the SOP algorithm! (The implementation in this module will be done in assembly.) Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required. 2.2.1 Multiply (MPY) The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M”. Multiply (.M unit): The . M unit performs multiplications in hardware MPY .M a1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result. 2.2.2 Addition (.?) MPY .M a1, x1, prod ADD .? Y, prod, Y Add (.L unit) MPY .M a1, x1, prod ADD .L Y, prod, Y RISC processors such as the C6000 use registers to hold the operands, so let’s change this code. 2.2.3 Register File - A Let us correct this by replacing a, x, prod and Y by the registers as shown above. Specifying Register Names MPY .M A0, A1, A3 ADD .L A4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions. Register File A contains 16 registers (A0 -A15) which are 32-bits wide. 2.2.4 Data loading Q: How do we load the operands into the registers? A: The operands are loaded into the registers by loading them from the memory using the .D unit. Load Unit “.D” It is worth noting at this stage that the only way to access memory is through the .D unit. 2.2.5 Load Instruction Q: Which instruction(s) can be used for loading operands from the memory to the registers? A: The load instructions. Load Instructions (LDB, LDH,LDW,LDDW) 2.2.6 Using the Load Instructions Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. The syntax for the load instruction is: LD *Rn, Rm Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. The question now is how many bytes are going to be loaded into the destination register? The answer is that it depends on the instruction you choose: • LDB: loads one byte (8-bit) • LDH: loads half word (16-bit) • LDW: loads a word (32-bit) • LDDW: loads a double word (64-bit) Note: LD on its own does not exist. Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7; gives A7 = 0x00000001 (2) LDH *A5, A7; gives A7 = 0x00000201 (3) LDW *A5, A7; gives A7 = 0x04030201 (4) LDDW *A5, A7:A6; gives A7:A6 =0x0807060504030201 Question: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place? 2.2.7 Loading the Pointer Rn u The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL .? a, A5 (‘a’ is a constant or label) u How many bits represent a full address? 32 bits u So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). u To solve this problem another instruction is available: MVKH eg. MVKH .? a, A5 (‘a’ is a constant or label) ah ah x al a A5 u Finally, to move the 32-bit address to a register we can use: MVKL a, A5 MVKH a,A5 Always use MVKL then MVKH, look at the following examples: Example 1 A5 = 0x87654321 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH 0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH 0x1234FABC, A5 A5 = 0x12344321 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong 2.28 LDH, MVKL and MVKH MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 pt1 and pt2 point to some locations in the data memory. [So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So let’s create a loop so that we can implement the SOP for N Taps.] 2.2.9 Creating a loop With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. 2.2.10 What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3. Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter. 1. Create a label to branch to 2. Add a branch instruction, B. Which unit is used by the B instruction? 3. Create a loop counter. 4. Decrement the loop counter 5. Make the branch conditional based on the value in the loop counter u What is the syntax for making instruction conditional? [condition] Instruction Label e.g. [B1] B loop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. u The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] Instruction Label e.g. [!B0] B loop ;branch if B0 = 0 [B0] B loop ;branch if B0 != 0 2.2.11 More on the Branch Instruction (1) u With this processor all the instructions are encoded in a 32-bit. u Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. u Case 1: B .S1 label u Relative branch. u Label limited to +/- 220 offset. u By specifying a register as an operand instead of a label, it is possible to have an absolute branch. u This will allow a dynamic range of 232. u Case 2: B .S2 register u Absolute branch. u Operates on .S2 ONLY! 2.2.12 Testing the code This code performs the following operations: a0*x0 + a0*x0 + a0*x0 + … + a0*x0 However, we would like to perform: a0*x0 + a1*x1 + a2*x2 + … + aN*xN Modifying the pointers The solution is to modify the pointers A5 and A6. 2.2.13 Indexing Pointers Syntax Description Pointer Modified *R Pointer No In this case the pointers are used but not modified. R can be any register Syntax Description Pointer Modified *R Pointer No *+R[disp] + Pre-offset No *-R[disp] - Pre-offset No In this case the pointers are modified BEFORE being used and RESTORED to their previous values. w [ disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). w disp = R or 5-bit constant. w R can be any register. Syntax Description Pointer Modified *R Pointer No *+R[disp] + Pre-offset No *-R[disp] - Pre-offset No *++R[disp] Pre-increment Yes *--R[disp] Pre-decrement Yes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values. Syntax Description Pointer Modified *R Pointer No *+R[disp] + Pre-offset No *-R[disp] - Pre-offset No *++R[disp] Pre-increment Yes *--R[disp] Pre-decrement Yes *R++ [disp] Post-increment Yes *R-- [disp] Post-decrement Yes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values. w [disp] specifies # elements - size in DW, W, H, or B. w disp = R or 5-bit constant. w R can be any register. 2.2.14 Modify and testing the code This code now performs the following operations: a0*x0 + a1*x1 + a2*x2 + ... + aN*xN Store the final result, but the Pointer A7 has not been initialized. What is the initial value of A4? A4 is used as an accumulator, so it needs to be reset to zero. 2.2.15 Increasing the processing power! How can we add more processing power to this processor? (1) Increase the clock frequency. (2) Increase the number of Processing units. To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Can the two sides exchange operands in order to increase performance? The answer is YES but there are limitations. u To exchange operands between the two sides, some cross paths or links are required. What is a cross path? u A cross path links one side of the CPU to the other. u There are two types of cross paths: u Data cross paths. u Address cross paths. 2.2.16 Data Cross Paths u Data cross paths can also be referred to as register file cross paths. u These cross paths allow operands from one side to be used by the other side. u There are only two cross paths: u one path which conveys data from side B to side A, 1X. u one path which conveys data from side A to side B, 2X. TMS320C67x Data-Path Data Cross Paths u Data cross paths only apply to the .L, .S and .M units. u The data cross paths are very useful, however there are some limitations in their use. 2.2.17 Data Cross Path Limitations (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously. eg: ADD .L1x A0,A1,B2 MPY .M1x A0,B6,A9 SUB .S1x A8,B2,A8 || ADD .L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously. eg: ADD .L1x A0,A1,B2 MPY .M1x A0,B6,A9 SUB .S1x A8,B2,A8 || ADD .L1x A0,B0,A2 NOT VALID! Data Cross Paths for both sides 2.2.17 Address cross paths (1) Th

展开阅读全文