收藏 分销(赏)

程序的循环优化方法.pptx

上传人:胜**** 文档编号:1722636 上传时间:2024-05-08 格式:PPTX 页数:19 大小:379.84KB
下载 相关 举报
程序的循环优化方法.pptx_第1页
第1页 / 共19页
程序的循环优化方法.pptx_第2页
第2页 / 共19页
程序的循环优化方法.pptx_第3页
第3页 / 共19页
程序的循环优化方法.pptx_第4页
第4页 / 共19页
程序的循环优化方法.pptx_第5页
第5页 / 共19页
点击查看更多>>
资源描述

1、AgendaIntroductionWho Cares?DefinitionLoop Dependence and RemovalDependency Identification LabSummaryIntroductionLoops must meet certain criteriaIteration IndependenceIteration IndependenceMemory DisambiguationMemory DisambiguationHigh Loop CountHigh Loop CountEtcEtcWho Cares实现真正的并行实现真正的并行:OpenMPOpe

2、nMP Auto ParallelizationAuto Parallelization显式的指令级并行显式的指令级并行 ILP(Instruction Level ILP(Instruction Level Parallelism)Parallelism)Streaming SIMD(MMX,SSE,SSE2,)Streaming SIMD(MMX,SSE,SSE2,)Software Pipelining on Intel Itanium ProcessorSoftware Pipelining on Intel Itanium Processor Remove Dependencies

3、for the Out-of-Order CoreRemove Dependencies for the Out-of-Order Core More Instructions run in parallel on Intel Itanium-ProcessorMore Instructions run in parallel on Intel Itanium-Processor自动编译器并行自动编译器并行 High Level OptimizationsHigh Level OptimizationsDefinitionint aMAX;for(J=0;JMAX;J+)aJ=bJ;Loop

4、Independence:Loop Independence:Iteration Y of a loop is Iteration Y of a loop is independent of when independent of when or whether iteration X or whether iteration X happenshappens图例图例OpenMP:True ParallelismSIMD:VectorizationSWP:Software PipeliningOOO:Out-of-Order Core ILP:Instruction Level Paralle

5、lism Green:Benefits from concept Yellow:Some Benefits from Concept Red:No Benefit from Concept AgendaDefinitionWho Cares?Loop Dependence and RemovalData DependenciesData DependenciesRemoving DependenciesRemoving DependenciesData Ambiguity and the CompilerData Ambiguity and the CompilerDependency Rem

6、oval LabSummaryFlow DependencyRead After WriteCross-Iteration Flow Dependence:Variables written then read in different iterationsfor(J=1;JMAX;J+)AJ=AJ-1;A1=A0;A2=A1;Anti-DependencyWrite After ReadCross-Iteration Anti-Dependence:Variables written then read in different iterationsfor(J=1;JMAX;J+)AJ=AJ

7、+1;A1=A2;A2=A3;Output DependencyWrite After WriteCross-Iteration Output Dependence:Variables written then written again in a different iterationfor(J=1;JMAX;J+)AJ=BJ;AJ+1=CJ;A1=B1;A2=C1;A2=B1;A3=C1;IntraIteration DependencyDependency within an iterationHurts ILPMay be automatically removed by compil

8、erK=1;for(J=1;JMAX;J+)AJ=AJ+1;BK=AK+1;K=K+2;A1=A1+1;B1=A1+1;for(J=1;JMAX;J+)AJ=A0+J;Remove DependenciesBest ChoiceBest ChoiceRequirement for Requirement for true Parallelismtrue ParallelismNot all Not all dependencies can dependencies can be removedbe removedfor(J=1;JMAX;J+)AJ=AJ-1+1;for(J=1;JMAX;J+

9、=2)AJ=AJ-1+BJ;AJ+1=AJ-1+(BJ+BJ+1);Increasing ILP,without removing dependencies Good:Unroll LoopGood:Unroll Loop Make sure the compiler Make sure the compiler cant or didnt do this for cant or didnt do this for youyou Compiler should not Compiler should not apply common sub-apply common sub-expressio

10、n elimination expression elimination Also notice that if this is Also notice that if this is floating point data-floating point data-precision could be precision could be alteredalteredfor(J=1;JMAX;J+)AJ=AJ-1+BJ;Induction VariablesInduction variables are Induction variables are incremented on each i

11、ncremented on each trip through the looptrip through the loopFix by replacing Fix by replacing increment expressions increment expressions with pure function of with pure function of loop indexloop indexi1=0;i2=0;for(J=0,JMAX,J+)i1=i1+1;B(i1)=i2=i2+J;A(i2)=for(J=0,JMAX,J+)B(J)=.A(J*2+J)/2)=.Reductio

12、ns Reductions collapse Reductions collapse array data to scalar data array data to scalar data via associative via associative operations:operations:Take advantage of Take advantage of associativity and associativity and compute partial compute partial sums or local sums or local maximum in private

13、maximum in private storagestorageNext,combine partial Next,combine partial results into shared results into shared result,taking care to result,taking care to synchronize accesssynchronize accessfor(J=0;JMAX;J+)sum=sum+cJ;Data Ambiguity and the Compilervoid func(int*a,int*b)for(J=0;JMAX;J+)aJ=bJ;Are

14、 the loop Are the loop iterations iterations independent?independent?The C+compiler has The C+compiler has no ideano ideaNo chance for No chance for optimization-In order optimization-In order to run error free the to run error free the compiler assumes compiler assumes that a and b overlapthat a an

15、d b overlapFunction Callsfor(J=0;J=0;J-)compute(J,.)A Simple Test1.1.Reverse the loop order Reverse the loop order and rerun in serialand rerun in serial2.2.If results are If results are unchanged,the loop is unchanged,the loop is Independent*Independent*for(J=0;JMAX;J+)compute(J,.)*Exception:Loops

16、with induction variablesReverseSummaryLoop Independence:Loop Iterations are Loop Independence:Loop Iterations are independent of each other.independent of each other.Explained its importanceExplained its importance ILP and ParallelismILP and ParallelismIdentified common causes of loop dependenceIdentified common causes of loop dependence Flow Dependency,Anti-Dependency,Output Flow Dependency,Anti-Dependency,Output DependencyDependencyTaught some methods of fixing loop Taught some methods of fixing loop dependencedependenceReinforced concepts through labReinforced concepts through lab

展开阅读全文
相似文档                                   自信AI助手自信AI助手
猜你喜欢                                   自信AI导航自信AI导航
搜索标签

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        获赠5币

©2010-2024 宁波自信网络信息技术有限公司  版权所有

客服电话:4008-655-100  投诉/维权电话:4009-655-100

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :gzh.png    weibo.png    LOFTER.png 

客服