ImageVerifierCode 换一换
格式:PPT , 页数:171 ,大小:5.01MB ,
资源ID:9778681      下载积分:20 金币
验证码下载
登录下载
邮箱/手机:
图形码:
验证码: 获取验证码
温馨提示:
支付成功后,系统会自动生成账号(用户名为邮箱或者手机号,密码是验证码),方便下次登录下载和查询订单;
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

开通VIP
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.zixin.com.cn/docdown/9778681.html】到电脑端继续下载(重复下载【60天内】不扣币)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

开通VIP折扣优惠下载文档

            查看会员权益                  [ 下载后找不到文档?]

填表反馈(24小时):  下载求助     关注领币    退款申请

开具发票请登录PC端进行申请。


权利声明

1、咨信平台为文档C2C交易模式,即用户上传的文档直接被用户下载,收益归上传人(含作者)所有;本站仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿,我们不确定上传用户享有完全著作权,根据《信息网络传播权保护条例》,如果侵犯了您的版权、权益或隐私,请联系我们,核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
2、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确),网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据,个别因单元格分列造成显示页码不一将协商解决,平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺,下载前须认真查看,确认无误后再购买,务必慎重购买;若有违法违纪将进行移交司法处理,若涉侵权平台将进行基本处罚并下架。
3、本站所有内容均由用户上传,付费前请自行鉴别,如您付费,意味着您已接受本站规则且自行承担风险,本站不进行额外附加服务,虚拟产品一经售出概不退款(未进行购买下载可退充值款),文档一经付费(服务费)、不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
4、如你看到网页展示的文档有www.zixin.com.cn水印,是因预览和防盗链等技术需要对页面进行转换压缩成图而已,我们并不对上传的文档进行任何编辑或修改,文档下载后都不会有水印标识(原文档上传前个别存留的除外),下载后原文更清晰;试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓;PPT和DOC文档可被视为“模板”,允许上传人保留章节、目录结构的情况下删减部份的内容;PDF文档不管是原文档转换或图片扫描而得,本站不作要求视为允许,下载前可先查看【教您几个在下载文档中可以更好的避免被坑】。
5、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用;网站提供的党政主题相关内容(国旗、国徽、党徽--等)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
6、文档遇到问题,请及时联系平台进行协调解决,联系【微信客服】、【QQ客服】,若有其他问题请点击或扫码反馈【服务填表】;文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“【版权申诉】”,意见反馈和侵权处理邮箱:1219186828@qq.com;也可以拔打客服电话:4009-655-100;投诉/维权电话:18658249818。

注意事项

本文(云计算与云数据管理.ppt)为本站上传会员【人****来】主动上传,咨信网仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知咨信网(发送邮件至1219186828@qq.com、拔打电话4009-655-100或【 微信客服】、【 QQ客服】),核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载【60天内】不扣币。 服务填表

云计算与云数据管理.ppt

1、单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,云计算与云数据管理,陆嘉恒,中国人民大学,先进数据管理,前沿讲习班,2025/4/5 周六,1,主要内容,2,云计算概述,Google,云计算技术:,GFS,,,Bigtable,和,Mapreduce,Yahoo,云计算技术和,Hadoop,云数据管理的挑战,2025/4/5 周六,2,人民大学新开的,分布式系统与云计算,课程,3,分布式系统概述,分布式云计算技术综述,分布式云计算平台,分布式云计算程序开发,2025/4/5 周六,3,第一篇分布式系统概述,4,第一章:分布式系统入门,第二章:客户,-

2、服务器端构架,第三章:分布式对象,第四章:公共对象请求代理结构,(CORBA),2025/4/5 周六,4,第二篇,云计算综述,5,第五章:,云计算入门,第六章:云服务,第七章:云相关技术比较,7.1,网格计算和云计算,7.2 Utility,计算(效用计算)和云计算,7.3,并行和分布计算和云计算,7.4,集群计算和云计算,2025/4/5 周六,5,第三篇,云计算平台,6,第八章:,Google,云平台的三大技术,第九章:,Yahoo,云平台的技术,第十章:,Aneka,云平台的技术,第十一章:,Greenplum,云平台的技术,第十二章:,Amazon dynamo,云平台的技术,20

3、25/4/5 周六,6,第四篇,云计算平台开发,7,第十三章:基于,Hadoop,系统开发,第十四章:基于,HBase,系统开发,第十五章:基于,Google Apps,系统开发,第十六章:基于,MS Azure,系统开发,第十七章:基于,Amazon EC2,系统开发,2025/4/5 周六,7,Cloud computing,2025/4/5 周六,8,2025/4/5 周六,9,Why we use cloud computing?,2025/4/5 周六,10,Why we use cloud computing?,Case 1:,Write a file,Save,Computer

4、down,file is lost,Files are always stored in cloud,never lost,2025/4/5 周六,11,Why we use cloud computing?,Case 2:,Use IE-download,install,use,Use QQ-download,install,use,Use C+-download,install,use,Get the serve from the cloud,2025/4/5 周六,12,What is cloud and cloud computing?,Cloud,Demand resources o

5、r services over Internet,scale and reliability of a data center.,2025/4/5 周六,13,What is cloud and cloud computing?,Cloud computing,is a style of computing in which,dynamically scalable,and often,virtualized,resources are provided as a serve over the Internet.,Users need not have knowledge of,experti

6、se in,or control over the technology infrastructure in the cloud that supports them.,2025/4/5 周六,14,Characteristics of cloud computing,Virtual.,software,databases,Web servers,operating systems,storage and networking as virtual servers.,On demand.,add and subtract processors,memory,network bandwidth,

7、storage.,2025/4/5 周六,15,IaaS,Infrastructure as a Service,PaaS,Platform as a Service,SaaS,Software as a Service,Types of cloud service,2025/4/5 周六,16,Software delivery model,No hardware or software to manage,Service delivered through a browser,Customers use the service on demand,Instant Scalability,S

8、aaS,2025/4/5 周六,17,Examples,Your current CRM package is not managing the load or you simply dont want to host it in-house.Use a SaaS provider such as S,Your email is hosted on an exchange server in your office and it is very slow.Outsource this using Hosted Exchange.,SaaS,2025/4/5 周六,18,Platform del

9、ivery model,Platforms are built upon Infrastructure,which is expensive,Estimating demand is not a science!,Platform management is not fun!,PaaS,2025/4/5 周六,19,Examples,You need to h,ost a large file(5Mb)on your website and make it available for 35,000 users for only two months duration.Use Cloud Fro

10、nt from Amazon,.,You want to start storage services on your network for a large number of files and you do not have the storage capacityuse Amazon S3.,PaaS,2025/4/5 周六,20,Computer infrastructure delivery model,A platform virtualization environment,Computing resources,such as storing and processing c

11、apacity.,Virtualization taken a step further,IaaS,2025/4/5 周六,21,Examples,You want to run a batch job but you dont have the infrastructure necessary to run it in a timely manner.Use Amazon EC2.,You want to host a website,but only for a few days.Use Flexiscale.,IaaS,2025/4/5 周六,22,Cloud computing and

12、 other computing techniques,2025/4/5 周六,23,The 21,st,Century Vision Of Computing,Leonard Kleinrock,one of the chief scientists of the original Advanced Research Projects Agency Network(ARPANET)project which seeded the Internet,said:“,As of now,computer networks are still in their,infancy,but as they

13、 grow up and become sophisticated,we will probably see the spread of,computer utilities,which,like present electric and telephone utilities,will service individual homes and offices across the country.”,2025/4/5 周六,24,The 21,st,Century Vision Of Computing,Sun Microsystems,co-founder Bill Joy He also

14、 indicated“It would take time until these markets to mature to generate this kind of,value.Predicting now which companies will capture the value is impossible.Many of them have not even been created yet.”,2025/4/5 周六,25,The 21,st,Century Vision Of Computing,2025/4/5 周六,26,Definitions,Cloud,Grid,Clus

15、ter,utility,2025/4/5 周六,27,Definitions,Cloud,Grid,Cluster,utility,Utility computing,is the packaging of computing resources,such as computation and storage,as a metered service similar to a traditional,public utility,2025/4/5 周六,28,Definitions,Cloud,Grid,Cluster,utility,A,computer cluster,is a group

16、 of linked computers,working together closely so that in many respects they form a single computer.,2025/4/5 周六,29,Definitions,Cloud,Grid,Cluster,utility,Grid computing,is the application of several computers to a single problem at the same time usually to a scientific or technical problem that requ

17、ires a great number of computer processing cycles or access to large amounts of data,2025/4/5 周六,30,Definitions,Cloud,Grid,Cluster,utility,Cloud computing,is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.,2025/4/5 周六,31

18、Grid Computing&Cloud Computing,share a lot commonality,intention,architecture and technology,Difference,programming model,business model,compute model,applications,and,Virtualization,.,2025/4/5 周六,32,Grid Computing&Cloud Computing,the problems are mostly the same,manage large facilities;,define met

19、hods by which consumers discover,request and use resources provided by the central facilities;,implement the often highly parallel computations that execute on those resources.,2025/4/5 周六,33,Grid Computing&Cloud Computing,Virtualization,Grid,do not rely on virtualization as much as Clouds do,each i

20、ndividual organization maintain full control of their resources,Cloud,an indispensable ingredient for almost every Cloud,2025/4/5 周六,34,2025/4/5 周六,35,2025/4/5 周六,36,Any question and any comments?,2025/4/5 周六,36,主要内容,37,云计算概述,Google,云计算技术:,GFS,,,Bigtable,和,Mapreduce,Yahoo,云计算技术和,Hadoop,云数据管理的挑战,2025

21、/4/5 周六,37,Google Cloud computing techniques,2025/4/5 周六,38,The,G,o,o,g,l,e,File System,2025/4/5 周六,39,The,G,o,o,g,l,e,File System(GFS),A scalable distributed file system for large distributed data intensive applications,Multiple GFS clusters are currently deployed.,The largest ones have:,1000+stora

22、ge nodes,300+TeraBytes of disk storage,heavily accessed by hundreds of clients on distinct machines,2025/4/5 周六,40,Introduction,Shares many same goals as previous distributed file systems,performance,scalability,reliability,etc,GFS design has been driven by four key observation of,G,o,o,g,l,e,applic

23、ation workloads and technological environment,2025/4/5 周六,41,Intro:Observations 1,1.Component failures are the norm,constant monitoring,error detection,fault tolerance and automatic recovery are integral to the system,2.Huge files(by traditional standards),Multi GB files are common,I/O operations an

24、d blocks sizes must be revisited,2025/4/5 周六,42,Intro:Observations 2,3.Most files are mutated by appending new data,This is the focus of performance optimization and atomicity guarantees,4.Co-designing the applications and APIs benefits overall system by increasing flexibility,2025/4/5 周六,43,The Des

25、ign,Cluster consists of a single,master,and multiple,chunkservers,and is accessed by multiple,clients,2025/4/5 周六,44,The Master,Maintains all file system metadata.,names space,access control info,file to chunk mappings,chunk(including replicas)location,etc.,Periodically communicates with chunkserver

26、s in,HeartBeat,messages to give instructions and check state,2025/4/5 周六,45,The Master,Helps make sophisticated chunk placement and replication decision,using global knowledge,For reading and writing,client contacts Master to get chunk locations,then deals directly with chunkservers,Master is not a

27、bottleneck for reads/writes,2025/4/5 周六,46,Chunkservers,Files are broken into,chunks,.Each chunk has a immutable globally unique 64-bit,chunk-handle.,handle is assigned by the master at chunk creation,Chunk size is 64 MB,Each chunk is replicated on 3(default)servers,2025/4/5 周六,47,Clients,Linked to

28、apps using the file system API.,Communicates with master and chunkservers for reading and writing,Master interactions only for metadata,Chunkserver interactions for data,Only caches metadata information,Data is too large to cache.,2025/4/5 周六,48,Chunk Locations,Master does not keep a persistent reco

29、rd of locations of chunks and replicas.,Polls,chunkservers at startup,and when new chunkservers join/leave for this.,Stays up to date by controlling placement of new chunks and through,HeartBeat,messages(when monitoring chunkservers),2025/4/5 周六,49,Operation Log,Record of all critical metadata chang

30、es,Stored on Master and replicated on other machines,Defines order of concurrent operations,Also used to recover the file system state,2025/4/5 周六,50,System Interactions:,Leases and Mutation Order,Leases,maintain a mutation order across all chunk replicas,Master grants a lease to a replica,called th

31、e,primary,The primary choses the serial mutation order,and all replicas follow this order,Minimizes management overhead for the Master,2025/4/5 周六,51,Atomic Record Append,Client specifies the data to write;GFS chooses and returns the offset it writes to and,appends the data to each replica at least

32、once,Heavily used by Google,s Distributed applications.,No need for a distributed lock manager,GFS choses the offset,not the client,2025/4/5 周六,52,Atomic Record Append:,How?,Follows similar control flow as mutations,Primary tells secondary replicas to append at the same offset as the primary,If a re

33、plica append fails at any replica,it is retried by the client.,So replicas of the same chunk may contain different data,including duplicates,whole or in part,of the same record,2025/4/5 周六,53,Atomic Record Append:,How?,GFS does not guarantee that all replicas are bitwise identical.,Only guarantees t

34、hat data is written at least once in an atomic unit.,Data must be written at the same offset for all chunk replicas for success to be reported.,2025/4/5 周六,54,Detecting Stale Replicas,Master has a,chunk version number,to distinguish up to date and stale replicas,Increase version when granting a leas

35、e,If a replica is not available,its version is not increased,master detects stale replicas when a chunkservers report chunks and versions,Remove stale replicas during garbage collection,2025/4/5 周六,55,Garbage collection,When a client deletes a file,master logs it like other changes and changes filen

36、ame to a hidden file.,Master removes files hidden for longer than 3 days when scanning file system name space,metadata is also erased,During,HeartBeat,messages,the chunkservers send the master a subset of its chunks,and the master tells it which files have no metadata.,Chunkserver removes these file

37、s on its own,2025/4/5 周六,56,Fault Tolerance:,High Availability,Fast recovery,Master and chunkservers can restart in seconds,Chunk Replication,Master Replication,“,shadow,”,masters provide read-only access when primary master is down,mutations not done until recorded on all master replicas,2025/4/5 周

38、六,57,Fault Tolerance:,Data Integrity,Chunkservers use,checksums,to detect corrupt data,Since replicas are not bitwise identical,chunkservers maintain their own checksums,For reads,chunkserver verifies checksum before sending chunk,Update checksums during writes,2025/4/5 周六,58,Introduction to,MapRedu

39、ce,2025/4/5 周六,59,MapReduce:Insight,”Consider the problem of counting the number of occurrences of each word in a large collection of documents”,How would you do it in parallel?,2025/4/5 周六,60,MapReduce Programming Model,Inspired from map and reduce operations commonly used in functional programming

40、 languages like Lisp.,Users implement interface of two primary,methods:,1.Map:(key1,val1)(key2,val2),2.Reduce:(key2,val2)val3,2025/4/5 周六,61,Map operation,Map,a pure function,written by the user,takes an input key/value pair and produces a set of intermediate key/value pairs.,e.g.(docid,doc-content)

41、Draw an analogy to SQL,map can be visualized as,group-by,clause of an aggregate query.,2025/4/5 周六,62,Reduce operation,On completion of map phase,all the intermediate values for a given output key are combined together into a list and given to a reducer.,Can be visualized as,aggregate,function(e.g.

42、average)that is computed over all the rows with the same group-by attribute.,2025/4/5 周六,63,Pseudo-code,map(String input_key,String input_value):,/input_key:document name,/input_value:document contents,for each word w in input_value:,EmitIntermediate(w,1);,reduce(String output_key,Iterator intermed

43、iate_values):,/output_key:a word,/output_values:a list of counts,int result=0;,for each v in intermediate_values:,result+=ParseInt(v);,Emit(AsString(result);,2025/4/5 周六,64,MapReduce:Execution overview,2025/4/5 周六,65,MapReduce:Example,2025/4/5 周六,66,MapReduce in Parallel:Example,2025/4/5 周六,67,MapRe

44、duce:Fault Tolerance,Handled via re-execution of tasks.,Task completion committed through master,What happens if Mapper fails?,Re-execute completed+in-progress,map,tasks,What happens if Reducer fails?,Re-execute in progress,reduce,tasks,What happens if Master fails?,Potential trouble!,2025/4/5 周六,68

45、MapReduce:,Walk through of,One more Application,2025/4/5 周六,69,2025/4/5 周六,70,MapReduce:PageRank,PageRank models the behavior of a“random surfer”.,C(t)is the out-degree of t,and(1-d)is a damping factor(random jump),The“random surfer”keeps clicking on successive links at random not taking content in

46、to consideration.,Distributes its pages rank equally among all pages it links to.,The dampening factor takes the surfer“getting bored”and typing arbitrary URL.,2025/4/5 周六,71,PageRank:Key Insights,Effects at each iteration is local.i+1,th,iteration depends only on i,th,iteration,At iteration i,PageR

47、ank for individual nodes can be computed independently,2025/4/5 周六,72,PageRank using MapReduce,Use Sparse matrix representation(M),Map each,row of M to a list of PageRank“credit”to assign to out link neighbours.,These prestige scores are,reduced,to a single PageRank value for a page by aggregating o

48、ver them.,2025/4/5 周六,73,PageRank using MapReduce,Map:distribute PageRank“credit”to link targets,Reduce:gather up PageRank“credit”from multiple sources to compute new PageRank value,Iterate until,convergence,Source of Image:Lin 2008,2025/4/5 周六,74,Phase 1:,Process HTML,Map task takes(URL,page-conten

49、t)pairs and maps them to(URL,(PR,init,list-of-urls),PR,init,is the“seed”PageRank for URL,list-of-urls contains all pages pointed to by URL,Reduce task is just the identity function,2025/4/5 周六,75,Phase 2:,PageRank Distribution,Reduce task gets(URL,url_list)and many(URL,val,)values,Sum,val,s and fix

50、up with,d to get new PR,Emit(URL,(new_rank,url_list),Check for convergence using non parallel component,2025/4/5 周六,76,MapReduce:Some More Apps,Distributed Grep.,Count of URL Access Frequency.,Clustering(K-means),Graph Algorithms.,Indexing Systems,MapReduce Programs In Google Source Tree,2025/4/5 周六

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服