收藏 分销(赏)

云计算与云数据管理.ppt

上传人:人****来 文档编号:9778681 上传时间:2025-04-07 格式:PPT 页数:171 大小:5.01MB
下载 相关 举报
云计算与云数据管理.ppt_第1页
第1页 / 共171页
云计算与云数据管理.ppt_第2页
第2页 / 共171页
点击查看更多>>
资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,云计算与云数据管理,陆嘉恒,中国人民大学,先进数据管理,前沿讲习班,2025/4/5 周六,1,主要内容,2,云计算概述,Google,云计算技术:,GFS,,,Bigtable,和,Mapreduce,Yahoo,云计算技术和,Hadoop,云数据管理的挑战,2025/4/5 周六,2,人民大学新开的,分布式系统与云计算,课程,3,分布式系统概述,分布式云计算技术综述,分布式云计算平台,分布式云计算程序开发,2025/4/5 周六,3,第一篇分布式系统概述,4,第一章:分布式系统入门,第二章:客户,-,服务器端构架,第三章:分布式对象,第四章:公共对象请求代理结构,(CORBA),2025/4/5 周六,4,第二篇,云计算综述,5,第五章:,云计算入门,第六章:云服务,第七章:云相关技术比较,7.1,网格计算和云计算,7.2 Utility,计算(效用计算)和云计算,7.3,并行和分布计算和云计算,7.4,集群计算和云计算,2025/4/5 周六,5,第三篇,云计算平台,6,第八章:,Google,云平台的三大技术,第九章:,Yahoo,云平台的技术,第十章:,Aneka,云平台的技术,第十一章:,Greenplum,云平台的技术,第十二章:,Amazon dynamo,云平台的技术,2025/4/5 周六,6,第四篇,云计算平台开发,7,第十三章:基于,Hadoop,系统开发,第十四章:基于,HBase,系统开发,第十五章:基于,Google Apps,系统开发,第十六章:基于,MS Azure,系统开发,第十七章:基于,Amazon EC2,系统开发,2025/4/5 周六,7,Cloud computing,2025/4/5 周六,8,2025/4/5 周六,9,Why we use cloud computing?,2025/4/5 周六,10,Why we use cloud computing?,Case 1:,Write a file,Save,Computer down,file is lost,Files are always stored in cloud,never lost,2025/4/5 周六,11,Why we use cloud computing?,Case 2:,Use IE-download,install,use,Use QQ-download,install,use,Use C+-download,install,use,Get the serve from the cloud,2025/4/5 周六,12,What is cloud and cloud computing?,Cloud,Demand resources or services over Internet,scale and reliability of a data center.,2025/4/5 周六,13,What is cloud and cloud computing?,Cloud computing,is a style of computing in which,dynamically scalable,and often,virtualized,resources are provided as a serve over the Internet.,Users need not have knowledge of,expertise in,or control over the technology infrastructure in the cloud that supports them.,2025/4/5 周六,14,Characteristics of cloud computing,Virtual.,software,databases,Web servers,operating systems,storage and networking as virtual servers.,On demand.,add and subtract processors,memory,network bandwidth,storage.,2025/4/5 周六,15,IaaS,Infrastructure as a Service,PaaS,Platform as a Service,SaaS,Software as a Service,Types of cloud service,2025/4/5 周六,16,Software delivery model,No hardware or software to manage,Service delivered through a browser,Customers use the service on demand,Instant Scalability,SaaS,2025/4/5 周六,17,Examples,Your current CRM package is not managing the load or you simply dont want to host it in-house.Use a SaaS provider such as S,Your email is hosted on an exchange server in your office and it is very slow.Outsource this using Hosted Exchange.,SaaS,2025/4/5 周六,18,Platform delivery model,Platforms are built upon Infrastructure,which is expensive,Estimating demand is not a science!,Platform management is not fun!,PaaS,2025/4/5 周六,19,Examples,You need to h,ost a large file(5Mb)on your website and make it available for 35,000 users for only two months duration.Use Cloud Front from Amazon,.,You want to start storage services on your network for a large number of files and you do not have the storage capacityuse Amazon S3.,PaaS,2025/4/5 周六,20,Computer infrastructure delivery model,A platform virtualization environment,Computing resources,such as storing and processing capacity.,Virtualization taken a step further,IaaS,2025/4/5 周六,21,Examples,You want to run a batch job but you dont have the infrastructure necessary to run it in a timely manner.Use Amazon EC2.,You want to host a website,but only for a few days.Use Flexiscale.,IaaS,2025/4/5 周六,22,Cloud computing and other computing techniques,2025/4/5 周六,23,The 21,st,Century Vision Of Computing,Leonard Kleinrock,one of the chief scientists of the original Advanced Research Projects Agency Network(ARPANET)project which seeded the Internet,said:“,As of now,computer networks are still in their,infancy,but as they grow up and become sophisticated,we will probably see the spread of,computer utilities,which,like present electric and telephone utilities,will service individual homes and offices across the country.”,2025/4/5 周六,24,The 21,st,Century Vision Of Computing,Sun Microsystems,co-founder Bill Joy He also indicated“It would take time until these markets to mature to generate this kind of,value.Predicting now which companies will capture the value is impossible.Many of them have not even been created yet.”,2025/4/5 周六,25,The 21,st,Century Vision Of Computing,2025/4/5 周六,26,Definitions,Cloud,Grid,Cluster,utility,2025/4/5 周六,27,Definitions,Cloud,Grid,Cluster,utility,Utility computing,is the packaging of computing resources,such as computation and storage,as a metered service similar to a traditional,public utility,2025/4/5 周六,28,Definitions,Cloud,Grid,Cluster,utility,A,computer cluster,is a group of linked computers,working together closely so that in many respects they form a single computer.,2025/4/5 周六,29,Definitions,Cloud,Grid,Cluster,utility,Grid computing,is the application of several computers to a single problem at the same time usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data,2025/4/5 周六,30,Definitions,Cloud,Grid,Cluster,utility,Cloud computing,is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.,2025/4/5 周六,31,Grid Computing&Cloud Computing,share a lot commonality,intention,architecture and technology,Difference,programming model,business model,compute model,applications,and,Virtualization,.,2025/4/5 周六,32,Grid Computing&Cloud Computing,the problems are mostly the same,manage large facilities;,define methods by which consumers discover,request and use resources provided by the central facilities;,implement the often highly parallel computations that execute on those resources.,2025/4/5 周六,33,Grid Computing&Cloud Computing,Virtualization,Grid,do not rely on virtualization as much as Clouds do,each individual organization maintain full control of their resources,Cloud,an indispensable ingredient for almost every Cloud,2025/4/5 周六,34,2025/4/5 周六,35,2025/4/5 周六,36,Any question and any comments?,2025/4/5 周六,36,主要内容,37,云计算概述,Google,云计算技术:,GFS,,,Bigtable,和,Mapreduce,Yahoo,云计算技术和,Hadoop,云数据管理的挑战,2025/4/5 周六,37,Google Cloud computing techniques,2025/4/5 周六,38,The,G,o,o,g,l,e,File System,2025/4/5 周六,39,The,G,o,o,g,l,e,File System(GFS),A scalable distributed file system for large distributed data intensive applications,Multiple GFS clusters are currently deployed.,The largest ones have:,1000+storage nodes,300+TeraBytes of disk storage,heavily accessed by hundreds of clients on distinct machines,2025/4/5 周六,40,Introduction,Shares many same goals as previous distributed file systems,performance,scalability,reliability,etc,GFS design has been driven by four key observation of,G,o,o,g,l,e,application workloads and technological environment,2025/4/5 周六,41,Intro:Observations 1,1.Component failures are the norm,constant monitoring,error detection,fault tolerance and automatic recovery are integral to the system,2.Huge files(by traditional standards),Multi GB files are common,I/O operations and blocks sizes must be revisited,2025/4/5 周六,42,Intro:Observations 2,3.Most files are mutated by appending new data,This is the focus of performance optimization and atomicity guarantees,4.Co-designing the applications and APIs benefits overall system by increasing flexibility,2025/4/5 周六,43,The Design,Cluster consists of a single,master,and multiple,chunkservers,and is accessed by multiple,clients,2025/4/5 周六,44,The Master,Maintains all file system metadata.,names space,access control info,file to chunk mappings,chunk(including replicas)location,etc.,Periodically communicates with chunkservers in,HeartBeat,messages to give instructions and check state,2025/4/5 周六,45,The Master,Helps make sophisticated chunk placement and replication decision,using global knowledge,For reading and writing,client contacts Master to get chunk locations,then deals directly with chunkservers,Master is not a bottleneck for reads/writes,2025/4/5 周六,46,Chunkservers,Files are broken into,chunks,.Each chunk has a immutable globally unique 64-bit,chunk-handle.,handle is assigned by the master at chunk creation,Chunk size is 64 MB,Each chunk is replicated on 3(default)servers,2025/4/5 周六,47,Clients,Linked to apps using the file system API.,Communicates with master and chunkservers for reading and writing,Master interactions only for metadata,Chunkserver interactions for data,Only caches metadata information,Data is too large to cache.,2025/4/5 周六,48,Chunk Locations,Master does not keep a persistent record of locations of chunks and replicas.,Polls,chunkservers at startup,and when new chunkservers join/leave for this.,Stays up to date by controlling placement of new chunks and through,HeartBeat,messages(when monitoring chunkservers),2025/4/5 周六,49,Operation Log,Record of all critical metadata changes,Stored on Master and replicated on other machines,Defines order of concurrent operations,Also used to recover the file system state,2025/4/5 周六,50,System Interactions:,Leases and Mutation Order,Leases,maintain a mutation order across all chunk replicas,Master grants a lease to a replica,called the,primary,The primary choses the serial mutation order,and all replicas follow this order,Minimizes management overhead for the Master,2025/4/5 周六,51,Atomic Record Append,Client specifies the data to write;GFS chooses and returns the offset it writes to and,appends the data to each replica at least once,Heavily used by Google,s Distributed applications.,No need for a distributed lock manager,GFS choses the offset,not the client,2025/4/5 周六,52,Atomic Record Append:,How?,Follows similar control flow as mutations,Primary tells secondary replicas to append at the same offset as the primary,If a replica append fails at any replica,it is retried by the client.,So replicas of the same chunk may contain different data,including duplicates,whole or in part,of the same record,2025/4/5 周六,53,Atomic Record Append:,How?,GFS does not guarantee that all replicas are bitwise identical.,Only guarantees that data is written at least once in an atomic unit.,Data must be written at the same offset for all chunk replicas for success to be reported.,2025/4/5 周六,54,Detecting Stale Replicas,Master has a,chunk version number,to distinguish up to date and stale replicas,Increase version when granting a lease,If a replica is not available,its version is not increased,master detects stale replicas when a chunkservers report chunks and versions,Remove stale replicas during garbage collection,2025/4/5 周六,55,Garbage collection,When a client deletes a file,master logs it like other changes and changes filename to a hidden file.,Master removes files hidden for longer than 3 days when scanning file system name space,metadata is also erased,During,HeartBeat,messages,the chunkservers send the master a subset of its chunks,and the master tells it which files have no metadata.,Chunkserver removes these files on its own,2025/4/5 周六,56,Fault Tolerance:,High Availability,Fast recovery,Master and chunkservers can restart in seconds,Chunk Replication,Master Replication,“,shadow,”,masters provide read-only access when primary master is down,mutations not done until recorded on all master replicas,2025/4/5 周六,57,Fault Tolerance:,Data Integrity,Chunkservers use,checksums,to detect corrupt data,Since replicas are not bitwise identical,chunkservers maintain their own checksums,For reads,chunkserver verifies checksum before sending chunk,Update checksums during writes,2025/4/5 周六,58,Introduction to,MapReduce,2025/4/5 周六,59,MapReduce:Insight,”Consider the problem of counting the number of occurrences of each word in a large collection of documents”,How would you do it in parallel?,2025/4/5 周六,60,MapReduce Programming Model,Inspired from map and reduce operations commonly used in functional programming languages like Lisp.,Users implement interface of two primary,methods:,1.Map:(key1,val1)(key2,val2),2.Reduce:(key2,val2)val3,2025/4/5 周六,61,Map operation,Map,a pure function,written by the user,takes an input key/value pair and produces a set of intermediate key/value pairs.,e.g.(docid,doc-content),Draw an analogy to SQL,map can be visualized as,group-by,clause of an aggregate query.,2025/4/5 周六,62,Reduce operation,On completion of map phase,all the intermediate values for a given output key are combined together into a list and given to a reducer.,Can be visualized as,aggregate,function(e.g.,average)that is computed over all the rows with the same group-by attribute.,2025/4/5 周六,63,Pseudo-code,map(String input_key,String input_value):,/input_key:document name,/input_value:document contents,for each word w in input_value:,EmitIntermediate(w,1);,reduce(String output_key,Iterator intermediate_values):,/output_key:a word,/output_values:a list of counts,int result=0;,for each v in intermediate_values:,result+=ParseInt(v);,Emit(AsString(result);,2025/4/5 周六,64,MapReduce:Execution overview,2025/4/5 周六,65,MapReduce:Example,2025/4/5 周六,66,MapReduce in Parallel:Example,2025/4/5 周六,67,MapReduce:Fault Tolerance,Handled via re-execution of tasks.,Task completion committed through master,What happens if Mapper fails?,Re-execute completed+in-progress,map,tasks,What happens if Reducer fails?,Re-execute in progress,reduce,tasks,What happens if Master fails?,Potential trouble!,2025/4/5 周六,68,MapReduce:,Walk through of,One more Application,2025/4/5 周六,69,2025/4/5 周六,70,MapReduce:PageRank,PageRank models the behavior of a“random surfer”.,C(t)is the out-degree of t,and(1-d)is a damping factor(random jump),The“random surfer”keeps clicking on successive links at random not taking content into consideration.,Distributes its pages rank equally among all pages it links to.,The dampening factor takes the surfer“getting bored”and typing arbitrary URL.,2025/4/5 周六,71,PageRank:Key Insights,Effects at each iteration is local.i+1,th,iteration depends only on i,th,iteration,At iteration i,PageRank for individual nodes can be computed independently,2025/4/5 周六,72,PageRank using MapReduce,Use Sparse matrix representation(M),Map each,row of M to a list of PageRank“credit”to assign to out link neighbours.,These prestige scores are,reduced,to a single PageRank value for a page by aggregating over them.,2025/4/5 周六,73,PageRank using MapReduce,Map:distribute PageRank“credit”to link targets,Reduce:gather up PageRank“credit”from multiple sources to compute new PageRank value,Iterate until,convergence,Source of Image:Lin 2008,2025/4/5 周六,74,Phase 1:,Process HTML,Map task takes(URL,page-content)pairs and maps them to(URL,(PR,init,list-of-urls),PR,init,is the“seed”PageRank for URL,list-of-urls contains all pages pointed to by URL,Reduce task is just the identity function,2025/4/5 周六,75,Phase 2:,PageRank Distribution,Reduce task gets(URL,url_list)and many(URL,val,)values,Sum,val,s and fix up with,d to get new PR,Emit(URL,(new_rank,url_list),Check for convergence using non parallel component,2025/4/5 周六,76,MapReduce:Some More Apps,Distributed Grep.,Count of URL Access Frequency.,Clustering(K-means),Graph Algorithms.,Indexing Systems,MapReduce Programs In Google Source Tree,2025/4/5 周六
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传
相似文档                                   自信AI助手自信AI助手

当前位置:首页 > 通信科技 > 云计算

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服