毕业论文-英文翻译文件系统虚拟化和服务网格数据管理.doc

资源描述

文件系统虚拟化和服务网格数据管理摘要：根据计算的大小和行政区划来看，它的规模在日益增大。例如科学网格，指几个机构之间资源利用协调解决问题，企业信息系统，从多个站点聚合协同努力发展。在这些系统中常见的是，应用程序和数据都分布在资源跨行政区划和广域网。这样的环境可称为“格式”环境。关键词：数据；应用；程序 1.介绍这环境有以下特色 1.1 特点： •异质性：有一个存在于多种应用程序和资源网格式环境。这些资源通常有不同的硬件配置（例如，CPU速度和结构，内存大小，磁盘带宽和容量）和软件设置（例如，操作系统和图书馆），而应用程序也有不同的特性（如数据访问模式）和需要（如需要的数据访问性能，安全性和可靠性）。 •活力：在网格式的环境中部署的系统具有高度的活力。在机器和网络故障可发生在任何时间和非专用资源可以动态加入和退出制度。另一方面，应用根据需要启动和终止，以及他们的工作量也随着时间的推移会有所不同。 •规模：大量资源可以在网格式的环境中汇总。他们分布在不同的机构和广域连接网络，提供的计算能力和存储能力，支持处决许多应用程序。 1.2重点是分布式数据管理的两个具体方面系统：数据供应 - 上运行的应用程序提供计算资源远程访问存储对存储资源的数据，并会管理数据供应 - 的建立，配置和远程终端数据访问。因为上述的任务异构，动态的，大规模的性质在网格式计算环境造成这些独特的挑战应用程序和资源。首先，应用程序和数据资源的多样性促使供应解决方案，可以透明地部署，无需改变现有的经营系统（海外/ SS）和修改应用程序源代码或二进制代码。第二，宽领域，跨应用程序域环境必要定制的优化数据访问，以解决效率低下（网络延时长，有限的网络带宽），不安全（不安全的资源，有限的互不同域之间的信任），以及不安全（不可靠机器和网络）是在这种环境中的典型。最后但并非最不重要的，在一个大的，动态的数据管理系统的配置也欲望灵活的控制和远程数据访问的自动优化，以与众多的应用程序提供数据的复杂性协议，以灵活地适应不断变化的环境，并提供应用所需的性能，安全性和可靠性。为了应对这些挑战，本文提出了两个层次的数据管理系统中，文件系统虚拟化应用提供定制网格范围内的数据接入和服务为基础的中间件使数据自动管理供应。特别是，该系统已作出以下贡献： •它提供点播，跨域数据访问透明的未修改应用和O /S用户级别广泛使用的O / S -级的虚拟化基础分布式文件系统（DFSs）。 •它支持网格式环境设计的应用程序定制的增强几个重要方面，远程数据访问，包括性能，一致性，安全性和可靠性。 •它采用的中间件服务，以实现灵活和可互操作的管理电网规模的数据配置，这是控制能力和生命周期动态数据的会话配置基于应用的需要。 •它开发自主功能，自动优化数据管理根据高层次的目标，以降低管理数据的复杂性和适应迅速变化的环境。 •最后，建议的制度已经证明，以彻底的实验评价，认为它是有效的，能显着优于常规。在网格式环境中的DFS为基础的办法，它也被成功在生产电网系统部署了数年，支持科学工具和许多学科的用户。 2.数据管理系统的论文中所提出的架构，以解决三个重要的问题 2.1应用程序透明的电网范围内的数据访问第一个问题是，如何提供应用程序透明的并网范围内的数据访问？不同于传统的分布式网格，因为他们的计算环境的鲜明的特点，例如，广域网络，异构的终端系统，与不相交的管理域。在局域网络（LAN）里的这些差异带来的数据管理的新挑战系统和成功的技术，例如，局域网文件系统，不能直接应用在网格环境。相反，数据网格管理需要专门处理这些独特的问题。现有的解决方案通过专门的使用Grid数据API或库，允许应用程序访问。然而，应用程序源或二进制必要的修改经常发生后，最终用户和开发人员的肩膀上的负担并提出一个障碍的应用程序不能轻易修改。因此，应用程序的透明度可取的，以便在网格，一个广泛的应用部署，其中格启用应该是网格中间件的责任，但不是应用程序用户或开发人员。本论文提出了一个用户级DFS的虚拟化，即网格虚拟文件系统（GVFS中），为应用程序透明的网格数据访问。由于众所周知的DFS的界面是由GVFS中保存并提交给应用程序，没有修改要求他们的源代码，库或二进制文件。此外，该方法是基于用户级的虚拟化技术，它不要求改变现有Ø /不锈钢并可以方便地部署在网格资源。此外，用户级别的增强为网格环境而设计风格是建立在虚拟层，使数据配置与应用程序所要求的特点。总之，这种方法提供建议GVFS中回答了第一个问题，即透明网宽未修改的应用和O /不锈钢通过数据访问用户级别的DFS的虚拟化。 2.2应用量身定做的栅格数据配置第二个问题是，如何提供与应用程序定制的优化数据？典型的O / SS是旨在支持通用的应用程序，但它往往是“一大小不适合所有” 的情况，。应用程序具有不同的特点和要求，例如，数据访问模式，可以接受的缓存和术语一致性的政策，安全问题和容错的要求。为了提供所需的性能，安全性和可靠性，一个网格，数据供应需求要按照优化应用程序的行为和需要。因为一个应用程序定制的优化（例如，有进取心的预取文件内容），可能会导致性能（例如，其他几个稀疏文件退化，数据库），应用量身定做的功能通常没有实现通用O / S的内核。此外，内核级的修改是难以港口和部署，特别是在共享环境。工具包为基础的解决方案通常为用户提供强大的API方案远程数据访问与期望的行为，但很少有熟练的程序员利用这些API的有效利用。为了解决这个问题，用户级别的DFS支持自定义，则建议应用定制的GVFS中数据会话。特别是，增强设计网格式的环境是虚拟化层时提供GVFS中，这包括可定制的磁盘高速缓存的高性能数据和多线程接入，应用程序所需的数据一致性协议的一致性效率，强并网兼容的安全保障网范围内的数据访问和可靠性协议支持应用程序透明的故障检测和恢复。基于GVFS中，数据会议可以根据需要创建的每个应用程序的基础上，其中每个会话应用和配置这些增强自主解决其应用的需求。因此，对第二个问题的答案是使用应用程序定制启用增强了GVFS中提供应用程序所需的电网范围内的数据会议性能，一致性，安全性和可靠性。 2.3服务为基础的管理自主数据第三个问题是，如何管理数据在网格尺度系统配置与动态变化的环境？ GVFS中方法的基础上，数据会话可以启动需求和独立定制的应用程序。然而，在大型系统中，许多动态数据会话管理是另一个具有挑战性的任务，因为它复杂性。数据会话需要在动态的基础上建立和销毁的生命周期的应用程序和它们的实例和数据存储的位置。会议定制的数据也意味着按照所需的行为和周围的环境各相关因素的考虑许多参数和调整。动态改变应用程序的工作量和资源进一步要求提供的数据会话连续监测，及时适应它们的配置。这些要求往往超出了最终用户和系统的能力，甚至管理员。然而，用户或管理员的目标是相当简单，明确的。例如，从一个应用程序用户的角度来看，它是理想的作业执行快捷，安全，可靠，从一个资源提供者的角度来看，这是预期资源利用是健康和盈利能力。因此，本论文提出了一种新的基于服务的数据管理办法，自主管理和自动优化配置的数据按照这样高层次的目标。本论文提出的数据管理服务来管理每个应用程序GVFS中会话，执行独立会话之间的隔离，并为每个会话申请所需的定制。它们支持灵活的控制在生命周期和数据会话的配置，并可以探索知识应用程序（如数据访问模式，数据共享方案，和服务质量要求）来定制的性能，使用他们的数据会话的一致性，安全性和可靠性增强。这些服务还提供可互操作的接口允许与其他网格中间件服务和自动化的直接相互作用数据处决供应任务。为了进一步降低管理数据会话的人为干预，使他们迅速适应不断变化的环境，植物神经功能内置于数据管理服务，使他们自动监测，分析能力，优化电网范围内的数据会分散的实体，合作共同努力实现所需的数据配置和资源使用的目标。这种自主管理是适用于几个重要的方面，包括数据会话缓存配置，数据复制和会话重定向。总之，GVFS中的数据管理系统解决最后一个问题由用人自主服务，提供自动化管理的数据和优化会议根据应用需求和不断变化的环境。字典 - 查看字典详细内容朗读显示对应的拉丁字符的拼音 FILE SYSTEM VIRTUALIZATION AND SERVICE FOR GRID DATA MANAGEMENT 1. INTRODUCTION Computations are becoming increasingly larger scale, in terms of both size and geographical and administration distribution. Examples include scientific grids [1] which harness resources among several institutions for coordinated problem solving, and enterprise information systems that aggregate eforts from multiple sites for collaborative development. Common in these systems is that applications and data are distributed on resources across administrative boundaries and wide-area networks. Such environments can be referred as the "grid-style" environments, which have the following distinctive characteristics: Heterogeneity: There exist a wide variety of applications and resources in a grid-style environment. The resources typically have diferent hardware configurations (e.g., CPU speed and architecture, memory size, disk bandwidth and capacity) and software setups (e.g., operating systems and libraries); the applications also have diverse characteristics (e.g., data access pattern) and needs (e.g., desired data access performance, security, and reliability). Dynamism: Systems deployed in a grid-style environment are highly dynamic. Failures on machines and networks can happen at any time, and non-dedicated resources may dynamically join and leave the system. On the other hand, applications are started and terminated on demand, and their workloads also vary over time. Scale: Large amounts of resources can be aggregated in a grid-style environment. They are distributed across diferent institutions and connected on wide-area networks, providing the computing power and storage capacity to support executions of many applications. This dissertation focuses on two specific aspects of data management in distributed systems: data provisioning — providing applications running on the computing resources with remote access to their data stored on the storage resources, and the management of the data provisioning — the establishment, configuration, and termination of the remote data access. Computing in a grid-style environment poses unique challenges to these tasks because of the above mentioned heterogeneous, dynamic, and large-scale nature of applications and resources. First, the diversity of applications and resources motivates a data provisioning solution that can be transparently deployed, without modifying the existing operating systems (O/Ss) and changing the application source code or binaries. Second, the wide-area, cross-domain environments necessitate application-tailored optimizations for data access to address the inefciency (long network delay, limited network bandwidth), insecurity (insecure resources, limited mutual-trust between diferent domains), and unsafety (unreliable machines and networks) that are typical in such environments. Last but not least, the management of data provisioning in a large, dynamic system also desires ﬂexible control and automatic optimization of the remote data access, in order to deal with the complexity of providing data to many applications, to agilely adapt to the changing environments, and to deliver application-desired performance, security, and reliability. To address these challenges, this dissertation presents a two-level data management system in which file system virtualization provides application-tailored grid-wide data access, and service-based middleware enables autonomic management of the data provisioning. In particular, this system has made the following contributions: It provides on-demand, cross-domain data access transparently for unmodified applications and O/Ss based on user-level virtualization of widely available O/S-level distributed file systems (DFSs). It supports application-tailored enhancements designed for grid-style environments on several important aspects of remote data access, including performance, consistency, security, and reliability. It employs middleware services to achieve ﬂexible and interoperable management of grid-scale data provisioning, which is capable of controlling the lifecycles and configurations of dynamic data sessions based on application needs. It develops autonomic functions to automatically optimize the data management according to high-level objectives, in order to reduce the complexity of managing data sessions and adapt them promptly to changing environments. Finally, the proposed system has been demonstrated, with thorough experimental evaluation, that it is efective and can significantly outperform conventional DFS-based approaches in grid-style environments; it has also been successfully deployed in a production grid system [2][3] for several years, supporting scientific tools and users from many disciplines. The data management system proposed in this dissertation is architected to address three important questions, which are discussed in the following subsections respectively. 2.1 Application-Transparent Grid-Wide Data Access The first question is, how to provide application-transparent grid-wide data access? Grids difer from traditional distributed computing environments because of their distinct characteristics, e.g., wide-area networking, heterogeneous end systems, and disjoint administrative domains. These diferences bring new challenges to data management systems, and the technologies that are successful in local-area networks (LAN), e.g., LAN file systems, cannot be directly applied in a grid environment. Instead, grid data management needs to specifically address these unique issues. Existing solutions allow applications to access grid data through the use of specialized APIs or libraries. However, the required modifications on application sources or binaries often place a burden upon the shoulders of end users and developers, and present a hurdle to applications that cannot be easily modified. Therefore, application-transparency is desirable to facilitate the deployment of a wide range of applications on grids, where grid-enabling should be the responsibility of the grid middleware but not the application users or developers. This dissertation presents a user-level DFS virtualization, namely Grid Virtual File System (GVFS), for application-transparent grid data access. Because the well-known DFS interface is preserved by GVFS and presented to applications, no modifications are required to their source code, libraries, or binaries. In addition, the proposed approach is based on user-level virtualization techniques, which requires no changes to existing O/Ss and can be conveniently deployed on grid resources. Furthermore, user-level enhancements designed for grid-style environments are built upon the virtualization layer to enable data provisioning with application-desired characteristics. In short, the proposed GVFS approach answers the first question by providing transparent grid-wide data access for unmodified applications and O/Ss through the user-level DFS virtualization. 2.2 Application-Tailored Grid Data Provisioning The second question is, how to provide data with application-tailored optimizations? Typical O/Ss are designed to support general-purpose applications, but it is often the case that "one size does not fit all". Applications have diverse characteristics and requirements, in terms of, for example, data access patterns, acceptable caching and consistency policies, security concerns, and fault tolerance requirements. To provide the desired performance, security, and reliability to a grid application, data provisioning needs to be optimized according to the application's behaviors and needs. Because an optimization tailored for one application (e.g., aggressive prefetching of file contents) may result in performance degradation for several others (e.g., sparse files, databases), application-tailored features are typically not implemented in general-purpose O/S kernels. In addition, kernel-level modifications are difcult to port and deploy, notably in shared environments. Toolkit-based solutions typically give users powerful APIs to program remote data access with desired behaviors, but few programmers are skilled to make efective use of such APIs. To solve this problem, user-level DFS customizations are proposed to support application-tailored GVFS data sessions. In particular, enhancements designed for grid-style environments are provided upon the virtualization layer in GVFS, which include customizable disk caching and multithreading for high-performance data access, efcient consistency protocols for application-desired data coherence, strong and grid-compatible security for secure grid-wide data access, and reliability protocols supporting application-transparent failure detection and recovery. Based on GVFS, data sessions can be created on demand on a per-application basis, where each session can apply and configure these enhancements independently to address its application's needs. Therefore, the answer to the second question is to use the application-tailored enhancements enabled by GVFS to provide grid-wide data sessions with application-desired performance, consistency, security, and reliability. 2.3 Service-Based Autonomic Data Management The third question is, how to manage data provisioning in a grid-scale system with dynamically changing environments? Based on the GVFS approach, data sessions can be started on demand and independently customized for applications. However, in a large-scale system, the management of many dynamic data sessions is another challenging task due to its complexity. Data sessions need to be dynamically established and destroyed based on the lifecycles of applications and the locations of their instantiations and data storage. Customization of data sessions also implies the consideration of various relevant factors and tuning of many parameters, in accordance with the desired behaviors and the surrounding environments. Dynamically changing application workload and resource availability further require continuous monitoring of data sessions and timely adaptation of their configurations. These requirements are often beyond the capability of end-users and even system administrators. Yet the goals of users or administrators are rather simple and explicit. For example, from an application user's point of view, it is desired that the job execution is fast, secure, and reliable; from a resource provider's point of view, it is expected that the resource use is healthy and profitable. Therefore, this dissertation presents a novel service-based autonomic data management approach to automatically manage and optimize the data provisioning according to such high-level objectives. This dissertation proposes a set of data management services to manage the per-application GVFS sessions, enforce the isolation among the independent sessions, and apply the desired customization for each session. They su

展开阅读全文