1、大连交通大学信息工程学院毕业设计(论文)任务书题 目 游戏拍卖行系统任务及要求:1.设计(研究)内容和要求任务:1、 调查游戏拍卖行系统当前技术的发展近况,完成实习报告,字数不少于3000,第三周交给指导老师。2、 结合自己实习情况安排进度,填写进度计划表,第二周完成后交给指导老师签字,并严格执行。3、 按照软件工程思想,独立完成系统的设计和程序开发,完成代码估计2000行左右。4、 用JSP技术实现游戏拍卖行系统的功能。5、 程序简洁,算法可行,运行情况良好。要求:1、 每周和指导老师至少见面沟通一次,回报课题进展情况,接受老师询问。2、 接到任务书后,查阅与题目及专业相关的外文资料进行翻译
2、,要求不少于10000个外文字符,译出汉字不得少于3000,于第四周交给指导老师审阅。3、 毕业设计第13周完成毕业论文的装订,并由指导老师评阅。论文要求12000字以上,包括综述、系统总体设计、系统实现、性能分析、结论等。4、 教学第13周通过中软及教研室组织进行软件验收,验收时要提供软件使用说明书。5、 于第13周提出毕业答辩申请并签字。6、 第14 周答辩,要求制作PPT。2.原始依据通过大学几年的学习,已经学习了诸如软件工程、数据库原理及应用、数据结构、C+、Visual Basic、JAVA等多门程序设计语言和网络等基础知识和专业知识,学生有能力而且可以独立完成小中型项目的设计与开发
3、。学校现有设备和环境可以提供给学生实习和上机,而且具有专业老师可以指导学生。3.参考文献1 王诚梅.JSP案例开发集锦M.北京:电子工业出版社.20052 吴晓松.国际电子商务发展状况及我国应对策略J.云南财贸学院学报.20013 军征.闰众.电子商务应用与重构案例分析M.北京:高等教育出版社.20034 唐有明.JSP动态网站开发基础练习.典型案例M.北京:清华大学出版社.20065 陈兵.网络安全与电子商务M.北京:北京大学出版社.20026 池雅庆.JSP项目开发实践M.北京:中国铁道出版社.20067 黄明.JSP信息系统设计与开发实例M.上海:机械工业出版社.20048 萨师煊.王珊
4、.数据库系统概论M.北京:高等教育出版社.20009 陈旭东.刘迪仁编著.JSP 2.0应用教程M.北京:清华大学出版社.2006.6 10 叶乃沂.电子商务信息时代的管理与战略M.上海:上海交通大学出版社.200211 Juan Lipson Vuong.A semantics-based routing scheme for grid resource discoveryM.E-Science: First International Conference on E-Science and GridComputing.200512 Cay S .Horstmann. Gary Cornel
5、l美.Core JAVA 2 Volume 1 FundamentalsM.Pearson .Education.2005-01 指导教师签字:专业(方向)负责人签字: 2012年3月26日大连交通大学信息工程学院毕业设计(论文)进度计划与考核表学生姓名李青霖专业班级软件工程08-1班指导教师常敬岩史原本课题其他人员题目游戏拍卖行系统日期计划完成内容完成情况指导老师检查签字第1周完成任务书、提交进度表第2周完成调研报告、完成英文翻译第3周进行市场调查研究,需求分析第4周初步对系统进行分析设计第5周系统详细设计,进行编码第6周系统编码实施、完成论文初稿第7周完成系统编码,进行调试第8周系调试统编
6、码、提交论文初稿第9周完成系统编码调试、完善毕业论文第10周完成撰写毕业设计论文编写及代码测试第11周完成论文终稿、准备毕业论文打印、装订第12周提交毕业论文终稿及代码第13周提交毕业论文成果资料第14周毕业论文答辩指导教师签字: 年月日注:“计划完成内容”由学生本人认真填写,其它由指导教师考核时填写。大连交通大学信息工程学院毕业设计(论文)外文翻译学生姓名 李青霖 专业班级 软件工程08-1班 指导教师 常敬岩史原 职 称 高工 讲师 所在单位 信息科学系软件工程教研室 教研室主任 刘瑞杰 完成日期 2012 年 4 月 13 日A clustering method to distribu
7、te a database on a gridScienceDirect:Future Generation Computer Systems 23 (2007) 9971002Summary: Clusters and grids of workstations provide available resources for data mining processes. To exploit these resources, new distributed algorithms are necessary, particularly concerning the way to distrib
8、ute data and to use this partition. We present a clustering algorithm dubbed Progressive Clustering that provides an “intelligent” distribution of data on grids. The usefulness of this algorithm is shown for several distributed datamining tasks.Keywords: Grid and parallel computings; Data mining; Cl
9、usteringIntroductionKnowledge discovery in databases, also called data mining, is a valuable engineering tool that serves to extract useful information from very large databases. This tool usually needs high computing capabilities that could be provided by parallelism and distribution. The work deve
10、loped here is part of the DisDaMin project that deals with data mining issues (as association rules, clustering, . . . ) using distributed computing. DisDaMins aim is to develop parallel and distributed solutions for data mining problems. It achieves two gains in execution times: gain from the use o
11、f parallelism and gain from decreased computation (by using an intelligent distribution of data and computation). In parallel and distributed environments such as grids or clusters, constraints inherent to the execution platform must be taken into account in algorithms. The non-existence of a centra
12、l memory forces us to distribute the database into fragments and to handle these fragments using parallelism. Because of the high communication cost in this kind of environment, parallel computing must beas autonomous as possible to avoid costly communications (or at least synchronizations). However
13、, existing grid data mining projects (e.g. Discovery Net, GridMiner, DMGA 7, or Knowledge Grid 11) provide mechanisms for integration and deployment of classical algorithms on grid, but not new grid-specific algorithms. On the other hand the DisDaMin project intends to tackle data mining tasks consi
14、dering data mining specifics as well as grid computing specifics. For data mining problems, it is necessary to obtain an intelligent data partition, in order to compute more independent data fragments. The main problem is how to obtain this intelligent partition. For the association rules problem, f
15、or example, the main criterion for intelligent partition is that data rows within a fragment are as similar as possible (according to values for each attribute), while data rows between fragments are as dissimilar as possible. This criterion allows us to parallelize this problem which normally needs
16、 to access the whole database. It allows us to decrease complexity (see 2). As this distribution criterion appears similar to the objective of clustering algorithms, the partition could be produced by a clustering treatment. The usefulness of the intelligent partition obtained from clustering for th
17、e association rules problem has already been studied (see 2). Clearly the clustering phase itself has to be distributed and needs to be fast in order not to slow down the global execution time. Clustering methods will be described before introducing the Distributed Progressive Clustering algorithm f
18、or execution on grid.Fig. 1. KMeans and agglomerative clustering principle.ClusteringClustering is the process of partitioning data into distinct groups (clusters) so that objects within a same cluster are similar, but dissimilar from objects in other clusters. Distinct clustering methods could be s
19、eparated considering two kinds of leading principles: hierarchical methods and partitioning ones.Hierarchical methods are composed of agglomerative ones (that initially consider a partition with clusters of a unique data instance and merge neighbouring clusters until a termination criterion is met)
20、and divisive ones (that initially consider a partition with one cluster which contains all data instances and cut clusters iteratively until termination). Partitioning methods are composed by distance-based methods (as KMeans 8 for example), density-based methods or based on probabilities. Other cri
21、teria permit us to distinguish between clustering methods (see 10); those methods based on membership degree of data instances to clusters (hard as cited before or fuzzy (see 4), and incremental methods for which data instances are considered when available instead of all at a time (see 5), method b
22、ased on neighbourhood search (k-nearest neighbours). . . . Two well-known clustering algorithms are the partitioning KMeans (see 8) (which yields approximate results and has an acceptable time complexity), and agglomerative methods (see 12) (which yield relative good quality results, but are limited
23、 by time complexity).Principle of Kmeans: KMeans is an iterative algorithm that constructs an initial k-partition of data instances. An iterative relocation technique attempts to improve the partitioning by moving data from one group to another one until a termination criterion (see Fig. 1, left par
24、t). KMeans will produce a local optimum result. Principle of agglomerative clustering: Hierarchical agglomerative clustering consists of a bottom-up approach to the problem that considers all data separately as clusters and merges two nearest clusters at each iteration until a termination condition
25、(see Fig. 1, right part). This method uses a similarity measure matrix that makes the method unsuitable for huge datasets (because of the storage cost). Parallel algorithms: The two previous methods need to access the whole database or to communicate between each iteration in order to obtain a corre
26、ct solution. Parallel methods exist for KMeans (see 3) and agglomerative clustering .Parallel versions also exist for other algorithms cited before (see 6). For parallel clustering to achieve the same quality clusters as under sequential clustering, a lot of communications is required. Those methods
27、 are suited to supercomputers as CC-NUMA or SMP, using a common memory and fast internal interconnection networks (Parallel Data Miner for IBM-SP3 for example). The huge number of communications in existing parallel methods yields performance problems in the context of grids. The classical methods n
28、eed to be revisited to take into account the constraints of grid architectures (no common memory, slow communications). The Distributed Progressive Clustering (DPC) method presented in the next section considers these constraints.Fig. 2. Database B and associated matrix V.Progressive clusteringThe d
29、istributed progressive clustering method deals with attributes in an incremental manner (this differs from existing incremental methods that deal with increasing number of data instances instead of increasing number of attributes in DPC). The method is suitable for distributed execution using local
30、computation to construct global results without synchronization. DPC is inspired by the sequential clustering algorithm called CLIQUE (see 1) that consists in clustering data by projections in each dimension, and by identifying dense clusters of data projections. The method assumes that the whole da
31、tabase can be reached for projections. In the context of grid, it is assumed that the database is distributed by vertical splits (multibase). DPC works in a bottom up approach considering attributes of the database. It first computes clusters on vertical fragments containing few attributes and then
32、combines these clusters to obtain clusters in higher dimensions. Both steps (i.e. the clustering of vertical fragments and the combination of these clusters) are executed in a distributed way benefiting from distributed execution. The distributed progressive clustering method is explained in the nex
33、t sections. Three steps could be identified: initial clustering, crossing and merging optimizing steps.DefinitionsA database with m attributes and n rows (instances) is represented by B = (A, K, V), where: A = A1, A2, . . . Am is a finite set of attributes; K = K1, K2, . . . Kn is the set of keys of
34、 the database rows; V is the associated matrix1 (see Fig. 2), with vi, j (where 1 _ i _ m and 1 _ j _ n) is the ith coordinate of the jth row. Let U be a partition based on keys,2 such as U = U1, S . . . ,Up, with Ui = Kl 2 K, i Ui = K and Ui Uj = ;. Let A be an attribute-partition, such as A = X1,
35、. . . , Xq , with X j = Ak 2 A, S j X j = A and X j Xk = ;. Let PX be a projection of database B on an attribute-subset X (X 2 A). Given X = Ak . . . Ar , the associated matrix to PX has n rows and q columns (a row for each instance of B and a column for each attribute Aj of X). The jth column of PX
36、 is associated to the jth column of B (see Fig. 3). Given an instance partition U (p elements) of database B (m columns), (U, B) could be associated to a reduced matrix R (p, r matrix, see Fig. 3). Each row of R is associated to a subset of instances Ui of B.From R (p, m matrix), it is possible to o
37、btain a matrix R0 (n, m), by duplicating, for each Ui of R, the row in R associated to Ui with cardinality of Ui . It is also possible to obtain a matrix associated to the database B, by replacing, for each Ui of R, the row in R associated to Ui by rows associated to Ui in B. Replacing n rows of B b
38、y p rows in R permit to decrease size of data to treat. Each row of R represents the mean of rows in B associated to Ui . Let RX be the reduced matrix associated to a projection PX of B. Given X a subset of attributes of database B from an attribute-partition A. M is the operation of projection defi
39、ned by M: B, X ! PX .Massociates the projection PX of B to the subset X. PX is obtain using a mask MX on matrix B. The mask MX is defined by a n, m matrix such as MXi j = 1, 8i, 8 j with Aj 2 X and MXi j = 0, 8i, 8 j with Aj 62 X. The operation of projectionM is then defined by:M(B, X) = MX t X = PX
40、 . Partition F. A partition of a database is a row partition of the associated matrix with computation of rows. This operation is achieved by the use of a classical clustering algorithm as a step of algorithm DPC. 网格上分布式数据库的聚类方法ScienceDirect:Future Generation Computer Systems 23 (2007) 9971002摘要:集群和
41、网格的工作站为数据挖掘过程提供可利用的资源。为了利用这些资源,新的分布式算法是必要的,特别是涉及分配数据以及使用分区的方法。我们提出一个被称为逐步聚类的聚类算法,它可以为网格中的数据提供一个“智能”分区。该算法的应用显示了分布式数据挖掘任务。关键词:网格式和并行处理;数据挖掘;聚类导言 数据库中的知识发现,也称为数据挖掘,是一种宝贵的工程工具,可从非常大的数据库提取有用的信息。此工具通常需要高计算能力,可以提供并行处理和分配。这里的开发工作是DisDaMin项目一部分,DisDaMin项目是利用分布式计算处理数据挖掘的问题(如关联规则,聚类分析,)。DisDaMin的目的是为数据挖掘问题开发并
42、行和分布式方案。它在执行时间方面实现了两个成果:成果从并行的使用和减少计算来获得(通过使用一种数据的智能分布和计算)。在并行和分布式环境,如网格或集群,限制固有的执行平台,必须考虑到的算法。中心记忆的不存在迫使我们分发数据到片段,并且利用并行来处理这些片段。由于在这样的环境下的高通信成本,并行计算必须尽可能避免昂贵的通讯费(或至少是同步)。但是,现有的网格数据挖掘项目(如Discovery Net, GridMiner,DMGA7,或Knowledge Grid11)提供的机制,都是整合和部署经典算法的网格,但不是新的网格的算法。另一方面,DisDaMin项目要处理的数据挖掘任务考虑数据挖掘细
43、节以及网格计算细节。数据挖掘的问题,获取智能数据分区是必要的,以便计算更单独的数据片段。其主要的问题是如何取得这个智能分区。对于关联规则的问题,例如,智能分区的主要标准时每个片段的数据行是尽可能相似的(根据每个属性的值),片段之间数据行是尽可能不同的。这一标准通常需要我们访问整个数据库来并行这个问题。它使我们能够降低复杂性(见2)。由于分配的标准在目标聚类算法表现得很相似,分区可产生的聚类待遇。从关联规则问题方面的聚类获得的智能分区的好处已经进行了研究(见2)。显然,聚类阶段本身已分发,而且需要快速进行,为了不减慢全球执行的时间。在网格上,聚类方法将在引入逐步分布聚类算法的执行之前被描述。聚类
44、聚类是数据分割成不同的群体(集群)的过程,使同一集群的数据相似,但不同于其他集群。独特的聚类方法可以根据两种主要原则分开:分层方法和分割方法。Kmeans聚类凝聚聚类输入:数据,用来计算的聚类号(k)输出:数据的聚类输入:数据,结束标准输出:数据的聚类(1)初始化k对象作为初始中心(2)重复(3)转让每个对象到最近的聚类(4)更新聚类的值(5)知道没有数据可以改变(6)返回k被定义的聚类(1)考虑每个数据作为聚类(2)重复(3)合并最近的两个聚类(4)更新聚类的距离(5)直到结束标准(6)返回被定义的聚类图1.Kmeans和凝聚聚类原则分层的方法是由凝聚部分(即最初根据惟一的数据实例考虑分区,
45、合并邻近的簇,直到满足终止的标准)和分布部分(即最初根据一个集群考虑分区,这个集群包含所有数据实例并且消减集群迭代直至终止)组成。划分的方法是以距离为基础的方法(如KMeans 8所示),基于密度的方法或基于概率的方法。其他标准使我们能够区分聚类方法(见10);那些方法基于集群的数据实例的隶属度(很难被引用或含糊不清(见4),以及数据实例的增量方法在某一时刻可以代替所有数据时被考虑(见5),这种方法基于邻里搜索(k-neareat邻居)两个著名的聚类算法是分割KMeans(见8)(产生近似的结果,并有可接受的时间复杂性)和凝聚的方法(见12)(其中产量相对优质的成果,但受到时间复杂度的限制)。
46、原则Kmeans:KMeans是一个迭代算法,构建了数据实例的初始化K-分区。迭代迁移技术试图通过将数据从一组移动到另一组的方式来改善分区,直至终止的标准(见图1,左部分)。KMeans将产生局部最优的结果。凝聚聚类的原则:分层凝聚聚类包括一个问题的自下而上的方法,这个问题是要把所有数据分别作为集群还是在每个迭代上合并两个最近的集群直至终止条件(见图1,右部分)。这种方法使用了相似度量矩阵,使该方法不适合大数据集(由于存储成本)。并行算法:前面的两个方法需要访问整个数据库或在每次迭代进行沟通,以获得正确的解决办法。并行方法存在KMeans(见3)和凝聚聚类中。并行版本也存在于其他算法引用之前(见6)。为了达到同一质量集群作为顺序聚类的并行集群来说,大量的通信是必需的。这些方法适用于作为CC NUMA或SMP的大型计算机,它使用一个相同的记忆和快速的内部交互网络(IBM - SP3的并行数据挖掘)。在现有的并行方法中大量的通信产生网格文本里的性能问题。将在下一节中考虑分布式逐步聚类(DPC)方法的这些制约因素。逐步聚类 分布式逐步聚类方法以循序渐进的方式处理属性(在分布式逐步聚类技术中,它有别于现有的增量方法,现有的增量方法处理越来越多的数据实例取代处理越来越多的属性)。该方法适用于利用当地算法以
©2010-2024 宁波自信网络信息技术有限公司 版权所有
客服电话:4008-655-100 投诉/维权电话:4009-655-100