1、 毕业设计(论文)外文翻译外文翻译 题 目 对零售超市数据进行最 优产品选择的数据挖掘框 架:广义 PROFSET 模型 专 业 网络工程 外文翻译 1 附录附录 英文原文英文原文 A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data:The Generalized PROFSET Model 1 Introduction Since almost all mid to large size retailers today possess electronic sales transac
2、tion Systems,retailers realize that competitive advantage will no longer be achieved by the mere use of these systems for purposes of inventory management or facilitating customer check-out.In contrast,competitive advantage will be gained by those retailers who are able to extract the knowledge hidd
3、en in the data,generated by those systems,and use it to optimize their marketing decision making.In this context,knowledge about how customers are using the retail store is of critical importance and distinctive competencies will be built by those retailers who best succeed in extracting actionable
4、knowledge from these data.Association rule mining 2 can help retailers to efficiently extract this knowledge from large retail databases.We assume some familiarity with the basic notions of association rule mining.In recent years,a lot of effort in the area of retail market basket analysis has been
5、invested in the development of techniques to increase the interestingness of association rules.Currently,in essence three different research tracks to study the interestingness of association rules can be distinguished.First,a number of objective measures of interestingness have been developed in or
6、der to filter out non-interesting association rules based on a number of statistical 外文翻译 2 properties of the rules,such as support and confidence 2,interest 14,intensity of implication 7,J-measure 15,and correlation 12.Other measures are based on the syntactical properties of the rules 11,or they a
7、re used to discover the least-redundant set of rules 4.Second,it was recognized that domain knowledge may also play an important role in determining the interestingness of association rules.Therefore,a number of subjective measures of interestingness have been put forward,such as unexpectedness 13,a
8、ction ability 1 and rule templates 10.Finally,the most recent stream of research advocates the evaluation of the interestingness of associations in the light of the micro-economic framework of the retailer 9.More specifically,a pattern in the data is considered interesting only to the extent in whic
9、h it can be used in the decision-making process of the enterprise to increase its utility.It is in this latter stream of research that the authors have previously developed a model for product selection called PROFSET 3,that takes into account both quantitative and qualitative elements of retail dom
10、ain knowledge in order to determine the set of products that yields maximum cross-selling profits.The key idea of the model is that products should not be selected based on their individual profitability,but rather on the total profitability that they generate,including profits from cross-selling.Ho
11、wever,in its previous form,one major drawback of the model was its inability to deal with supermarket data(i.e.,large baskets).To overcome this limitation,in this paper we will propose an important generalization of the existing PROFSET model that will effectively deal with large baskets.Furthermore
12、,we generalize the model to include category management principles specified by the retailer in order to make the output of the model even more realistic.The remainder of the paper is organized as follows.In Section 2 we will focus on the limitations of the previous PROFSET model for product selecti
13、on.In Section 3,we will introduce the generalized PROFSET model.Section 4 will be devoted to the empirical implementation of the model and its results on real-world supermarket data.外文翻译 3 Finally,Section 5 will be reserved for conclusions and further research.2 The PROFSET Model The key idea of the
14、 PROFSET model is that when evaluating the business value of a product,one should not only look at the individual profits generated by that product(the naive approach),but one must also take into account the profits due to cross-selling effects with other products in the assortment.Therefore,to eval
15、uate product profitability,it is essential to look at frequent sets rather than at individual product items since the former represent frequently co-occurring product combinations in the market baskets of the customer.As was also stressed by Cabena et al.5,one disadvantage of associations discovery
16、is that there is no provision for taking into account the business value of an association.The PROFSET model was a first attempt to solve this problem.Indeed,in terms of the associations discovered,the sale of an expensive bottle of wine with oysters accounts for as much as the sale of a carton of m
17、ilk with cereal.This example illustrates that,when evaluating the interestingness of associations,the micro-economic framework of the retailer should be incorporated.PROFSET was developed to maximize cross-selling opportunities by evaluating the profit margin generated per frequent set of products,r
18、ather than per product.In the next Section we will discuss the limitations of the previous PROFSET model.More details can be found elsewhere 3.2.1 Limitations The previous PROFSET model was specifically developed for market basket data from automated convenience stores.Data sets of this origin are c
19、haracterized by small market baskets(size 2 or 3)because customers typically do not purchase many items during a single shopping visit.Therefore,the profit margin generated per frequent purchase combination(X)could accurately be approximated by adding the profit 外文翻译 4 margins of the market baskets(
20、Tj)containing the same set of items,i.e.X=Tj.However,for supermarket data,the existing formulation of the PROFSET model poses significant problems since the size of market baskets typically exceeds the size of frequent item sets.Indeed,in supermarket data,frequent item sets mostly do not contain mor
21、e than 7 different products,whereas the size of the average market basket is typically 10 to 15.As a result,the existing profit allocation heuristic cannot be used anymore since it would cause the model to heavily underestimate the profit potential from cross-selling effects between products.However
22、,getting rid of this heuristic is not trivial and it will be discussed in detail in Section 3.1.A second limitation of the existing PROFSET model relates to principles of category management.Indeed,there is an increasing trend in retailing to manage product categories as separate strategic business
23、units 6.In other words,because of the trend to offer more products,retailers can no longer evaluate and manage each product individually.Instead,they define product categories and define marketing actions(such as promotions or store layout)on the level of these categories.The generalized PROFSET mod
24、el takes this domain knowledge into account and therefore offers the retailer the ability to specify product categories and place restrictions on them.3 The Generalized PROFSET Model In this section,we will highlight the improvements being made to the previous PROFSET model 3.3.1 Profit Allocation A
25、voiding the equality constraint X=Tj results in different possible profit allocation systems.Indeed,it is important to recognize that the margin of transaction Tj can potentially be allocated to different frequent subsets of that transaction.In other words,how should the margin m(Tj)be allocated to
26、one or more different frequent 外文翻译 5 subsets of Tj?The idea here is that we would like to know the purchase intentions of the customer who bought Tj.Unfortunately,since the customer has already left the store,we do not possess this information.However,if we can assume that some items occur more fre
27、quently together than others because they are considered complementary by customers,then frequent item sets may be interpreted as purchase intentions of customers.Consequently,there is the additional problem of finding out which and how many purchase intentions are represented in a particular transa
28、ction Tj.Indeed,a transaction may contain several frequent subsets of different sizes,so it is not straightforward to determine which frequent sets represent the underlying purchase intentions of the customer at the time of shopping.Before proposing a solution to this problem,we will first define th
29、e concept of a maximal frequent subset of a transaction.Definition 1.Let F be the collection of all frequent subsets of a sales transaction Tj.ThenYX is called maximal,denoted as X max,if and only if.FY:YX.Using this definition,we will adopt the following rationale to allocate the margin m(Tj)of a s
30、ales transaction Tj.If there exists a frequent set X=Tj,then we allocate m(Tj)to M(X),just as in the previous PROFSET model.However,if there is no such frequent set,then one maximal frequent subset X will be drawn from all maximal frequent subsets according to the probability distributionTj,with Aft
31、er this,the margin m(X)is assigned to M(X)and the process is repeated for Tj X.In summary:外文翻译 6 Table 1 contains all frequent subsets of T for a particular transaction database.In this example,there is no unique maximal frequent subset of T.Indeed,there are two maximal frequent subsets of T,namely
32、cola,peanuts and peanuts,cheese.Consequently,it is not obvious to which maximal frequent subset the profit margin m(T)should be allocated.Moreover,we would not allocate the entire profit margin m(T)to the selected item set,but rather the proportion m(X)that corresponds to the items contained in the
33、selected maximal subset.Now how can one determine to which of both frequent subsets of T this margin should be allocated?As we have already discussed,the crucial idea here is that it really depends on what has been the purchase intentions of the customer who purchased T.Unfortunately,one can never k
34、now exactly since we havent asked the customer at the time of purchase.However,the support of the frequent subsets of T may provide some probabilistic estimation.Indeed,if the support of a frequent subset is an indicator for the probability of occurrence of this purchase combination,then according t
35、o the data,customers buy the maximal subset cola,peanuts two times more frequently than the maximal subset peanuts,cheese.Consequently,we can say that it is more likely that the customers purchase intention has been cola,peanuts instead of peanuts,cheese.This information is used to construct the pro
36、bability distributionTj,reflecting the relative frequencies of the frequent subsets of T.Now,each time a sales transaction 外文翻译 7 cola,peanuts,cheese is encountered in the data,a random draw from the probability distribution Tj will provide the most probable purchase intention(i.e.frequent subset)fo
37、r that transaction.Consequently,on average in two of the three times this transaction is encountered,maximal subset cola,peanuts will be selected and m(cola;peanuts)will be allocated to M(cola;peanuts).After this,T is split up as follows:T:=T cola;peanutsand the process of assigning the remaining ma
38、rgin is repeated as if the new T were a separate transaction,until T does not contain a frequent set anymore.3.2 Category Management Restrictions As pointed out in Section 2.1,a second limitation of the previous PROFSET model is its inability to include category management restrictions.This sometime
39、s causes the model to exclude even all products from one or more categories because they do not contribute enough to the overall profitability of the optimal set.This often contradicts with the mission of retailers to offer customers a wide range of products,even if some of those categories or produ
40、cts are not profitable enough.Indeed,customers expect supermarkets to carry a wide variety of products and cutting away categories/departments would be against the customers expectations about the supermarket and would harm the stores image.Therefore,we want to offer the retailer the ability to incl
41、ude category restrictions into the generalized PROFSET model.This can be accomplished by adding an additional index k to the product variable iQ to account for category membership,and by adding constraints on the category level.Several kinds of category restrictions can be introduced:which and how m
42、any categories should be included in the optimal set,or how many products from each category should be included.The relevance of these 外文翻译 8 restrictions can be illustrated by the following common practices in retailing.First,when composing a promotion leaflet,there is only limited space to display
43、 products and therefore it is important to optimize the product composition in order to maximize cross-selling effects between products and avoid product cannibalization.Moreover,according to the particular retail environment,the retailer will include or exclude specific products or product categori
44、es in the leaflet.For example,the supermarket in this study attempts to differentiate from the competition by the following image components:fresh,profitable and friendly.Therefore,the promotion leaflet of the retailer emphasizes product categories that support this image,such as fresh vegetables an
45、d meat,freshly-baked bread,ready-made meals,and others.Second,product category constraints may reflect shelf space allocations to products.For instance,large categories have more product facings than smaller categories.These kind of constraints can easily be included in the generalized PROFSET model
46、 as will be discussed hereafter.外文翻译 9 中文翻译 对对零售超市数据零售超市数据进行进行 最优产品最优产品选择的选择的数据挖掘框架数据挖掘框架:广义广义 PROFSEPROFSET T 模型模型 第一章第一章 引言引言 当今几乎所有的中大型零售商拥有电子销售交易系统,零售商认识到,竞争优势将不再仅仅取决于使用这些系统管理目的的库存或便利客户退房。相反的,谁能够在提取这些数据背后隐藏的、由数据库生成的信息,并用它来优化其营销决策,就能获得竞争优势。在此背景下,能够最成功地从这些数据中提取可操作信息的零售商,他们提取的信息在零售行业中是至关重要的,而且具有特有的
47、竞争优势。如果我们假设关联规则挖掘具有一些熟悉的基本概念,从大型零售数据库运用关联规则挖掘2,可以帮助零售商成功地提取这方面的知识。近年来,随着关联规则利润的发展,在零售市场分析方向的许多区域出现了投资现象。目前,基于此规则,已经发展了一些利润客观评价方法,以便排除一些无利润因素;例如:规则数据特性的支持和密度、利润、应用的完整性、J-规则以及关联。其他的方法是基于此规则的同步性发展起来的。其次,人们已经意识到掌握这些信息,在决定这些规则的相关利润时扮演极为重要的角色,然而,例如像不可预测性、行为能力和规则模板的利润客观标准已经被提出,最终,在零售商微观经济框架理论的协助下,当今主流的研究方向
48、已经转向关联利润的评估,更重要的是,它已经用于在大型企业的决策制定,以加强统一性。在本文的后部分,作者优先介绍了一种面向产品选择的模块PROFSET。它在零售知识上,从质量管理和数量管理两方面进行了阐述,为外文翻译 10 的是能够对特定规格的产品产生最大的效益。这个模块的关键点在于它不能基于个体特性来进行选择,而是基于它们产生的特性集合,包括因交叉交易产生的特性。但是最初它还不能克服在超级市场中表现出的一些缺陷,为解决之一问题,本文引入了一种现有 PROFSET 模块的重要改进版,可以有效地运用到大型市场上。进一步,我们发展了一种专业于零售行业的模块,包括产品种类管理规则,以便让模块色输出更加
49、真实。本文接下来的内容分布如下:第二章,我们介绍以前 PROFSET 模块的局限性;第三章,介绍集成化PROFSE 模块;第四章,介绍集成化 PROFSET 模块在实用市场数据方面的一些以有点;最后在第五章,总结本文,并介绍一些将来研究方向。第二章第二章 PROFSEPROFSET T 模块模块 PROFSET 模块概念的关键之处在于当评价一个商品的商业价值时,不仅要看到它本身的个体效益(自然方法),更要考虑在交易过程中与其他产品相结合时的效益。然而,当评价一种产品的市场效益时,必须从全局出发,而不是着眼于个体,因为前者更能反映市场上消费者多次、重复购买的市场特性。正如 Cabena 等人提出
50、的观点,疏忽了产品之间的联系,就会失去了解市场上商业间相互联系所产生的价值。而 PROFSET 模块的设计目的正是用来解决这样的问题。实际上,如果利用联系的观点看,一瓶酒加牡蛎的价值等同于一加仑的牛奶加谷物的价值,这个例子说明,当评价联系商品的市场价值时,必须考虑进零售的围观市场因素。PROFSET 通过评价在每次交易中不同产品产生的利润差,进一步评价交叉销售的可能性,而不是对单个商品进行评价。在下面的内容中,将讲述到以前 PROFSET 模块的局限性以及更多的细节。2.1 局限性 以前的 PROFSET 模块专门从自动便利超市对市场采购数据进行研究和发展,由于消费者一般不会在一次消费中购买大