1、本科生毕业设计(论文)外文文献翻译毕业设计题目: 交通灯智能控制系统 学 院: 信息科学与工程学院 专业班级: 测控技术与仪器0703班 学生姓名: 王欣 指导教师: 桑海峰 2011年 3月 19日外文原文Intelligent Traffic Light ControlMarco Wiering, Jelle van Veenen, Jilles Vreeken, and Arne Koopman Intelligent Systems GroupInstitute of Information and Computing Sciences Utrecht UniversityPadual
2、aan 14, 3508TB Utrecht, The Netherlandsemail: marcocs.uu.nlJuly 9, 2004AbstractVehicular travel is increasing throughout the world, particularly in large urban areas.Therefore the need arises for simulating and optimizing traffic control algorithms to better accommodate this increasing demand. In th
3、is paper we study the simulation and optimization of traffic light controllers in a city and present an adaptive optimization algorithm based on reinforcement learning. We have implemented a traffic light simulator, Green Light District, that allows us to experiment with different infrastructures an
4、d to compare different traffic light controllers. Experimental results indicate that our adaptive traffic light controllers outperform other fixed controllers on all studied infrastructures.Keywords: Intelligent Traffic Light Control, Reinforcement Learning, Multi-Agent Systems (MAS), Smart Infrastr
5、uctures, Transportation Research1 IntroductionTransportation research has the goal to optimize transportation flow of people and goods.As the number of road users constantly increases, and resources provided by current infrastructures are limited, intelligent control of traffic will become a very im
6、portant issue in the future. However, some limitations to the usage of intelligent traffic control exist. Avoiding traffic jams for example is thought to be beneficial to both environment and economy, but improved traffic-flow may also lead to an increase in demand Levinson, 2003.There are several m
7、odels for traffic simulation. In our research we focus on microscopic models that model the behavior of individual vehicles, and thereby can simulate dynamics of groups of vehicles. Research has shown that such models yield realistic behavior Nagel and Schreckenberg, 1992, Wahle and Schreckenberg, 2
8、001.Cars in urban traffic can experience long travel times due to inefficient traffic light control. Optimal control of traffic lights using sophisticated sensors and intelligent optimization algorithms might therefore be very beneficial. Optimization of traffic light switching increases road capaci
9、ty and traffic flow, and can prevent traffic congestions. Traffic light control is a complex optimization problem and several intelligent algorithms, such as fuzzy logic, evolutionary algorithms, and reinforcement learning (RL) have already been used in attempts to solve it. In this paper we describ
10、e a model-based, multi-agent reinforcement learning algorithm for controlling traffic lights.In our approach, reinforcement learning Sutton and Barto, 1998, Kaelbling et al., 1996 with road-user-based value functions Wiering, 2000 is used to determine optimal decisions for each traffic light. The de
11、cision is based on a cumulative vote of all road users standing for a traffic junction, where each car votes using its estimated advantage (or gain) of setting its light to green. The gain-value is the difference between the total time it expects to wait during the rest of its trip if the light for
12、which it is currently standing is red, and if it is green. The waiting time until cars arrive at their destination is estimated by monitoring cars flowing through the infrastructure and using reinforcement learning (RL) algorithms.We compare the performance of our model-based RL method to that of ot
13、her controllers using the Green Light District simulator (GLD). GLD is a traffic simulator that allows us to design arbitrary infrastructures and traffic patterns, monitor traffic flow statistics such as average waiting times, and test different traffic light controllers. The experimental results sh
14、ow that in crowded traffic, the RL controllers outperform all other tested non-adaptive controllers. We also test the use of the learned average waiting times for choosing routes of cars through the city (co-learning), and show that by using co-learning road users can avoid bottlenecks.This paper is
15、 organized as follows. Section 2 describes how traffic can be modelled, predicted, and controlled. In section 3 reinforcement learning is explained and some of its applications are shown. Section 4 surveys several previous approaches to traffic light control, and introduces our new algorithm. Sectio
16、n 5 describes the simulator we used for our experiments, and in section 6 our experiments and their results are given. We conclude in section 7.2 Modelling and Controlling TrafficIn this section, we focus on the use of information technology in transportation. A lot of ground can be gained in this a
17、rea, and Intelligent Transportation Systems (ITS) gained interest of several governments and commercial companies Ten-T expert group on ITS, 2002, White Paper, 2001, EPA98, 1998.ITS research includes in-car safety systems, simulating effects of infrastructural changes, route planning, optimization o
18、f transport, and smart infrastructures. Its main goals are: improving safety, minimizing travel time, and increasing the capacity of infrastructures. Such improvements are beneficial to health, economy, and the environment, and this shows in the allocated budget for ITS.In this paper we are mainly i
19、nterested in the optimization of traffic flow, thus effectively minimizing average traveling (or waiting) times for cars. A common tool for analyzing traffic is the traffic simulator. In this section we will first describe two techniques commonly used to model traffic. We will then describe how mode
20、ls can be used to obtain real-time traffic information or predict traffic conditions. Afterwards we describe how information can be communicated as a means of controlling traffic, and what the effect of this communication on traffic conditions will be. Finally, we describe research in which all cars
21、 are controlled using computers.2.1 Modelling Traffic.Traffic dynamics bare resemblance with, for example, the dynamics of fluids and those of sand in a pipe. Different approaches to modelling traffic flow can be used to explain phenomena specific to traffic, like the spontaneous formation of traffi
22、c jams. There are two common approaches for modelling traffic; macroscopic and microscopic models.2.1.1 Macroscopic models.Macroscopic traffic models are based on gas-kinetic models and use equations relating traffic density to velocity Lighthill and Whitham, 1955, Helbing et al., 2002. These equati
23、ons can be extended with terms for build-up and relaxation of pressure to account for phenomena like stop-and-go traffic and spontaneous congestions Helbing et al., 2002, Jin and Zhang, 2003, Broucke and Varaiya, 1996. Although macroscopic models can be tuned to simulate certain driver behaviors, th
24、ey do not offer a direct, flexible, way of modelling and optimizing them, making them less suited for our research.2.1.2 Microscopic models.In contrast to macroscopic models, microscopic traffic models offer a way of simulating various driver behaviors. A microscopic model consists of an infrastruct
25、ure that is occupied by a set of vehicles. Each vehicle interacts with its environment according to its own rules. Depending on these rules, different kinds of behavior emerge when groups of vehicles interact.Cellular Automata. One specific way of designing and simulating (simple) driving rules of c
26、ars on an infrastructure, is by using cellular automata (CA). CA use discrete partially connected cells that can be in a specific state. For example, a road-cell can contain a car or is empty. Local transition rules determine the dynamics of the system and even simple rules can lead to chaotic dynam
27、ics. Nagel and Schreckenberg (1992) describe a CA model for traffic simulation. At each discrete time-step, vehicles increase their speed by a certain amount until they reach their maximum velocity. In case of a slower moving vehicle ahead, the speed will be decreased to avoid collision. Some random
28、ness is introduced by adding for each vehicle a small chance of slowing down. Experiments showed realistic behavior of this CA model on a single road with emerging behaviors like the formation of start-stop waves when traffic density increases.Cognitive Multi-Agent Systems. A more advanced approach
29、to traffic simulation and optimization is the Cognitive Multi-Agent System approach (CMAS), in which agents interact and communicate with each other and the infrastructure. A cognitive agent is an entity that autonomously tries to reach some goal state using minimal effort. It receives information f
30、rom the environment using its sensors, believes certain things about its environment, and uses these beliefs and inputs to select an action. Because each agent is a single entity, it can optimize (e.g., by using learning capabilities) its way of selecting actions. Furthermore, using heterogeneous mu
31、lti-agent systems, different agents can have different sensors, goals, behaviors, and learning capabilities, thus allowing us to experiment with a very wide range of (microscopic) traffic models.Dia (2002) used a CMAS based on a study of real drivers to model the drivers response to travel informati
32、on. In a survey taken at a congested corridor, factors influencing the choice of route and departure time were studied. The results were used to model a driver population, where drivers respond to presented travel information differently. Using this population, the effect of different information sy
33、stems on the area where the survey was taken could be simulated. The research seems promising, though no results were presented.A traffic prediction model that has been applied to a real-life situation, is described in Wahle and Schreckenberg, 2001. The model is a multi-agent system (MAS) where driv
34、ing agents occupy a simulated infrastructure similar to a real one. Each agent has two layers of control; one for the (simple) driving decision, and one for tactical decisions like route choice. The real world situation was modelled by using detection devices already installed. From these devices, i
35、nformation about the number of cars entering and leaving a stretch of road are obtained. Using this information, the number of vehicles that take a certain turn at each junction can be inferred. By instantiating this information in a faster than real-time simulator, predictions on actual traffic can
36、 be made. A system installed in Duisburg uses information from the existing traffic control center and produces real-time information on the Internet. Another system was installed on the freeway system of North Rhine-Westphalia, using data from about 2.500 inductive loops to predict traffic on 6000
37、km of roads.中文译文智能交通灯控制马克 威宁,简丽 范 威,吉尔 威瑞肯,安瑞 库普曼智能系统小组乌得勒支大学信息与计算科学研究所荷兰乌得勒支Padualaan14号邮箱:marcocs.uu.nl2004年7月9日摘要世界各地的车辆运行逐渐增多,尤其是在一个大的本地区域。因此就需要有关交通控制的模拟与优化算法,来更好的地适应日益增长的需求。在文中,我们学习了在城市中的模拟与优化的交通灯控制器,以及目前基于强化学习的自适应优化算法。我们已经实行了一个交通等模拟器,绿灯区,这允许我们用不同的基础设施和不同的交通控制器去实验。实验结果表明,在所有基础设施的研究领域内,我们的自适应交通灯
38、控制器优于其他固定的控制器。关键字:智能交通灯控制,强化学习,多代理系统(MAS),智能基础设施,运输研究1 介绍运输研究的目的是优化人流和物流。随着道路使用者的数量不断上涨,当前基础设施所提供的资源受到限制,在未来,交通智能控制将会成为一个非常重要的问题。然而,一些交通智能控制使用受限问题的存在。避免交通堵塞,例如,被认为是对环境和经济有益的,但是增加的交通流也可能导致资源需求的增加。莱文森,2003。这有几个交通仿真模型。在我们的研究中,我们专注于那些具有个体车辆行为的微观模型,从而更好的模拟群体车辆的动力学。研究表明,这种模型的出现具有现实意义Nagel and Schreckenber
39、g,1992,Wahle and Schreckenberg,2001。汽车在城市交通中经历了漫长的运行时间,要归因于低效的交通灯控制。因此,使用成熟传感器和智能优化算法的交通灯优化控制可能是有益的。优化的交通灯切换增加了道路的容量和人流,能阻止交通堵塞。交通灯控制是一个复杂的优化问题和几个智能算法,例如模糊逻辑、遗传算法和强化学习(RL)已被应用去试图解决问题。在本文中,我们描述了一种对交通灯控制,基于模型的、多代理的强化学习算法。我们的方法,强化学习Sutton and Barto,1998,Kaelbling,1996和基于道路使用者的价值功能威宁,2000被用来决定每个交通灯的优化选择
40、。这个决定是基于道路使用者站了一个交叉路口的累积投票,在那里每辆汽车使用其估计选票的优势(或增益)设置它的光的绿色。在其余路程,它的所有等待时间里,如果信号灯现在是红色的或者绿色的,那么增益的值是不同的。汽车直到到达目的地后的等待时间,是通过监测汽车流过基础设施和应用强化学习(RL)算法而估算出来的。本文写作安排如下。第二部分描述了交通是如何被建立、预测和控制的。在第三部分解释了什么是强化学习和一些它的应用。第四部分调查了几个以前交通控制的方法,介绍了我们的新算法。第五部分描述了我们实验中所使用的仿真器,以及第六部分给出我们的实验和实验结果。在第七部分我们得出结论。2 建立和控制交通在这一部分
41、,我们专注于在交通运输方面所使用的信息技术。在这个区域增加了大量的土地,并且一些政府和商业公司在交通智能系统(ITS)方面获得了利润。Ten-T expert group on ITS,2002,白皮书,2001,EPA98,1998。交通智能系统(ITS)研究包括车内安全系统,基础设施改变所引起的仿真效果,路途规划,优化运输和智能的基础设施。其主要目标是:提高安全性、减少运行时间、增加基础设施的能力。这种改进有益健康、经济、环境,这表现在交通智能系统的分配预算方面。在本文中,我们主要对车流的优化感兴趣,从而有效减少平均运行(或者等待)的车辆次数。一种常见的分析交通的工具就是交通仿真器。在这部
42、分中,我们将首先描述两种常用于交通模型的技术。然后我们将描述模型是如何用来获取实时交通信息或者预测交通情况的。后来,我们描述信息是如何作为一种控制交通的手段来进行沟通的,在这样的交通条件下,沟通产生了什么样的影响。最后,我们描述了所有的汽车都使用计算机进行控制的研究。2.1 建立交通与交通动力学仅有的相似之处是,例如,流体力学和管内的沙子。建立车流模型的不同方法是用来解释交通的特殊现象的,就像自发形成的交通堵塞状况。有两种普遍的方法去建立交通:宏观和微观模型。2.1.1 宏观模型宏观交通模型是基于gas-kinetic模型的,利用了关于交通密度和速度的方程式Lighthill and Whit
43、ham,1955,Helbing et al.,2002。这些方程式可以延长积累和放松压力,归因于类似的停停走走的交通和自发的拥堵的现象。Helbing et al.,2002,Jin and Zhang, 2003,Broucke and Varaiya,1996。尽管宏观模型可以来模拟一些特定的可调驱动行为,但是他们不能提供一个直接的、灵活的建立和优化交通的方法,这使他们不太适合我们的研究。2.1.2 微观模型与宏观模型相对比的,微观交通模型提供了一种仿真各种各样司机行为的方法。一个微观模型由一组车辆占据的基础设施组成。每辆车都根据自己的规则,和周围的环境产生作用。根据这些规则,当很多车辆
44、互相作用时,不同种类的行为就会出现。元胞自动机。一个在基础设施上的具体设计和仿真(简单的)汽车驾驶规则,利用了元胞自动机(CA)。元胞自动机运用离散的部分连接细胞,那些细胞就能处于一种特殊的状态下。例如,一个道路细胞可以包含一辆汽车或者也可以是空的。当地的转换规则决定了系统的动力学,甚至简单的规则可以导致混沌动力学。Nagel and Schreckenberg (1992)描述了这种用于交通仿真的元胞自动机模型。在每个离散的时间-步长内,车辆在一定数值上增加自身的速度,直到他们的最大速度。万一如果车速较慢的车行驶在前面,那么车辆的速度将会降低,避免冲撞。一些无规则性是通过增加每辆车的小几率减
45、速而被介绍的。实验表明,在单一路段上,当交通密度增加时,元胞自动机模型的现实行为会以起始波浪那样的形式出现。认知的多代理系统。一个更先进的交通仿真和优化方法是认知多代理系统方式(CMAS),这些代理互相作用并且与其他代理和基础设施相沟通。一个认知代理是一个整体,这个整体利用最小的努力,试着去达成一些目标状态。他利用自己的传感器接收来自环境的信息,对这些来自环境的信息产生信任,利用这些信任,输入信号来选择一个行动。因为每个代理都是一个单独的整体,他可以优化他的选择行为(例如使用学习能力)。此外,不同的代理有不同的传感器、目标、行为和学习能力,利用异构多代理系统,从而使我们利用一个非常广泛的(微观
46、)交通模型去实验。Dia(2002)使用了一个基于真正司机模型的、认知多代理系统的运行信息回应。在调查中,采取了一个拥挤的走廊,选择了有影响因素的路线和起飞时间来研究。这个结果以前被用于建立一个司机数量模型,司机对现有的运行信息回应不同。利用这一数量,在研究的区域内,不同信息系统的影响能被仿真出来。尽管目前没有结论,但是这样的研究似乎是有希望的。2.2 预测交通对于优化控制来说,预测交通条件的能力是重要的。例如,如果我们在现有的条件下,能够知道哪些道路将会在未来堵塞,这些信息能够传输给道路使用者,那么他们就能规避这条道路,从而缓解整个系统的拥堵现象。另外,如果我们能准确的预测不同驾驶策略的后果
47、,一个优化决定(或者至少对这个区间路段的优化预测)就能通过预测结果进行比较。在一个交叉路口,最简单形式的交通预测就是通过在某段时间内测量交通,或者假设下一时段的交通和现在相同Ledoux,1996,一个交通灯下,神经网络被用于表示对一列排队的长期预测。一个多感知层Rumelhart et al.,1986被训练来预测下一个时间-步长的排队长度,长期预测可以由迭代法一步预测制成。当预测以十个步长进行时,生成的网络将十分精确,但是它还没有被纳入到控制器里。应用于真实生活情况的交通预测模型被描述在Wahle and Schreckenberg, 2001。该模型是一种占用一个仿真基础设施的多代理系统(MAS)。每个代理都有两层控制:一个是(简单的)驾驶决定,另一个是类似于路径选择的决定战术策略。利用已经安装好的探测装置,真实世界的情况就能被建立成模型。从这些装置中,这段路进入和离开的车辆数量的信息能被获得。利用这些信息,在每个交叉路口转弯的车辆数量能够被推断。通过举例说明这种信息的速度要快于实时模拟系统,预测实际的交通就能被做到。一个被安装在杜伊斯堡的系统,使用存在的交通控制中心的信息,在互联网上产生实时信息。另一个系统被安装在北威斯特法伦州的高速公路上,它使用了2.500的感应线圈来预测6000千米的道路交通。指导教师评语: 指导教师签字:年 月 日