Collision Avoidance System

Introduction
Project Development
1. Environment
2. Agent
3. State
4. Policy
5. Actions
6. Models Used
7. Reward Function
8. Implementation
Results
1. Deep Q-network
2. Double Deep Q-network
3. Duelling Double Deep Q-network
4. Comparison between models
Installation

Introduction

This report is drawn up after the development of the thesis project at Tongji University (同济大学) for the Double Degree project Politong.
The thesis project aims to develop a machine learning algorithm that allows an autonomous vehicle to learn to drive and avoid obstacles. The research will focus on Reinforcement Learning methods.

Project Development

The project is developed in Unreal Engine 4 (4.27.0) for the graphic and environment part, and Python for the Machine Learning part.

As it is common in a Reinforcement Learning context, The agent is in an environment it does not know and it will learn a good policy in order to avoid the walls using a trial and error approach. The agent will be rewarded with positive or negative rewards depending on the actions it performs.

Environment

There are 2 main environments: a simple track and a complex one.

Simple Track & Complex Track

The simple track is used for training and validation. Since this is used for training, it does not have long straightaways or sharp turns (Left image)
The complex track is used for testing. This will have more long straightaways or sharp turns. (Right image)

Agent

The agent is a car available on the Vehicle Variety pack asset of Unreal Engine 4.

State

The agent senses the environment around it using some Light Detection and Ranging sensors (LiDAR). The number of LiDAR sensors to use is an important hyper-parameter: in this case, good results were found using 32 LiDAR sensors. These 32 sensors are equally spaced in an angle range of 170◦, this is another important hyper-parameter.

The beam's colour depends on whether the ray has hit an obstacle or not: red object is near enough, green no hit. There are also two other beams (yellow and blue) they have for visual purposes, in particular, to make humans understand how the agent learns; these will be used in the Reward Function.

Policy

The policy, as mentioned early, is a mapping from state to action: how the agent chooses an action in a given state. The goal is to improve the policy over time in order to increase the cumulative reward.

Actions

There are 2 possible actions the agent is allowed to perform: throttle and steer. These are discretised into 5 actions:

a₁, steer to the left:
a₂, slightly steer to the left:
a₃, go fully forward:
a₄, slightly steering to the right:
a₅, steer to the right:

Models Used

Deep Q-Network
Double Deep Q-Network
Duelling Double Deep Q-Network

Reward Function

The reward function is an important function that tells the agent what is correct and what is wrong using rewards and punishments. Popular methods need some external information, for example where the middle of the road is, where the next checkpoint is and so on. The proposed reward function in this project tries only to get information from the car sensors.
Basically, the reward is proportional to the direction of the car and its speed (more information in the report; section 3.6)

Implementation

The whole system can be divided into 2 parts: the environment part and the machine learning part.

The environment part is developed in Unreal Engine 4 and it includes the environment itself and the physical agent; from this part, the agent senses the world performs actions and gets the rewards.
The machine learning part is developed in Python and it is where the neural networks run.
The 2 parts communicate through HTTP requests: Unreal Engine is the client and Python is the server. The 2 parts communicate through HTTP requests: Unreal Engine is the client and Python is the server. The main reason why there are two parts and that they communicate through HTTP requests is that: using Python tools (TensorFlow, etc.) is convenient when dealing with machine learning and write everything in C++ would be time-consuming and prone to errors; Unreal Engine does not allow native communication with Python scripts.

Results

Deep Q-network

Double Deep Q-network

Duelling Double Deep Q-network

Comparison between models

From these plots it is possible to see some differences between the learning paths of the models and that they were able to reach the goal and drive correctly.

More details in the report; chaper 4.

Installation

Clone the repository

Change the IP address of the server here

Start the Unreal Engine client

Start the Python server

Start the simulation

maurivass / collision-avoidance-system-with-rl Goto Github PK