Giter VIP home page Giter VIP logo

pacman's Introduction

Pacman

A Deep Convolutional Q-Network (DCQN) has been developed and trained to master the Pacman environment within OpenAI's Gymnasium. This project is dedicated to harnessing reinforcement learning methodologies, empowering an agent to independently navigate and successfully play the game of Pacman.

Deep Convolutional Q-Learning

Q-Learning

Q-learning is a model-free reinforcement learning algorithm used to learn the quality of actions in a given state. It operates by learning a Q-function, which represents the expected cumulative reward of taking a particular action in a particular state and then following a policy to maximize cumulative rewards thereafter.

Deep Learning

Deep learning involves using neural networks with multiple layers to learn representations of data. Convolutional neural networks (CNNs) are a specific type of neural network commonly used for analyzing visual imagery. They are composed of layers such as convolutional layers, pooling layers, and fully connected layers, which are designed to extract features from images.

Combining Q-Learning with Deep Learning

In traditional Q-learning, a Q-table is used to store the Q-values for all state-action pairs. However, for complex environments with large state spaces, maintaining such a table becomes infeasible due to memory constraints. Deep Q-Networks address this issue by using a neural network to approximate the Q-function, mapping states to Q-values directly from raw input pixels.

Deep Convolutional Q-Learning

Deep Convolutional Q-Learning specifically utilizes CNNs as function approximators within the Q-learning framework. It's particularly effective in environments where the state space is represented as visual input, such as playing Atari games from raw pixel inputs.

The general steps involved in training a Deep Convolutional Q-Learning agent are as follows:

Observation

The agent observes the environment, typically represented as images or raw sensory data. Action Selection: Based on the observed state, the agent selects an action according to an exploration strategy, such as ε-greedy.

Reward

The agent receives a reward from the environment based on the action taken.

Experience Replay

The agent stores its experiences (state, action, reward, next state) in a replay buffer.

Training

Periodically, the agent samples experiences from the replay buffer and uses them to update the parameters of the neural network using techniques like stochastic gradient descent (SGD) or variants like RMSprop or Adam.

Target Network

To stabilize training, a separate target network may be used to calculate target Q-values during updates. This network is periodically updated with the parameters from the primary network.

Iteration

The process of observation, action selection, and training continues iteratively until convergence. By leveraging deep convolutional neural networks, Deep Convolutional Q-Learning has demonstrated remarkable success in learning effective control policies directly from high-dimensional sensory input, making it a powerful technique for solving complex reinforcement learning problems, especially in the realm of visual tasks like playing video games or robotic control.

Overview of Pacman Environment

video

Description

The Pacman environment is a classic arcade game where the player controls Pacman, a character moving around a maze, eating food pellets while avoiding ghosts. When Pacman consumes a Power Pellet, it gains the ability to eat the ghosts.

Action Space

The action space for Pacman is Discrete(5), meaning there are five possible actions Pacman can take. These actions include moving up, down, left, right, or taking no action (NOOP). Optionally, with full_action_space=True, all 18 possible actions that can be performed on an Atari 2600 can be enabled.

Observation Space

The observations in the Pacman environment can be of three types: "rgb", "grayscale", and "ram". If "rgb" is chosen, the observation space is Box(0, 255, (210, 160, 3)), representing a color image with three color channels (RGB). If "ram" is chosen, the observation space is Box(0, 255, (128,)), representing the RAM (Random Access Memory) of the Atari 2600. If "grayscale" is chosen, the observation space is Box(0, 255, (210, 160)), representing a grayscale version of the RGB observation.

The Architecture of the Neural Network

Convolutional Layers

The network consists of four convolutional layers (conv1, conv2, conv3, and conv4). These layers are responsible for learning hierarchical features from the input image (state). Each convolutional layer applies a set of filters to the input image to extract relevant features. The output channels for each convolutional layer are 32, 64, 64, and 128, respectively. The kernel sizes for the convolutional layers are 8x8, 4x4, 3x3, and 3x3, and the strides are 4, 2, 1, and 1, respectively.

Batch Normalization Layers

Batch normalization layers (bn1, bn2, bn3, and bn4) are applied after each convolutional layer to normalize the activations. This helps in stabilizing and accelerating the training process by reducing internal covariate shift.

Fully Connected Layers

Following the convolutional layers, there are three fully connected layers (fc1, fc2, and fc3). These layers are responsible for transforming the high-dimensional feature maps from the convolutional layers into Q-values for each action. The first fully connected layer has 512 output features, the second has 256 output features, and the third has action_size output features, which corresponds to the dimension of the action space.

Activation Function

ReLU (Rectified Linear Unit) activation function is applied after each layer, except for the output layer. ReLU introduces non-linearity to the network, allowing it to learn complex relationships in the data.

Forward Pass

In the forward method, the input state is passed through the convolutional layers with ReLU activation and batch normalization applied at each step. Then, the output is flattened (view) to be fed into the fully connected layers. Finally, the Q-values for each action are obtained as the output of the last fully connected layer (fc3).

License

This project is licensed under the MIT License - see the LICENSE file for details.

pacman's People

Contributors

neill-erasmus avatar

Watchers

 avatar  avatar

pacman's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.