Giter VIP home page Giter VIP logo

godka / pensieve-ppo Goto Github PK

View Code? Open in Web Editor NEW
65.0 8.0 31.0 15.7 MB

The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.

Home Page: https://godka.github.io/Pensieve-PPO/

License: BSD 2-Clause "Simplified" License

Python 16.20% DIGITAL Command Language 83.80%
pensieve reinforcement-learning a2c ppo dqn tensorflow deep-learning pytorch

pensieve-ppo's Introduction

Pensieve PPO

Updates

May. 4, 2024: We removed the Elastic, revised BOLA, and add new baseline Comyco [3] and Genet [2].

Jan. 26, 2024: We are excited to announce significant updates to Pensieve-PPO! We have replaced TensorFlow with PyTorch, and we have achieved a similar training speed while training models that rival in performance.

For the TensorFlow version, please check Pensieve-PPO TF Branch.

Dec. 28, 2021: In a previous update, we enhanced Pensieve-PPO with several state-of-the-art technologies, including Dual-Clip PPO and adaptive entropy decay.

About Pensieve-PPO

Pensieve-PPO is a user-friendly PyTorch implementation of Pensieve [1], a neural adaptive video streaming system. Unlike A3C, we utilize the Proximal Policy Optimization (PPO) algorithm for training.

This stable version of Pensieve-PPO includes both the training and test datasets.

You can run the repository by executing the following command:

python train.py

The results will be evaluated on the test set (from HSDPA) every 300 epochs.

Tensorboard Integration

To monitor the training process in real time, you can leverage Tensorboard. Simply run the following command:

tensorboard --logdir=./

Pretrained Model

We have also added a pretrained model, which can be found at this link. This model demonstrates a substantial improvement of 7.03% (from 0.924 to 0.989) in average Quality of Experience (QoE) compared to the original Pensieve model [1]. For a more detailed performance analysis, refer to the figures below:

If you have any questions or require further assistance, please don't hesitate to reach out.

Additional Reinforcement Learning Algorithms

For more implementations of reinforcement learning algorithms, please visit the following branches:

[1] Mao H, Netravali R, Alizadeh M. Neural adaptive video streaming with Pensieve[C]//Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017: 197-210.

[2] Xia, Zhengxu, et al. "Genet: automatic curriculum generation for learning adaptation in networking." Proceedings of the ACM SIGCOMM 2022 Conference. 2022.

[3] Huang, Tianchi, et al. "Comyco: Quality-aware adaptive video streaming via imitation learning." Proceedings of the 27th ACM international conference on multimedia. 2019.

pensieve-ppo's People

Contributors

dependabot[bot] avatar godka avatar kasimte avatar ruixiaozhang avatar shuffle0412 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pensieve-ppo's Issues

The time to train the model

Could you give me some hint on how long it takes to train the model with 1000000 epoch? Several days or weeks?With much thanks.

TQL

tqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltqltql

a question about compute_v

Hi,gaoka
thanks for your work.
but i'm confused by your calculation of Adv

Pensieve-PPO/src/ppo2.py

Lines 150 to 151 in f310bf1

for t in reversed(range(ba_size - 1)):
R_batch[t, 0] = r_batch[t] + GAMMA * R_batch[t + 1, 0]

you let R_t=r_t+\gamma * R_{t+1}.
but in this loop,you use reversed.does that mean R_t=r_t+\gamma * R_{t-1}
is this a typo?
wait for your reply,thanks.

setting entropy to TD_loss summary vars

In train.py, why did you set TD_loss variable to entropy in summary_vars[0]?
You already computed avg_entropy in summary_vars[2]!

 summary_str = sess.run(summary_ops, feed_dict={
                    summary_vars[0]: actor._entropy,
                    summary_vars[1]: avg_reward,
                    summary_vars[2]: avg_entropy
                })
                writer.add_summary(summary_str, epoch)
                writer.flush()

Could you please clarify?

Monitor cross-validation curve

I was trying to monitor the TD_loss and rewards using tensorboard by launching it using this command:
sim> tensorboard logdir=./results
However, I couldn't see the validation curve of the rl-test.py
I can only see the TD_loss, entropy and rewards of train.py.
How can I enable and show the cross-validation curve to monitor the model convergence?

SAC import error

In train_sac.py there is an import from sac import Network which gives an error!

How did you install soft actor-critic (sac)?
Default installation using pip install sac leads to Network not found error!

How is your baseline method implemented?

I can see from your experimental results comparison chart that there are many algorithms, including BOLA, HYB, RB, etc. Can the implementation details of these algorithms be made public?
image

Way to replicate baseline results for different data sets

Hey, I am exploring github for solutions similar/based on Pensieve by Hongzimao. I am looking for a way to create benchmark platform for Pensieve vs non rl ABR implementations popular in research like BB, RB, BOLA, MPC and all other that's been included here in baseline directory. I have already created tools for generating different data sets basing on https://dash.itec.aau.at/dash-dataset/. It is part of my masters degree work, so it would be very helpful to receive some hints.

Training with Multiple videos with random number of bitrates masked.

We are trying to train this model for multiple video using PPO with different masks(variable number of bit rates masked).
For example, there are currently maximum 12 bitrates and some of them are randomly masked in different videos. Some times there are only 9 of them available and some time only 6 and so on.

So far we are trying to use the same dataset as used for A3C. We have modified abr.py and abrenv.py accordingly. But in our approach there seems to be issues in experience batch creation and its further processing. Can you share a reference how to create a video data set for PPO with different masks.

a2c vs ppo NN architecture

According to the code, PPO uses the same network architecture (actor-critic), state-space, and reward space.
The difference between A2C and PPO resides in clipping the policy, right?

 # adaptive entropy weight
 # https://arxiv.org/abs/2003.13590
 p_batch = np.clip(p_batch, ACTION_EPS, 1. - ACTION_EPS)
 _H = np.mean(np.sum(-np.log(p_batch) * p_batch, axis=1))
 _g = _H - self.H_target
 self._entropy_weight -= self.lr_rate * _g * 0.1

Therefore, the design of NN architecture of Pensieve-PPO looks like Pensieve's.
image

Is it so, or does the architecture have another design?
Could you please share the design and the paper?

Thanks

How to improve exploration?

What is the parameter to enable better exploration in PPO?
Is it EPS ?
How to control exploration and exploitation in PPO?
Do you recommend a good approach for this purpose?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.