prosysscience / jssenv Goto Github PK

An OpenAi Gym environment for the Job Shop Scheduling problem.

License: MIT License

Python 100.00%

job-shop-schedulling job-shop-scheduling-problem combinatorial-optimization deep-reinforcement-learning openai-gym reinforcement-learning reinforcement-learning-environments openai-gym-environment

jssenv's People

Contributors

Stargazers

Watchers

jssenv's Issues

Is it possible that I only want jobs to go through only few machines instead of all machines ?

needhelp: Can’t get Gantt chart to work

When I run the main.py program in the JSS project, the Gantt chart is never displayed on the screen, while the results in wandb indicate that the program is running well. I am quite confused why this happens. Could you please give me some suggestions? Thanks in advance! :)

Trouble running the repository

Hello!
I am a complete beginner with the OpenAI Gym and I am not entirely sure how to use the repository.
What would be the quickest way to run the implementation (e.g. on Google Colab) on an example dataset (like the ones under JSS>Instances), and then get a nice Gantt chart like the one you have in your example.

How to prevent overfitting?

Hello there,

I am working on a very similar area, that is using reinforcement learning for optimization of the semiconductor manufacturing process and your implementation is easy to follow and has been very helpful I would say. Just wanted to ask a question here, how are preventing overfitting?

I can see that you are training the model on just one instance from your instances folder and iterating it for a while, so how are making sure that the model would not overfit? I believe, it should be trained on multiple instances to make it more stable. If you are already training it on multiple instances, ignore my question but please help me by pointing me how did you achieve that in your implementation?

Cheers,
Smita

Regarding implementation of _prioritization_non_final function

Hello there,
Thank you for the environment implementation, it has been a great help for me.
While going through your paper, it was written that you want to somehow prioritize the non-final operations but in your implementation I did not understand how you achieved that,

if len(non_final_job) > 0:
        for job in final_job:
              current_time_step_final = self.todo_time_step_job[job]
              time_needed_legal = self.instance_matrix[job][current_time_step_final][1]
              if time_needed_legal > min_non_final:
                  self.legal_actions[job] = False
                  self.nb_legal_actions -= 1

In the implementation of your function dedicated to achieving the non-final-operation-prioritization, there are a few things that I found quite confusing,

You are first making sure that the length of non-final jobs is non-zero and then entering in a loop that has nothing to do with it.
Secondly, inside this loop at line 6, you are actually making the job with its final operation illegal and decreasing the legal actions(in other words you are prioritizing it), I tried to evaluate that sentence with an example as well but it did not work, for eg: lets's say we have two jobs J1 and J2 at final job operation and non-final job operation respectively with the time taken by J1 is 5 and time taken by J2 is 3(keeping that in mind that they both need the same machine), in this implementation, it actually prioritizes the final operation and that should not be the case, I think according to paper.

Any explanation will be highly appreciated.
Thanks in advance!

Error about self.next_time_step

JSSenv.py, line 367, in increase_time_step
next_time_step_to_pick = self.next_time_step.pop(0)
IndexError: pop from empty list

Most of conditions that won't happen. Sometimes this error will happen but I haven't find any logic error. The above result is from ta01 dataset. Could you help me please?

A question about learning rate.

In your paper "A Reinforcement Learning Environment For Job-Shop Scheduling", I saw that you used a linear scheduler for the learning rate and the entropy coefficient who decay linearly each parameter from 0.0006861 to 0.00007783 and from 0.0002458 to 0.002042 respectively.
I am confused about the learning rate and the entropy coefficient. Is this learning rate Actor-Net's learning rate? Is this entropy coefficient Critric-Net's learning rate?

Modification in ENV to have priorization of Jobs and More Limitation on Jobs

Hi there,

You did a great job and thanks for sharing your work, I was wondering if any additional logic like a constraint on the job selection or its availability by changing the legal_job array makes sense or not. For me, it is important to prioritize the job and define more constraint on the one activity of a job? I know it is not based on this logics but can you give me a hint about how I can implement it? just by changing the legal job status?

need help :Can't get the Gantt chart gif from env.render()

I use JSSEnv project as environment
![圖片1](https://user-images.githubusercontent.com/88368771/168755281-015ba7bb-d5fb-43c7-bf35-1c64193069a6.png)
(here is my code)

I had used render() function to render the image, but it didn't work. The GIF doesn't appear successfully . I would like to ask how can i make the Gantt chart appear successfully by using JSSEnv project.

Question regarding check_no_op method.

Dear Prosysscience Team,

I am using your code and your paper for my Masterthesis. It has been a great help, and I want to thank you for that!
Since I am trying to modify your env for my purpose, I would like to understand how the method check_no_op works.

I don't understand the purpose of line 259 that is:
if len(machine_next) == self.nb_machine_legal:

It would be a great help if you could explain this in more detail. Thank you!

Tests broken since last commit?

It seems like the requirements and tests are no longer up to date and are now breaking.

A question about the reward design

First of all, congratulations! This is an interesting work for DRL solving JSSP.

I have read through the paper and been quite enjoying it. I have a question regarding the reward design. So, if you could elaborate a bit, I will be much appreciated. Thanks in advance.

According to the paper, the reward is designed as:

My question is following:
I hypothesize that the term of processing time makes no difference?
Since no matter what decision is made by the agent, the total cumulated rewards will always have the same amount of processing time of all operations, i.e., it is permutation-invariant. For example, p11 + p12 + p13 + ... = p13 + p12 + p11 + ... = p12 + p13 + p11 + .... Therefore it makes no difference to the agent, and the agent will get the same amount of this reward anyway. However, the total idle time makes a lot of sense to me. So I just wondering have you ever tested that the reward is purely based on the total idle time, that is we just remove the term of processing time in the reward?

I understand that including the processing time will make the reward even denser than purely using the total idle time.

Env crashes with IndexError, when using random actions

Occasionally my environment (and thus my ray workers) crash at the beginning of training.

I observed two cases so far:

IndexError: pop from empty list [next_time_step_to_pick = self.next_time_step.pop(0)]
IndexError: index 15 is out of bounds for axis 0 with size 15 [time_needed = self.instance_matrix[action][current_time_step_job][1]]

Obviously the random actions steer the environment into a bad place.

How did you handle this during your own training? Currently I can't train my agents, because they crash when the env crashes. (Im using Ray[rllib])

Code to reproduce the error:

env_config = {"instance_path": "\\instances\\ta20"}
env = gym.make('JSSEnv-v1', env_config=env_config)

obs = env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, _ = env.step(action)
    env.render()
    if done:
        print("Episode ended")
        break
env.close()```

Please specify the application method.

You have done great work.
But I don't understand that how can we apply you model in our files, and get the beautiful chart like you. can you specify step-by-step method for that, it makes it more accessible.
suppose I have Taillard specification file, now what to do next.

Tests_solutions: Tests calls step with illegal action

Still investigating the legal and illegal actions.. (:

I noticed that the following code in test_solutions.py does not check for illegal actions:

            if no_op and not done:
                self.assertTrue(len(env.next_time_step) > 0, "step {}".format(step_nb))
                previous_time_step = env.current_time_step
                state, reward, done, _ = env.step(env.jobs)
                self.assertTrue(env.current_time_step > previous_time_step, "we increase the time step")

But sometimes the nope-action is not legal when getting used.

Not sure if its an issue with the test design or if there is a risk that the agent can't reproduce the solutions on its own.

needhelp: Makespan and Training reward

How can I get Makespan curve in training process and Training reward in training process?

Issue with ray 1.12.0 and Windows in environment instantiation

Hello,
while trying to execute the related RL project in my development environment:

Windows 10
Python 3.9.7
ray 1.12.0
numpy 1.22.3
tensorflow 2.8.0
gym 0.21.0

I encountered two very basic issues.

By default, ray Rllib tries to create a logging directory with the "env" name in it. However, with the configuration "JSSEnv:jss-v1" this leads into error.
Ray was not able to identify the JSSEnv by class name, as suggested by ray env documentation

This triggered me to add proper Gym registration (see my fork of JSSEnv and corresponding adjustments on the Rllib execution files (see my fork of RL-Job-Shop-Scheduling).

Kindly review both branches and feel free to merge and adopt.

BR Philipp

About your code

Hello,after studying your code, I would like to ask how to get the state representation and action distribution updates of each time step from your encapsulated environment. I can't print the expression of the intermediate state of each step.

prosysscience / jssenv Goto Github PK

jssenv's People

Contributors

Stargazers

Watchers

Forkers

jssenv's Issues

Recommend Projects

Recommend Topics

Recommend Org