pyjobshop / pyjobshop Goto Github PK

Solving job shop scheduling problems with constraint programming in Python.

Home Page: https://pyjobshop.readthedocs.io/

License: MIT License

Python 100.00%

constraint-programming scheduling flow-shop job-shop makespan operations-research optimization parallel-machines flexible-job-shop permutation-flow-shop

pyjobshop's Introduction

Note

This package is under development. Expect things to break significantly during the v0.0.x phase.

PyJobShop

PyJobShop is a package for implementing scheduling models in Python. It supports the classically known flexible job shop problem (FJSP) and many of its extensions such as arbitrary precedence relations, sequence-dependent setup times, and many more! The implementation is currently work in progress - feel free to open an issue if you have any questions.

Installation

First, clone this repository to your local setup and cd into it. Then, make sure you have Poetry version 1.2+ installed, and run the following command:

poetry install

This will set-up a virtual environment and install all necessary dependencies.

Installing CPLEX

Running the CP model and solving it requires the optimization engine from IBM CPLEX. See their documentation for more details. If you install the free community edition, you can only solve models with up to 1000 variables and 1000 constraints. Models beyond that size require a paid version or an academic version.

pyjobshop's People

Contributors

Stargazers

Watchers

pyjobshop's Issues

Optional operations and processing plans

It’s currently assumed that all operations must be scheduled. This means that don’t allow for prize-collecting variants of the problem. It may be interesting to look into this at some point and find related literature to see if it’s worth it to include it in our models.

Document CPLEX installation

https://www.ibm.com/academic/home

Rename machine to resource?

Advantage:

Used in Dauzere-Perez et al. (2023).
Generalizes machines to arbitrary resources.

Disadvantage:

The term machine is much more explicit.

Contradiction when using interval size and size_of constraint

This leads to a contradiction in a very simple flexible job-shop problem:

    for op in data.operations:
        m.add_interval_var(name=f"O_{op.id}")

        for idx, machine in enumerate(op.machines):
            var = m.add_interval_var(
                optional=True,
                name=f"A_{op.id}_{machine.id}",
                size=op.durations[idx],
            )
            # The duration of the operation on the machine is at least the
            # duration of the operation; it could be longer due to blocking.
            m.add(m.size_of(var) == op.durations[idx])  # TODO contradiction

Job deadlines

Then we can also model tardiness.

This addition is very straightforward. We can add a deadline attribute to jobs.

Auxilliary resources

Sometimes the processing of an operation requires an auxilliary resource (e.g., worker that oversees the production).

This can be modeled as a RFJSP. The auxilliary resource is just another machine.
For an operation that requires both machine and another resource, we can make two operations.
The first one is simply the processing operation like defined before.
The second one has eligible machines corresponding to its auxilliary resources.
The first and second operation must be synced together.
This means that we need a Synchronize timing constraint.

Fix transport precedence

https://github.com/leonlan/fjsp/blob/fd937421ce2443deb7912fcbb17daf8753707b03/cp.py#L106-L118

Set fixed interval start/end/duration for operations

Some applications may require re-optimizing schedules, where some part of the schedule is already fixed. We should allow users to pass start, end and duration values when constructing Operations, so they can fix those decisions already. We don't need to do it for machines, because that can already be done with the current interface where a user only passes one machine.

If we implement #39, then we should do the same for that.

Documentation

Philosophy: stick as much to PyVRP's documentation styling so we can reuse code, and newly learned things can be applied back.

Fix transportation constraints

https://github.com/leonlan/fjsp/blob/28dafbce8791386c7b7e47db72ec1622731c40bc/fjsp/CpModel.py#L137-L157

This constraint is not correct yet.

To recap:

If there are two operations $i$ and $j$, let $M_i$ denote the eligible machine set for $i$ and define $M_j$ similarly.
Assume that there is an edge $i \rightarrow j$, implying that $i$ must come before $j$. (This is not always true!)
Let $m_i \in M_i$ and $m_j \in M_j$ denote some machine pair.

Nowe we distinguish two cases:

There is a machine edge between $m_i$ and $m_j$. Then there are no restrictions: we can schedule $i$ on $m_i$ or not, and we can schedule $j$ on $m_j$ or not, independent of the other choice.
There is no machine edge between $m_i$ and $m_j$. Then that means that if I schedule on $i$ on $m_i$, then we cannot schedule the operations on $m_j$, because there is an accessibility restriction. Therefore, the presence of $i$ on $m_i$ forces the non-presence of $j$ on $m_j$.

The correct constraint is therefore:
$$\text{PresenceOf}(A) + \text{PresenceOf}(B) \leq 1.$$

Processing modes

Kis (2003) generalizes operations to operate under processing modes instead of a set of eligible machines.
Let $M$ denote a mode, which consists of a set of machine-consumption tuples $(m_1, r_1), (m_2, r_2), \dots$, and takes $p_{iM}$ time.
Let $\mathcal{M}$ denote a set of modes. Processing an operation means that exactly one processing mode must be selected. Note that this generalizes the processing times, which are modes of a single machine with load 1.

The proposed model interface is something like

def add_processing_mode(self, duration: int, *modes: tuple[Machine, int]):
    modes_ = ((self._machine2id(machine), load) for machine, load in modes)
    self._processing_modes[modes_] = duration

FJSP entities and attributes

The survey by

Xiong, H., Shi, S., Ren, D., & Hu, J. (2022). A survey of job shop scheduling problem: The types and models. Computers & Operations Research, 142, 105731. https://doi.org/10.1016/j.cor.2022.105731

has a convenient way of classifying entities and their attributes for the JSP. This will be useful to to design/extend our ProblemData model, and this is also relevant for PyJobShop/FJSPLIB#4 when we want to design a standard benchmark format.

Batching constraints

Batching looks very similar to the contrasets. It's slightly more complicated because it must be decided what to batch together.

Thoughts:

EndToEnd if operations are batched together.
May overlap on a machine for "batched" operations.
Only question is how to decide which operations to overlap together. Introduce a spanning interval variable?

Matheuristic

CP matheuristic not tried before.

https://www.youtube.com/watch?v=4Mkcf-t7VS4&embeds_referring_euri=https%3A%2F%2Fschedulingseminar.com%2F&source_ve_path=MjM4NTE&feature=emb_title

Solution evaluation based on sequences/assignments

#7 introduces a Solution class that asssumes a solution in terms of its scheduled operations and timing decisions. This is easy to get from a solved CP model, but when using heuristics, obtaining the timing constraints is not as easy. We should provide another way to construct a Solution purely based on operation to machine assignments, and then evaluate the timing decisions ourselves on the operations graph.

Custom objective functions

It would be nice to define a class that describes an objective function.

Makepsan: $\max_{i \in O} C_i$
Total completion time: $\sum_{j \in J} \max_{i \in O_j} C_i$
Total tardiness: $\sum_{j \in J} \max ( \max_{i \in O_j} C_i- d_j, 0)$

While at the same time, users should be able to define their own custom objective functions. One use case is when some operations don't contribute to the objective.

Feasible solutions in process plan are incorrect

The newly added constraint admits infeasible solutions (per the OR-graph definition). We need to add precense_of equality constraints among each operation group.

Upload package to PyPI

After #26 is a good moment to make this package available.

`ProblemData.replace()` method

Tests

Release times

Make CPLEX optional, OR-tools default

After #26, we should make OR-tools default and CPLEX optional because the latter is not free.

Job interval variable

The job interval variable should span the operation interval variables and can be used to define objectives or constraints.

Machine availability constraints

Extensions

From Dauzère-Pérès et al. (2023):

(5.1) Time lag constraints
#28
#27
(5.4) Setup time constraints
(5.5) Blocking constraints
(5.6) Transportation constraints
(5.7) Overlapping constraints
(6.1) Complex/non-linear routes
(6.2) Multiple resources

References

Just a list of relevant references for this work:

Dauzère-Pérès, S., Ding, J., Shen, L., & Tamssaouet, K. (2023). The flexible job shop scheduling problem: A review. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2023.05.017
Tamssaouet, K., & Dauzère-Pérès, S. (2023). A General Efficient Neighborhood Structure Framework for the Job-Shop and Flexible Job-Shop Scheduling Problems. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2023.05.018
Framinan, J. M., Perez-Gonzalez, P., & Fernandez-Viagas, V. (2019). Deterministic assembly scheduling problems: A review and classification of concurrent-type scheduling models and solution procedures. European Journal of Operational Research, 273(2), 401–417. https://doi.org/10.1016/j.ejor.2018.04.033
Da Col, G., & Teppan, E. C. (2022). Industrial-size job shop scheduling with constraint programming. Operations Research Perspectives, 9, 100249. https://doi.org/10.1016/j.orp.2022.100249
Xiong, H., Shi, S., Ren, D., & Hu, J. (2022). A survey of job shop scheduling problem: The types and models. Computers & Operations Research, 142, 105731. https://doi.org/10.1016/j.cor.2022.105731

Rename operation to task?

Task is more clear.
Task is shorter.
Task is used in CP terminology.

vs.

Operation is commonly used in FJSP.

Precedence types with delays

Precedence types with delays are needed to model transportation times between constraints with end_at_end precedence types.

Support for OR-Tools

The current CP models are written in CPLEX CP Optimizer. This is proprietary software and expensive if you don't have an academic license. OR-Tools is free and it should be possible to support as well.

Use processing times as machine eligiblity?

Right now the user has to pass in both machine2ops and processing_times in order to set the machine-operation eligibility and processing times, respectively.

    model = Model()

    job = model.add_job()
    machine = model.add_machine()
    operation = model.add_operation(earliest_start=1)

    model.assign_job_operations(job, [operation])
    model.assign_machine_operations(machine, [operation]) # redundant?
    model.add_processing_time(operation, machine, 1) # redundant?

There is clearly some repetition here. When setting processing times, it's (almost?) always the case that the corresponding machine operation pair is eligible.

The reasons why to set the eligiblities explicitly is to minimize the number of assignment variables. From a processing time matrix, it's not easy to infer which (machine, operation) pairs are eligible, so this would result in all pairs to be assignment variables, which is undesired.

Instead, having a processing times dictionary that maps pairs to processing times could work.

Code coverage

Would be nice to add code coverage to see which parts of the codes are untested.

Steps:

Add codecov secret token
Add codecov to GH actions

Update README

Update text
Add badges

Mixed-integer linear programming support

At some point it will be nice to include a MILP model. The nice thing about MILPs is that most solvers can read a general MPS or LP file, so we only need to define the MILP model once.

Solvers to support:

HiGHS: open-source
Gurobi: better than CPLEX according to Naderi et al. (2023).
CPLEX (maybe?)
OR tools (maybe?)

Throughput

Defined as the number of jobs completed before the planning horizon (Ostermeier 2023).

Some things to think about:

Jobs must become optional now.
When is a job "fully" scheduled in combination with process plans? (Without process plans, simply all operations must become present.)
Objective: sense needs to maximize as well.

Machine connection as an assignment precedence constraint

Introduce a new precedence constraint called "transport restriction" for $i \rightarrow j$ meaning that the selected machine for $i$ must be able to access the selected machine in $j$.

Notes on graph representation

The issue serves to discuss how the graph representation used in the codebase (operations graph, machine graph) relate to those discussed in the literature.

(This slide deck has been very helpful.)

In FJSP, it's common to talk about the disjunctive graph. This is a directed graph in which the nodes represent operations and the arcs represent precedence relations. There are two types of arcs:

Conjunctive arcs represent the processing order of the operations for a single job.
Disjunctive arcs connect two operations (belonging to different jobs) that are to be processed on the same machine. Specifically, disjunctive arcs form a clique for each machine.

One way of looking at the FJSP problem is selecting a subset of disjunctive arcs, such that the union of the conjunctive and disjunctive arcs forms a directed acyclic graph (DAG). Cycles correspond to conflicts in the schedule that lead to infeasibility (not respecting precedence).

Rename package name to to `pyjobshop`.

How to enforce that one operation must come after the next one?

This is relevant on press lines. In this case, we have that two batches of the same production order must be processed sequentially (business rule).

Setup times

Setup times are in all different kinds of flavors. The easiest one to tackle first is $st_{ikl}$, which is the setup time when scheduling operation $i$ before $k$ on machine $l$. This can simply be included in the NoOverlap constraint.

It remains a question where to store this information. Machines could have a setup_times matrix attribute, but this requires knowledge of the operation indices. I think it's better to have it in ProblemData.

TODOS

~~There's one more: initial setup times (which are used in the SMRDSDST instances).~~
Setup times matrix dimensions are awkward (op, op, machine). Change to (machine, op, op).

Plot operations and machines graph

Validate processing plans

Processing plans are added in #66. Currently, ProblemData does not validate whether the processing plans are valid, but we should do that for better user experience.

Given a plan $P$, consisting of operation sets $O_1, O_2$, we should check that:

At most one operation set contains a "required" operation.

Install CPLEX in CI environment

It's currently not possible to run all tests from the CI environment because CP Optimizer is not installed.

Format with ruff-format

Model.solve() interface

For now, the solve interface should take a time_limit parameter.

When OR-Tools is supported, it should also take a solver parameter. (#26)

When MIP is supported, it should also take a CP/MIP parameter. (#40)

When matheuristics, it should take a heuristic flag boolean parameter.

Due windows and earliness

We currently support deadlines, which are the latest time at which a job must be completed without incurring penalties. In practice though the deadlines are not a fixed point in time, but rather a time interval during which the job is still considered to be in time

Deadlines generalize to due windows, where deadlines are the "late" due window and we introduce a new "early" due window. This generalization only holds when we only consider tardiness and disregard earliness.

Make operation independent of jobs/machines

Operation depends on the job and machine index. I think this can be made cleaner by separating its dependency on Job and Machine entirely. The relationship between operations and jobs and machines should be passed to ProblemData explicitly.

Can we remove the same sequence constraint?

In c599729 I added same sequence constraints so that the current model can be used to solve permutation flow shop problems.

But can we do without? I think doing it with (general) precedence constraints is possible. The propagation is probably much weaker, but I don't necessarily want the best performance for flow shops here.

Consider two subsequent operations $i$ and $j$ of a single job on consecutive machines. Then $f(i) = s(j)$ forces the same sequence constraint. No, actually, this is much stronger: it's no blocking.

Same sequence is actually doesn't hold up in practice: it assumes unlimited buffers between machines. More often than not, a batch occupies (blocks) the machine until it can move on to the next stage.

EDIT: The above constraint is not blocking, it's no-wait.
Blocking is when we also relax the interval variables to be of size at least $p_{ik}$ rather than be exactly that size.

Test Model

Refactor ScheduledOperation

https://github.com/leonlan/fjsp/blob/ce855f88ecf0c755e92f9c39b2374544a3d5127b/plot.py#L10-L15

Change to

 class ScheduledOperation(NamedTuple): 
     op: Operation 
     assigned_machine: int 
     start: int 
     duration: int

DifferentUnits constraints

$A_{i} \neq A_{j}$ - operations $i$ and $j$ must be assigned to differents machines. This may be relevant in the context of contamination.