Giter VIP home page Giter VIP logo

cp-sat-log-analyzer's Introduction

I am an experienced algorithm engineer with a background in theoretical computer science. I have a knack for tackling complex, especially NP-hard, problems by fusing theoretical rigor with practical implementation. My work has been recognized with best paper (CIAC, ALENEX) and student awards.


Skills

Technical Skills ๐Ÿ’ก

  • Combinatorial Optimization: Proficient in solving NP-hard problems to optimality using techniques like Mixed Integer Programming, Constraint Programming, custom branch-and-bound algorithms, SAT-solvers, and Second Order Cone Programming.
  • Approximation & Meta-heuristics: Skilled in finding near-optimal solutions via approximation algorithms, meta-heuristics, and LNS-variants.
  • Algorithmic Foundations: Strong background in theoretical computer science, with a comprehensive understanding of algorithmic concepts and their practical applications, such as complexity, approximation, and graph theory.
  • Programming & Performance: Adept in writing maintainable Python and C++ code for complex algorithms, with expertise in performance tuning and modularization. Check out my repositories for examples.
  • Data Analysis & Visualization: Capable of managing, evaluating, and quality-checking complex data, as well as visualizing data sets for insights and decision-making. Refer to my dissertation for exemplary empirical evaluations.
  • Machine Learning: Familiar with machine learning techniques and have applied them successfully in research projects. Eager to explore their potential to augment classical algorithms, although this is neither a primary focus nor an area of expertise.

Research Skills ๐Ÿ“š

  • Theory-Practice Bridge: Skillful in bridging the gap between theoretical computer science and practical implementation, highlighted by interdisciplinary collaborations and consultancy.
  • Interdisciplinary Collaboration: Extensive involvement in projects across diverse fields such as robotics, bioinformatics, automotive, and satellite management.
  • Creativity & Curiosity: Demonstrated curiosity and creativity in learning and combining new techniques, as evidenced by the diverse techniques used in my dissertation and the variety of projects undertaken.

Soft Skills ๐Ÿค

  • Teaching & Presentation: Proficient in teaching and presenting complex topics, as proven by the positive evaluation of my lecture on algorithm engineering and the popularity of my online material.
  • Project Management: Experienced in managing multiple projects and student teams concurrently, as evidenced by the number of successfully completed projects.

๐Ÿ”ฌ Research ๐Ÿ“– Publications ๐ŸŽ“ Dissertation
Explore my ongoing research projects. Discover my published works. My dissertation for a coherrent sample of my work (2017-2022).

๐Ÿค Let's Connect!

I am open to research stays and collaborative opportunities to broaden my expertise and contribute to groundbreaking research. Let's work together to solve challenging problems and make a lasting impact.

cp-sat-log-analyzer's People

Contributors

d-krupke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cp-sat-log-analyzer's Issues

Parser errors when `num_workers` parameter is set

Problem

Logs from solver runs in which the num_workers parameter is set cannot be parsed.

Example code
# Example from https://d-krupke.github.io/cpsat-primer/04_modelling.html

from ortools.sat.python import cp_model

# Weighted, directed graph as instance
# (source, destination) -> cost
dgraph = {
    (0, 1): 13,
    (1, 0): 17,
    (1, 2): 16,
    (2, 1): 19,
    (0, 2): 22,
    (2, 0): 14,
    (3, 0): 15,
    (3, 1): 28,
    (3, 2): 25,
    (0, 3): 24,
    (1, 3): 11,
    (2, 3): 27,
}
model = cp_model.CpModel()
# Variables: Binary decision variables for the edges
edge_vars = {(u, v): model.NewBoolVar(f"e_{u}_{v}") for (u, v) in dgraph.keys()}
# Constraints: Add Circuit constraint
# We need to tell CP-SAT which variable corresponds to which edge.
# This is done by passing a list of tuples (u,v,var) to AddCircuit.
circuit = [
    (u, v, var) for (u, v), var in edge_vars.items()  # (source, destination, variable)
]
model.AddCircuit(circuit)

# Objective: minimize the total cost of edges
obj = sum(dgraph[(u, v)] * x for (u, v), x in edge_vars.items())
model.Minimize(obj)

# Solve
solver = cp_model.CpSolver()
solver.parameters.num_workers = 2
# solver.parameters.num_search_workers = 1
solver.parameters.log_search_progress = True
solver.log_callback = print  # (str)->None


status = solver.Solve(model)
assert status in (cp_model.OPTIMAL, cp_model.FEASIBLE)
tour = [(u, v) for (u, v), x in edge_vars.items() if solver.Value(x)]
print("Tour:", tour)
Example log:
Starting CP-SAT solver v9.10.4067
Parameters: log_search_progress: true num_workers: 2

Initial optimization model '': (model_fingerprint: 0xc8df407b243296f5)
#Variables: 12 (#bools: 12 in objective)
  - 12 Booleans in [0,1]
#kCircuit: 1

Starting presolve at 0.00s
  3.77e-05s  0.00e+00d  [DetectDominanceRelations] 
  6.72e-04s  0.00e+00d  [PresolveToFixPoint] #num_loops=2 #num_dual_strengthening=1 
  6.19e-06s  0.00e+00d  [ExtractEncodingFromLinear] 
  3.11e-04s  1.42e-05d  [Probe] #probed=24 #new_binary_clauses=6 
  8.63e-06s  0.00e+00d  [MaxClique] 
  2.76e-05s  0.00e+00d  [DetectDominanceRelations] 
  5.42e-04s  0.00e+00d  [PresolveToFixPoint] #num_loops=1 #num_dual_strengthening=1 
  4.71e-06s  0.00e+00d  [ProcessAtMostOneAndLinear] 
  7.22e-05s  0.00e+00d  [DetectDuplicateConstraints] 
  2.00e-06s  0.00e+00d  [DetectDominatedLinearConstraints] 
  1.37e-06s  0.00e+00d  [DetectDifferentVariables] 
  2.23e-06s  0.00e+00d  [ProcessSetPPC] 
  2.36e-06s  0.00e+00d  [FindAlmostIdenticalLinearConstraints] 
  4.16e-06s  0.00e+00d  [FindBigAtMostOneAndLinearOverlap] 
  2.70e-06s  1.20e-07d  [FindBigVerticalLinearOverlap] 
  1.62e-06s  0.00e+00d  [FindBigHorizontalLinearOverlap] 
  2.15e-06s  0.00e+00d  [MergeClauses] 
  2.50e-05s  0.00e+00d  [DetectDominanceRelations] 
  9.11e-05s  0.00e+00d  [PresolveToFixPoint] #num_loops=1 #num_dual_strengthening=1 
  2.78e-05s  0.00e+00d  [ExpandObjective] 

Presolve summary:
  - 0 affine relations were detected.
  - rule 'TODO dual: only one blocking constraint?' was applied 36 times.
  - rule 'presolve: 0 unused variables removed.' was applied 1 time.
  - rule 'presolve: iteration' was applied 1 time.

Presolved optimization model '': (model_fingerprint: 0x6ab67e65d25f3b4d)
#Variables: 12 (#bools: 12 in objective)
  - 12 Booleans in [0,1]
#kCircuit: 1

Preloading model.
#Bound   0.01s best:inf   next:[0,231]    initial_domain
#Model   0.01s var:12/12 constraints:1/1

Starting search at 0.01s with 2 workers.
1 full problem subsolver: [default_lp]
11 incomplete subsolvers: [feasibility_pump, graph_arc_lns, graph_cst_lns, graph_dec_lns, graph_var_lns, rins/rens, rnd_cst_lns, rnd_var_lns, routing_path_lns, routing_random_lns, violation_ls]
3 helper subsolvers: [neighborhood_helper, synchronization_agent, update_gap_integral]
#Bound   0.02s best:inf   next:[63,231]   default_lp
#1       0.02s best:63    next:[]         default_lp (fixed_bools=1/13)
#Done    0.02s default_lp

Task timing                     n [     min,      max]      avg      dev     time         n [     min,      max]      avg      dev    dtime
          'default_lp':         1 [  7.03ms,   7.03ms]   7.03ms   0.00ns   7.03ms         1 [  2.97us,   2.97us]   2.97us   0.00ns   2.97us
    'feasibility_pump':         2 [294.74us, 323.73us] 309.23us  14.49us 618.46us         1 [  7.05us,   7.05us]   7.05us   0.00ns   7.05us
       'graph_arc_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
       'graph_cst_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
       'graph_dec_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
       'graph_var_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
           'rins/rens':         1 [331.75us, 331.75us] 331.75us   0.00ns 331.75us         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
         'rnd_cst_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
         'rnd_var_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
    'routing_path_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
  'routing_random_lns':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns
        'violation_ls':         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns         0 [  0.00ns,   0.00ns]   0.00ns   0.00ns   0.00ns

Search stats     Bools  Conflicts  Branches  Restarts  BoolPropag  IntegerPropag
  'default_lp':     13          0        26        25          71             98

SAT stats        ClassicMinim  LitRemoved  LitLearned  LitForgotten  Subsumed  MClauses  MDecisions  MLitTrue  MSubsumed  MLitRemoved  MReused
  'default_lp':             0           0           0             0         0         0           0         0          0            0        0

Lp stats         Component  Iterations  AddedCuts  OPTIMAL  DUAL_F.  DUAL_U.
  'default_lp':          1           7          0        5        0        0

Lp dimension                                                Final dimension of first component
  'default_lp':  8 rows, 12 columns, 24 entries with magnitude in [1.000000e+00, 1.000000e+00]

Lp debug         CutPropag  CutEqPropag  Adjust  Overflow  Bad  BadScaling
  'default_lp':          0            0       4         0    0           0

Lp pool          Constraints  Updates  Simplif  Merged  Shortened  Split  Strenghtened  Cuts/Call
  'default_lp':            8        0        0       8          0      0             0        0/0

LNS stats                Improv/Calls  Closed  Difficulty  TimeLimit
       'graph_arc_lns':           0/0      0%        0.50       0.10
       'graph_cst_lns':           0/0      0%        0.50       0.10
       'graph_dec_lns':           0/0      0%        0.50       0.10
       'graph_var_lns':           0/0      0%        0.50       0.10
           'rins/rens':           0/0      0%        0.50       0.10
         'rnd_cst_lns':           0/0      0%        0.50       0.10
         'rnd_var_lns':           0/0      0%        0.50       0.10
    'routing_path_lns':           0/0      0%        0.50       0.10
  'routing_random_lns':           0/0      0%        0.50       0.10

LS stats           Batches  Restarts  LinMoves  GenMoves  CompoundMoves  WeightUpdates
  'violation_ls':        0         0         0         0              0              0

Solutions (1)    Num   Rank
  'default_lp':    1  [1,1]

Objective bounds     Num
      'default_lp':    1
  'initial_domain':    1

Solution repositories    Added  Queried  Ignored  Synchro
  'feasible solutions':      1        0        0        1
        'lp solutions':      0        0        0        0
                'pump':      1        0

CpSolverResponse summary:
status: OPTIMAL
objective: 63
best_bound: 63
integers: 13
booleans: 13
conflicts: 0
branches: 26
propagations: 71
integer_propagations: 98
restarts: 25
lp_iterations: 7
walltime: 0.0252102
usertime: 0.0252104
deterministic_time: 2.4303e-05
gap_integral: 5.13657e-05
solution_fingerprint: 0xabc7fcb3d5a11d39

Tour: [(0, 1), (2, 0), (3, 2), (1, 3)]

Submitting this log file to CP-SAT-Log-Analyzer's streamlit interface gives the following error:

File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
File "/mount/src/cp-sat-log-analyzer/app.py", line 34, in <module>
    show_overview(parser)
File "/mount/src/cp-sat-log-analyzer/_app/overview.py", line 44, in show_overview
    value=solver_block.get_number_of_workers(),
File "/mount/src/cp-sat-log-analyzer/cpsat_log_parser/blocks/solver.py", line 59, in get_number_of_workers
    raise ValueError("No number of workers found")

Investigation

CP-SAT-Log-Analyzer checks for two things:

def get_number_of_workers(self) -> int:
# the line looks like this: "Setting number of workers to 24"
for line in self.lines:
if line.startswith("Setting number of workers to"):
return int(line.strip().split(" ")[-1])
# If `num_search_workers` is set, the number of workers is not shown in the log.
if "num_search_workers" in self.get_parameters():
return int(self.get_parameters()["num_search_workers"])
raise ValueError("No number of workers found")

Turns out, I'm setting num_workers parameter instead of num_search_workers.

The parameter file from Google OR-Tools says that num_search_workers is deprecated in favor of num_workers:

https://github.com/google/or-tools/blob/43e400c4f2749268d6a53d39be8f019906fd19c9/ortools/sat/sat_parameters.proto#L560-L568

Proposed solution

Since num_search_workers is still supported as fallback whenever num_workers is not set, I propose adding another if statement like this:

def get_number_of_workers(self) -> int:
    # the line looks like this: "Setting number of workers to 24"
    for line in self.lines:
        if line.startswith("Setting number of workers to"):
            return int(line.strip().split(" ")[-1])

    # If `num_workers` is set, the number of workers is not shown in the log.
    if "num_workers" in self.get_parameters():
        return int(self.get_parameters()["num_workers"])

    # `num_search_workers` is deprecated in favor `num_workers`, but if the
    # latter is not set, `num_search_workers` is still used.
    if "num_search_workers" in self.get_parameters():
        return int(self.get_parameters()["num_search_workers"])

    raise ValueError("No number of workers found")

Add a first estimation of the costs of constraints

It is relatively easy to read from the log if some expensive constraints, such as Multiplication are used.
Maybe it would be nice to directly annotate those, e.g., by ๐ŸŒ๐ŸŒ๐ŸŒ/๐ŸŒ๐ŸŒ, to indicate that these can slow down the model quite a bit.

On a same note, it may be good to display the danger of creating infeasible models due to the integer arithmetic. For example Division can be super dangerous. This part is more difficult, as this problem is quite difficult to detect.

Add a function to aggregate logs

A single log does not always tell you the truth and you may want to aggregate multiple logs to see the "average behavior", like how quickly the optimality gap falls on average.

This of course requires that the logs are relatively similar as otherwise there is no use in this. If you try to aggregate the logs of different problems, you cannot expect to get anything useful out of it. However, imagine you are solving a scheduling problem every day and want to see the average behavior of CP-SAT or detect certain outliers.

One can start by overlaying the plots, but could also do boxplots of the other metrics, such as runtime, number of variables, etc. Could require to extend the parser first to allow to access more of the values.

Integrate model name and fingerprints into the overview

While the model names are given by the user itself, and shouldn't be a surprise, the fingerprint of it can be a useful indicator if the model building is deterministic. This of course requires the fingerprint to actually be deterministic on different machines etc. AFAIK, the Gurobi support team uses such a feature to verify that they actually recreated the same model (in Gurobi). I have to check if the fingerprint of CP-SAT is actually reliable, too.

There is also the solution fingerprint, which can be used to quickly check if the solution of two different solve processes is identical. This may require the models to have the same fingerprints, too, to be useful.

Add a function to print to logs side by side

This should be relatively easy, but I am not sure about how to keep the user interface simple.
Could be very useful if you want to check if your changes actually lead to an improvement.

Right now, I simply open two windows, but this makes it difficult to align the entries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.