meelgroup / arjun Goto Github PK

CNF minimizer and minimal independent set calculator

License: Other

CMake 13.65% Shell 2.88% C++ 81.54% Python 1.62% Makefile 0.31%

arjun's Introduction

Arjun

A minimal independent set calculator and CNF minimizer that uses a combination of steps to simplify a CNF such that its count stays the same. Name is taken from the Hindu myth where Arjun is known for being the "one who concentrates the most". This system is also used as a preprocessor for our tool ApproxMC and should be used in front of GANAK. For the paper, see here.

Note that the simplification part of Arjun contains code from SharpSAT-td by Tuukka Korhonen and Matti Jarvisalo, see this PDF and this code for details. Note that treewidth-decomposition is not part of Arjun.

How to Build

To build on Linux, you will need the following:

sudo apt-get install build-essential cmake
sudo apt-get install zlib1g-dev

Then, build our version of CaDiCaL, CadiBack, CryptoMiniSat, and Arjun:

# not required but very useful
sudo apt-get install zlib1g-dev

git clone https://github.com/meelgroup/cadical
cd cadical
git checkout mate-only-libraries-1.8.0
./configure
make
cd ..

git clone https://github.com/meelgroup/cadiback
cd cadiback
git checkout mate
./configure
make
cd ..

git clone https://github.com/msoos/cryptominisat
cd cryptominisat
mkdir build && cd build
cmake ..
make
cd ../..

git clone https://github.com/meelgroup/sbva
cd sbva
mkdir build && cd build
cmake ..
make
cd ../..

git clone https://github.com/meelgroup/arjun
cd arjun
mkdir build && cd build
cmake ..
make
sudo make install
sudo ldconfig

How to Use

Run it on your instance and it will give you a reduced independent set:

./arjun input.cnf output.cnf
c [arjun] original sampling set size: 500
c ind 1 4 5 20 31 0
[...]
c [arjun] Done dumping. T: 1.0406

This means that your input independent set of your input formula input.cnf, which had a size of 500 has been reduced to 5, which is ony 1% of the original set. The simplified formula with the smaller independent set has been output to output.cnf. The final simplified will contain a line such as:

c MUST MULTIPLY BY 1024

This means that the final count of the CNF must be multiplied by 2^10 (i.e. 1024) in order to get the correct count. Note that if you forget to multiply, the count will be wrong. So you must multiply.

Only extracting independent set

In case you are only interested in a reduced independent set, use:

./arjun input.cnf
c [arjun] original sampling set size: 500
c ind 1 4 5 20 31 0

This will not write an output file, but only display the reduced independent set.

arjun's People

Contributors

Stargazers

Watchers

Forkers

meelgroup

arjun's Issues

How to run example instance

Documentation example says

./arjun input.cnf output.cnf
c [arjun] original sampling set size: 500
c ind 1 4 5 20 31 0
[...]
c [arjun] Done dumping. T: 1.0406

My Question ???

Is it a command line? If so, where is located the ./arjun file after successfully completed the build process?

Cheers!

Mate

Licence missing

I came to the repo following dependencies: unigen -> approxmc -> arjun

All other tools are MIT, can you please provide a LICENCE file to clarify.

Looking for an Option to minimize without deleting variables.

Documentation

The docs says by setting --renumber to 0, it avoid renumbering of variables so I can keep track of a deleted variable if it ever happened.

Question

Is there an option or a way to avoid searching for independent set by deleting variables ?

Cheers!

Mate

[Feature Request] Mapping between variables of original and new formulas

Hello,

Along with the smaller CNF, it will be helpful to get a mapping to the variables of the original formula, which can potentially help with sampling, weighted counting etc. Similarly, even for empty occurrences, it would be useful to know which variables they were (for eg for weighted counting when the weights don't sum to 1).

Thanks,
Aditya

static compilation

Would it be possible to provide a tutorial about static compilation?

BTW, it seems that some "cd .." are missed in the building script.

In case there are many independent variables the --backbone 0 doesn't help

In particular, ./arjun min_broken.cnf --xor 0 gives:

c [mis] Start unknown size: 4833
c [mis] iter:   160 T/F/U: 161/0/0    by:   1 U:    4672 I:     161 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:   321 T/F/U: 161/0/0    by:   1 U:    4511 I:     322 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:   482 T/F/U: 161/0/0    by:   1 U:    4350 I:     483 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:   643 T/F/U: 161/0/0    by:   1 U:    4189 I:     644 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:   804 T/F/U: 161/0/0    by:   1 U:    4028 I:     805 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:   965 T/F/U: 161/0/0    by:   1 U:    3867 I:     966 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  1126 T/F/U: 161/0/0    by:   1 U:    3706 I:    1127 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  1287 T/F/U: 161/0/0    by:   1 U:    3545 I:    1288 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:  1448 T/F/U: 161/0/0    by:   1 U:    3384 I:    1449 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  1609 T/F/U: 161/0/0    by:   1 U:    3223 I:    1610 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:  1770 T/F/U: 161/0/0    by:   1 U:    3062 I:    1771 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  1931 T/F/U: 161/0/0    by:   1 U:    2901 I:    1932 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  2092 T/F/U: 161/0/0    by:   1 U:    2740 I:    2093 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  2253 T/F/U: 161/0/0    by:   1 U:    2579 I:    2254 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  2414 T/F/U: 161/0/0    by:   1 U:    2418 I:    2415 N:       0 backb avg:    0.0 backb max:      0 T: 0.20
c [mis] iter:  2575 T/F/U: 161/0/0    by:   1 U:    2257 I:    2576 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  2736 T/F/U: 161/0/0    by:   1 U:    2096 I:    2737 N:       0 backb avg:    0.0 backb max:      0 T: 0.18
c [mis] iter:  2897 T/F/U: 161/0/0    by:   1 U:    1935 I:    2898 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  3058 T/F/U: 161/0/0    by:   1 U:    1774 I:    3059 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  3219 T/F/U: 161/0/0    by:   1 U:    1613 I:    3220 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  3380 T/F/U: 161/0/0    by:   1 U:    1452 I:    3381 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  3541 T/F/U: 161/0/0    by:   1 U:    1291 I:    3542 N:       0 backb avg:    0.0 backb max:      0 T: 0.19
c [mis] iter:  3702 T/F/U: 161/0/0    by:   1 U:    1130 I:    3703 N:       0 backb avg:    0.0 backb max:      0 T: 0.20
c [mis] iter:  3863 T/F/U: 161/0/0    by:   1 U:     483 I:    3864 N:     486 backb avg:    3.0 backb max:     98 T: 0.23
c [mis] iter:  4024 T/F/U: 161/0/0    by:   1 U:      26 I:    4025 N:     782 backb avg:    1.8 backb max:     99 T: 0.23
c [mis] backward round finished T: 4.81

Check out the backb max being 0 for a long while. That makes backb ineffective, and forces us to pass a huge vector every time, copying it, etc. That slows us down by a factor of 1.5x

Sometimes, we produce worse output than B+E

Without --xorgate 0 --orgate 0 --itegate 0 --irreggate 0 --sort 10 we get:

./bpluse-compare.sh faster_with_be/wage_circuit_div.t1.i18.777adaa9.stp.cnf
Running on CNF file faster_with_be/wage_circuit_div.t1.i18.777adaa9.stp.cnf
Running Arjun...
1.88user 0.01system 0:01.89elapsed 99%CPU (0avgtext+0avgdata 28372maxresident)k
0inputs+168outputs (0major+6320minor)pagefaults 0swaps
Running BPE (new, compiled)
0.89user 0.00system 0:00.89elapsed 99%CPU (0avgtext+0avgdata 16364maxresident)k
0inputs+944outputs (0major+2603minor)pagefaults 0swaps
ARJUN vs BPE (new, compiled)
num (non-set) vars      839     num (non-set) vars      774
num cls                4365     num cls                3810
num bin cls             285     num bin cls             184
max cl size              20     max cl size              36
avg non-bin cl sz      4.1      avg non-bin cl sz      4.5
num (non-unit) lits   21600     num (non-unit) lits   20342

With --xorgate 0 --orgate 0 --itegate 0 --irreggate 0 --sort 10 we get:

 ./bpluse-compare.sh faster_with_be/wage_circuit_div.t1.i18.777adaa9.stp.cnf
Running on CNF file faster_with_be/wage_circuit_div.t1.i18.777adaa9.stp.cnf
Running Arjun...
1.30user 0.00system 0:01.30elapsed 99%CPU (0avgtext+0avgdata 21156maxresident)k
0inputs+136outputs (0major+3534minor)pagefaults 0swaps
Running BPE (new, compiled)
0.90user 0.00system 0:00.90elapsed 100%CPU (0avgtext+0avgdata 14556maxresident)k
0inputs+944outputs (0major+3344minor)pagefaults 0swaps
ARJUN vs BPE (new, compiled)
num (non-set) vars      765     num (non-set) vars      774
num cls                3723     num cls                3810
num bin cls             202     num bin cls             184
max cl size              20     max cl size              36
avg non-bin cl sz      3.8      avg non-bin cl sz      4.5
num (non-unit) lits   17669     num (non-unit) lits   20342

Pinging @arijitsh since he reported the issue. This issue on GitHub is to make sure the issue is not lost.

Likely fix:

make --sort 10 the default
Turn off gate-based system for small CNFs where it doesn't help that much in terms of time.

CNF with 0 clauses output

Hello,

I ran arjun with default parameters on the file mc2022_track1_001.cnf and got the file output.cnf with 0 clauses as output. I am not sure what the issue is or if I am running the tool incorrectly. The command I used was
./arjun --input mc2022_track1_001.cnf output.cnf

I have also attached the output of the tool.

Thanks,
Aditya
issue_files.zip

Independent Set contains variables greater than numVars

Hi,

Arjun sometimes generates CNFs with independent set containing variable numbers that are greater than the number of variables in the header. I was running arjun without the "renumber 0" feature. In one example (mc21_track1_122.cnf) this behavior showed up for the latest commit but not the static release version. In second example (mc2022_track1_023.cnf) this behavior showed up for both latest commit and static release. Please see attached files.

Thanks,
Aditya
ind-set_oob_upload-files.zip

Arjun on Raspberry Pie

Hi Mate,

First I tried to port Arjun to Windows.
I couldn't get around the problem that "find_equiv_subformula()" is no known method of CMSat::SATSolver.
Something must have confused my compiler (MS VC++ 2022).
(update 03-Mar-2023: Turned out to be my fault: I used cms 5.11.2 rather than 5.11.4)
For the time being, I gave up the porting and resorted to my Raspberry Pie.

Building and testing Arjun on a Raspberry Pie went smoothly overall.
However, I’ve encountered the following issues:
(I hope that you don't mind to get this "bundle" of issues rather than several separate issues)

To circumvent a compile error (known from my Windows port), I had to change the following return type in Arjun.h (line 145)

       //  long unsigned get_backbone_simpl_max_confl() const;
       uint64_t get_backbone_simpl_max_confl() const;

As found by @ale-depi, I had to execute the following before running arjun

sudo ldconfig -v

Otherwise, the shared libaries like liblouvain_communities.so.1.0 are not found.

During make, I got a lot of warnings. I hope and assume that these can be ignored.

For a CNF example with interspersed xcnf/XOR clauses, I got an assertion failure:

arjun: /home/ak/cppsrc/arjun/src/arjun.cpp:103: bool ArjunNS::Arjun::add_xor_clause(const std::vector<unsigned int>&, bool):
Assertion `false && "Funnily enough this does NOT work. The XORs would generate a BVA variable, and that would then not be
returned as part of the simplified CNF. We could calculate a smaller independent set, but that's all."' failed.

I would be interested to contribute a mapping tool to get the original literal IDs back.
Do you have any hints, where I could start/look to get this accomplished?

I tried two examples and got a massive reduction in variables and clauses.
964 variables in 3088 clauses shrunk to 452 variables in 2704 clauses
3248 variables in 10808 clauses where simplified to 1520 variables in 9594 clauses.

Now I am curious, if the resulting SAT solution is correct and easy/easier to find.

Greetings,

Axel

missing method to map back the solution

Hi @meelgroup

There is no way to map back the solution found by a SAT solver to the original problem. The tool is very useful because simplifies a lot CNFs but sometimes it is needed the interpretation rather than just the satisfiability.

This could help since CryptoMiniSat do not support anymore the preprocessing as one could see from this.

Remarks.

In the Arjun README at the end of "How to Build" commands it could be useful an ldconfig.
Since you are suggesting sudo make install, also ./arjun ... can be just arjun ...
I see that you are also the developer of ApproxMC. The above suggestion for ldconfig works as well.
Note that in the section How to use the Python interface, the double ** is get as a Markdown delimiter messing the text.

Thank you for your work.

Alessandro