jordagaman / zddr Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 0.0 1.85 MB

A zero-suppressed binary decision diagram implementation for R

Home Page: https://jordagaman.github.io/zddr/

License: Other

R 99.45% Dockerfile 0.55%

zddr's People

Contributors

Stargazers

Watchers

zddr's Issues

Optimize anot

zdd_anot may not be optimized. It is not fully self-recursive. The last IF statement includes an OR operation P1\(Q0|Q1).

Evaluate if there are merits in speed or memory in changing this to (P1\Q0)\Q1. Initial intuition suggests it may have benefits because it doesn't involve creating the larger Q0|Q1 ZDD in the process.

Ignore Dockerfile in the build

Bug resulting in non minimal cutsets

The expected result from

zdd_and(1,3,4,5,6,7) | zdd_and(1,2,4,6,7,23) | zdd_and(1,2)

{ 1 2 } 
{ 1 3 4 5 6 7 }

Resolve R-CMD-Check Warnings

Causing CMD-check to fail:

Determine the performance impact of minimizing on every operation

Right now, ZDDs are guaranteed to be minimal because their construction includes the appropriate AND NOT operation. It's pleasant to live in the reality where if a ZDD is encountered, it is reliably minimal.

However, for large problems, this may be slowing down the performance. Determine if it is faster or more efficient to minimize a ZDD after a major operation, but not at every intermediate step.

Add as_zdd for boolean

as_zdd.logical should go to either zdd0() or zdd1(). Duh.

While you're at it, maybe think about overflows for isTrue, etc.

Function to covert cutset results to tibble/matrix

Couldn't figure out how to do this
May be useful to convert the results to other objects such as tibble or matrix

Apply as_zdd to vectors and lists

While the as_zdd function already applies to numeric arrays, it only handles the case where there is a single element. Add an IF statement to this function to account for multiple elements, which should return a zdd_and of the elements.

Next, add the as_zdd function for lists. This should return a zdd_or of the elements.

Identify why ubuntu-devel fails checks but other OSs do not

In the attached logs, 4_ubuntu-16.04 (devel) fails on a test that passes with all other operating systems. Why!

First observed after commit e95664d

Logs saved here: logs_140.zip

Silence the protect function

The protect() function returns the ZDD it is protecting. This is not really necessary for the user.

Example:

a <- seq(10, 40, 1) %>% as.list %>% as_zdd
protect(a)

Memory of ZDD Store:     0.167 MB, item count: 147 ( 1.14 KB/item)
Memory of ZDD Functions: 0.056 MB, item count: 271 ( 0.21 KB/item)
b602bb12ba87c94de49bbf707c4a1505 : 31 cutsets, min order: 1 max order: 1 
1-order 
     31 
{10}, {11}, {12}, {13}, {14}, {15}, {16}, {17}, {18}, {19}, {20}, {21}, {22},
{23}, {24}, {25}, {26}, {27}, {28}, {29}, {30}, {31}, {32}, {33}, {34}, {35},
{36}, {37}, {38}, {39}, {40}

Optimize crossproduct

Crossproduct is ripe for optimization. Right now, P and Q are fully cross multiplied in this function without regard for the dependencies between them. We can cut down on some of the waste if we break up each P*Q by breaking P into cutsets that are not and are non minimal to Q.

P = (P\Q) | PinQ

Q = (Q\P) | QinP

These lines may do the work:

  PnotQ <- P %% Q
  QnotP <- Q %% P
  if( (PnotQ != P) | (QnotP != Q) ) {
    Prem  <- P - PnotQ
    Qrem  <- Q - QnotP
    return( (PnotQ * QnotP) | Prem | Qrem )
  }

Set up codespaces

https://github.com/features/codespaces

This sounds like something that @warner-pra would love to do.

zddr must be loaded via library()

May want to make a note in the readme that zddr functions cannot be called independently without first loading the packing via library("zddr"). See reproducible example below:

> zddr::zdd_and(1,2)
Error in register_zdd(res) : object 'zdd_store' not found
> library(zddr)
> zddr::zdd_and(1,2)
Registered S3 method overwritten by 'pryr':
  method      from
  print.bytes Rcpp
Memory of ZDD Store:     0.005 MB, item count: 5 ( 1 KB/item)
Memory of ZDD Functions: 0.002 MB, item count: 5 ( 0.4 KB/item)
abeefcadb70a482be32fc1017bad384b : 1 cutsets, min order: 2 max order: 2 
{ 1 2 }

Report the distribution of cutsets in a ZDD by order

The order of a cutsets is the number of basic events. It is useful when printing a ZDD to identify the distribution of cutset orders.

Example:

> zdd_and(1,2,3) | zdd(4) | zdd_and(5,6) | zdd_and(7,8,9,10,11,12,13)

should return

Memory of ZDD Store:     0.03 MB, item count: 27 ( 1.11 KB/item)
Memory of ZDD Functions: 0.012 MB, item count: 61 ( 0.2 KB/item)
bf350eb8aff1a3680663c631027bec65 : 4 cutsets, min order: 1 max order: 7 
1-order 2-order 3-order 7-order 
      1       1       1       1 
{1,2,3}, {4}, {5,6}, {7,8,9,10,11,12,13}

Add limit to number of cutsets returned by a print statement

If there are 64 millions sets, maybe don't return them all. Consider two changes:

A standard cutoff for cutsets followed by a recommendation to the user to use another function to print them to a file instead
Right now, one cutset prints per line. Perhaps multiple small ones could print per line instead for more compact output.

Add ZDD node count to the standard print

Each ZDD represents the total tree referenced underneath. When a node prints, it reports on the status of ZDD and function stores, but it's not clear how many of the nodes actually live beneath the one in question.

Increase test coverage of intersection function

https://codecov.io/gh/jordagaman/zddr/src/main/R/zdd_intersection.R

Add method for protecting some ZDDs from deletion

Large fault tree problems are going to create memory challenges in the ZDD store. If we're not careful, performance will suffer.

What grows the store environment

Every ZDD that is touched is kept in zdd_store. For intermediate calculation, this is a good thing because it works in concert with the function results store to ensure that expensive tree operations are not performed more than once.

Unmitigated, this results in a store environment that includes the result trees (which we want to keep) and all intermediate trees that helped us get there (which we shouldn't need once a solution is returned).

What is the consequence of a large store environment

This package works by (ab)using R Environments as hash tables. This is great, but the performance degrades with the number of entries. For very large problems, carrying every solution AND intermediate result forward preserves a bloated ZDD store. This means every time you ask for a ZDD from the store (basically the most repeated operation), the result takes longer.

Obviously, in addition to the actual calc speed, there is also a concern with the total memory used as well. It's hard to imagine right now feeling the memory pain on a personal machine, but surely it'll hurt in CI/CD space.

Proposed solution

A two step solution is proposed. First, develop a recursive function to mark some ZDD nodes as protected. Second, add an option to the store reset function to preserve protected nodes.

Mark protected

Preserve protected

The default behavior of the store reset should assume the user wants to preserve everything she marked as protected.

reset_zdd_store <- function(preserve_protected = TRUE)
  For each node
    IF node is protected, NEXT
    ELSE remove

Namespace for zdd functions

zddr/R/zdd.R

Line 68 in 7248abe

zdd_store[[zdd]] <- zdd

A namespace is needed for the line above (zddr::zdd_store). This is causing an error in a package I am developing that uses zddr::zdd():

Error: Problem with `mutate()` column `zobj`.
ℹ `zobj = list(zddr::zdd(eventID))`.
✖ object 'zdd_store' not found