glenkpeterson / paguro Goto Github PK

Generic, Null-safe, Immutable Collections and Functional Transformations for the JVM

License: Other

Java 97.86% HTML 2.14%

immutable-collections java clojure kotlin jvm transformations stream rrb-tree functional-transformations persistentvector

paguro's People

Contributors

Stargazers

Watchers

paguro's Issues

ImList, MutList - example for best practice of inserting an element before an existing element

Is there an example for best practice of inserting an element before an existing element in ImList, MutList?

Do I have to

copy all elements below the "insert point" into a new mutList
insert the new element at "insert point" in the mutList
copy all elements greater equal the "insert point" into the mutList
call mutList.immutable() at the end

Confusing comment in RrbTree implementation

This comment in the source file RrbTree.java:

    // (MIN_NODE_LENGTH + MAX_NODE_LENGTH) / 2 should equal
    // STRICT_NODE_LENGTH so that they have the same average node size
    // to make the index interpolation easier.

confuses me. The values I see for these constants are:

 NODE_LENGTH_POW_2 5 
 STRICT_NODE_LENGTH 32 
 HALF_STRICT_NODE_LENGTH 16 
 MIN_NODE_LENGTH 22 
 MAX_NODE_LENGTH 44

Thus (MIN_NODE_LENGTH + MAX_NODE_LENGTH) / 2 is equal to (22 + 44) / 2 = 33, 1 more than STRICT_NODE_LENGTH. Perhaps this is just an out of date comment? i.e. maybe off by one is acceptable for the implementation?

Support split(0) and split(size)

Hello,

First, thank you very much for providing theses highly performant persistent datastructures.

However, I noticed that RrbTree does not support split at index 0 or equals to the size of the tree. And I don't really understand why.

If the index is zero then the split is easy:

left part: empty tree
right part: original tree

And if the index equals the size of the tree, then the split is easy again:

left part: original tree
right part: empty tree

ImUnsortedMap should refine return type of ImMap.assoc(Map.Entry)

It does so for assoc(K, V), but not for assoc(Map.Entry<K, V>).

Make map creation less verbose

It would be great to have a way to create a map without wrapping the keys and values in a tuple, similar to Java Map.of, as @ericnormand suggested in Clojureverse.

For example, instead of creating a nested map like this:

map(tup("booksByIsbn",
        map(tup("978-1779501127",
                map(tup("isbn", "978-1779501127"),
                    tup("title", "Watchmen"),
                    tup("publicationYear", 1987),
                    tup("authorIds", vec("alan-moore",
                                         "dave-gibbons")))))),
    tup("authorsById",
        map(tup("alan-moore",
                map(tup("name", "Alan Moore"),
                    tup("bookIsbns", vec("978-1779501127")))),
            tup("dave-gibbons",
                map(tup("name", "Dave Gibbons"),
                    tup("bookIsbns", vec("978-1779501127")))))))

we would be able to create it like that (a bit less verbose):

map("booksByIsbn",
    map("978-1779501127",
        map("isbn", "978-1779501127",
            "title", "Watchmen",
            "publicationYear", 1987,
            "authorIds", vec("alan-moore",
                             "dave-gibbons"))),
    "authorsById",
    map("alan-moore",
        map("name", "Alan Moore",
            ("bookIsbns", vec("978-1779501127"))),
        "dave-gibbons",
        map("name", "Dave Gibbons",
            "bookIsbns", vec("978-1779501127"))))

Findbugs reports lots of issues

@cprice404 says (in issue #24):

Our build system runs findbugs automatically. With (as far as I know) a pretty basic findbugs setup, there are 4 or 5 "high priority" warnings on the project, and a few dozen "medium priority" ones. Based on your unit test coverage i somewhat doubt that the things findbugs is calling out are critical bugs, so that won't prevent me from evaluating the lib or anything... But I thought you might be interested to know. The kind of stuff that findbugs picks up in general are spiritually similar to what you are doing with equality tests in your testutils lib, so you might find them interesting.

Thanks for reporting this @cprice404! One quick thought is that these collections come from Clojure and are designed to be fast. They break many rules about type-safety internally, but are tested very carefully to enforce all relevant rules externally. So without seeing the report, I'm concerned that there may be some issues I just can't fix.

On the other hand, actual bugs are not allowed and I hope to be able to fix any soon!

Remove usages of Array.newInstance() to enable native compilation

Currently, Paguro uses java.lang.reflect.Array.newInstance() in a few places. It would be great if these reflective array instantiations could be removed so that applications that depend on Paguro can be compiled to native code with GraalVM. (Currently, this is only possible if all potential element types are statically known and the corresponding array types are registered with GraalVM.)

toArray(T[]) - ClassCastException

String[] array = vec("A", "B", "C").toArray(new String[0]); -> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;

String[] array = vec("A", "B", "C").toArray(new String[3]); -> This works.

UnmodCollection.toArray(T[] as):

if (as.length < size()) {
    as = (T[]) new Object[size()];
}

if (as.length < size()) {
    as = (T[]) java.lang.reflect.Array.newInstance(as.getClass().getComponentType(), size());
}

PersistentVector can't handle reverse iteration starting from last element

If index == size(), then array = null, and previous() will throw NPE.

Also, listIterator() doesn't do any bounds checks (unlike UnmodList.listIterator()).

Names conflict with Kotlin Standard Library - what to do?

Thoughts:

Rename List, Map, and Set to something like Vector, Dictionary, and Bag. Rename corresponding interfaces Vec, Dict, Bag and StaticImports functions mutableVec, mutableDict, mutableBag.
Provide a separate version of Paguro for Kotlin. Maybe all unit tests would be written in Kotlin?
Provide a pure Kotlin version with the above changes.

ImListTrans.concat()

I think ImListTrans should override concat, refining the return type to ImListTrans.

PS: Thanks for transient collections. Big improvement for my use case.

Concatenating RrbTree's with different element types gives ArrayStoreException

... when iterating over the resulting tree.

I suspect (but haven't verified) that the culprit is RrbTree:1418 and RrbTree:1377, which choose items[0].getClass() as the array element type to be passed to Array.newInstance(). This only seems safe if the array is known to be homogeneous.

It would be great if Paguro could do without Array.newInstance(), as reflective array instantiation makes Paguro difficult to impossible to compile to native code with GraalVM.

Paguro 3.1.0, Oracle JDK 1.8.0_191.

Reproducer:

import org.junit.jupiter.api.Test;
import org.organicdesign.fp.StaticImports;
import org.organicdesign.fp.collections.RrbTree;

public class RrbTreeBug {
  @Test
  public void test() {
    RrbTree<Object> rrb =
        // first tree needs to contain at least 32 elements for the bug to surface
        StaticImports.<Object>rrb(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2)
            .concat(StaticImports.<Object>rrb("a"));

    // iterating throws ArrayStoreException
    for (Object elem : rrb) {
      System.out.println(elem);
    }
  }
}

Stack trace:

java.lang.ArrayStoreException
	at java.lang.System.arraycopy(Native Method)
	at org.organicdesign.fp.collections.Cowry.spliceIntoArrayAt(Cowry.java:126)
	at org.organicdesign.fp.collections.RrbTree$Leaf.pushFocus(RrbTree.java:1417)
	at org.organicdesign.fp.collections.RrbTree$ImRrbt.pushFocus(RrbTree.java:693)
	at org.organicdesign.fp.collections.RrbTree$ImRrbt.iterator(RrbTree.java:686)
	at RrbTreeBug.test(RrbTreeBug.java:13)

Support Request: Creating a PersistentVector

Hi there,

I was wondering what would be the most concise and efficient way to create a PersistentVector from a single element?
Is it
PersistentVector.ofIter(Collections.singletonList(elem));
or
PersistentVector.empty().append(elem)
or is there some simpler way?

I was looking for a factory method with a varargs signature, so that something like
PersistentVector.of(elem1, elem2, elem3)
becomes possible.

I apologize if opening an issue is not the right way to ask for support.
Best regards,
Sebastian

Support Request: Threadsafety of PersistentHashMap

Hello there,
I have not been able to determine whether PersistentHashMap is thread-safe. The Javadoc and usage examples don't seem to say. More specifically, say that I have a PersistentHashMap<K, V> map and concurrently execute the following code on it with different key-value combinations, would that be OK or do I need to synchronize?

V oldValue = map.get(key);
V newValue = (oldValue == null) ? value : remappingFunction.apply(oldValue, value);
if (newValue == null) {
      return map.without(key);
} else {
      return map.assoc(key, newValue);
}

The reference to map never changes, and the map itself is never modified, but can I concurrently execute multiple calls to without and assoc and get consistent results?

Memory optimizations for tupled keys/values

Moin,

when migrating to immutable data structures i too often face the challenge to have composite keys or values that are to be combined just for the current map usage.
Paying the price of extra tuple objects hurt. For my own RB-Tree I generated extra variants.
I don't consider this a good solution for a library.

What do you think about an AbtractImmutableHash/TreeMap with customizable Entry-Classes and Comparator?
That weakens the immutable contract but with having custom values and comparators already IMHO
thats OK.

Cal

Best practice for "inplace" sort() method on MutRrbt

Sorting a MutRrbt "inplace" is a very common process in my project.
What is the "best practice" for sorting a MutRrbt data structure (before making it immutable())?

Should something simple like the following snippet directly be included in MutRrbt or can this be done in a more "clever" way?

  public static MutRrbt<Object> sort(MutRrbt<Object> rrbTree, int fromIndex, int toIndex,
      Comparator<Object> comparator) {
    int size = rrbTree.size();
    Object[] a = new Object[size];
    rrbTree.toArray(a);
    Arrays.sort(a, fromIndex, toIndex, comparator);
    return StaticImports.mutableRrb(a);
  }

MutVector<F>#replace() problem

MutVector<F>#replace()does not replace the value at index 1 in this case:

Also the question is the replace() method the right way to replace a value in a vector?

Eliminate unmodMap/Set/List/SortedMap/SortedSet methods and Unmod classes in FunctionUtils?

I'm talking about the wrapper classes, empty unmodifiable constructors and wrapper methods in FunctionalUtils. They seemed like a step in the right direction before incorporating the persistent collections from Clojure.

A year or two after adding the Clojure collections, I have removed all usages from client code. Instead I always use the data definition shortcuts in StaticImports like vec(...), set(...), etc., or xform(oldCollection).

People like that Paguro is small (less than 250K). I don't want to keep code that no-one is using when there are better ways of doing those things. Without that code, Paguro currently goes from 229,935 to 218,672 bytes which is 11K or about 5% difference. What do you think about eliminating those for Paguro 3.0?

Collector implementations of the Paguro collections

As suggested by @viebel in #43. Make Paguro easier to use with Streams, even if it's actually easier to use without or instead of Streams.

Incorrect error message for Xform.drop()

is: Can't drop less than one item.
sb: Can't drop less than zero items.

Lazy/non-lazy concat() in PersistentHashSet versus PersistentVector

I'm using Paguro in a project that deals with commutative and non-commutative operations, and I use PersistentHashSet for commutative ones and a PersistentVector for non-commutative ones, to distinguish between ordered and unordered elements. Elements are added to these data structures via the concat(Iterable) method. I noticed that for a PersistentVector this results (as expected) in a new PersistentVector object, but the PersistentHashSet I receive an intermediate transformation operator (Xform$AppendIterDesc) that (lazily, I assume) represents the concatenation. Notably, the result is not a PersistentHashSet.

Is this intentionally designed like this? If so, it might be helpful to include the rationale for this divergent behavior somewhere in the documentation. I'm using concat because it seems to be the only API method that allows me to append to either data structure without knowing what the underlying implementation is. Is there some other method that would allow for that while returning an object of the same type as the original data structure?

get fails for some indexes after joining two RrbTree vectors of certain sizes

New failing test cases given in this PR: #35

I don't have a more full analysis or proposed fix yet, but may spend some time looking for one.

Support maps and sets that preserve insertion order

Would be great to have maps and sets that preserve insertion order, a la https://github.com/frankiesardo/linked or https://github.com/amalloy/ordered.

My main motivation for maintaining insertion order is to have/provide a more deterministic user experience (cf. ES6 maps).

Separate persistent collections from the other features of Paguro

I think this will facilitate more use of the collections in other projects. Requested by @nornagon and the kategory folks.

PersistentSetMultimap?

Guava's ImmutableSetMultimap is a convenient wrapper around ImmutableMap<K, ImmutableSet<V>>. It would be awesome to have a similar convenience in Paguro!

tag for 3.0.16?

Hello,

I'm a former Clojure developer who is now back in Java land. This library sounds really interesting and seems like it would fill in some gaps that I'm sorely missing based on my time in Clojure. I am going to play around with it but need to build from source; I don't see a tag in the repo that corresponds with the 3.0.16 maven artifact. Am I correct in assuming that this is the appropriate commit?:

61f61c7

Also - it seems like you're gaining a lot of momentum in the direction of Kotlin. Kotlin sounds fantastic but I doubt I'll be able to adopt it at work any time in the near future. Do you expect that the library will continue to be seamlessly consumable for Java 8 as your Kotlin changes continue to come in?

Thanks for writing a really interesting library!

Possible bug in immutable RRB tree implementation

The reproduction of the sequence I have below in Clojure, since it is adapting some Clojure test for another library, core.rrb-vector, that I found this behavior. Let me know if you have any questions or would prefer it written in Java.

user=> (def empty-vector (org.organicdesign.fp.collections.RrbTree/empty))
#'user/empty-vector
user=> (class empty-vector)
org.organicdesign.fp.collections.RrbTree$ImRrbt
(defn append-elems-to-vec [v s]
  (reduce (fn [v elem]
            (.append v elem))
          v
          s))
#'user/append-elems-to-vec
user=> (def v1 (append-elems-to-vec empty-vector (range 44)))
#'user/v1
user=> (class v1)
org.organicdesign.fp.collections.RrbTree$ImRrbt
user=> v1
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43)
user=> (def v2 (append-elems-to-vec empty-vector (range 45)))
#'user/v2
user=> (def v3 (.join v1 v2))
#'user/v3
user=> (class v3)
org.organicdesign.fp.collections.RrbTree$ImRrbt
user=> v3
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44)
user=> (def v4 (append-elems-to-vec v3 '(-1)))
#'user/v4
user=> v4
(-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44)
user=> (.get v4 0)
-1

It appears that the element -1 is put at the beginning of the vector v4, whereas it should have been the last element.

This does not happen if I make v1 or v2 shorter, but it does also happen for some longer lengths of v1 and v2 as well.

I am using Paguro version 3.1.2

if map contains key then error else add key/value to map

Moin,

while implementing a primary index with ImMap i fell fack into mutable habits:

if(contains) then error else add

in my case the bias is on keys being unique and so i can just add the new key and see if the map
instance changes.

That assumes that the following holds true:

map.assoc(existingKey, v) == map

If that holds true i wonder if assocIfAbsent is useful to add.
May take a value or a supplier.

Opinions?

Cal

No way to iterate over maps without allocation

Once any of the map types are created, the only way to inspect the keys or values ends up going through the iterator. This causes Tuple2 creation for each map entry only to possibly extract the key or value from that tuple immediately. Ideally, once a map is created, there would be a way to loop through the whole thing with minimal allocation.

The Clojure collections have previously added a keyIterator and valIterator to deal with this common occurrence. I've attached a patch with a similar implementation.
key_val_iterators.patch.txt

ClassCastException after updating from 3.6.0 to 3.7.1

According to its type signature, RrbTree#ImRrbt.split() returns Tuple2<ImRrbt<E>,ImRrbt<E>>. But after updating to 3.7.1, it actually returns Tuple2<MutRrbt<E>,ImRrbt<E>> (at least in some cases), resulting in the following ClassCastException:

class org.organicdesign.fp.collections.RrbTree$MutRrbt cannot be cast to class org.organicdesign.fp.collections.RrbTree$ImRrbt (org.organicdesign.fp.collections.RrbTree$MutRrbt and org.organicdesign.fp.collections.RrbTree$ImRrbt are in unnamed module of loader 'app')

Same with 3.7.0.

More Benchmarks vs. pCollections (was: incorrect comparison with pCollections)

In the wiki you explain the difference between the two implementation comparing O(log2 n) and O(log32 n). I'm afraid, you don't understand the idea behind the Big O notation. By definition, it ignores any constant multiplier, so that O(k*f(n)) = O(f(n)) for any k <> 0. See "Multiplication by a constant" in the Wikipedia article. From elementary math, log(2, n) = 5 * log(32, n) :

log(32, n)= log(2^5, n) = log(2, n) / log(2, 32) = log(2, n) / 5

It means that
O(log(2, n)) = O(log(32, n))
the same way as
O(3 * n) = O(n)

This means that both UncleJim and PCollections belong to the same big O class, which mathematicians denote simply as O(log(n)).

Of course the constant factor is important for real-world use but it depends on all the implementation details, not just one array length constant. Also, the big O notation describes the asymptotic behaviour of an algorithm when n tends towards infinity and might not reflect its behaviour with smaller values.

RRB-Tree (was: Best way to create PersistentVector with multiple elements?)

What's the preferred way to create a PersistentVector with multiple elements? Would you recommend to first create a list or array, then use concat to benefit from the mutable vector optimization? Would it make sense to have a builder for this? A builder could perhaps perform additional optimizations, such as using a different representation for vectors with a small number of elements.

Return of StaticImports.xform

StaticImports:

public static <T> Transformable<T> xform(Iterable iterable);
public static <T> Transformable<T> xformArray(T... items);

public static <T> Xform<T> xform(Iterable iterable);
public static <T> Xform<T> xformArray(T... items);

Backport to Java 7 (used to include, "through RxJava")

I would like to use this library in Java 6.5 on Android devices, but it's only ready for Java 8 usage.

The replacement consists on swapping the java.lang.func imports for rx.functions ones, moving the interfaces into abstract classes, and refactoring lambdas and method references into anonymous classes.

For the Either types I already have my own implementation based off Scala's you can leverage: https://github.com/pakoito/JavaSealedUnions and https://github.com/pakoito/RxSealedUnions

For Tuples I've been using JavaTuples: https://github.com/pakoito/RxTuples

In case this proves too much work, would you be open to accept a PR, branch it on the same repository, or have a new separate backport project?

Wrong bounds check in UnmodList.listIterator

is: if ( (index < 0) || (index >= size()) )
sb: is: if ( (index < 0) || (index > size()) )

Are you using Unmod interfaces or just the Im (Immutable) ones?

This Might be another leftover from before the Clojure collections. The theory was to add Unmod... versions of the major java.util collections that only deprecate the mutate-in-place methods. Then separately add the Im... (Immutable) interfaces that add back change-methods that return a new changed collection. The thought was that people might want to create a different inheritance hierarchy that still fits into the Java hierarchy without adding anything.

A few years later, I'm less enamored with this idea, basically because I haven't used the Unmod interfaces directly, preferring instead to use the Im interfaces. Also, the PersistentVector can go from mutable to immutable. I think that's a much more useful distinction to represent in a hierarchy. In the interest of still keeping the hierarchy simple, I'd like to put all the change methods from the Im interfaces into the Unmod interfaces. Then make Mutable-List/Set/Map extend Unmod-List/Set/Map instead of Im-List/Set/Map.

I know, there's a RangeOfInt. I plan to add an UnmodSizedIterable and have RangeOfInt extend UnmodSizedIterable. I've only ever used RangeOfInt once...

How to use Paguro with Java stream?

Is there a simpler way to collect a stream into a Paguro data structure, than to collect in a Java collection and to convert to Paguro?

For the moment, here is how I do it:

var vec1 = PersistentVector.ofIter(List.of(10, 2, 3));
var vec2 = PersistentVector.ofIter(vec1.stream().sorted().map(x -> x + 1).collect(Collectors.toList()));

Convert to Kotlin? Feedback from Android and Kotlin devs appreciated...

Instead of converting UncleJim to Java 7 or Java 6, we could convert the whole thing to Kotlin and compile it to Java 6 from there.

"Kotlin generates bytecode which is compatible with Java 6 or newer. This ensures that Kotlin can be used in environments such as Android, where Java 6 is the latest supported version."

I started my new project at work this week using Kotlin. I've only used it for 2 days, but so far it looks amazing! They have shaved off all the backwards-compatibility warts of Java (like primitives and arrays) and added a few of the safer parts of Scala. About the only syntax I really even had to look up was the lambda syntax which is different from either Java or Scala, but certainly no worse.

One alternative for Android developers is a back-port to Java 6 or 7, but that's going to be pretty difficult due to the default methods in interfaces. What if we convert the whole thing to Kotlin and compile it to Java 6 from there? Or maybe Android developers would rather use Kotlin?

Does Kotlin require a runtime library on Android? If so, how big is it? Other considerations? If you work in Android and/or Kotlin or have any relevant experience, your feedback would be appreciated.

Make Collections Implement Serializable

First of all, awesome library! I played with it for a while and really like it. We currently have a library just like yours also inspired by Clojure, PCollections, Scala and of course ADHD Spiewak. We would like to freeze that library and take another route because of our upgrade to Java8.

Maybe be a strange question but are there any plans to make the collections Serializable?

glenkpeterson / paguro Goto Github PK

paguro's People

Contributors

Stargazers

Watchers

Forkers

paguro's Issues

Recommend Projects

Recommend Topics

Recommend Org