glenkpeterson / paguro Goto Github PK
View Code? Open in Web Editor NEWGeneric, Null-safe, Immutable Collections and Functional Transformations for the JVM
License: Other
Generic, Null-safe, Immutable Collections and Functional Transformations for the JVM
License: Other
Is there an example for best practice of inserting an element before an existing element in ImList, MutList?
Do I have to
This comment in the source file RrbTree.java:
// (MIN_NODE_LENGTH + MAX_NODE_LENGTH) / 2 should equal
// STRICT_NODE_LENGTH so that they have the same average node size
// to make the index interpolation easier.
confuses me. The values I see for these constants are:
NODE_LENGTH_POW_2 5
STRICT_NODE_LENGTH 32
HALF_STRICT_NODE_LENGTH 16
MIN_NODE_LENGTH 22
MAX_NODE_LENGTH 44
Thus (MIN_NODE_LENGTH + MAX_NODE_LENGTH) / 2
is equal to (22 + 44) / 2 = 33, 1 more than STRICT_NODE_LENGTH. Perhaps this is just an out of date comment? i.e. maybe off by one is acceptable for the implementation?
Hello,
First, thank you very much for providing theses highly performant persistent datastructures.
However, I noticed that RrbTree does not support split at index 0 or equals to the size of the tree. And I don't really understand why.
If the index is zero then the split is easy:
And if the index equals the size of the tree, then the split is easy again:
It does so for assoc(K, V)
, but not for assoc(Map.Entry<K, V>)
.
It would be great to have a way to create a map without wrapping the keys and values in a tuple, similar to Java Map.of
, as @ericnormand suggested in Clojureverse.
For example, instead of creating a nested map like this:
map(tup("booksByIsbn",
map(tup("978-1779501127",
map(tup("isbn", "978-1779501127"),
tup("title", "Watchmen"),
tup("publicationYear", 1987),
tup("authorIds", vec("alan-moore",
"dave-gibbons")))))),
tup("authorsById",
map(tup("alan-moore",
map(tup("name", "Alan Moore"),
tup("bookIsbns", vec("978-1779501127")))),
tup("dave-gibbons",
map(tup("name", "Dave Gibbons"),
tup("bookIsbns", vec("978-1779501127")))))))
we would be able to create it like that (a bit less verbose):
map("booksByIsbn",
map("978-1779501127",
map("isbn", "978-1779501127",
"title", "Watchmen",
"publicationYear", 1987,
"authorIds", vec("alan-moore",
"dave-gibbons"))),
"authorsById",
map("alan-moore",
map("name", "Alan Moore",
("bookIsbns", vec("978-1779501127"))),
"dave-gibbons",
map("name", "Dave Gibbons",
"bookIsbns", vec("978-1779501127"))))
@cprice404 says (in issue #24):
Our build system runs findbugs automatically. With (as far as I know) a pretty basic findbugs setup, there are 4 or 5 "high priority" warnings on the project, and a few dozen "medium priority" ones. Based on your unit test coverage i somewhat doubt that the things findbugs is calling out are critical bugs, so that won't prevent me from evaluating the lib or anything... But I thought you might be interested to know. The kind of stuff that findbugs picks up in general are spiritually similar to what you are doing with equality tests in your testutils lib, so you might find them interesting.
Thanks for reporting this @cprice404! One quick thought is that these collections come from Clojure and are designed to be fast. They break many rules about type-safety internally, but are tested very carefully to enforce all relevant rules externally. So without seeing the report, I'm concerned that there may be some issues I just can't fix.
On the other hand, actual bugs are not allowed and I hope to be able to fix any soon!
Currently, Paguro uses java.lang.reflect.Array.newInstance()
in a few places. It would be great if these reflective array instantiations could be removed so that applications that depend on Paguro can be compiled to native code with GraalVM. (Currently, this is only possible if all potential element types are statically known and the corresponding array types are registered with GraalVM.)
String[] array = vec("A", "B", "C").toArray(new String[0]); -> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;
String[] array = vec("A", "B", "C").toArray(new String[3]); -> This works.
UnmodCollection.toArray(T[] as):
if (as.length < size()) {
as = (T[]) new Object[size()];
}
to
if (as.length < size()) {
as = (T[]) java.lang.reflect.Array.newInstance(as.getClass().getComponentType(), size());
}
If index == size()
, then array = null
, and previous()
will throw NPE.
Also, listIterator()
doesn't do any bounds checks (unlike UnmodList.listIterator()
).
Thoughts:
I think ImListTrans
should override concat
, refining the return type to ImListTrans
.
PS: Thanks for transient collections. Big improvement for my use case.
... when iterating over the resulting tree.
I suspect (but haven't verified) that the culprit is RrbTree:1418 and RrbTree:1377, which choose items[0].getClass()
as the array element type to be passed to Array.newInstance()
. This only seems safe if the array is known to be homogeneous.
It would be great if Paguro could do without Array.newInstance()
, as reflective array instantiation makes Paguro difficult to impossible to compile to native code with GraalVM.
Paguro 3.1.0, Oracle JDK 1.8.0_191.
Reproducer:
import org.junit.jupiter.api.Test;
import org.organicdesign.fp.StaticImports;
import org.organicdesign.fp.collections.RrbTree;
public class RrbTreeBug {
@Test
public void test() {
RrbTree<Object> rrb =
// first tree needs to contain at least 32 elements for the bug to surface
StaticImports.<Object>rrb(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2)
.concat(StaticImports.<Object>rrb("a"));
// iterating throws ArrayStoreException
for (Object elem : rrb) {
System.out.println(elem);
}
}
}
Stack trace:
java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at org.organicdesign.fp.collections.Cowry.spliceIntoArrayAt(Cowry.java:126)
at org.organicdesign.fp.collections.RrbTree$Leaf.pushFocus(RrbTree.java:1417)
at org.organicdesign.fp.collections.RrbTree$ImRrbt.pushFocus(RrbTree.java:693)
at org.organicdesign.fp.collections.RrbTree$ImRrbt.iterator(RrbTree.java:686)
at RrbTreeBug.test(RrbTreeBug.java:13)
Hi there,
I was wondering what would be the most concise and efficient way to create a PersistentVector from a single element?
Is it
PersistentVector.ofIter(Collections.singletonList(elem));
or
PersistentVector.empty().append(elem)
or is there some simpler way?
I was looking for a factory method with a varargs signature, so that something like
PersistentVector.of(elem1, elem2, elem3)
becomes possible.
I apologize if opening an issue is not the right way to ask for support.
Best regards,
Sebastian
Hello there,
I have not been able to determine whether PersistentHashMap is thread-safe. The Javadoc and usage examples don't seem to say. More specifically, say that I have a PersistentHashMap<K, V> map
and concurrently execute the following code on it with different key-value combinations, would that be OK or do I need to synchronize?
V oldValue = map.get(key);
V newValue = (oldValue == null) ? value : remappingFunction.apply(oldValue, value);
if (newValue == null) {
return map.without(key);
} else {
return map.assoc(key, newValue);
}
The reference to map never changes, and the map itself is never modified, but can I concurrently execute multiple calls to without
and assoc
and get consistent results?
Moin,
when migrating to immutable data structures i too often face the challenge to have composite keys or values that are to be combined just for the current map usage.
Paying the price of extra tuple objects hurt. For my own RB-Tree I generated extra variants.
I don't consider this a good solution for a library.
What do you think about an AbtractImmutableHash/TreeMap with customizable Entry-Classes and Comparator?
That weakens the immutable contract but with having custom values and comparators already IMHO
thats OK.
Cal
Sorting a MutRrbt
"inplace" is a very common process in my project.
What is the "best practice" for sorting a MutRrbt
data structure (before making it immutable()
)?
Should something simple like the following snippet directly be included in MutRrbt
or can this be done in a more "clever" way?
public static MutRrbt<Object> sort(MutRrbt<Object> rrbTree, int fromIndex, int toIndex,
Comparator<Object> comparator) {
int size = rrbTree.size();
Object[] a = new Object[size];
rrbTree.toArray(a);
Arrays.sort(a, fromIndex, toIndex, comparator);
return StaticImports.mutableRrb(a);
}
I'm talking about the wrapper classes, empty unmodifiable constructors and wrapper methods in FunctionalUtils. They seemed like a step in the right direction before incorporating the persistent collections from Clojure.
A year or two after adding the Clojure collections, I have removed all usages from client code. Instead I always use the data definition shortcuts in StaticImports like vec(...)
, set(...)
, etc., or xform(oldCollection)
.
People like that Paguro is small (less than 250K). I don't want to keep code that no-one is using when there are better ways of doing those things. Without that code, Paguro currently goes from 229,935 to 218,672 bytes which is 11K or about 5% difference. What do you think about eliminating those for Paguro 3.0?
is: Can't drop less than one item.
sb: Can't drop less than zero items.
I'm using Paguro in a project that deals with commutative and non-commutative operations, and I use PersistentHashSet
for commutative ones and a PersistentVector
for non-commutative ones, to distinguish between ordered and unordered elements. Elements are added to these data structures via the concat(Iterable)
method. I noticed that for a PersistentVector
this results (as expected) in a new PersistentVector
object, but the PersistentHashSet
I receive an intermediate transformation operator (Xform$AppendIterDesc
) that (lazily, I assume) represents the concatenation. Notably, the result is not a PersistentHashSet
.
Is this intentionally designed like this? If so, it might be helpful to include the rationale for this divergent behavior somewhere in the documentation. I'm using concat
because it seems to be the only API method that allows me to append to either data structure without knowing what the underlying implementation is. Is there some other method that would allow for that while returning an object of the same type as the original data structure?
New failing test cases given in this PR: #35
I don't have a more full analysis or proposed fix yet, but may spend some time looking for one.
Would be great to have maps and sets that preserve insertion order, a la https://github.com/frankiesardo/linked or https://github.com/amalloy/ordered.
My main motivation for maintaining insertion order is to have/provide a more deterministic user experience (cf. ES6 maps).
Guava's ImmutableSetMultimap is a convenient wrapper around ImmutableMap<K, ImmutableSet<V>>
. It would be awesome to have a similar convenience in Paguro!
Hello,
I'm a former Clojure developer who is now back in Java land. This library sounds really interesting and seems like it would fill in some gaps that I'm sorely missing based on my time in Clojure. I am going to play around with it but need to build from source; I don't see a tag in the repo that corresponds with the 3.0.16
maven artifact. Am I correct in assuming that this is the appropriate commit?:
Also - it seems like you're gaining a lot of momentum in the direction of Kotlin. Kotlin sounds fantastic but I doubt I'll be able to adopt it at work any time in the near future. Do you expect that the library will continue to be seamlessly consumable for Java 8 as your Kotlin changes continue to come in?
Thanks for writing a really interesting library!
The reproduction of the sequence I have below in Clojure, since it is adapting some Clojure test for another library, core.rrb-vector, that I found this behavior. Let me know if you have any questions or would prefer it written in Java.
user=> (def empty-vector (org.organicdesign.fp.collections.RrbTree/empty))
#'user/empty-vector
user=> (class empty-vector)
org.organicdesign.fp.collections.RrbTree$ImRrbt
(defn append-elems-to-vec [v s]
(reduce (fn [v elem]
(.append v elem))
v
s))
#'user/append-elems-to-vec
user=> (def v1 (append-elems-to-vec empty-vector (range 44)))
#'user/v1
user=> (class v1)
org.organicdesign.fp.collections.RrbTree$ImRrbt
user=> v1
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43)
user=> (def v2 (append-elems-to-vec empty-vector (range 45)))
#'user/v2
user=> (def v3 (.join v1 v2))
#'user/v3
user=> (class v3)
org.organicdesign.fp.collections.RrbTree$ImRrbt
user=> v3
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44)
user=> (def v4 (append-elems-to-vec v3 '(-1)))
#'user/v4
user=> v4
(-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44)
user=> (.get v4 0)
-1
It appears that the element -1 is put at the beginning of the vector v4, whereas it should have been the last element.
This does not happen if I make v1 or v2 shorter, but it does also happen for some longer lengths of v1 and v2 as well.
I am using Paguro version 3.1.2
Moin,
while implementing a primary index with ImMap i fell fack into mutable habits:
if(contains) then error else add
in my case the bias is on keys being unique and so i can just add the new key and see if the map
instance changes.
That assumes that the following holds true:
map.assoc(existingKey, v) == map
If that holds true i wonder if assocIfAbsent is useful to add.
May take a value or a supplier.
Opinions?
Cal
Once any of the map types are created, the only way to inspect the keys or values ends up going through the iterator. This causes Tuple2 creation for each map entry only to possibly extract the key or value from that tuple immediately. Ideally, once a map is created, there would be a way to loop through the whole thing with minimal allocation.
The Clojure collections have previously added a keyIterator and valIterator to deal with this common occurrence. I've attached a patch with a similar implementation.
key_val_iterators.patch.txt
According to its type signature, RrbTree#ImRrbt.split()
returns Tuple2<ImRrbt<E>,ImRrbt<E>>
. But after updating to 3.7.1, it actually returns Tuple2<MutRrbt<E>,ImRrbt<E>>
(at least in some cases), resulting in the following ClassCastException:
class org.organicdesign.fp.collections.RrbTree$MutRrbt cannot be cast to class org.organicdesign.fp.collections.RrbTree$ImRrbt (org.organicdesign.fp.collections.RrbTree$MutRrbt and org.organicdesign.fp.collections.RrbTree$ImRrbt are in unnamed module of loader 'app')
Same with 3.7.0.
In the wiki you explain the difference between the two implementation comparing O(log2 n) and O(log32 n). I'm afraid, you don't understand the idea behind the Big O notation. By definition, it ignores any constant multiplier, so that O(k*f(n)) = O(f(n)) for any k <> 0. See "Multiplication by a constant" in the Wikipedia article. From elementary math, log(2, n) = 5 * log(32, n) :
log(32, n)= log(2^5, n) = log(2, n) / log(2, 32) = log(2, n) / 5
It means that
O(log(2, n)) = O(log(32, n))
the same way as
O(3 * n) = O(n)
This means that both UncleJim and PCollections belong to the same big O class, which mathematicians denote simply as O(log(n)).
Of course the constant factor is important for real-world use but it depends on all the implementation details, not just one array length constant. Also, the big O notation describes the asymptotic behaviour of an algorithm when n tends towards infinity and might not reflect its behaviour with smaller values.
What's the preferred way to create a PersistentVector with multiple elements? Would you recommend to first create a list or array, then use concat
to benefit from the mutable vector optimization? Would it make sense to have a builder for this? A builder could perhaps perform additional optimizations, such as using a different representation for vectors with a small number of elements.
StaticImports:
public static <T> Transformable<T> xform(Iterable iterable);
public static <T> Transformable<T> xformArray(T... items);
To
public static <T> Xform<T> xform(Iterable iterable);
public static <T> Xform<T> xformArray(T... items);
I would like to use this library in Java 6.5 on Android devices, but it's only ready for Java 8 usage.
The replacement consists on swapping the java.lang.func imports for rx.functions ones, moving the interfaces into abstract classes, and refactoring lambdas and method references into anonymous classes.
For the Either types I already have my own implementation based off Scala's you can leverage: https://github.com/pakoito/JavaSealedUnions and https://github.com/pakoito/RxSealedUnions
For Tuples I've been using JavaTuples: https://github.com/pakoito/RxTuples
In case this proves too much work, would you be open to accept a PR, branch it on the same repository, or have a new separate backport project?
is: if ( (index < 0) || (index >= size()) )
sb: is: if ( (index < 0) || (index > size()) )
This Might be another leftover from before the Clojure collections. The theory was to add Unmod... versions of the major java.util collections that only deprecate the mutate-in-place methods. Then separately add the Im... (Immutable) interfaces that add back change-methods that return a new changed collection. The thought was that people might want to create a different inheritance hierarchy that still fits into the Java hierarchy without adding anything.
A few years later, I'm less enamored with this idea, basically because I haven't used the Unmod interfaces directly, preferring instead to use the Im interfaces. Also, the PersistentVector can go from mutable to immutable. I think that's a much more useful distinction to represent in a hierarchy. In the interest of still keeping the hierarchy simple, I'd like to put all the change methods from the Im interfaces into the Unmod interfaces. Then make Mutable-List/Set/Map extend Unmod-List/Set/Map instead of Im-List/Set/Map.
I know, there's a RangeOfInt. I plan to add an UnmodSizedIterable and have RangeOfInt extend UnmodSizedIterable. I've only ever used RangeOfInt once...
Is there a simpler way to collect a stream into a Paguro data structure, than to collect in a Java collection and to convert to Paguro?
For the moment, here is how I do it:
var vec1 = PersistentVector.ofIter(List.of(10, 2, 3));
var vec2 = PersistentVector.ofIter(vec1.stream().sorted().map(x -> x + 1).collect(Collectors.toList()));
Instead of converting UncleJim to Java 7 or Java 6, we could convert the whole thing to Kotlin and compile it to Java 6 from there.
"Kotlin generates bytecode which is compatible with Java 6 or newer. This ensures that Kotlin can be used in environments such as Android, where Java 6 is the latest supported version."
I started my new project at work this week using Kotlin. I've only used it for 2 days, but so far it looks amazing! They have shaved off all the backwards-compatibility warts of Java (like primitives and arrays) and added a few of the safer parts of Scala. About the only syntax I really even had to look up was the lambda syntax which is different from either Java or Scala, but certainly no worse.
One alternative for Android developers is a back-port to Java 6 or 7, but that's going to be pretty difficult due to the default methods in interfaces. What if we convert the whole thing to Kotlin and compile it to Java 6 from there? Or maybe Android developers would rather use Kotlin?
Does Kotlin require a runtime library on Android? If so, how big is it? Other considerations? If you work in Android and/or Kotlin or have any relevant experience, your feedback would be appreciated.
First of all, awesome library! I played with it for a while and really like it. We currently have a library just like yours also inspired by Clojure, PCollections, Scala and of course ADHD Spiewak. We would like to freeze that library and take another route because of our upgrade to Java8.
Maybe be a strange question but are there any plans to make the collections Serializable?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.