amaembo / streamex Goto Github PK

View Code? Open in Web Editor NEW

2.2K 79.0 248.0 4.92 MB

Enhancing Java Stream API

License: Apache License 2.0

Java 100.00%

java java8 collections streams-api

streamex's Introduction

StreamEx 0.8.2

Enhancing Java Stream API.

This library defines four classes: StreamEx, IntStreamEx, LongStreamEx, DoubleStreamEx which are fully compatible with Java 8 stream classes and provide many additional useful methods. Also EntryStream class is provided which represents the stream of map entries and provides additional functionality for this case. Finally, there are some new useful collectors defined in MoreCollectors class as well as primitive collectors concept.

Full API documentation is available here.

Take a look at the Cheatsheet for brief introduction to the StreamEx!

Before updating StreamEx check the migration notes and full list of changes.

StreamEx library main points are following:

Shorter and convenient ways to do the common tasks.
Better interoperability with older code.
100% compatibility with original JDK streams.
Friendliness for parallel processing: any new feature takes the advantage on parallel streams as much as possible.
Performance and minimal overhead. If StreamEx allows to solve the task using less code compared to standard Stream, it should not be significantly slower than the standard way (and sometimes it's even faster).

Examples

Collector shortcut methods (toList, toSet, groupingBy, joining, etc.)

List<String> userNames = StreamEx.of(users).map(User::getName).toList();
Map<Role, List<User>> role2users = StreamEx.of(users).groupingBy(User::getRole);
StreamEx.of(1,2,3).joining("; "); // "1; 2; 3"

Selecting stream elements of specific type

public List<Element> elementsOf(NodeList nodeList) {
    return IntStreamEx.range(nodeList.getLength())
      .mapToObj(nodeList::item).select(Element.class).toList();
}

Adding elements to stream

public List<String> getDropDownOptions() {
    return StreamEx.of(users).map(User::getName).prepend("(none)").toList();
}

public int[] addValue(int[] arr, int value) {
    return IntStreamEx.of(arr).append(value).toArray();
}

Removing unwanted elements and using the stream as Iterable:

public void copyNonEmptyLines(Reader reader, Writer writer) throws IOException {
    for(String line : StreamEx.ofLines(reader).remove(String::isEmpty)) {
        writer.write(line);
        writer.write(System.lineSeparator());
    }
}

Selecting map keys by value predicate:

Map<String, Role> nameToRole;

public Set<String> getEnabledRoleNames() {
    return StreamEx.ofKeys(nameToRole, Role::isEnabled).toSet();
}

Operating on key-value pairs:

public Map<String, List<String>> invert(Map<String, List<String>> map) {
    return EntryStream.of(map).flatMapValues(List::stream).invert().grouping();
}

public Map<String, String> stringMap(Map<Object, Object> map) {
    return EntryStream.of(map).mapKeys(String::valueOf)
        .mapValues(String::valueOf).toMap();
}

Map<String, Group> nameToGroup;

public Map<String, List<User>> getGroupMembers(Collection<String> groupNames) {
    return StreamEx.of(groupNames).mapToEntry(nameToGroup::get)
        .nonNullValues().mapValues(Group::getMembers).toMap();
}

Pairwise differences:

DoubleStreamEx.of(input).pairMap((a, b) -> b-a).toArray();

Support of byte/char/short/float types:

short[] multiply(short[] src, short multiplier) {
    return IntStreamEx.of(src).map(x -> x*multiplier).toShortArray(); 
}

Define custom lazy intermediate operation recursively:

static <T> StreamEx<T> scanLeft(StreamEx<T> input, BinaryOperator<T> operator) {
        return input.headTail((head, tail) -> scanLeft(tail.mapFirst(cur -> operator.apply(head, cur)), operator)
                .prepend(head));
}

And more!

License

This project is licensed under Apache License, version 2.0

Installation

Releases are available in Maven Central

Before updating StreamEx check the migration notes and full list of changes.

Maven

Add this snippet to the pom.xml dependencies section:

<dependency>
  <groupId>one.util</groupId>
  <artifactId>streamex</artifactId>
  <version>0.8.2</version>
</dependency>

Gradle

Add this snippet to the build.gradle dependencies section:

implementation 'one.util:streamex:0.8.2'

Pull requests are welcome.

streamex's People

Contributors

Stargazers

Watchers

Forkers

grossws manikitos abbadon123 vishalzanzrukia antonykapustin grmpcerber luigicortese valery1707 mgivney ravikumarmaddi archie-lab alexeylevin volkov golovnin newday1 harisez stankevichevg agackovka x0ddf tspannhw cybernetics simple555a geniusgeek ekosusil cahillie40 man4ester skapral drozd0 nbardiuk isopov edrb2409 loganathan001 alainlompo hiteshtara aeding isatimur rybalkinsd madalasa venkatesha huantt tommy-kwon ljj038 onexuan frankenjoy123 rf5860 vaznoe levincai mdaniyalkhann andyglick nku915 xlj44400 epkugelmass landawn shettyrgs dzhinn restmad seanf ernestwwchin maximvegorov sharsu wuxinyun achuprin mwylaoma aditya-agrawal bunsoy bravo2017 radistao archer-christ rainbow954 grmendes sham7god canthonyl gestatech hashjang ogregoire zeng08 harrisyang americanstone optyfr xiaoying1990 msteinbeck colinger loordgek gm2211 minhnhut0602 libinjareo bigkahuna1uk maksymus itavgur dilippanwar1 vyacheslav-lapin vsoltys jxsd0084 aballaci lyleshixiong gl5 blacktree itfly diegoeliascosta xuxingquan

streamex's Issues

New collector which collapses nested items

A new collector could be developed which is able to collect input elements to the sorted list removing elements which are considered to be the parts of another element. Elements have well-defined order (either natural order or specified by user-supplied comparator). Also user must supply a BiPredicate<E, E> isPartOf which accepts (child, parent) pair and returns true when child is a part of parent. The following invariants must hold:

for every A, B: B isPartOf A => B > A (every parent precedes its children according to the elements order)
for every A, B, C: C isPartOf A && C > B && B > A => B isPartOf A (all children immediately follow their parent according to the elements order)

The following signatures are proposed:

class MoreCollectors {
    public static <E> Collector<E, ?, List<E>> collapseNested(
                        Comparator<? super E> comparator,
                        BiPredicate<? super E, ? super E> isPartOf) {...}
    public static <E extends Comparable<? super E>> Collector<E, ?, List<E>> collapseNested(
                        BiPredicate<? super E, ? super E> isPartOf) {...}
}

Final solution is to drop the comparator/comparable (assuming the stream is already properly sorted), rename the method to dominators() and swap the BiPredicate arguments. The BiPredicate transitivity is the only non-trivial requirement which must be held for correct processing in both sequential and parallel modes. Non-sorted inputs also could be used to solve some interesting problems. For example, this core leaves only numbers which are bigger than any preceding number:

streamOfIntegers.collect(MoreCollectors.dominators((a, b) -> a >= b));

Related to #29

Draft implementation
Investigate possible ready solutions (if any) in other libraries (Apache Commons, Guava) to check whether there are best practices for such problem (nothing found which exactly matches given problem; Guava RangeSet is probably the closest thing, but still different).
Unit tests
Benchmark, compare with naive solution which sorts the whole input according to comparator, then processes the adjacent elements.
JavaDoc
CHANGES
CHEATSHEET

toMap with merge function and grouping with ordered downstream: allow concurrent collection for unordered source

Methods StreamEx.toMap(keyMapper, valMapper, mergeFunction), StreamEx.toSortedMap(keyMapper, valMapper, mergeFunction), EntryStream.toMap(mergeFunction), EntryStream.toSortedMap(mergeFunction), EntryStream.toCustomMap(mergeFunction, mapSupplier) do not use concurrent collection for parallel stream as mergeFunction is not guaranteed to be commutative. However if stream is unordered it's still possible to use concurrent collection, so it should be implemented. This way user may explicitly call .unordered() to speed up the results.

The same applies for EntryStream.grouping(downstream), EntryStream.grouping(supplier, downstream), StreamEx.groupingBy(classifier, downstream), StreamEx.groupingBy(classifier, supplier, downstream): currently parallel concurrent collection is used only for unordered downstream collector. It could be used also for unordered source.

Added IntStreamEx.of(Integer[] array), etc.

Sometimes input data is located in boxed array like Integer[], Long[] or Double[]. It would be nice to unbox it automatically:

Integer[] input = ...
IntStream stream = IntStreamEx.of(input);

The same for LongStreamEx, DoubleStreamEx.

Add StreamEx.ofLines(Path)

Simple implementation may wrap Files.lines(Path). More complex implementation could be added later which reports an estimated size based on the input file length.

Implementation
Test (with temp file)
JavaDoc
Changes

Add StreamEx.without(T...) and primitive friends

Currently there are StreamEx.without(T), IntStreamEx.without(int) and LongStreamEx.without(long) methods. It would be nice addition to allow specifying several values with var-args:

StreamEx#without(T...)
IntStreamEx#without(int...)
LongStreamEx#without(long...)

There are some open questions though as discussed in #40 (copied):

The main problem here is the computational difficulty. The simplest solution is to perform the linear scan of the supplied array for every stream element which will make the overall difficulty as O(n*m) where n is the number of stream elements and m is the array length. Having such method might encourage users to use it instead of manually converting the input to the efficient data structure. For example, use might have long String[] array. Currently user might write:

String[] toExclude = ...
Set<String> excludeSet = new HashSet<>(Arrays.asList(toExclude));
result = StreamEx.of(something).remove(excludeSet::contains).toList();

After the proposed change user will be able to write:

String[] toExclude = ...
StreamEx.of(something).without(toExclude).toList();

While this is shorter, it would be less efficient. Or should we create a HashSet internally (making the method early-binding)? This way we should require the consistent hashCode/equals implementation (and compareTo if it's comparable as HashSet may use it!). Or should we create a HashSet only if array size is bigger than some threshold? And what about primitive case? If user has some library with primitive sets (like Trove or Koloboke), he may create an efficient set and use it with remove. The StreamEx library cannot use such collections (and it probably would be too much to implement IntSet/LongSet internally, though this could be discussed). So should we use HashSet<Integer> and HashSet<Long> boxed sets in these cases?

Limited joining collector

Implement collector which joins stream of CharSequences with given delimiter which may short-circuit when the given length is reached adding the optional ellipsis. Possible features and settings include:

Roadmap:

Add more overloads for EntryStream.of(K, V, ...)

Currently it's possible to create an EntryStream providing up to three concrete key/value pairs. Such feature is very useful (at least until JEP-269 is coming) and should be extended for more parameters. Probably up to 10 key-value pairs should be supported to be in sync with JEP-269.

Currently it seems unnecessary to extend correspondingly EntryStream.append and EntryStream.prepend: it would be possible to use EntryStream.append(EntryStream.of(...)) if it's necessary to add more than 3 pairs.

Implementation
Tests
JavaDoc
Changes

Improve CollapseSpliterator.forEachRemaining

Current JDK Stream.spliterator() implementation works in a way that after at least one tryAdvance call the forEachRemaining falls to the slower implementation. The CollapseSpliterator.forEachRemaining currently extracts one element from the source before calling source.forEachRemaining. It could be rewritten to avoid this first tryAdvance call which may greatly improve speed in some cases. Also memory footprint may be improved when upstream intermediate operations involve flatMap as buffering will become unnecessary.

draft implementation (45a53a4)
all tests passed
benchmark
update CHANGES.md (0d6d2a9)

Joining collector: possibility to specify custom delimiter used before the last stream element

A custom delimiter could be specified to format more user-friendly messages like this:

StreamEx.of("Red", "Green", "Blue", "Black").collect(
         Joining.with(", ").lastDelimiter(" or "));
// Produces "Red, Green, Blue or Black"

Possible names for new method: lastDelimiter, beforeLast. An open question is whether it should be used when short-circuit occurs.

Defer JDK stream creation until necessary and merge Custom* streams with normal ones

A big refactoring effort could be made to defer creation of JDK stream until it's really necessary. This would allow to stick with spliterators for most of sources and quasi-intermediate operations and sometimes don't create Stream at all. For example StreamEx.of(array).pairMap(blahblah).forEach(blahblah).

This also assumes merging CustomStreamEx and friends into their parents eliminating several classes (and actually reducing the library size a little bit).

skipLast(long n)

Please add StreamEx.skipLast(long n). This method is similar to skip(long n) except this method returns a stream consisting of the initial elements of this stream after discarding the last n elements of the stream. If this stream contains fewer than n elements then an empty stream will be returned.

Here's an implementation.

public static <T> Stream<T> skipLast(Stream<T> stream, int n)
{
   Spliterator<T> iter;
   Stream<T> result;

   iter   = stream.spliterator();
   iter   = new SkipLast<>(iter, n);
   result = StreamSupport.stream(iter, false);

   return(result);
}

private static class SkipLast<T> extends Spliterators.AbstractSpliterator<T> implements Consumer<T>
{
   private final Spliterator<T> m_input;
   private final ArrayDeque<T>  m_queue;
   private final int            m_n;

   public AllButLastN(Spliterator<T> input, int n)
   {
      super(Math.max(input.estimateSize() - n, 0), 0);

      m_input = input;
      m_n     = n;
      m_queue = new ArrayDeque<>(n + 1);
   }

   @Override
   public boolean tryAdvance(Consumer<? super T> action)
   {
      T value;

      while (m_queue.size() <= m_n)
         if (!m_input.tryAdvance(this))
            return(false);

      value = m_queue.pollFirst();

      action.accept(value);

      return(true);
   }

   @Override
   public void accept(T value)
   {
      m_queue.addLast(value);
   }
}

Remove generic parameter restriction on StreamEx.select, EntryStream.selectKeys/selectValues

Currently StreamEx.select is declared this way:

public <TT extends T> StreamEx<TT> select(Class<TT> clazz) {...}

Sometimes people want to select objects by interface which is not subtype of stream type. It would be good to remove TT type restriction and declare simply as

public <TT> StreamEx<TT> select(Class<TT> clazz) {...}

Similarly for EntryStream.selectKeys/selectValues

implementation
tests
CHANGES

Erased signature will not change, so no backward compatibility issues should appear both on source or binary level.

Replace PrependSpliterator with TailConcatSpliterator

Currently only StreamEx.prepend(T...) is TSO-optimized which is inconvenient sometimes. It would be better to reimplement the standard .concat operation fully with our own TailConcatSpliterator which will concatenate two streams and work as TSO for the second one.

Other improvements which could be implemented here:

Do not create unnecessary close handlers.
Do not concatenate with SIZED/empty source: just wrap the original stream.
Use TSO internally, so the long series of prepend calls would not lead to StackOverflow error (though it would not work with append, but we can also consider "rotating" or even truly balancing the concatenation tree).

Primitive specializations could be postponed as we have no primitive headTail yet.

Related to #54.

Implementation
Additional tests
Documentation updates
Changes

Poor `StreamEx.ofLines` parallel performance

BufferedReader's spliterator (taken from Spliterators.spliteratorUnknownSize which returns an IteratorSpliterator) has very large granularity (1024 lines minimum), which makes it impractical for many tasks. See this thread: http://stackoverflow.com/questions/22569040/readerlines-parallelizes-badly-due-to-nonconfigurable-batch-size-policy-in-it

I propose that StreamEx implements its own spliterator. There's a number of ways to do it. I think the most flexible would be to read ahead a certain number of bytes (eg, 256KB). If the file is smaller than that then we can know how many lines it has and return a real (ie, known-size) spliterator. If the file is larger than that, then we return something similar to an IteratorSpliterator, but I think it should still have a much smaller granularity than 1k.

I wrote in the comments to my harshly down-voted answer that the root problem is the spliterator. It's meant to work well when the data is in memory (and is either in a randomly-accessible collection or a tree). It works poorly when we're, you know, actually STREAMING data. The irony. What are your thoughts on that?

Rename package javax.util.streamex

Using of javax package name is not recommended and actually against the Oracle Binary Code License Agreement, SUPPLEMENTAL LICENSE TERMS, F. Thus probably it would be good to change the package name in 0.5.0 version.

The package will be renamed to one.util.streamex, so existing users upon update will need to search-and-replace javax.util.streamex with one.util.streamex across the sources. Also the OSGi Bundle-SymbolicName will be changed to one.util.streamex, thus probably manifests should be changed as well. ~~No changes in Maven group ID/artifact ID will be applied~~. Maven group ID will be changed to one.util to stay in sync with package/OSGi name.

Register util.one
Update codebase (rename packages in main and test)
Update pom.xml to properly create OSGi manifest.
Request access to the one.util group ID on Maven Central.
Update pom.xml group ID.
Update README.md to warn about the change; probably add short migration guide.
Update README.md after 0.5.0 release to specify new group ID.
Add redirects from old documentation URLs (like http://amaembo.github.io/streamex/javadoc/javax/util/streamex/StreamEx.html) to new ones (like http://amaembo.github.io/streamex/javadoc/one/util/streamex/StreamEx.html) preserving the anchor.

StreamEx.parallel(fjp).runLengths() fails to run the task in the specified pool

runLengths() method creates normal EntryStream instead of using the appropriate strategy.

Implement withFirst(): extract first stream element

Sometimes it's useful to extract the first stream element and use it to process other elements. Some solutions are proposed and discussed here. In StreamEx we must make such operation lazy: it should not consume any stream elements until the terminal operation is performed.

Two possible versions are proposed:

class StreamEx<T> {
    // Return EntryStream where keys are the same and represent the first stream element
    // while values are the rest elements (the stream length is one element less than 
    // the original stream)
    public EntryStream<T, T> withFirst() {...};

    // Return StreamEx which elements are the result of applying the mapper function 
    // to the first stream element and every other element
    <U> StreamEx<U> withFirst(BiFunction<T, T, U> mapper) { ... };
}

The biggest problem is the current flatMap bug: this code must produce 1, 2, 3, 4, 5

StreamEx.of(0, 2, 4).flatMap(x -> Stream.of(x, x + 1)).parallel()
              .withFirst().values().toList();

Issue #54 extracted from this one as it's substantially different.

Add scanLeft for primitive streams

The scanLeft operation could be added to primitive streams which returns an array:

int[] IntStreamEx.scanLeft(int seed, IntBinaryOperator accumulator);
int[] IntStreamEx.scanLeft(IntBinaryOperator accumulator);

And the corresponding for LongStreamEx/DoubleStreamEx. An implementation may take advantage on the SIZED stream characterstic (even in parallel) to preallocate the array.

Fix return types in generated StreamEx/EntryStream documentation

Due to JavaDoc bug generic return type of some methods in StreamEx and EntryStream documentation are depicted like S. Example: StreamEx.append. It's expected to be StreamEx<T> in StreamEx.html and EntryStream<K, V> in EntryStream.html. It can be potentially fixed by applying search-and-replace after building the JavaDoc.

mapFirst/mapLast for all stream types

The mapFirst method should map the first stream element while leaving other elements intact. The mapLast method should map the last stream element while leaving other elements intact. The returned stream type is the same as the original stream type.

proof of concept implementation (46efc88)
documentation (8111d56)
testcases (0e55e86)

Zip Streams

Please add a method of the signature: EntryStream<K, V> EntryStream.zip(Stream keys, Stream values). This will allow for zipping two streams together. See this Stack Overflow question for more details.

http://stackoverflow.com/questions/17640754/zipping-streams-using-jdk8-with-lambda-java-util-stream-streams-zip

append/prepend(Collection) should not optimize isEmpty() for concurrent collections

Currently StreamEx.append/prepend(Collection) and EntryStream.append/prepend(Map) optimize the case when supplied collection/map is empty. This might be undesired if the collection/map is concurrent as it may be legally modified during the terminal operation from the another thread. It seems that more correct optimization would be checking collection.spliterator().getExactSizeIfKnown() == 0 instead. This will return true for empty non-concurrent collections and return false for concurrent collection. With #55 we don't even need to create the extra spliterator() for this check: if the check fails, we can use the same spliterator further.

Implementation
Tests
JavaDoc update
Changes

Further improvement of UnknownSizeSpliterator parallel performance

Investigation shows that tweaking the estimated size of USS array part may greatly improve the splitting quality making it comparable with ArrayList-like source (provided that getting the item from the source is relatively cheap compared to downstream processing). Thus such spliterator might become really useful for the parallel processing of unknown size sources (including StreamEx.ofLines, see #3).

Implementation
Benchmarks
Changes

Add EntryStream.flatMapToKey, flatMapToValue

Similarly to EntryStream.mapToKey, EntryStream.mapToValue sometimes it's desired to flatten values only, but use keys as well (or vice versa).

Collapse not working

I didn't found where it came from but the following code does not work:

streamEx
    .sorted()
    .collapse(C::isNested,C::merge)
    .collect(toList());
    ...
    //items are not collapsed

    public static final boolean isNested(T a, T b) {
        return a.isParent(b) || b.isParent(a);
    };

    public static final T merge(T a, T b) {
        return a.isParent(b) ? a : b;
    };

"POJO" is working :

List<T> tmp = 
    stream
        .sorted()
        .collect(toList());
Iterator<T> it = tmp.iterator();
T curr, last;
curr = last = null;
while (it.hasNext()) {
    T oldLast = last;
    last = curr;
    curr = it.next();
    if (last != null && last.isParent(curr)) {
        it.remove();
        curr = last;
        last = oldLast;
    }
}
tmp.stream().collect(toList());
//items are sorted

Create EntryStream.ofTree() attaching the depth value

Currently there are StreamEx.ofTree() methods family which allow to construct the stream of tree-like structure. Sometimes it's desired to track the current tree depth. This could be resolved adding new static methods to EntryStream:

<T> EntryStream<Integer, T> ofTree(T root, BiFunction<Integer, T, Stream<T>> mapper) {}
<T, TT extends T> EntryStream<Integer, T> ofTree(T root, Class<TT> collectionClass,
        BiFunction<Integer, TT, Stream<T>> mapper) {}

The resulting stream contains elements in the values and the tree depth in the keys (0 = root, 1 = immediate root children and so on). Also mapper function receives depth now which allows, for example, to limit tree expansion by given depth:

EntryStream.ofTree(root, (depth, e) -> depth > MAX_DEPTH ? null : e.children());

In this case depth could be easily dropped using EntryStream.values() if it's unnecessary after that.

Simple implementation may use flatMap. Later improvement using custom spliterator is possible (see #17).

Optimize mapFirst/mapLast

Current implementation of mapFirst/mapLast is poor: it concatenates the source stream with single-element stream, then uses pairMap on it. It's even worse for primitive streams where it additionally boxes and unboxes the stream. Research shows that it's relatively simple to add direct support of mapFirst/mapLast to the PairSpliterator including its primitive friends.

Implementation
Additional tests
Benchmarks
Changes

Improve PairSpliterator.forEachRemaining

Current JDK Stream.spliterator() implementation works in a way that after at least one tryAdvance call the forEachRemaining falls to the slower implementation. The PairSpliterator.forEachRemaining (all four implementations ref/int/long/double) currently extracts one element from the source before calling source.forEachRemaining. It could be rewritten to avoid this first tryAdvance call which may greatly improve speed in some cases. Also memory footprint may be improved when upstream intermediate operations involve flatMap as buffering will become unnecessary.

Add MoreCollectors.flatMapping

This should work like newly appeared JDK9 Collectors.flatMapping (see JDK-8071600), but should produce short-circuiting collector for short-circuiting downstream. Short-circuiting should be supported even in the middle of created stream traversal (likely with throwing CancelException), so it could be useful even if outside collector does not support short-circuit:

stream.collect(Collectors.groupingBy(Element::getName, 
                          MoreCollectors.flatMapping(Element::children,
                               MoreCollectors.head(20))));

Such code should not collect more than 20 children and do not call Element::children at all if 20 children are already collected.

initial implementation (a50c48d)
simple unit tests (4da67f3)
unit tests for short-circuiting version (4da67f3)
documentation (d9a2932)
changes (d9a2932)

Support custom thread-pools

Inspired by http://blog.krecan.net/2014/03/18/how-to-specify-thread-pool-for-java-8-parallel-streams/

Add MoreCollectors.toBatches collector

There's repeating requests to add a partitioning of the stream to the fixed size batches. See, for example, this question. Some problems can be solved using StreamEx.ofSubLists, but sometimes the source is not a random access list. It's hardly possible to create efficient parallel implementation for ordered batches, but if we don't guarantee an order (any subset of input elements may be collected into single batch), then implementation becomes quite straightforward. Probably it's a good idea to add such collector.

JavaDoc

Почему бы не написать

/** {@inheritDoc} */

перед каждым методом из Stream?

Backport Java-9 fix for Pattern.splitAsStream

In Java-8 Pattern.compile(anything).splitAsStream("") returns empty stream while in Java-9 it returns a stream of single empty element (see JDK-8069325). The implementation of StreamEx.split(CharSequence, String) and StreamEx.split(CharSequence, Pattern) should be fixed to return single empty element in Java-8 as well (also to conform with #13).

Implementation
Tests
Changes

Add commonPrefix and commonSuffix Collectors

Create new collectors in MoreCollectors:

Collector<CharSequence, ?, String> commonPrefix();
Collector<CharSequence, ?, String> commonSuffix();

Both should find the longest common prefix/suffix of the stream of CharSequence objects. Both are unordered and can short-circuit if there's no common prefix/suffix. Both should process surrogate pairs properly. Both should return empty string for empty input. Probably it's good not to store substring, but some element and length of prefix/suffix to reduce substring operations.

Add to readme all extra feature provided by streamex.

For example, I will need to "merge" two streams. Look to https://github.com/amaembo/streamex - nothing :(

So I found out the operation name on Google ("zip"), in parallel I found several other libraries provide such functionality (protonpack, Lazy-Seq) and only after it in Eclipse I realized that such method is exists in StreamEx.

MoreCollectors.mapping: do not call mapper if short-circuiting downstream reduction is finished

In MoreCollector.mapping the mapper function call can be skipped when downstream reduction is finished even if the whole collection is not short-circuited (for example, Collectors.groupingBy(classifier, MoreCollector.mapping(mapper, shortCircuitingDownstream)) is used). Thus the cascaded reduction may still benefit from short-circuiting if the mapper function is expensive.

Restore PrependSpliterator

While TailConcatSpliterator is implemented (see #55) and useful, the PrependSpliterator which allows adding exactly one element at the beginning of the stream is very important use case for headTail scenarios (see #54). As recursively defined operations may use prepend for every single element, it's significant allocation pressure to create TailConcatSpliterator and ArraySpliterator every time. Replacing them both with simple PrependSpliterator reduces allocation and improves overall performance of recursive headTail operations.

Add StreamEx.of(Enumeration)

Sometimes people still use java.util.Enumeration. Currently there's no easy way to create a Stream of it (though Java-9 will add asIterator() default method which will make things easier — see JDK-8072726). Nevertheless (even in Java-9) it would be useful to have

StreamEx<T> StreamEx.of(Enumeration<T>)

This method would create an Iterator and use StreamEx.of(Iterator<T>) (see #35).

Add StreamEx.split(CharSequence, char) method

Splitting by single char might be more efficient than currently available regex-based splitting. Also it's possible to estimate the size and truly parallelize it jumping to the second part of the string.

Two methods will be added:

StreamEx.split(CharSequence, char); // = StreamEx.split(CharSequence, char, true);
StreamEx.split(CharSequence, char, boolean); // true = String.split(delim, 0); 
                                             // false = String.split(delim, -1).

Add MoreCollectors.toMap methods accepting Entry<K,V>

It's very common to use Collectors.toMap collector supplying Map.Entry as the input. For example searching on StackOverflow this query yields 37 results and this query yields 30 results. Some of the cases are covered by EntryStream.toMap() method, but when complex downstream reduction is desired having special collector which collects only entries would be nice. The following signatures are proposed:

class MoreCollectors {
    public static <K, V> Collector<Entry<K, V>, ?, Map<K, V>> toMap() {};
    public static <K, V, VV> Collector<Entry<K, V>, ?, Map<K, VV>> toMap(Function<V, VV> valueMapper) {};
    public static <K, V> Collector<Entry<K, V>, ?, Map<K, V>> toMap(BinaryOperator<V> combiner) {};
    public static <K, V, VV> Collector<Entry<K, V>, ?, Map<K, VV>> toMap(Function<V, VV> valueMapper, BinaryOperator<VV> combiner) {};

    public static <K, V, M extends Map<K, V>> Collector<Entry<K, V>, ?, M> toCustomMap(Supplier<M> mapSupplier) {};
    public static <K, V, VV, M extends Map<K, VV>> Collector<Entry<K, V>, ?, M> toCustomMap(Function<V, VV> valueMapper, Supplier<M> mapSupplier) {};
    public static <K, V, M extends Map<K, V>> Collector<Entry<K, V>, ?, M> toCustomMap(BinaryOperator<V> combiner, Supplier<M> mapSupplier) {};
    public static <K, V, VV, M extends Map<K, VV>> Collector<Entry<K, V>, ?, M> toCustomMap(Function<V, VV> valueMapper, BinaryOperator<VV> combiner, Supplier<M> mapSupplier) {};

}

Need to investigate when unwanted signature clashes are possible (using method references) and possibly remove some overloads or rename collectors.

Implement headTail(): map (head, tailStream) to another stream

Extracted from #50 as it's substantially different feature.

The following method is proposed:

class StreamEx<T> {
    // Return StreamEx which is the result of applying the mapper function to the first
    // stream element and the stream of the rest elements; the mapper function is applied 
    // at most once during the terminal operation execution
    <U> StreamEx<U> headTail(BiFunction<T, StreamEx<T>, Stream<U>> mapper) { ... };

    // Supplier is called if this stream is empty
    <U> StreamEx<U> headTail(BiFunction<T, StreamEx<T>, Stream<U>> mapper, Supplier<Stream<U>> supplier) { ... };
}

Such method is really flexible (it allows to implement most of other intermediate operations). The drawback is that it cannot be parallelized well (only poor-man buffered parallelization could be performed) and recursive usage eats the stack (which could be partially solved if TSO can be implemented for some spliterators).

Usage examples.

Prime numbers:

static StreamEx<Integer> sieve(StreamEx<Integer> input) {
    return input.withFirst((head, tail) -> sieve(tail.filter(n -> n % head != 0))
                      .prepend(head));
}

sieve(StreamEx.iterate(2, x -> x+1)).takeWhile(x -> x < 10000).forEach(System.out::println);

Lazy scanLeft:

static <T> StreamEx<T> scanLeft(StreamEx<T> input, BinaryOperator<T> operator) {
    return input.headTail((head, tail) -> 
        scanLeft(tail.mapFirst(cur -> operator.apply(head, cur)), operator)
            .prepend(head));
}

Revert stream:

static <T> StreamEx<T> reverse(StreamEx<T> input) {
    return input.headTail((head, tail) -> reverse(tail).append(head));
}

takeWhile op implementation:

static <T> StreamEx<T> takeWhile(StreamEx<T> input, Predicate<T> predicate) {
    return input.headTail((head, tail) -> predicate.test(head) ? 
                 takeWhile(tail, predicate).prepend(head) : null );
}

takeWhileClosed (including the first violating element):

static <T> StreamEx<T> takeWhileClosed(StreamEx<T> input, Predicate<T> predicate) {
    return input.headTail((head, tail) -> predicate.test(head) ? 
            ? takeWhileClosed(tail, predicate).prepend(head)
            : Stream.of(head));
}

cycle - infinitely cycles the input, lazy

static <T> StreamEx<T> cycle(StreamEx<T> input) {
    return input.headTail((head, tail) -> cycle(tail.append(head)).prepend(head));
}

mirror - this stream, then reversed stream, also lazy

static <T> StreamEx<T> mirror(StreamEx<T> input) {
    return input.headTail((head, tail) -> mirror(tail).append(head).prepend(head));
}

every - take every nth element

static <T> StreamEx<T> every(StreamEx<T> input, int n) {
    return input.headTail((head, tail) -> every(tail.skip(n-1), n).prepend(head) );
}

the first stream element if nothing matches the predicate; the first matching otherwise

static <T> T firstMatchingOrFirst(StreamEx<T> stream, Predicate<T> predicate) {
    return stream.headTail(
          (head, tail) -> tail.prepend(head).filter(predicate).append(head))
                    .findFirst().get();
}

stream of singleton lists to stream of fixed-size batches

static <T> StreamEx<List<T>> batches(StreamEx<List<T>> input, int size) {
    return input.headTail((head, tail) -> head.size() >= size ? 
        batches(tail, size).prepend(head) : 
        batches(tail.mapFirst(next -> StreamEx.of(head, next).toFlatList(l -> l)), size));
}

stream of singleton lists to stream of sliding windows

static <T> StreamEx<List<T>> sliding(StreamEx<List<T>> input, int size) {
    return input.headTail((head, tail) -> head.size() == size ? 
        sliding(tail.mapFirst(next -> StreamEx.of(head.subList(1, size), next)
              .toFlatList(l -> l)), size).prepend(head) : 
        sliding(tail.mapFirst(next -> StreamEx.of(head, next).toFlatList(l -> l)), size));
}

Primitive specializations also could be implemented using BiFunction<Integer, IntStreamEx, IntStream> mapper, etc. (postponed)

Declare StreamEx.append(T...)/prepend(T...) as SafeVarargs

It's perfectly ok to use @SafeVarargs with append/prepend. The only problem is that these methods must be declared final now.

Implementation
Changes

Add short-circuiting reducingWithZero() collector

It's possible to add a short-circuiting reducing collectors like this:

public static <T> Collector<T, ?, Optional<T>> reducingWithZero(T zero, BinaryOperator<T> op);
public static <T> Collector<T, ?, T> reducingWithZero(T zero, T identity, BinaryOperator<T> op);

Here zero is the element which satisfies the following property: for any t op.apply(zero, t) is equal to op.apply(t, zero) and equal to zero. So when zero is reached, further reduction could be stopped.

Also, a shortcut StreamEx.reduceWithZero == collect(reducingWithZero) should be created

Examples:

StreamEx.of(integers).reduceWithZero(0, 1, (a, b) -> a*b));
StreamEx.of(integers).reduceWithZero(Integer.MAX_VALUE, Math::max));
StreamEx.of(integers).reduceWithZero(0, (a, b) -> a & b)); // like andingInt()
StreamEx.of(integers).reduceWithZero(0xFFFFFFFF, (a, b) -> a | b));
StreamEx.of(sets).reduceWithZero(Collections.emptySet(), 
  (set1, set2) -> StreamEx.of(set1).filter(set2::contains).toSet())); // like intersecting()

Re-implement Stream.skip to have reasonable parallel behavior

Currently, something like StreamEx.ofLines(...).skip(1).parallel().... doesn't do anything reasonable or useful. Instead of skipping the file's header, it skips a random line in the file. I think it should be possible to change its behavior to actually skip the first item if the source stream is ordered.

Enhance Joining collector

The following enhancements could be implemented for Joining collector in future versions:

Limit length by custom user function (which probably computes the rendered string width in the UI): restrictions on function definition should be clearly specified
Possibility to cut using custom BreakIterator instance or BreakIterator factory
Possibility to cut using custom user function which can adapt java.text.BreakIterator or possibly ICU4J BreakIterator
Possibility to specify the non-default Locale for BreakIterator in cutSymbols/cutWords/maxSymbols modes (need to investigate when Locale differences matter).

Add chain() method for all stream types

As suggested in JDK-8140283 a chain method could be added to all stream types which maps the current stream to the result of the specified function:

<U> chain(Function<? super StreamEx<T>,U> func) {
    return func.apply(this);
}

This would allow enhancing the streams with custom processing method while keeping the chaining syntax.

It's better to stay in sync with the JDK-8140283 in order not to produce name/signature clash with future Java versions.

Support StreamEx.of(Iterator) (and primitives) and optimize it for parallel processing

New static methods could be added:

StreamEx<T> StreamEx.of(Iterator<T>)
EntryStream<K, V> EntryStream.of(Iterator<Entry<K, V>>)
IntStreamEx IntStreamEx.of(PrimitiveIterator.OfInt)
LongStreamEx LongStreamEx.of(PrimitiveIterator.OfLong)
DoubleStreamEx DoubleStreamEx.of(PrimitiveIterator.OfDouble)

JavaDoc should mention that use of such methods is discouraged as iterator has poor characteristics and unknown size which results in poor parallel processing. However parallel processing quality might be improved using custom UnknownSizeSpliterator implementation (see the discussion in #3). So this spliterator should be used internally instead of JDK Spliterators.spliteratorUnknownSize.

Probably StreamEx.ofLines could also be switched to this spliterator, but additional investigation is necessary.

Flat mapping of collection with optional

Hi,

Take a look on http://short-coding-notes.blogspot.cz/2014/09/flatten-list-of-optional-values-in-java.html

Is it implemented somehow in streamex?

Best,
Stas

StreamEx.ofTree: rewrite with custom spliterator

Currently StreamEx.ofTree implementation is flatMap-based and inherits all the flatMap issues: poor short-circuiting and buffering when external iteration used. It's possible to write custom spliterator which would be free of these disadvantages and probably could split better.

StreamEx.toSet() for parallel stream may use concurrent collection

StreamEx/EntryStream.toSet() now is a simple shortcut to collect(Collectors.toSet()). For parallel stream it can collect into single ConcurrentHashMap.newKeySet() avoiding the combine step which should work faster.

The problem is handling null element for parallel stream as ConcurrentHashMap.newKeySet() does not support such key.

Implementation
Tests
Benchmark (compare with current version for parallel stream)
Changes.MD