juliacollections / itertools.jl Goto Github PK

View Code? Open in Web Editor NEW

153.0 16.0 28.0 822 KB

Common functional iterator patterns

License: Other

Julia 100.00%

itertools chain julia julia-language julialang iterators functional-iterators

itertools.jl's Introduction

IterTools.jl

Common functional iterator patterns.

Installation

Install this package with Pkg.add("IterTools")

itertools.jl's People

Contributors

Stargazers

Watchers

itertools.jl's Issues

New Iteration Protocol

Keno:

The recommendation is to either define iteration conditionally (if VERSION, then define one or the other) or just drop support for 0.6

I propose we have one file for 0.6 iterators and one for 0.7 iterators. I doubt there will be much shared code. Then once 1.0 is out we can just delete the 0.6 stuff.

I will probably start working on this soon so please post if you have any thoughts.

Fix all the failing doctests

There are lots, but I think most of them are due to the way jldoctest handles @show. Will require more investigation.

a proposal for a sort of partition by

Could it be convenient to have an iterator that is somewhere between groupby and partition?
The application of the function refers to the case in which we want to take some consecutive slices of variable dimensions (steps) from an iterator

julia> itr=10:-1:1
10:-1:1

julia> steps=[1,2,3,2]
4-element Vector{Int64}:
 1
 2
 3
 2

julia> collect(partby(itr,steps))
4-element Vector{Tuple{Vararg{Int64}}}:
 (10,)
 (9, 8)
 (7, 6, 5)
 (4, 3)

julia> steps=[1,2,3,5]
4-element Vector{Int64}:
 1
 2
 3
 5

julia> collect(partby(itr,steps))
3-element Vector{Tuple{Vararg{Int64}}}:
 (10,)
 (9, 8)
 (7, 6, 5)

julia> steps=[4,2,3,5]
4-element Vector{Int64}:
 4
 2
 3
 5

julia> collect(partby(itr,steps))
3-element Vector{Tuple{Vararg{Int64}}}:
 (10, 9, 8, 7)
 (6, 5)
 (4, 3, 2)

julia> steps=[4,2,3,1, 5]
5-element Vector{Int64}:
 4
 2
 3
 1
 5

julia> collect(partby(itr,steps))
4-element Vector{Tuple{Vararg{Int64}}}:
 (10, 9, 8, 7)
 (6, 5)
 (4, 3, 2)
 (1,)

julia> steps=[2,3,1, 2,7]
5-element Vector{Int64}:
 2
 3
 1
 2
 7

julia> collect(partby(itr,steps))
4-element Vector{Tuple{Vararg{Int64}}}:
 (10, 9)
 (8, 7, 6)
 (5,)
 (4, 3)

julia> collect(partby(partition(itr,2,1),steps))
4-element Vector{Tuple{Vararg{Tuple{Int64, Int64}}}}:    
 ((10, 9), (9, 8))
 ((8, 7), (7, 6), (6, 5))
 ((5, 4),)
 ((4, 3), (3, 2))

#-------------


struct PartBy{I, S}
    xs::I
    steps::S
end
_length_partby(i,s)= findlast(<=(length(i)), accumulate(+, s))
eltype(::Type{<:PartBy{I,S}}) where {I,S} = Tuple{Vararg{eltype(I)}}# Tuple{eltype(I),Vararg{eltype(I)}} #Vector{eltype(I)}
IteratorSize(::Type{<:PartBy{I,S}}) where {I,S} = HasLength()
length(it::PartBy{I,S}) where {I,S} = _length_partby(it.xs, it.steps)


function partby(xs::I, steps::S) where {I, S}
    if any(<=(0),steps)
        throw(ArgumentError("all steps must be positives."))
    end
    PartBy{I, S}(xs, steps)
end

macro ifsomething(ex)
    quote
        result = $(esc(ex))
        result === nothing && return nothing
        result
    end
end

function iterate(it::PartBy{I, S}, state=nothing) where {I, S}
    if state === nothing
        xs_val, xs_state = @ifsomething iterate(it.xs)
        step_val, step_state = @ifsomething iterate(it.steps)
        result = Vector{eltype(I)}(undef, step_val)
        result[1]=xs_val
        kgo = true
        for i in 2:step_val
            result[i], xs_state = @ifsomething iterate(it.xs, xs_state)
        end
       step_iter = iterate(it.steps, step_state)
        if isnothing(step_iter)
            return (tuple(result...),(false, xs_val, xs_state, step_val, step_state))
        else
            step_val, step_state = step_iter
        end step_val, step_state = @ifsomething iterate(it.steps, step_state)
    else
        (kgo, xs_val, xs_state, step_val, step_state) = state
        kgo || return nothing
        result = Vector{eltype(I)}(undef, step_val)       
        for i in 1:step_val
            result[i], xs_state = @ifsomething iterate(it.xs, xs_state)
        end
        step_iter = iterate(it.steps, step_state)
        if isnothing(step_iter)
            return (tuple(result...),(false, xs_val, xs_state, step_val, step_state))
        else
            step_val, step_state = step_iter
        end
    end
    return (tuple(result...), (kgo,xs_val, xs_state, step_val, step_state))
end

Would takewhile be a natural candidate for IterTools.jl?

Would a takewhile function be a natural function to add to IterTools? It consumes a given iterator as long as a given condition is met.

So collect(TakeWhile(Base.Iterators.countfrom(1), x-> x^2 <= 50)) would yield [1, 2, 3, 4, 5, 6].

My own (simplistic?) implementation looks like this:

struct TakeWhile{I}
    xs::I
    cond::Function
end

Base.iterate(it::TakeWhile) = iterate(it.xs)

function Base.iterate(it::TakeWhile, state)
    (val, state) = iterate(it.xs, state)
    it.cond(state) || return nothing
    val, state
end

Base.IteratorSize(it::TakeWhile) = Base.SizeUnknown()

imap doesn’t preserve type

Hello,

I don't really know if that a bug or a limitation but imap from IterTools.jl doesn't preserve type

julia> itr = imap(x->2*x, 1:10)
julia> collect(itr)
10-element Array{Any,1}:
  2
  4
  6
  8
 10
 12
 14
 16
 18
 20

I was expecting same result (ie Vector of Int) as

julia> 2*collect(1:10)
10-element Array{Int64,1}:
  2
  4
  6
  8
 10
 12
 14
 16
 18
 20

julia> map(x->2*x, 1:10)
10-element Array{Int64,1}:
  2
  4
  6
  8
 10
 12
 14
 16
 18
 20

My goal is to simply apply a function to values of an iterator and get a new iterator (preserving type)

Kind regards

PS : crossposted on https://discourse.julialang.org/t/imap-from-itertools-jl-doesnt-preserve-type/7683

documentation please

it would be awesome to have an updated documentation similar to the one in Iterators.jl

Exponential time complexity for recursive chains with increasing NTuple length

Hi all,

there is some strange exponential time and memory behavior that I don't understand:
I have a tree with some additional data (Vector{Foo}) on every node, and eventually I want to iterate through all the Foo objects without unnecessary allocations. At first I thought I ran into an infinite loop, because CPU got stuck at 100% and nothing else would happen, but I was able to track down the problem to the following minimal example. I'll just use a linear list instead of a wider tree here, and run length() instead of collecting items. That is enough to show the exponential behavior:

versioninfo()

using IterTools

struct Foo end

"""
Creates a Chain{Tuple{Chain{Tuple{ ... NTuple{n,Vector{Foo}} ... }}}} of depth `d`,
with an NTuple of length `n` on the lowest level, containing empty Vectors of `Foo`.
"""
function g(d::Int, n::Int)
  if d == 0
    return [Foo()]
  elseif d == 1
    return chain([g(d - 1, n) for i in 1:n]...)
  else
    return chain(g(d - 1, n))
  end
end

for i in 1:19
  @time length(g(5, i))
end

Result for the above example (d=5, n=19):

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

  0.184802 seconds (123.40 k allocations: 6.813 MiB, 26.74% gc time)
  0.115444 seconds (123.95 k allocations: 6.828 MiB)
  0.115224 seconds (129.93 k allocations: 7.195 MiB)
  0.175451 seconds (136.42 k allocations: 7.585 MiB, 31.63% gc time)
  0.120656 seconds (143.94 k allocations: 8.109 MiB)
  0.124801 seconds (153.71 k allocations: 8.956 MiB)
  0.176149 seconds (168.00 k allocations: 10.415 MiB, 28.02% gc time)
  0.141948 seconds (191.84 k allocations: 13.889 MiB, 2.80% gc time)
  0.144715 seconds (235.74 k allocations: 20.443 MiB)
  0.167659 seconds (321.68 k allocations: 37.108 MiB, 3.56% gc time)
  0.208135 seconds (495.73 k allocations: 69.814 MiB, 3.25% gc time)
  0.300323 seconds (854.18 k allocations: 138.819 MiB, 3.42% gc time)
  0.487694 seconds (1.60 M allocations: 307.199 MiB, 3.67% gc time)
  0.874714 seconds (3.14 M allocations: 639.010 MiB, 3.37% gc time)
  1.708150 seconds (6.36 M allocations: 1.311 GiB, 3.37% gc time)
  3.567857 seconds (13.05 M allocations: 3.163 GiB, 3.62% gc time)
  7.628431 seconds (26.95 M allocations: 6.670 GiB, 3.52% gc time)
 16.260810 seconds (55.80 M allocations: 14.041 GiB, 3.78% gc time)
 34.626086 seconds (115.57 M allocations: 29.456 GiB, 4.18% gc time)

If I run the same loop again, the times are suddenly what I would've expected in the first place:

  0.000087 seconds (27 allocations: 992 bytes)
  0.000060 seconds (29 allocations: 1.094 KiB)
  0.000045 seconds (31 allocations: 1.219 KiB)
  0.000047 seconds (33 allocations: 1.344 KiB)
  0.000049 seconds (35 allocations: 1.469 KiB)
  0.000047 seconds (37 allocations: 1.594 KiB)
  0.000042 seconds (39 allocations: 1.719 KiB)
  0.000041 seconds (41 allocations: 1.844 KiB)
  0.000045 seconds (43 allocations: 1.969 KiB)
  0.000042 seconds (45 allocations: 2.094 KiB)
  0.000040 seconds (47 allocations: 2.219 KiB)
  0.000041 seconds (49 allocations: 2.344 KiB)
  0.000040 seconds (51 allocations: 2.469 KiB)
  0.000039 seconds (53 allocations: 2.594 KiB)
  0.000042 seconds (55 allocations: 2.719 KiB)
  0.000046 seconds (57 allocations: 2.844 KiB)
  0.000048 seconds (59 allocations: 2.969 KiB)
  0.000044 seconds (61 allocations: 3.094 KiB)
  0.000045 seconds (63 allocations: 3.219 KiB)

I guess my question is: Why is that? And how can I fix it efficiently? I could change chain to vcat, but I was hoping to avoid any unnecessary copies/allocations.

eager vs lazy evaluation

Maybe we could begin unifying the iterator interface to a lazy only version to get people used to materializing manually where needed. Then we should add some Cache iterator and a materializing function though. The latter one would probably just call collect. The Cache iterator would be needed to prevent the situation that originally made Base use eager evaluation as the default.

subsets() not working on Set() objects

julia> for s in subsets(Set([1,2,3]))
           println(s)
       end
ERROR: MethodError: no method matching getindex(::Set{Int64}, ::Array{Bool,1})
Stacktrace:
 [1] next(::IterTools.Subsets{Set{Int64}}, ::Array{Bool,1}) at /home/robert/.julia/v0.6/IterTools/src/IterTools.jl:640
 [2] anonymous at ./<missing>:?

The docs say that subsets should work on collections. For me, it works for vectors, but not for sets. Am I doing something wrong?

reverse iterators for Julia 0.7

Now that 0.7 has reverse iteration (JuliaLang/julia#24187), it would be nice to support that in the IterTools iterators where it makes sense:

define Iterators.reverse(itr::SomeIterType) in cases where the reverse iterator is easily written in terms of some other iterator type. (e.g. if itr "wraps" around another iterator, just call Iterators.reverse on the "inner" iterator).
define start/next/done methods for Iterators.Reverse{SomeIterType} in cases where you need a specialized iteration protocol.

For examples, see https://github.com/JuliaLang/julia/blob/master/base/iterators.jl

Iterate over rows (or columns) of a matrix

Hello,

following JuliaLang/julia#14491
we should be able to iterate easily over columns (or over rows) of a matrix
but I'm not sure if this is in the scope of this package

Kind regards

Tag new release

Could a maintainer please tag a new release? Here's a diff since the last release:

v1.3.0...master

Deploy Documenter.jl docs

We currently can generate them but don't deploy them. We should do that and then remove the duplicate README docs.

Proposal: Add zip_longest

Julia's Base have an often-used zip function, which terminates as soon as any of the iterators are finished. Stopping when any of the iterators is finished, is one of the two natural stopping conditions, with the other being stopping when all the iterators are finished, and the finished iterators producing a sentinel value until this happens.

I propose adding this function to IterTools.jl, perhaps along the lines of:

zip_longest(iters..., default=nothing)
For one or more iterable objects, return an iterable of tuples, where the `i`th tuple
contains the `i`th component of each input iterable if it is not finished, and `default`
otherwise.

This proposed zip_longest is included in the Python base itertools library, which otherwise contain only a few functions, indicating that at least the Python developers agree this iteration function is particularly useful.

`distinct` version allowing arbitrary equality function

Would it make sense to add a version of distinct that allows users to pass a definition of equality as the first argument? I'm interested in using distinct on an iterator over floats, where I'm less interested in exact equality.

chain iteratorsize

This looks like a bug. The underlying iterator doesn't have length() defined.

julia> using IterTools
julia> type Iterable x::Float64 end;
julia> Base.done(x::Iterable, state) = state == 1;
julia> Base.next(x::Iterable, state) = (0, state+1);
julia> Base.start(x::Iterable) = 0;
julia> Base.iteratorsize(::Type{Iterable}) = Base.SizeUnknown()
julia> collect(chain(Iterable(0), Iterable(0)))
ERROR: MethodError: no method matching length(::Iterable)
julia> Base.iteratorsize{T}(::Type{IterTools.Chain{T}}) = Base.SizeUnknown()
WARNING: Method definition iteratorsize(Type{IterTools.Chain{T}}) in module IterTools at /Users/xxx/.julia/IterTools/src/IterTools.jl:1158 overwritten in module Main at REPL[14]:1.

julia> collect(chain(Iterable(0), Iterable(0)))
2-element Array{Any,1}:
 0
 0

count() function deprecated

I updated to IterTools v0.1.0 today, and got the following warning messages when loading DataFrames package. I am on Julia v0.6.0 windows 10.

WARNING: Method definition count() in module IterTools at deprecated.jl:56 overwritten in module Iterators at deprecated.jl:56.
WARNING: Method definition count(Number) in module IterTools at deprecated.jl:56 overwritten in module Iterators at deprecated.jl:56.
WARNING: Method definition count(Number, Number) in module IterTools at deprecated.jl:56 overwritten in module Iterators at deprecated.jl:56.

New method for subsets of fixed size?

I want to iterate over all unordered pairs in a collection A. Thanks to this package, the easiest way to do this is to use subsets(A,2), but this seems rather wasteful as it does not exploit that I know at compile time the size of the set. So I suggest to add a method subsets(xs,::Type{Val{k}}) which can do so.

I'd be happy to provide a PR, but I first wanted to check that there's a chance it will make it into this package. Also, it would be my first PR ever, so I would likely need some help along the way.

chain an iterator of iterators

Python's itertools has that - chain.from_iterable.

There was a pull request for Iterators.jl for that here: JuliaCollections/Iterators.jl#34

It is useful for a couple of reasons:

chain(xs...) splatting can be memory expensive
chain(xs...) dispatch can be extremely slow (but it seems this is fixed now, given that chain(xs...)=Chain(xs) - either way, there is still unnecessary splatting)

Here are some timing results:

to_be_chained = (x:x+1000 for x in 1:1000)

# The current chain (splatting, but maybe? no slow dispatch)
@time collect(chain(to_be_chained...));
  4.676737 seconds (5.24 M allocations: 15.082 GiB, 41.03% gc time)

# Lazy chain implemented with a generator comprehension
@time collect((el for iter in to_be_chained for el in iter));
 0.141397 seconds (29.84 k allocations: 10.547 MiB, 3.31% gc time)

Naively I was thinking I can use Chain directly to skip the splatting, but Chain explicitly requires a Tuple of iterables, not a generic iterator of iterables:

@time collect(IterTools.Chain(to_be_chained))
MethodError: Cannot `convert` an object of type Base.Generator{UnitRange{Int64},##193#194} to an object of type IterTools.Chain
This may have arisen from a call to the constructor IterTools.Chain(...),
since type constructors fall back to convert methods.

`iterated` with no seed

Would be nice to have a version of iterated with no seed that just returns the iterated function f composed n times. Leaving this here so I don't forget to make a PR when I have the time.

Collect subsets of tuple

Is this a bug in IterTools?

julia> using IterTools
julia> collect(subsets((0,2,6), 2))

ERROR: MethodError: Cannot `convert` an object of type Tuple{Int64,Int64} to an object of type Array{Int64,1}
...
Stacktrace:
 [1] setindex!(::Array{Array{Int64,1},1}, ::Tuple{Int64,Int64}, ::Int64) at ./array.jl:767
 [2] copyto!(::Array{Array{Int64,1},1}, ::IterTools.Binomial{Tuple{Int64,Int64,Int64}}) at ./abstractarray.jl:671
 [3] _collect at ./array.jl:550 [inlined]
 [4] collect(::IterTools.Binomial{Tuple{Int64,Int64,Int64}}) at ./array.jl:544
 [5] top-level scope at none:0

This works though:

julia> collect(x for x in subsets((0,2,6), 2))
3-element Array{Tuple{Int64,Int64},1}:
 (0, 2)
 (0, 6)
 (2, 6)

Julia 1.1
IterTools 1.3

partition function type stability for constant arguments

I frequently like to use partition for a sliding window over a fixed array, like partition(arr, 2, 1) to get a set of pairs. The issue is that this code is type unstable with the current version of the code. kristoffer.carlsson
gave a solution that inlines pieces, but mentions it might not be optimal in general (https://discourse.julialang.org/t/itertools-jl-partition-type-stability/13847)

If there is a way to fix this common use case would be super awesome!

`eltype` of partition not inferred, mention in documentation

I had assumed constant propagation would make these two functions f and g generate identical code.

function f(a)
    s = 0
    for (a,b) in IterTools.partition(a, 2, 1)
        s += sum(abs.(b .- a))
    end
    return s
end

function g(a)
    s = 0
    for (a,b) in IterTools.Partition{typeof(a), 2}(a, 1)
        s += sum(abs.(b .- a))
    end
    return s
end

However

julia> @code_warntype f([1,2,3])
Variables
  #self#::Core.Compiler.Const(f, false)
  a@_2::Array{Int64,1}
  s::Int64
  @_4::Union{Nothing, Tuple{Tuple{Vararg{Int64,N} where N},Tuple{Int64,Array{Int64,1}}}}
  b::Int64
  @_6::Int64
  a@_7::Int64

Body::Int64
1 ─       (s = 0)
│   %2  = IterTools.partition::Core.Compiler.Const(IterTools.partition, false)
│   %3  = (%2)(a@_2, 2, 1)::IterTools.Partition{Array{Int64,1},_A} where _A
│         (@_4 = Base.iterate(%3))
│   %5  = (@_4 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_4::Tuple{Tuple{Vararg{Int64,N} where N},Tuple{Int64,Array{Int64,1}}}::Tuple{Tuple{Vararg{Int64,N} where N},Tuple{Int64,Array{Int64,1}}}
│   %9  = Core.getfield(%8, 1)::Tuple{Vararg{Int64,N} where N}

[...]

where as @code_warntype g([1,2,3]) shows no red.

I tried this on Julia 1.3 and 1.4. It seems like something which Julia could fix, but until then I'm wondering if it's beneficial to include a note in the documentation for partition.

fix doctests

#104 disabled doctests
just to get docs working again.

But it would be good to go and renable them

Docstring for iterate is misleading

The docstring for iterate(f, x) shows the sequence starting with f(x) but the sequence actually starts with x

[Feature Request] Sliding Windows

Given a sequence c_1, c_2, ..., c_n, ... which are outputs of a (possibly infinite) iterator. The windows(c, k) should return an iterator with first entry [c_1, c_2, ..., c_k], second entry [c_2, c_3, ..., c_{k+1}] and so on.

The name is from a common technique called sliding window, mentioned here https://en.cppreference.com/w/cpp/ranges/slide_view .

Consider adding more iterators inspired by PyToolz

http://toolz.readthedocs.io/en/latest/api.html

[proposal] new ijoin function , an iterable join

Hi. This is a proposal for a new feature based on the declension of the well known join function with an iterable arg.

import Base.Iterators as _I

struct _IJoin
    s
    delim
end

ijoin(s, delim) = _isijoinable(s) ? _IJoin(s, delim) : throw(ArgumentError("s = $s"))

_isijoinable(s) = _isijoinable(Base.IteratorSize(s))
_isijoinable(::Base.HasLength) = true
_isijoinable(::Base.HasShape{N}) where {N} = N==1
_isijoinable(::Base.IteratorSize) = true

Base.IteratorSize(a::_IJoin) = Base.IteratorSize(a.s)
Base.length(a::_IJoin) = 2*length(a.s)-1
Base.size(a::_IJoin) = let (s1,)=size(a.s); 2*s1-1 end

Base.iterate(a::_IJoin) =
    let y=iterate(a.s); y!==nothing ? (y[1], (:yld2,y[2])) : nothing end
Base.iterate(a::_IJoin, (st,sst)) = _iterate(a, Val(st), sst)
_iterate(a::_IJoin, ::Val{:yld2}, sst) =
    let y=iterate(a.s, sst); y!==nothing ? (a.delim, (:yld3,sst)) : nothing end
_iterate(a::_IJoin, ::Val{:yld3}, sst) =
    let y=iterate(a.s, sst); y!==nothing ? (y[1], (:yld2,y[2])) : nothing end

using Test
import Base.Iterators as _I

@test ijoin(["foo", "bar", "baz"], ".") |> collect == ["foo",  ".", "bar",  ".",  "baz"]
@test ijoin([1,2,3],0) |> collect == [1,0,2,0,3]
@test ijoin((1,2,3),0) |> collect == [1,0,2,0,3]
@test ijoin(1:5,0) |> collect == [1,0,2,0,3,0,4,0,5]
@test _I.dropwhile(<(3),1:5) |> s->ijoin(s,0) |> collect == [3,0,4,0,5]
@test map(ijoin(1:5, 0)) do e; 2e end == [2,0,4,0,6,0,8,0,10]
@test_throws ArgumentError ijoin(fill(1,2,3),0)

# this composes nicely
@test map(ijoin(_I.dropwhile(<(3),1:5), 0)) do e; 2e end |> collect == [6, 0, 8, 0, 10]

# more readably
@test 1:5 |>
    s -> _I.dropwhile(<(3),s) |>        # == [3, 4, 5]
    s -> ijoin(s, 0) |>                 # == [3, 0, 4, 0, 5]
    s -> map(s) do e; 2e end |>         # == [6, 0, 8, 0, 10]
    collect ==
    [6, 0, 8, 0, 10]

This will be useful to work on collections

and i dont like s -> _I.append!([first(s)], _I.flatten(zip(_I.cycle(0), _I.tail(s)))) . :)

IteratorSize(::TakeWhile)

I guess it should be IteratorSize(::Type{TakeWhile}) there https://github.com/JuliaCollections/IterTools.jl/blob/master/src/IterTools.jl#L910

MethodError using IterTools.product

in v0.6:

julia> using IterTools

julia> R = Iterators.repeated([1, -1], 2)
Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}(Base.Iterators.Repeated{Array{Int64,1}}([1, -1]), 2)

julia> collect(R)
2-element Array{Array{Int64,1},1}:
 [1, -1]
 [1, -1]

julia> IterTools.product(R)
IterTools.Product{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}}}((Base.Iterators.Take{Base.Iterators.Repeated{Array{
Int64,1}}}(Base.Iterators.Repeated{Array{Int64,1}}([1, -1]), 2),))

julia> collect(IterTools.product(R))
2-element Array{Tuple{Array{Int64,1}},1}:
 ([1, -1],)
 ([1, -1],)

in v0.7.0-beta2.0

julia> using IterTools

julia> R = Iterators.repeated([1, -1], 2)
Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}(Base.Iterators.Repeated{Array{Int64,1}}([1, -1]), 2)

julia>  collect(R)
2-element Array{Array{Int64,1},1}:
 [1, -1]
 [1, -1]

julia>  IterTools.product(R)
┌ Warning: `product(xss...)` is deprecated, use `Iterators.product(xss...)` instead.
│   caller = top-level scope at none:0
└ @ Core none:0
Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}}}((Base.Iterators.Take{Base.Iterators.Re
peated{Array{Int64,1}}}(Base.Iterators.Repeated{Array{Int64,1}}([1, -1]), 2),))

julia>  collect(IterTools.product(R))
┌ Warning: `product(xss...)` is deprecated, use `Iterators.product(xss...)` instead.
│   caller = top-level scope at none:0
└ @ Core none:0
ERROR: MethodError: no method matching isless(::Array{Int64,1}, ::Int64)
Closest candidates are:

  isless(::Missing, ::Any) at missing.jl:66
  isless(::AbstractFloat, ::Real) at operators.jl:150
  isless(::Real, ::Real) at operators.jl:338
  ...
Stacktrace:
 [1] <(::Array{Int64,1}, ::Int64) at ./operators.jl:260
 [2] <=(::Array{Int64,1}, ::Int64) at ./operators.jl:309
 [3] isdone(::Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}, ::Tuple{Array{Int64,1},Tuple{Int64,Nothing}}) at ./iterators.j
l:587
 [4] _pisdone at ./iterators.jl:805 [inlined]
 [5] isdone at ./iterators.jl:811 [inlined]
 [6] iterate at ./iterators.jl:843 [inlined]
 [7] copyto! at ./abstractarray.jl:658 [inlined]
 [8] _collect(::UnitRange{Int64}, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}}}, :
:Base.HasEltype, ::Base.HasShape{1}) at ./array.jl:538
 [9] collect(::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Array{Int64,1}}}}}) at ./array.jl:532
 [10] top-level scope at none:0

could you confirm that this is a unresolved issue? or i need to upgrade some of the packages in my workflow? thanks.

Proposal: filter_map, flat_map, findfirst_map, inspect, fuse

It would be better if we have some major iterator functions which Rust has.
Rust std::iter::Iterator

Especially, I'm eager for filter_map and flat_map.
(I personally think that filter_map and flat_map should better be in Base.Iterators...)

Enumerable module in ruby 2.7 will have flat_map and filter_map.

Use a project for building docs

see https://discourse.julialang.org/t/psa-use-a-project-for-building-your-docs/14974

Import Base.Iterators when IterTools is used

If possible it would be nice if Base.Iterators-functions were imported automatically for cleaner code. Since lines like collect(Iterators.flatten(chain(1:9,reverse(11:20),10) would look cleaner if it was just flatten rather than Iterators.flatten.
When people import IterTools it seems to be clear that they intend to use them (alot) or like to go for functional programming and thus simplifying it sounds like a good idea to me.

`iterated` performance

The iterated iterator is very slow in simple cases due to type instability, and inability to inline the function being iterated.

For example @time IterTools.nth(IterTools.iterated(x -> x + 1, 0.), 10_000_000) gives 2.791875 seconds (50.01 M allocations: 916.153 MiB, 4.42% gc time) on my machine.

One possible fix is to parametrise the Iterated struct by the function being used, so that iterate is specialised for every such function (although I'm new to Julia, so unsure if this would be considered idiomatic or an ugly hack).

For example

struct MyIterated{T, f}
    seed::T
    function MyIterated(f::Function, seed::T) where T
        new{T, f}(seed)
    end
end
Base.IteratorSize(::Type{<:MyIterated}) = Base.IsInfinite()
Base.IteratorEltype(::Type{<:MyIterated}) = Base.EltypeUnknown()

my_iterated(f, seed) = MyIterated(f, seed)

Base.iterate(it::MyIterated) = (it.seed, it.seed)
function Base.iterate(it::MyIterated{T, f}, state) where {T, f}
    newval = f(state)
    return (newval, newval)
end

Then @time IterTools.nth(my_iterated(x -> x + 1, 0.), 10_000_000) gives 0.013682 seconds, which is as fast as a hand-written loop.

Proposal: Add ShiftIterator to offset a given iterator with defined length

Would you be interested in a PR with the iterator below? It accepts both positive as well as negative offsets:

struct ShiftIterator{I}
 itr::I
 offset::Int
 length::Int
 start::Int
end

function ShiftIterator(itr, offset)
 _, s = iterate(itr)
 start = s - 1
 len = length(itr)
 off = offset ≥ 0 ? offset : len + offset
 ShiftIterator{typeof(itr)}(itr, off, len, start)
end

function Base.iterate(itr::ShiftIterator, state=1)
 if state > itr.length
   nothing
 else
   s = ((state + itr.offset - 1) % itr.length) + itr.start
   item, _ = iterate(itr.itr, s)
   item, state + 1
 end
end

Base.length(itr::ShiftIterator) = itr.length

If yes, where should it go?

IterTools.pop should use Base.front instead of implementing it inline in a hackish way

This line reimplements Base.front in a presumably suboptimal way:

IterTools.jl/src/IterTools.jl

Line 589 in 831bbd2

pop(t::NTuple) = reverse(tail(reverse(t))), t[end]

I guess this would matter for the performance of bigger tuples, that is, bigger values of k in subsets(collection, Val{k}()).

The fix is simple, just use Base.front. However, this package seems to still support Julia 1.0, and even Julia 0.7, and Base.front may have been introduced after that, because I can only find it on https://docs.julialang.org starting with Julia 1.2.

Remove at-itr macro

Reasons:

There are exactly three uses in the wild, only one of them is a package, and it's not registered
It's hard to maintain and I don't know how to fix it for 0.7
The only reason to use it is performance and I can't see a performance benefit on 0.6

julia> using IterTools
INFO: Recompiling stale cache file /Users/ericdavies/.julia/lib/v0.6/IterTools.ji for module IterTools.

julia> function f1(it)
           sum = zero(typeof(first(it)))
           @itr for (i, el) in enumerate(it)
               sum += i * el
           end
           return sum
       end
f1 (generic function with 1 method)

julia> function f2(it)
           sum = zero(typeof(first(it)))
           for (i, el) in enumerate(it)
               sum += i * el
           end
           return sum
       end
f2 (generic function with 1 method)

julia> x = rand(10000);

julia> using BenchmarkTools

julia> @benchmark f1($x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.155 μs (0.00% GC)
  median time:      12.556 μs (0.00% GC)
  mean time:        13.182 μs (0.00% GC)
  maximum time:     113.474 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark f2($x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.157 μs (0.00% GC)
  median time:      11.194 μs (0.00% GC)
  mean time:        13.010 μs (0.00% GC)
  maximum time:     2.441 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x = rand(10);

julia> @benchmark f1($x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.205 ns (0.00% GC)
  median time:      10.359 ns (0.00% GC)
  mean time:        10.824 ns (0.00% GC)
  maximum time:     74.991 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark f2($x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.205 ns (0.00% GC)
  median time:      10.374 ns (0.00% GC)
  mean time:        11.055 ns (0.00% GC)
  maximum time:     136.026 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

Support random `partition` ?

Like:

random_partition(xs, n)

a=[1,2,3,4,5,6]
for i in random_partition(a)
    @show i
end
# result
# i = (3, 2)
# i = (5, 1)
# i = (4, 6)

Iterator slicing/indexing

Is there a way to slice/index arbitrary iterator in Julia? Can you add a function ("slice") that can do that? It's like Python's itertools.islice and it does

silce(finite_iterator, indices) == collect(finite_iterator)[indices]

but it also works with infinite iterator and indices, provided that indices are strictly increasing (otherwise you need to cache the whole history). I actually coded up a quick implementation:

struct Slice{I, S}
    iter::I
    slice::S
end
const slice = Slice

mutable struct SliceState
    index::Int
    iter_state
    slice_state
    iter_value
end

Base.start(it::Slice) =
    SliceState(0, start(it.iter), start(it.slice), nothing)

function Base.done(it::Slice, state)
    done(it.slice, state.slice_state) && return true

    local x
    i = state.index
    n, ss = next(it.slice, state.slice_state)
    state.slice_state = ss
    si = state.iter_state
    @assert i < n
    while i < n
        if done(it.iter, si)
            return true
        end
        x, si = next(it.iter, si)
        i += 1
    end
    state.iter_state = si
    state.iter_value = x
    state.index = i

    return false
end

Base.next(it::Slice, state) = (state.iter_value, state)
Base.iteratorsize(::Type{<: Slice}) = Base.SizeUnknown()

using Base.Test
# @show collect(slice(1:100, 1:2:5))
@test collect(slice(1:100, 1:2:5)) == collect(1:2:5)
@test collect(slice(1:5, 1:2:10)) == collect(1:2:5)
@test collect(slice(1:10, [1, 3, 7, 11])) == [1, 3, 7]

Side notes: Besides obvious lack of documentation and good error message (rather than @assert), the most smelly part of the code is the mutable SliceState and the calls to the next of the underlying iterators. But it seems that this is the usual way to solve this kind of problem for Julia <= 0.6. I guess it's going to be easier with the new iteration protocol for Julia 1.0.

Can IterTools.jl and/or Base already do it? Am I missing something? If not, do you think it is a good API to add in IterTools.jl? I can make a PR with a bit more refined code if you like.

A proposal for `takeuntil`

The current takewhile is very convient to use. But for cases in which I also want to take the last value which fails the condition, it cannot do what I want. Hence I'm proposing to add a function called takeuntil, which baiscally do something similar but also include the value which first makes the function false. This is the code snippet for what I propose.

struct TakeUntil{I}
    cond::Function
    xs::I
end

"""
    takeuntil(cond, xs)
An iterator that yields values from the iterator `xs` as long as the
predicate `cond` is true. Unlike `takewhile`, it also take the last 
value for which the predicate `cond` is false.
'''jldoctest
julia> collect(takeuntil(x-> x^2 < 10, 1:100))
3-element Array{Int64,1}:
 1
 2
 3
 4
'''
"""
takeuntil(cond, xs) = TakeUntil(cond, xs)

function Base.iterate(it::TakeUntil, state=(false, nothing))
    is_cond, state_xs = state
    is_cond && return nothing
    (val, state_xs) = 
        @ifsomething (state_xs === nothing ? iterate(it.xs) : iterate(it.xs, state_xs))
    val, (it.cond(val), state_xs)
end

Base.IteratorSize(it::TakeUntil) = Base.SizeUnknown()
Base.eltype(::Type{TakeUntil{I}}) where {I} = eltype(I)
IteratorEltype(::Type{TakeUntil{I}}) where {I} = IteratorEltype(I)

If this looks good, I can fire a PR for this.

Doc - difference between Base.Iterators.product and IterTools.product

Hello,

Maybe we should add in doc an example showing difference between Base.Iterators.product and IterTools.product

julia> xss = 1:3, 5:7, 10:11
(1:3, 5:7, 10:11)

julia> collect(Base.Iterators.product(xss...))
3×3×2 Array{Tuple{Int64,Int64,Int64},3}:
[:, :, 1] =
 (1, 5, 10)  (1, 6, 10)  (1, 7, 10)
 (2, 5, 10)  (2, 6, 10)  (2, 7, 10)
 (3, 5, 10)  (3, 6, 10)  (3, 7, 10)

[:, :, 2] =
 (1, 5, 11)  (1, 6, 11)  (1, 7, 11)
 (2, 5, 11)  (2, 6, 11)  (2, 7, 11)
 (3, 5, 11)  (3, 6, 11)  (3, 7, 11)

julia> collect(IterTools.product(xss...))
18-element Array{Tuple{Int64,Int64,Int64},1}:
 (1, 5, 10)
 (2, 5, 10)
 (3, 5, 10)
 (1, 6, 10)
 (2, 6, 10)
 (3, 6, 10)
 (1, 7, 10)
 (2, 7, 10)
 (3, 7, 10)
 (1, 5, 11)
 (2, 5, 11)
 (3, 5, 11)
 (1, 6, 11)
 (2, 6, 11)
 (3, 6, 11)
 (1, 7, 11)
 (2, 7, 11)
 (3, 7, 11)

Kind regards

Differences in `Base.Iterators.partition` and `IterTools.partition`

There are some differences between Base.Iterators.partition and IterTools.partition. I brought it up on Slack and @oxinabox asked me to file an issue.

Base.Iterators.partition returns each chunk as an Array, whereas IterTools.partition returns a Tuple
In the case that the last chunk is smaller than the requested partition size, Base.Iterators.partition returns a shorter chunk, whereas IterTools.partition drops the last values.
IterTools.partition accepts a step argument that determines the hop size for each chunk.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Feature request: "row-major" product()

Currently product() iterates over its parameters in a "column-major" order (I guess, to be consistent with the order of array indices). I believe it would be useful to be able to iterate in a row-major order as well (that is, the first iterable changes the slowest). Of course, one can piggy-back on product, passing it reverse()ed arguments, and reverse()ing back the results of next(), but it makes the iteration several times slower.

Logic-wise, the only thing that will need to change is the iteration in next() — from for i in 1:n to for i in n:-1:1. I am not sure how to handle the eltype() properly. Perhaps a parametrization of Product on a value type is needed?

juliacollections / itertools.jl Goto Github PK

itertools.jl's Introduction

IterTools.jl

Installation

itertools.jl's People

Contributors

Stargazers

Watchers

Forkers

itertools.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org