Giter VIP home page Giter VIP logo

Comments (16)

9il avatar 9il commented on August 28, 2024

auto b = 6.iota.array.sliced(2, 3);

Current version is already clear auto b = 6.iota.sliced(2, 3).slice;

from mir.

9il avatar 9il commented on August 28, 2024

for ndslice module common range is Slice ;)

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

Current version is already clear auto b = 6.iota.sliced(2, 3).slice;

... but not efficient as 6.iota.sliced(2, 3).slice gets translated to 6.iota.sliced(2, 3).byElement.array.sliced!replaceArrayWithPointer(slice.shape) - would be nice if we could avoid the duplicate allocation. slice already allocates, we just want to fill it.

from mir.

9il avatar 9il commented on August 28, 2024

No, it is already fast

from mir.

9il avatar 9il commented on August 28, 2024

see implementation

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

would be nice if we could avoid the duplicate allocation. slice already allocates, we just want to fill it.

Yeah just saw it - ignore my complaints for now. sorry.

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

It does make a huge difference - m1 and m3 take roughly twice the time (I guess the 20% are the overhead created from byElement) or am I missing something here?

fastest: 1
rel. diff for 0: 1.871
rel. diff for 1: 1.000
rel. diff for 2: 1.855
All unit tests have been run successfully.
dub test  21.39s user 0.38s system 99% cpu 21.772 total
auto m1()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice, sliced;
    auto r = iota(k).sliced(s1, s2).slice;
    assert(r[9, 9] == k - 1);
}

auto m2()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice;
    import mir.ndslice.selection: byElement;
    auto r = slice!ulong(s1, s2);
    auto range = iota(k);

    foreach(ref el; r.byElement)
    {
        el = range.front;
        range.popFront;
    }

    assert(r[9, 9] == k - 1);
}

auto m3()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice, sliced;
    import mir.ndslice.selection: byElement;
    auto r = slice!ulong(s1, s2);
    auto range = iota(k);

    r[] = range.sliced(s1, s2);

    assert(r[9, 9] == k - 1);
}

unittest
{
    auto n = 1_000_000;

    import std.datetime: benchmark;
    auto result = benchmark!(m1, m2, m3)(n);

    import std.stdio;
    import std.range: enumerate;
    import std.algorithm: map, minPos;

    auto rs = [result[0], result[1], result[2]].map!`a.length`;
    auto minEl = rs.enumerate.minPos!`a.value < b.value`.front;

    writeln("fastest: ", minEl.index);

    foreach (i, el; rs.enumerate)
        writefln("rel. diff for %d: %.3f", i , el * 1. / minEl.value);
}

from mir.

9il avatar 9il commented on August 28, 2024

LDC results?

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

LDC results?

nope - dmd, but with ldc it's still measurable!.

fastest: 1
rel. diff for 0: 1.225
rel. diff for 1: 1.000
rel. diff for 2: 1.273
All unit tests have been run successfully.
dub test --compiler=ldc  21.21s user 0.88s system 99% cpu 22.208 total

from mir.

9il avatar 9il commented on August 28, 2024

add --build=unittest-nobounds

from mir.

9il avatar 9il commented on August 28, 2024

add --build=unittest-nobounds

add --build=release-nobounds

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

add --build=release-nobounds

ldc: Unknown command line argument '-disable-boundscheck'. Try: 'ldc -help'
ldc: Did you mean '-disable-dfa-sched'?

from mir.

9il avatar 9il commented on August 28, 2024

ok, lets just --build=release

from mir.

wilzbach avatar wilzbach commented on August 28, 2024

done - I was shocked a bit initially because the runtime went done from 20s to less than a second, but I added a global variable (which is printed) to add the result of the verification assert.

fastest: 2
rel. diff for 0: 2.079
rel. diff for 1: 1.780
rel. diff for 2: 1.000
true
dub --compiler=ldc --build=release  28.88s user 0.09s system 100% cpu 28.958 total

I verified the output by running in multiple orders.

Source

from mir.

9il avatar 9il commented on August 28, 2024

oh, I thought the implementation is differ. Thanks, will be fixed

from mir.

9il avatar 9il commented on August 28, 2024

Fixed

from mir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.