Giter VIP home page Giter VIP logo

Comments (19)

dot-asm avatar dot-asm commented on June 18, 2024 1

The problem is two-fold, a) initialization of SWIG private structures is not MT-safe, b) it's not sufficient to make init context-aware, it has to be even isolate-aware. First problem manifests as crashes at require time and can be mitigated by require-ing blst before creating workers. I have solution for it. Second problem customarily manifests as crash when a thread or whole script finishes. This one is being worked on. Duplicating .node file should indeed do the trick for both, but obviously suboptimal.

from blst.

dot-asm avatar dot-asm commented on June 18, 2024 1

@dot-asm Is your branch swig/swig#1973 safe to use in production?

I personally would say so, yes.

from blst.

dot-asm avatar dot-asm commented on June 18, 2024 1

Can you run this locally to confirm I'm not crazy?

I can run it, but I can't confirm you're not crazy:-):-):-) Not qualified, you know...:-):-):-)
But on serious note! For starters, I don't actually see 1ms:

0 main -> worker 0.7323275699999999 ms
0 worker -> main 0.11640998600000001 ms
1 main -> worker 0.429437391 ms
1 worker -> main 0.123062587 ms
2 main -> worker 0.338589356 ms
2 worker -> main 0.13179326133333333 ms
3 main -> worker 0.295062732 ms
3 worker -> main 0.1400632265 ms
...

However, I reckon that these are not actually representative numbers. First, it's appropriate to disregard first results, because you have to give JIT compiler a chance to kick in. Secondly, as you use setTimeout, and with increasing timeout value, you allow processor to ramp down operating frequency, which would manifest itself as operations appearing slower. This is apparently why main-to-worker averages slower than worker-to-main. I mean setTimeout falls asleep prior main-to-worker, frequency drops, but then, by the worker-to-main time, it's increasing... Either way, if I fix the frequency and discount results from the first iteration of the outer loop, I get following:

1 main -> worker 0.057152096 ms
1 worker -> main 0.060167767999999996 ms
2 main -> worker 0.06846348 ms
2 worker -> main 0.062043496 ms
3 main -> worker 0.07300487333333333 ms
3 worker -> main 0.06437548666666666 ms
...

In other words messages are passed in ~60Ξs, "Ξs" as in "microseconds," at the nominal processor frequency. It's still arguably higher than expected, but I wouldn't be surprised that it's an adequate result for the snippet in question. I mean I would imagine that the process.hrtime() output gets serialized to JSON, which would take its toll...

from blst.

dot-asm avatar dot-asm commented on June 18, 2024 1

Just in case for reference, I used node v14.16. No, it doesn't mean that I know if there were some relevant improvements, or if prior versions were "bad." It was just a convenient choice.

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

As I said yesterday. "There is a remaining issue with Workers, and it's being worked on." In other words, the problem is recognized and even understood, so you don't have to spend time on PoC, but be a little bit patient:-) Meanwhile require("blst") in the main script before you create workers, even if the main script doesn't use it, this minimizes chances of crash at startup...

from blst.

dapplion avatar dapplion commented on June 18, 2024

Thanks! Would you mind sharing a bit about your understatement of the problem?

Meanwhile require("blst") in the main script before you create workers, even if the main script doesn't use it, this minimizes chances of crash at startup...

A dirty work-around that seems to work consistently is to duplicate the .node file so each file is required by only one thread

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

@dapplion, could you test swig/swig#1973?

from blst.

dapplion avatar dapplion commented on June 18, 2024

@dapplion, could you test swig/swig#1973?

Didn't had a chance to test yet. Will update then, let me know if you have any updates too. Thanks!

from blst.

dapplion avatar dapplion commented on June 18, 2024

@dapplion, could you test swig/swig#1973?

Early tests seem to work fine! Thank you so much for pushing this 🎉 Will work on POC inside Lodestar to confirm it's all good

from blst.

dapplion avatar dapplion commented on June 18, 2024

@dot-asm Now that we have unlock the gates of parallelization, do you think it would be possible to send a pairing context between Javascript workers? The current SWIG produced Pairing class doesn't expose any way to get a clone-able data structure that is send-able in a Worker message.

  • Is the Pairing context "serializable" is some way? i.e. get some bytes as Buffer

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

This is what I customarily call a "wrong-attitude" question:-) When it comes to software, question shouldn't be if something is possible, but how to achieve the goal. So let's start with the goal. Or rather take next step from stated goal to send pairing context between workers. What is possible to send? I mean I don't actually know. Is it possible to send ArrayBuffer-s? To express it in more programmer-friendly manner would following be useful/usable in terms of Typescript prototype:

--- a/bindings/node.js/blst.hpp.ts
+++ b/bindings/node.js/blst.hpp.ts
@@ -190,7 +190,8 @@ export interface Pairing {
     aug?: bytes
   ): BLST_ERROR;
   commit(): void;
-  merge(ctx: Pairing): BLST_ERROR;
+  asArrayBuffer(): ArrayBuffer;
+  merge(ctx: Pairing | ArrayBuffer): BLST_ERROR;
   finalverify(sig?: PT): boolean;
 }
 

One can probably even say that merge could accept only ArrayBuffer...

from blst.

dapplion avatar dapplion commented on June 18, 2024

What is possible to send?

Primitive types, objects and arrays of primitive types and TypedArray instances (Buffer, ArrayBuffer, Uint8Array, etc). Stricter definition is arguments consumable by this function https://nodejs.org/api/v8.html#v8_v8_serialize_value

Is it possible to send ArrayBuffer

Yes!

To express it in more programmer-friendly manner would following be useful/usable in terms of Typescript prototype

This API sketch would work well yes.


Some questions to study the viability

  • Do you know what is the cost in CPU cycles (or some other metric) of turning the Pairing into an ArrayBuffer and vice-versa?
  • What is the expected size in bytes of a Pairing context (as a function of signatures aggregated if variable)

To preface my question I'm wondering what's the optimal strategy to validate a large enough amount of signatures in Javascript / Typescript. A naive approach is to just slice the signature set and divide into workers, then add the resulting booleans.

However, seeing that Go, Rust and Nim use the same strategy it would be worth studying the viability: thus my petition get the Pairing context as raw bytes.

It's important to note that in Javascript the communication between threads has significant lag, between 0.2 - 0.05 ms one way. If the size of the message is big enough serialization is slow and become a bottleneck too. However a SharedArrayBuffer can help for big payloads.

Maybe a better strategy would be to do the multi-threading in C land, inside the binding, where inter-thread communication is orders of magnitude faster and sharing memory is okay. So, do you know if it's possible to generate native binding for Node with SWIG capable of doing multi-threading themselves?

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

This API sketch would work well yes.

Then add this at the end of SWIGJAVASCRIPT_V8 section in blst.swg.

// allow Pairing context to be passed and merged as ArrayBuffer...
%extend blst::Pairing {
    blst::Pairing* asArrayBuffer() { return $self; }
}
%typemap(out) blst::Pairing* asArrayBuffer() {
    size_t sz = blst_pairing_sizeof();
    auto ab = v8::ArrayBuffer::New(v8::Isolate::GetCurrent(), sz);
    memcpy(ab->GetData(), $1, sz);
    $result = ab;
}
%typemap(in) (const blst::Pairing* ctx) %{
    if ($input->IsArrayBuffer()) {
        auto ab = v8::Local<v8::ArrayBuffer>::Cast($input);
        if (ab->ByteLength() < blst_pairing_sizeof())
            SWIG_exception_fail(SWIG_TypeError, "in method '$symname', "
                                                "<ArrayBuffer> is too short");
        $1 = reinterpret_cast<$1_ltype>(ab->GetData());
    } else {
        // based on SWIGTYPE* in $SWIG_LIB/typemaps/swigtype.swg
        void *argp = nullptr;
        int res = SWIG_ConvertPtr($input, &argp, $descriptor, $disown);
        if (!SWIG_IsOK(res)) {
            SWIG_exception_fail(SWIG_ArgError(res), "in method '$symname', "
                                                    "argument $argnum of type '$type'");
        }
        $1 = reinterpret_cast<$1_ltype>(argp);
    }
%}

If you confirm that it's working as expected, I'll push to the repository...

Do you know what is the cost in CPU cycles (or some other metric) of turning the Pairing into an ArrayBuffer and vice-versa?

I don't quite follow. If you want to be in the business of counting CPU cycles, how come you ended up on Javascript side? :-) :-) :-) But on serious note, I can't really afford delving into this to the depth you might expect. As you can see [above], the overhead cost is allocating some memory from V8 heap plus memcpy, and checking type and casting on the receiving side. And that's about as far as I'd go.

What is the expected size in bytes of a Pairing context (as a function of signatures aggregated if variable)

It's always ~3K. Formally speaking one can truncate commit-ted Pairing context to less than 1K, but I'd rather not go there, not without good reason.

from blst.

dapplion avatar dapplion commented on June 18, 2024

If you want to be in the business of counting CPU cycles, how come you ended up on Javascript side?

🙃🙃 I'll just say that I'm learning Rust..

In a serious note, our node spends between 50-80% of total CPU time doing BLS sig verification, so if we have to spend time optimizing something, this is a great target. Throwing all the work to a thread will be a massive improvement too.

the overhead cost is allocating some memory from V8 heap plus memcpy, and checking type and casting on the receiving side. And that's about as far as I'd go.

That's good enough thanks!

It's always ~3K.

Thanks, I don't think it would be a problem at all then, no need to truncate.


I'll report back when I can do a test with your SWIG snippet.

@dot-asm Is your branch swig/swig#1973 safe to use in production?

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

What's the standing with asArrayBuffer add-on interface?

On related note, my impression is that passing ArrayBuffer between threads doesn't actually involve any data duplication or serialization. Or rather I see no reason why it would have to. ArrayBuffer data storage appears to be allocated off V8 pool, and it can be detached and reattached. So that transfer of an ArrayBuffer can be achieved by passing a pointer to the off-pool storage...

from blst.

dapplion avatar dapplion commented on June 18, 2024

What's the standing with asArrayBuffer add-on interface?

On related note, my impression is that passing ArrayBuffer between threads doesn't actually involve any data duplication or serialization. Or rather I see no reason why it would have to. ArrayBuffer data storage appears to be allocated off V8 pool, and it can be detached and reattached. So that transfer of an ArrayBuffer can be achieved by passing a pointer to the off-pool storage...

Yeah that sounds right. I haven't had time to implement the transfer Pairing strategy, so can't comment on that front yet. I did some benchmarks and even tho transfering ArrayBuffer to threads is done with pointers, there's still a time cost in message passing, which is around 1ms per one-way transfer in a few machines I've tested.

Due to that the current strategy is to bundle as many signature sets as possible to reduce the communication between workers and the main thread without under-utilizing the workers.

from blst.

dot-asm avatar dot-asm commented on June 18, 2024

there's still a time cost in message passing, which is around 1ms per one-way transfer in a few machines I've tested.

"1ms" as in "millisecond"? If so, then it doesn't sound right. LAN is faster than that. If real, then node should be notified...

from blst.

dapplion avatar dapplion commented on June 18, 2024

there's still a time cost in message passing, which is around 1ms per one-way transfer in a few machines I've tested.

"1ms" as in "millisecond"? If so, then it doesn't sound right. LAN is faster than that. If real, then node should be notified...

😆 yeah it's ridiculous but that's the state of the art. Can you run this locally to confirm I'm not crazy? https://github.com/ChainSafe/blst-ts/blob/master/benchmark/workerMessagePassing.js

from blst.

dapplion avatar dapplion commented on June 18, 2024

can run it, but I can't confirm you're not crazy:-):-):-) Not qualified, you know...:-):-):-)

Actually, you are super qualified to disregard my initial results! Thanks for taking the time to analyze it, definitely now things make sense. Happy to know the latency is low in practice.

from blst.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google âĪïļ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.