In the web, we use wasm-bindgens intern feature to ca

Perhaps I was nieve in thinking that the Java runtime could accept the byte arra

Web: custom string interning about dioxus HOT 8 CLOSED

dioxuslabs commented on May 13, 2024

Web: custom string interning

from dioxus.

Comments (8)

ealmloff commented on May 13, 2024 1

Somewhat terse response...I presume you are saying that converting from utf8 to utf16 and vice versa within Javascript is sufficiently performant. However, though that may be the case and indeed my suggestion might amount to premature optimization; nonetheless, avoiding the need for conversion in the first place seems inherently more performant. 😀

Sorry, I was trying to get something out in between meetings before I forgot. Should have read that again before I sent it. I hadn't tried that before. I meant it doesn't appear to make things faster. It seemed promising so I created a benchmark. I would have expected converting a utf16 array to a JavaScript string to be faster because you don't have to change encodings. If you run the benchmark in my last message it seems to be slower instead. I'm not entirely sure why? Maybe the uft16 version is just copying more memory?

Part of this is due to the types of string operations we do. I suspect that adding two strings already in JavaScript would be faster than adding them in rust and then converting the arrays to a JavaScript string but most of Dioxus' code would require the strings to start in rust code so it is difficult to avoid the conversation.

We do a few things to be faster with string conversion:

We dynamicly cache the most recently used string for elements, attributes, etc by hashing the pointer (as opposed to a prebaked list like this PR adds)
We combine all strings on the rust side into one long string and then only call decode once in the JavaScript side. This is the main thing that sledgehammer-bindgen does faster
We use hyristics to choose a string decoding algorithm

from dioxus.

dranikpg commented on May 13, 2024

This is an interesting issue to research. Its important to define 'good' performance and goals ahead: is it average or worst-case?

I've traced the example projects and some of my own pet projects. Almost all good candidates (by most frequent appearance) for interning already were part of the builtin predefined cache. This is generally because applications have few repetitive content (text and class names). I assume, that having a smart automatic interning algorithm would reduce overall performance.

However the rules above don't apply for applications with large lists, tables and highly repetitive content. Given, that we'll be tracing DomEdits, we can tell tags and attributes apart from user text and values. The former are probably already interned, the latter require more analysis. And this is an opportunity for reducing worst case performance (no predictions on average), so the main goal would be high adaptiveness and flexibility. For example, detecting that we're in the middle (or better at the start) of rendering a large table with repetitive text and styles.

web-sys is already looking up every string in a hashmap if string interning is enabled (if only we could supply it with pre-calculated hashes 😢).

A good approach with high adaptiveness would be having a sliding window counting the most frequent string among the last X requests. As soon as a string appears more than 2 or 3 times, it should be interned. And then easily be replaced if its not in the window. A bloom filter is great for tracking large amounts of data in a compact manner, but doesn't tell let you detect rapid changes or delete elements from it. I'll experiment around before giving any detailed results.

from dioxus.

jkelleyrtp commented on May 13, 2024

This is an interesting issue to research. Its important to define 'good' performance and goals ahead: is it average or worst-case?

I've traced the example projects and some of my own pet projects. Almost all good candidates (by most frequent appearance) for interning already were part of the builtin predefined cache. This is generally because applications have few repetitive content (text and class names). I assume, that having a smart automatic interning algorithm would reduce overall performance.

However the rules above don't apply for applications with large lists, tables and highly repetitive content. Given, that we'll be tracing DomEdits, we can tell tags and attributes apart from user text and values. The former are probably already interned, the latter require more analysis. And this is an opportunity for reducing worst case performance (no predictions on average), so the main goal would be high adaptiveness and flexibility. For example, detecting that we're in the middle (or better at the start) of rendering a large table with repetitive text and styles.

web-sys is already looking up every string in a hashmap if string interning is enabled (if only we could supply it with pre-calculated hashes 😢).

A good approach with high adaptiveness would be having a sliding window counting the most frequent string among the last X requests. As soon as a string appears more than 2 or 3 times, it should be interned. And then easily be replaced if its not in the window. A bloom filter is great for tracking large amounts of data in a compact manner, but doesn't tell let you detect rapid changes or delete elements from it. I'll experiment around before giving any detailed results.

It's a very exciting problem! We can do a lot of work to bridge the gap between Rust and JS frameworks until WASI types are supported and UTF8 JS strings are accepted naturally.

Our DomEdits are a tad naive in recognizing lists... hence why template support would be interesting.

I would love to see some prototypes of different heuristics. It might be worth wiring up the few benchmarks we have into CI/CD to get a sense of performance on every push and plot that over time.

Pre-hashing, bloom filters, tries, and other techniques would be very interesting to explore and I'd love to see any prototypes you might come up with. Hopefully the string cache code is simple enough to root around in.

from dioxus.

jkelleyrtp commented on May 13, 2024

This is a somewhat solved problem with templates now. We have a fixed list of strings and then the templates are always made once.

from dioxus.

gdennie commented on May 13, 2024

Any reason why you didn't simply create or use a utf16 crate such as utf16string?

Granted, the non-web platforms use utf8 but could that difference be resolved via a trait or type alias?

from dioxus.

ealmloff commented on May 13, 2024

Any reason why you didn't simply create or use a utf16 crate such as utf16string?

Granted, the non-web platforms use utf8 but could that difference be resolved via a trait or type alias?

It doesn't appear to be the case that converting from a utf16 string to a JavaScript string is faster https://jsbench.me/ptllv2ek5d/1. Sledgehammer bindgen helps with string conversion

from dioxus.

gdennie commented on May 13, 2024

Somewhat terse response...I presume you are saying that converting from utf8 to utf16 and vice versa within Javascript is sufficiently performant. However, though that may be the case and indeed my suggestion might amount to premature optimization; nonetheless, avoiding the need for conversion in the first place seems inherently more performant. 😀

It would also be a great demonstration and exploitation of Rust fantastic trait based abstraction mechanism.

In any event, you know best. I am really a novice. Thanks for your attention 🙂.

from dioxus.

gdennie commented on May 13, 2024

Perhaps I was nieve in thinking that the Javascript runtime could accept the byte array as a proper reference to a utf16 string without decoding or other gymnastics. If that were possible and it didn't perform any subsequent reallocation then I expect performance would be better. Of course, in the general case we would creating Javascript types in the interface buffer and perhaps bumping the specification of WASM, something that's out of scope.

Thanks again for your attention.

from dioxus.

Web: custom string interning about dioxus HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent