Giter VIP home page Giter VIP logo

Comments (30)

trusktr avatar trusktr commented on August 18, 2024 1

Out of curiosity, what are you guys planning to make (or have made) with asm.js + StringView?

from validator.

michaelficarra avatar michaelficarra commented on August 18, 2024
  1. Why is the length member signed? How can we create a negative-length string?
  2. How do we guarantee charCodeAt resolves to the original String.prototype.charCodeAt?

from validator.

espadrine avatar espadrine commented on August 18, 2024

Why is the length member signed? How can we create a negative-length string?

Thanks! Fixed.

How do we guarantee charCodeAt resolves to the original String.prototype.charCodeAt?

Similarly to the check for a bogus global, if String.prototype.charCodeAt is altered, it would default to interpreting the asm.js module, instead of using the compiled version.

from validator.

michaelficarra avatar michaelficarra commented on August 18, 2024

@espadrine: How do you statically check that String.prototype.charCodeAt is altered?

from validator.

espadrine avatar espadrine commented on August 18, 2024

@michaelficarra How do you statically check that window.Math.sqrt is altered?

Those are part of the runtime checks at linking time.

from validator.

michaelficarra avatar michaelficarra commented on August 18, 2024

So you'd like to add it to this list?

from validator.

espadrine avatar espadrine commented on August 18, 2024

So you'd like to add it to this list?

No. It isn't meant to be a function call in the standard library.
However, it would add a field in that list.

from validator.

kripken avatar kripken commented on August 18, 2024

The problem is that the prototype can change after linking. We avoid that with standard library stuff by saving them in the asm closure. But if we call "string".charCodeAt later on, the String prototype might have been changed in the meantime.

This isn't the only challenge here - adding this means support in the + operator, presumably. And also it means we can have GC'd objects in asm.js.

None of which is impossible, but the question is the motivation. if it's just efficient string processing, we should measure that first.

Regarding string efficiency, there is an idea to do a StringView for typed arrays. Basically a typed array is a view into an ArrayBuffer, and a StringView would view the same buffer but present it as string data (C-style null-terminated). If string performance is a concern, this might be worth investigating too.

from validator.

espadrine avatar espadrine commented on August 18, 2024

The problem is that the prototype can change after linking. We avoid that with standard library stuff by saving them in the asm closure.

Hmm, I see. Can we add it to the standard library, then, like @michaelficarra suggested?

The call can look like stdlib.String.prototype.charCodeAt.call(str, index).

This isn't the only challenge here - adding this means support in the + operator, presumably.

I would view such a construct as immutable. The length of the string doesn't change, its content doesn't either.
Heavy-lifting parsing operations on huge strings usually don't involve string concatenation.

I am not sure how I feel about StringView for two reasons:

  1. It doesn't exist yet,
  2. We can already easily convert a string into an Uint16Array(str.length) and pass it to asmjs code. However, this conversion cannot be optimized. Flattening a normal JS string into an efficient form intuitively sounds like it can be optimized to be faster.

That said, supporting a wilder collection of string operations than simply reading a character at a given index, built-in, can be nice. I'm just really not sure we can be faster than normal JS there.

from validator.

ScatteredRay avatar ScatteredRay commented on August 18, 2024

We can already easily convert a string into an Uint16Array(str.length) and pass it to asmjs code. However, this conversion cannot be optimized. Flattening a normal JS string into an efficient form intuitively sounds like it can be optimized to be faster.

Actually, let's think about this for a bit, is it possible, that we could optimize specific instances of UInt16Array conversion and back to work efficientlly, and without conversion? I mean intArray[i] looks awfully similar to str.charCodeAt(i)

from validator.

timmutton avatar timmutton commented on August 18, 2024

I believe introducing string support would be beneficial at least for lljs. As it stands, even a basic "hello, world" fails to validate using James Long's lljs fork, furthermore if you were to do any webgl using lljs that would also fail to validate due to the shaders requiring strings.

If there were support for fixed-length strings, and possibly support in stdlib for string generics (which could be done as a shim in other browsers like Math.imul) that would cover a lot of basic use cases

from validator.

jlongster avatar jlongster commented on August 18, 2024

Tim, while string support would be nice, you don't need it to use WebGL. You can load your shaders in js land and only do the computationally expensive stuff in asm.js. My cloth demo uses WebGL (http://jlongster.com/s/lljs-cloth/), you can see the whole program here: https://github.com/jlongster/lljs-cloth/blob/master/verlet.ljs

from validator.

timmutton avatar timmutton commented on August 18, 2024

You're completely right. I would like to be able to do a whole app in LLJS, but that would require lljs/asm supporting strings, or possibly being able to mark whether a lljs function/struct should use asm or not (which would likely introduce a whole host of other complications). For the time being your solution works very well though

from validator.

cscott avatar cscott commented on August 18, 2024

Note that strings are not usually implemented as a flat array of u16 under the hood. In order to support efficient string append, a linked data structure (such as "ropes") is usually used. So it's not necessarily straightforward to provide a view of the UTF16 data backing a string.

from validator.

martingala avatar martingala commented on August 18, 2024

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView

from validator.

timmutton avatar timmutton commented on August 18, 2024

Awesome, that looks really good

from validator.

martingala avatar martingala commented on August 18, 2024

@timmutton

Awesome, that looks really good

Thank you ;)
(I'm User:fusionchess, the author of that library... StringView is in alpha test for now!!)

from validator.

martingala avatar martingala commented on August 18, 2024

you can help me to find bugs ;)

from validator.

timmutton avatar timmutton commented on August 18, 2024

haha yeah I'd be more than happy to do so, does that mean that it's in nightly now?

from validator.

martingala avatar martingala commented on August 18, 2024

does that mean that it's in nightly now?

of course! I completed it on June, 6... ;)

EDIT: I fixed a bug just now!

from validator.

timmutton avatar timmutton commented on August 18, 2024

Fantastic! I'll give it a crack after work

from validator.

timmutton avatar timmutton commented on August 18, 2024

Sorry, when you say it's in nightly, do you mean I need to copy stringview.js from the page you linked and then use it, or do you mean I can just call a StringView from my code. The reason I ask is because I've updated nightly and Im getting a reference error

from validator.

martingala avatar martingala commented on August 18, 2024

You need to copy stringview.js from the page I linked...!
(sorry)

P.S. Look at the revision... when I change something I update the revision number...:

StringView - Mozilla Developer Network - revision #3

Bye :)

from validator.

timmutton avatar timmutton commented on August 18, 2024

Fantastic. Been playing with it for a little, looks like it has potential. Will have to wait until it works with asm to know for sure (or if it does, an example would be great because I can't get it to validate)

from validator.

martingala avatar martingala commented on August 18, 2024

Great :)
I think I will change the method StringView.prototype.makeIndex() soon ( https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView#StringView.prototype.makeIndex%28%29 ):

  1. renaming its name to "StringView.prototype.getLength()"
  2. or changing its return to an index from zero rather than a raw length from skipOffsetIndex...
    Send your suggestions if you have ;)

Good hacks ;)

P.S. I also don't like very much the name of the property "stringView.bufferView". Have you any name suggestion alternative to "bufferView"? :)

from validator.

cscott avatar cscott commented on August 18, 2024

Quick review:
a) this discussion is veering offtopic for asm.js; you should open a bugzilla for StringView first, and then once consensus on that has been reached, you can raise the issue of StringView in asm.js. That said, there's no apparent issue with invoking StringView methods as foreign functions and/or using asm.js to access the backing storage directly.
b) you use the word 'characters' often in your documentation, which is rather misreading. You should try to make it very clear when you are referring to elements of the backing array (whether Uint8, Uint16, Uint32, etc) and when you mean codepoints (a collection 1-6 Uint8 elements for UTF8, 1-2 Uint16 elements for UTF16, 1 Uint32 element for UCS4, 1 Uint8 for ASCII, or something else).
c) Similarly, the methods called (eg) "toBase64" make the conversion unclear. Do you mean to return the base64-encoded string corresponding to the UTF8 encoding of the codepoints stored in the stringview? Or the base64 encoding of the UTF16 encoding of the codepoints? Or the base64 encoding of the "natural" contents of the backing array, in which case you need to specify whether little-endian or big-endian encoding of the backing array is expected.
d) In the introduction you claim the the library is "highly scalable". I think you mean "extensible".
e) I think you'd be better off creating a family of StringView subclasses, in the same way that Uint8, Uint16, etc are subclasses of ArrayBufferView. You'd then have UTF8StringView, UTF16StringView, UCS4StringView, etc. This would allow better optimization of the string view methods, instead of having to select one of a number of different implementations based on the underlying encoding.

from validator.

martingala avatar martingala commented on August 18, 2024

@cscott
Thank you for your review ;)
briefly...

a) this discussion is veering offtopic for asm.js; you should open a bugzilla for StringView first, and then once consensus on that has been reached, you can raise the issue of StringView in asm.js. That said, there's no apparent issue with invoking StringView methods as foreign functions and/or using asm.js to access the backing storage directly.

I made StringView as a generic API... asm.js is only one of its possible usage, I think...

b) you use the word 'characters' often in your documentation, which is rather misreading. You should try to make it very clear when you are referring to elements of the backing array (whether Uint8, Uint16, Uint32, etc) and when you mean codepoints (a collection 1-6 Uint8 elements for UTF8, 1-2 Uint16 elements for UTF16, 1 Uint32 element for UCS4, 1 Uint8 for ASCII, or something else).

Yes, sorry for my poor english, I'm italian ;) when I use the word "character" i mean "codepoint".

c) Similarly, the methods called (eg) "toBase64" make the conversion unclear. Do you mean to return the base64-encoded string corresponding to the UTF8 encoding of the codepoints stored in the stringview? Or the base64 encoding of the UTF16 encoding of the codepoints? Or the base64 encoding of the "natural" contents of the backing array, in which case you need to specify whether little-endian or big-endian encoding of the backing array is expected.

The return of toBase64() corresponds to the bytes of the stringView encoded into a base64 string, even when the stringView is UTF-16/UTF-32 encoded.

d) In the introduction you claim the the library is "highly scalable". I think you mean "extensible".

Yes ;) I'll try to improve the english of that page...

e) I think you'd be better off creating a family of StringView subclasses, in the same way that Uint8, Uint16, etc are subclasses of ArrayBufferView. You'd then have UTF8StringView, UTF16StringView, UCS4StringView, etc. This would allow better optimization of the string view methods, instead of having to select one of a number of different implementations based on the underlying encoding.

It is an idea. But it it would be only an "aesthetical" idea I think, because the only thing which would change would be some "if" statements...: instead of 'if (stringView.encoding === "UTF-8")' there will be something like 'if (stringView.constructor === UTF8StringView)'... etc... during conversions. And in some cases it is not important the encoding choosen, so I don't know if it would be a good idea to split the StringView constructor...

from validator.

cscott avatar cscott commented on August 18, 2024

I made StringView as a generic API... asm.js is only one of its possible usage, I think...

That's why I'm surprised that this discussion is taking place in the asm.js bugtracker.

The return of toBase64() corresponds to the bytes of the stringView

You need to specify endianness, then; the underlying ArrayBufferView leaves this undefined.

the only thing which would change would be some "if" statements

Virtual method dispatch is your friend.

from validator.

martingala avatar martingala commented on August 18, 2024

@cscott

That's why I'm surprised that this discussion is taking place in the asm.js bugtracker.

I haven't published that library elsewhere, so I haven't a bugtracker. I think that a good idea would be to move this discussion to my MDN discussion page...: https://developer.mozilla.org/en-US/docs/User_talk:fusionchess

You need to specify endianness, then; the underlying ArrayBufferView leaves this undefined.

Only one endian is supported: the one choosen by the JavaScript engine!

Virtual method dispatch is your friend.

My syntax cames from tradition... like in Java...: new OutputStreamWriter(System.out, "UTF-16");
I like it :P

from validator.

martingala avatar martingala commented on August 18, 2024

@cscott
P.S. I have updated the StringView page on MDN with your suggestions. I have also updated the makeIndex() method ( https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView#StringView.prototype.makeIndex%28%29 )
Bye :)

from validator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.