Giter VIP home page Giter VIP logo

Comments (12)

PureFox48 avatar PureFox48 commented on June 20, 2024 2

Well, we can do something like that if you want to keep the line count down but it's hardly making the code easier to read.

Personally, I'll be surprised if String.compareTo is used directly very much anyway. I think people will just use the standard comparison operators instead which, of course, will call it under the hood.

Going off topic slightly, I think a better use for a new operator such as <=> would be as a 'swap' function. So we'd have:

var a = 1
var b = 2
a <=> b
System.print([a, b]) // [2, 1]

Currently, we have nothing like this in the language for scalars and it's impossible to write one as we don't have parameter passing by reference. You either have to use a temporary variable (3 lines) or (for numbers) a one line hack such as::

var a = 1
var b = 2
b = a + b - (a = b)
System.print([a, b]) // [2, 1]

I noticed recently that the supercomputer language, Chapel, was using <=> for this purpose so we'd be in good company :)

Python, Ruby and Lua use multiple assignment for swapping but I don't like that solution much as it introduces problems of its own.

from wren.

mhermier avatar mhermier commented on June 20, 2024 1

You can implement compareTo in C for better performance and to handle handle BOM prelude.

But I would go optimize even furzer, and directly use strcmp. Since regular UTF-8 encoding is compatible with strcmp (ignoring some edge cases like BOM, characters with multiple encoding, sub-optimal encoding...), it should provide a pretty reliable and fast comparison.

from wren.

mhermier avatar mhermier commented on June 20, 2024 1

Thinking at it again we might need to use memcmp instead. Since Strings known their size and since \0 is a legal UTF-8 char, I think it is a better solution. That way, we can also compare strings holding binary data.

from wren.

mhermier avatar mhermier commented on June 20, 2024 1

We can use something more compact, by doing something like:

// Use a variation of https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign
#define intcmp(a, b) (((a) > (b)) - ((a) < (b)))

RETURN_NUM(res != 0 ? res : intcmp(len1, len2));

Not sure if it should be a macro or specialized function since we might want to have at least Num to be comparable.

Speaking of which maybe we might want to use <=> operator instead of compareTo or any other name.

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024 1

OK, I'll change the type of the lengths to size_t and leave the rest of it alone.

The way I envisaged a swap operator working is analogous to = except, yes, you'd need a variable on both sides. You could still use it as expression but it wouldn't be overloadable.

But you're probably right that it wouldn't be worth the effort of implementing all this just to save two lines of code. Anyway it's just an idea I had :)

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024

I hadn't really thought about it before, but given the way UTF-8 is encoded, I think you're right that strcmp/memcmp will give the correct results when comparing two UTF-8 strings lexicographically.

I wouldn't have thought that we'd need to worry about the UTF-8 BOM, because if we've picked that up when reading a string from a file (say), then we should have already stripped it off from the Wren side before doing any comparisons.

Anyway, if we're going to implement compareTo in C, then we'll need to add something like the following in the appropriate places in wren_core.c:

DEF_PRIMITIVE(string_compareTo)
{
  if (!validateString(vm, args[1], "Argument")) return false;

  ObjString* str1 = AS_STRING(args[0]);
  ObjString* str2 = AS_STRING(args[1]);

  uint32_t len1 = str1->length;
  uint32_t len2 = str2->length;

  // If strings have equal length, return memcmp result.
  if (len1 == len2) RETURN_NUM(memcmp(str1->value, str2->value, len1));

  // Get minimum length for comparison.
  uint32_t len3 = (len1 <= len2) ? len1 : len2;
  int res = memcmp(str1->value, str2->value, len3);
   
  // If result is non-zero, just return that.
  if (res) RETURN_NUM(res);

  // Otherwise the shorter string will come first.
  res = (len1 < len2) ? -1 : 1;
  RETURN_NUM(res);
}

...

PRIMITIVE(vm->stringClass, "compareTo(_)", string_compareTo);

from wren.

mhermier avatar mhermier commented on June 20, 2024

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024

OK, I've incorporated your first two suggestions.

I'd already dealt with the third as I realized just after I'd posted it that we could only guarantee the sign.

DEF_PRIMITIVE(string_compareTo)
{
  if (!validateString(vm, args[1], "Argument")) return false;

  ObjString* str1 = AS_STRING(args[0]);
  ObjString* str2 = AS_STRING(args[1]);

  uint32_t len1 = str1->length;
  uint32_t len2 = str2->length;

  // Get minimum length for comparison.
  uint32_t minLen = (len1 <= len2) ? len1 : len2;
  int res = memcmp(str1->value, str2->value, minLen);
   
  // If result is non-zero, just return that.
  if (res) RETURN_NUM(res);

  // If the lengths are the same, the strings must be equal
  if (len1 == len2) RETURN_NUM(0);

  // Otherwise the shorter string will come first.
  res = (len1 < len2) ? -1 : 1;
  RETURN_NUM(res);
}

...

PRIMITIVE(vm->stringClass, "compareTo(_)", string_compareTo);

from wren.

mhermier avatar mhermier commented on June 20, 2024

About the ease of read, it is subjective. My solution at least express the intent in code, so I find it simpler than reading comments. But it is a minor detail, as long as the code working, it is fine.
For future proofing, I would use size_t instead of uint32_t, while overkill it should have only a limited impact (since it is stack/register memory and not allocated memory), and if we need to change string size to size_t it will be one site less to update...

While I find the idea interesting for <=> to be swap, I think it is too much limited in the case of the wren language. Since we only simulated left values (via operators (...)=(value) and [...]=(value)) it would only be applicable on bare variable names. So it seems to be too much restricted to avoid the 3 lines boiler plate required to manually implement swap, and could be taking advantage of our simulated left values (at a non negligible or avoidable cost).

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024

Just to finish up on the 'swap' thing, the nearest I can get to a generic swap function, using what we currently have, is this:

import "meta" for Meta

var Swap = Fn.new { |a, b|
  var s = "{\n"
  s = s + "  var t = %(a)\n"
  s = s + "  %(a) = %(b)\n"
  s = s + "  %(b) = t\n"
  s = s + "}\n"
  Meta.compile(s).call()
}

var a = 1
var b = 2
Swap.call("a", "b")
System.print([a, b])

var c = 3
var d = 4
Swap.call("c", "d")
System.print([c, d])

var e = [5, 6]
Swap.call("e[0]", "e[1]")
System.print(e)

/* output:
[2, 1]
[4, 3]
[6, 5]
*/

The generated code is placed in its own block to ensure the temporary variable 't' is destroyed after each invocation.

Note though that you have to pass the variable names as strings. Although it works for indexed variables, we don't really need this as we have list.swap.

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024

Incidentally, it struck me later that the Swap() function in my previous post would not be a great solution as it relies on capturing the swapped variables and will only work at top level if defined in the same module.

Nor will it work if called from within a function or method - not much does with Meta.compile and friends :(

from wren.

PureFox48 avatar PureFox48 commented on June 20, 2024

I've now opened a PR to implement string comparisons on the basis agreed above.

from wren.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.