typelevel / case-insensitive Goto Github PK
View Code? Open in Web Editor NEWA case-insensitive string for Scala
License: Apache License 2.0
A case-insensitive string for Scala
License: Apache License 2.0
The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.
In 1.x.x of case-insensitive
we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.
The distinction here between "simple" and "full" is that a simple case fold will not change the number of char
values needed to represent the string, but a full case fold may change the number of char
values needed.
In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.
I will follow up with some more concrete examples shortly.
If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.
I'm considering targeting #232 to a series/2.x.x
branch.
@rossabaker @armanbilge any objections to me creating that branch and moving the code there?
Run gem install bundler
Successfully installed bundler-2.2.5
Parsing documentation for bundler-2.2.5
Installing ri documentation for bundler-2.2.5
Done installing documentation for bundler after 5 seconds
1 gem installed
Fetching gem metadata from http://rubygems.org/.........
listen-3.2.1 requires ruby version >= 2.2.7, ~> 2.2, which is incompatible with
the current version, ruby 3.0.0p0
Error: Process completed with exit code 5.
Builds that worked last week are now all failing like that.
We already have the plugin. We just haven't done the rest of the work.
Would we want to add some of the standard utility string function as in StringLike
.
Maybe a .transform
function could be useful. The usual naming things is hard
warning, would be interested in other ideas.
def trim: CIString = transform(_.trim)
def transform(f: String => String): CIString = CIString(f(toString))
I'd like to give munit a try. Now that scalameta/munit#124 is fixed, we can.
While working on cats-uri, I ran into an issue with how CIString
was handling certain unicode values which led me to notice it wasn't respecting Caseless matching from the Unicode standard. As it turns out, neither does String.equalsIgnoreCase
.
I'd just about completed a branch to implement full case folding as defined by the Unicode standard when I ran across this test.
test("character based equality") {
assert(CIString("ร") != CIString("SS"))
}
Since under the Unicode standard's caseless matching these two strings would compare equal, I'm beginning to think we are intentionally not following the standard here. Is that the case? If so, why? Is it to maintain parity with what the Java standard library is doing with methods like equalsIgnoresCase
?
It would be nice if this had CI for the dev shell. It would look a lot like the one in typelevel-nix, and catch issues like #221.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.