Giter VIP home page Giter VIP logo

case-insensitive's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

case-insensitive's Issues

Globbing

https://github.com/typelevel/case-insensitive/blob/main/core/src/main/scala/org/typelevel/ci/package.scala#L34

The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.

In 1.x.x of case-insensitive we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.

The distinction here between "simple" and "full" is that a simple case fold will not change the number of char values needed to represent the string, but a full case fold may change the number of char values needed.

In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.

I will follow up with some more concrete examples shortly.

If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.

Jekyll is once again emitting a putrescent stench

Run gem install bundler
Successfully installed bundler-2.2.5
Parsing documentation for bundler-2.2.5
Installing ri documentation for bundler-2.2.5
Done installing documentation for bundler after 5 seconds
1 gem installed
Fetching gem metadata from http://rubygems.org/.........
listen-3.2.1 requires ruby version >= 2.2.7, ~> 2.2, which is incompatible with
the current version, ruby 3.0.0p0
Error: Process completed with exit code 5.

Builds that worked last week are now all failing like that.

String operators

Would we want to add some of the standard utility string function as in StringLike.

Maybe a .transform function could be useful. The usual naming things is hard warning, would be interested in other ideas.

  def trim: CIString = transform(_.trim)
  def transform(f: String => String): CIString = CIString(f(toString))

Unicode Case Folding

While working on cats-uri, I ran into an issue with how CIString was handling certain unicode values which led me to notice it wasn't respecting Caseless matching from the Unicode standard. As it turns out, neither does String.equalsIgnoreCase.

I'd just about completed a branch to implement full case folding as defined by the Unicode standard when I ran across this test.

  test("character based equality") {
    assert(CIString("รŸ") != CIString("SS"))
  }

Since under the Unicode standard's caseless matching these two strings would compare equal, I'm beginning to think we are intentionally not following the standard here. Is that the case? If so, why? Is it to maintain parity with what the Java standard library is doing with methods like equalsIgnoresCase?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.