Giter VIP home page Giter VIP logo

babosa's Introduction

Babosa

Build Status

Babosa is a library for creating human-friendly identifiers, aka "slugs". It can also be useful for normalizing and sanitizing data.

It is an extraction and improvement of the string code from FriendlyId. I have released this as a separate library to help developers who want to create libraries similar to FriendlyId.

Features / Usage

Transliterate UTF-8 characters to ASCII

"Gölcük, Turkey".to_slug.transliterate.to_s #=> "Golcuk, Turkey"

Locale sensitive transliteration, with support for many languages

"Jürgen Müller".to_slug.transliterate.to_s           #=> "Jurgen Muller"
"Jürgen Müller".to_slug.transliterate(:german).to_s  #=> "Juergen Mueller"

Currently supported languages include:

  • Bulgarian
  • Danish
  • German
  • Greek
  • Hindi
  • Macedonian
  • Norwegian
  • Romanian
  • Russian
  • Serbian
  • Spanish
  • Swedish
  • Turkish
  • Ukrainian
  • Vietnamese

Additionally there are generic transliterators for transliterating from the Cyrillic alphabet and Latin alphabet with diacritics. The Latin transliterator can be used, for example, with Czech. There is also a transliterator named "Hindi" which may be sufficient for other Indic languages using Devanagari, but I do not know enough to say whether the transliterations would make sense.

I'll gladly accept contributions from fluent speakers to support more languages.

Strip non-ASCII characters

"Gölcük, Turkey".to_slug.to_ascii.to_s #=> "Glck, Turkey"

Truncate by characters

"üüü".to_slug.truncate(2).to_s #=> "üü"

Truncate by bytes

This can be useful to ensure the generated slug will fit in a database column whose length is limited by bytes rather than UTF-8 characters.

"üüü".to_slug.truncate_bytes(2).to_s #=> "ü"

Remove punctuation chars

"this is, um, **really** cool, huh?".to_slug.word_chars.to_s #=> "this is um really cool huh"

All-in-one

"Gölcük, Turkey".to_slug.normalize.to_s #=> "golcuk-turkey"

Other stuff

Using Babosa With FriendlyId 4+

require "babosa"

class Person < ActiveRecord::Base
  friendly_id :name, use: :slugged

  def normalize_friendly_id(input)
    input.to_s.to_slug.normalize(transliterations: :russian).to_s
  end
end

UTF-8 support

Babosa normalizes all input strings to NFC.

Ruby Method Names

Babosa can generate strings for Ruby method names. (Yes, Ruby 1.9+ can use UTF-8 chars in method names, but you may not want to):

"this is a method".to_slug.to_ruby_method! #=> this_is_a_method
"über cool stuff!".to_slug.to_ruby_method! #=> uber_cool_stuff!

# You can also disallow trailing punctuation chars
"über cool stuff!".to_slug.to_ruby_method(allow_bangs: false) #=> uber_cool_stuff

Easy to Extend

You can add custom transliterators for your language with very little code. For example here's the transliterator for German:

module Babosa
  module Transliterator
    class German < Latin
      APPROXIMATIONS = {
        "ä" => "ae",
        "ö" => "oe",
        "ü" => "ue",
        "Ä" => "Ae",
        "Ö" => "Oe",
        "Ü" => "Ue"
      }
    end
  end
end

And a spec (you can use this as a template):

require "spec_helper"

describe Babosa::Transliterator::German do
  let(:t) { described_class.instance }
  it_behaves_like "a latin transliterator"

  it "should transliterate Eszett" do
    t.transliterate("ß").should eql("ss")
  end

  it "should transliterate vowels with umlauts" do
    t.transliterate("üöä").should eql("ueoeae")
  end
end

Rails 3.x and higher

Some of Babosa's functionality was added to Active Support 3.0.0.

Babosa now differs from ActiveSupport primarily in that it supports non-Latin strings by default, and has per-locale ASCII transliterations already baked-in. If you are considering using Babosa with Rails, you may want to first take a look at Active Support's transliterate and parameterize to see if they suit your needs.

Please see the API docs and source code for more info.

Getting it

Babosa can be installed via Rubygems:

gem install babosa

You can get the source code from its Github repository.

Reporting bugs

Please use Babosa's Github issue tracker.

Misc

"Babosa" means "slug" in Spanish.

Maintainers

Contributors

Many thanks to the following people for their help:

Copyright

Copyright (c) 2010-2021 Norman Clarke and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

babosa's People

Contributors

aried3r avatar balasankarc avatar basex avatar burningtyger avatar cattekin avatar denys281 avatar dmitry-ilyashevich avatar dru avatar ebeigarts avatar gogainda avatar hamdiakoguz avatar himynameisjonas avatar huchenme avatar janlelis avatar jarosan avatar kimjoar avatar norman avatar parndt avatar reiz avatar srushe avatar tct avatar tiwi avatar vortex avatar vrodokanakis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

babosa's Issues

Chrome : undefined method downcase for nil nilclass nomethoderror

my code works in mozilla browser,but when i give it in chrome it is not opening chrome browser. I downlaoded chromedriver and put it in %HOMEPATH%\Local Settings\Application Data\Google\Chrome\Application\chrome.exe folder,also i did the same in ruby/bin folder.

But still i am getting the error "undefined method downcase for nil nilclass nomethoderror"

@browser=Watir::Browser.new :chrome

it works in mozilla

@browser=Watir::Browser.new :ff

Not getting installed with ruby1.9.3p194

Gem::InstallError: babosa requires Ruby version >= 2.0.0.
An error occurred while installing babosa (1.0.2), and Bundler cannot continue.
Make sure that gem install babosa -v '1.0.2' succeeds before bundling.

Add Ruby 2.7 support

On Ruby 2.7, running the tests yield the following failure

$ ruby --version
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]

$ bundle exec rspec
................................F...............................................

Failures:

  1) Babosa::Identifier#to_ruby_method should raise an error when it would generate an impossible method name
     Failure/Error: expect {"1".to_identifier.to_ruby_method}.to raise_error(Babosa::Identifier::Error)
     
       expected Babosa::Identifier::Error, got #<FrozenError: can't modify frozen String: ""> with backtrace:
         # ./lib/babosa/identifier.rb:172:in `downcase!'
         # ./lib/babosa/identifier.rb:172:in `to_ruby_method!'
         # ./lib/babosa/identifier.rb:289:in `send_to_new_instance'
         # ./lib/babosa/identifier.rb:263:in `to_ruby_method'
         # ./spec/babosa_spec.rb:151:in `block (4 levels) in <top (required)>'
         # ./spec/babosa_spec.rb:151:in `block (3 levels) in <top (required)>'
     # ./spec/babosa_spec.rb:151:in `block (3 levels) in <top (required)>'

Finished in 0.06649 seconds (files took 0.20925 seconds to load)
80 examples, 1 failure

Failed examples:

rspec ./spec/babosa_spec.rb:149 # Babosa::Identifier#to_ruby_method should raise an error when it would generate an impossible method name

In Ruby 2.7, casting a nil object to String yields a frozen object, thanks to https://bugs.ruby-lang.org/issues/16150?next_issue_id=16184.

So, if leader and trailer in to_ruby_method method are nils, as is the case in the failing test, to_s yields frozen strings, on which we are attempting downcase!, which fails (obviously).

Recommendation for slug validation?

First off - Thanks for this gem!

For data consistency reasons would be nice to validate format of the slug attribute in some cases.

Any suggestions for a regexp or other approach for such a validation?

`word_chars!` is stripping numbers

According to the docs for word_chars! it is not supposed to strip numbers from the string, but it is doing so as of 11e0f33

image

# Remove any non-word characters. For this library's purposes, this means
# anything other than letters, numbers, spaces, underscores, dashes,
# newlines, and linefeeds.
#
# @return String
def word_chars!
# `^\p{letter}` = Any non-Unicode letter
# `&&` = add the following character class
# `[^ _\n\r]` = Anything other than space, underscore, newline or linefeed
gsub!(/[[^\p{letter}]&&[^ _\-\n\r]]/, "")
to_s
end

Emoji support

In babosa 1.0.4 emojis were kept when using normalize, now they seem to be removed.

Example:
In 1.0.4 'Emojis on slug 😋'.to_slug.normalize.to_s # => 'emojis-on-slug-😋'
In 2.0.0 'Emojis on slug 😋'.to_slug.normalize.to_s # => 'emojis-on-slug'

The change seems related to the function word_chars!, before was matching specific chars to to strip out, now is a lot more broad by using [^\p{letter}]

Is this intended? Would it be possible to keep support for the emojis?

Should Japanese be supported?

While Chinese works well enough with 1:1 substitutions to produce something meaningful, this is impossible for Japanese. Consider the following three sentences:

  • 生まれた子供たち。
  • 生きる意味を見失う。
  • 生意気な小僧。

Notice that they all begin with 生. But (one possible transliteration) is:

  • umareta kodomo─tachi.
  • ikiru imi o miushinau.
  • namaiki na kozoo.

So depending on the context 生? can be eitheru,i, ornama`. And it gets worse, there are more possibilities (see file below). Admittedly, there are less possibilities for other characters, but the principle stays the same.

Thus transliterating Japanese is almost as hard as transliterating it - it can be done in software only approximately (like automated translations), and it isn't going to be perfect.

Fortunately, ruby is quite popular in Japan and there are some ruby ports of some morphological analyzers, which pretty much do all the hard work. I made a simple script (mainly for private use) that illustrates how Japanese transliterations might be achieved.

Run this ruby script (more notes/comments and links to the gems at the beginning)
http://pastebin.com/hgJjqYcb

with

$ ruby japanese_to_ascii.rb < test_text > output

on this file
http://pastebin.com/CGTHiE58

to get this output
http://pastebin.com/aknZyH0Q

As you can see it isn't perfect, but for the most part correct and readable. So this raises the question, should Japanese be supported in the first place, seeing how hard and possibly open-ended it is...?

travis builds failing

As there are some open pull requests for ukrainian from last summer that fail the Travis CI build, it seems as if there's a problem with CI. @norman could you take a look?

Error while working 2 characters togeder

Hello, while working with the library some words (like Yáñez) make the program to give an error, and happen when 2 "special" characters are togeder.

Thanks in advance

NoMethodError: undefined method `downcase' for nil:NilClass

1.9.3-p0 :011 > "1".to_slug.to_ruby_method
NoMethodError: undefined method `downcase' for nil:NilClass
  from /home/ekhatko/.rvm/gems/ruby-1.9.3-p0/gems/babosa-0.3.9/lib/babosa/identifier.rb:172:in `to_ruby_method!'
  from /home/ekhatko/.rvm/gems/ruby-1.9.3-p0/gems/babosa-0.3.9/lib/babosa/identifier.rb:284:in `send_to_new_instance'
  from /home/ekhatko/.rvm/gems/ruby-1.9.3-p0/gems/babosa-0.3.9/lib/babosa/identifier.rb:258:in `to_ruby_method'
  from (irb):11
  from /home/ekhatko/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'

Updating slug column in Friendly_id for existing records

I am using babosa to create correct Chinese in slug. babosa works perfectly for new records, but I can't update existing records using babosa.

I ran Product.find_each(&:save) just like the usual way to update slug column in Friendly_id for existing records, but it doesn't update the slug column.

Error adding custom transliterators

When I added custom transliterators, for example:

module Babosa
module Transliterator
class German < Latin
APPROXIMATIONS = {
"ä" => "ae",
"ö" => "oe",
"ü" => "ue",
"Ä" => "Ae",
"Ö" => "Oe",
"Ü" => "Ue"
}
end
end
end

I received the following warning:

...config/initializers/babosa.rb:5: warning: already initialized constant Babosa::Transliterator::German::APPROXIMATIONS
...gems/babosa-0.3.11/lib/babosa/transliterator/german.rb:5: warning: previous definition of APPROXIMATIONS was here

How can I add custom transliterators?

Automating Scrollbar for clicking a button

I have a button on the bottom of the page. I need to scrolldown before clicking the link.I need to automate this.
Tried different things but no success.

These are the below snippets i tried..

@ff.goto("javascript:window.moveTo(10,20)") --getting timeout error
@ff.div(:id,"right-pane-footer").document.scrollIntoView -
@ff.button(:id, "save").scrollIntoView

button = @ff.button(:id,"save")
button.locate
button.ole_object.invoke('scrollIntoView')
@ff.button(:id,"save").ole_object.invoke('scrollIntoView')

@ff.element(:id,"save").scrollIntoView

Turkish support

Default latin transliterator is enough for Turkish support.
Maybe it should be mentioned in readme as supported with latin or a Turkish class could be added directly inheriting from latin.
I tested with Turkish special characters (Ç,ç,Ü,ü,Ö,ö,İ,ı,Ş,ş,Ğ,ğ) and characters with circumflex (â,î,û,ê,ô):

"Ç,ç,Ü,ü,Ö,ö,İ,ı,Ş,ş,Ğ,ğ,â,î,û,ê,ô,Â,Î,Û,Ê,Ô".to_slug.transliterate.to_s
=> "C,c,U,u,O,o,I,i,S,s,G,g,a,i,u,e,o,A,I,U,E,O"

ASCII serialization of Danish

In the Danish language the three letters æ, ø, å is usually serialized as ae, oe, aa. One might consider adding this as a language option...

Don't use autoload

Hello,

Firstly, thank you for Babosa! :)

Now, to business: I've seen this issue in production when Babosa is used within a threaded Sidekiq worker:

NameError: uninitialized constant Babosa::Transliterator::Latin

It's a very sporadic error and cannot be reproduced easily.

I'm fairly sure that it's happening because Babosa uses autoload and autoload has thread-safety problems.

I would suggest switching to just using require_relative.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.