Giter VIP home page Giter VIP logo

Comments (12)

Hugo-ter-Doest avatar Hugo-ter-Doest commented on July 25, 2024 1

Hi, I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first.

I checked the license:

/*
 * Original author: Joder Illi
 * 
 * Copyright (c) 2010, FormBlitz AG
 * All rights reserved.
 * Implementation of the stemming algorithm from http://snowball.tartarus.org/algorithms/german/stemmer.html
 * Copyright of the algorithm is: Copyright (c) 2001, Dr Martin Porter and can be found at http://snowball.tartarus.org/license.php
 *
 * Redistribution and use in source and binary forms, with or without 
 * modification, is covered by the standard BSD license. 
 * 
 */

As I see it BSD licensed code can be integrated in a MIT licensed code base as long as the the added code has the original (BSD) license.

from natural.

Hugo-ter-Doest avatar Hugo-ter-Doest commented on July 25, 2024 1

#663 is merged.

from natural.

chrisumbel avatar chrisumbel commented on July 25, 2024

i agree! one of my highest priorities for natural before fall 2012 is non-English stemmers. i personally was going to look into doing French as I can likely handle that completely, but was hoping to get native speakers to help me at least verify my work with other languages.

would you either be able to handle either the implementation or at least help me verify its accuracy?

the algorithm you've attached, have you played with it much? are you aware if there are any licensing restrictions with it?

from natural.

thomasfr avatar thomasfr commented on July 25, 2024

Hi,
I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first. Otherwise i can i help you in testing yours.

But great to hear that this is on your top priorities list. :)

from natural.

chrisumbel avatar chrisumbel commented on July 25, 2024

Feel free to take a stab at it!

from natural.

chrisumbel avatar chrisumbel commented on July 25, 2024

oops! i did not mean to close this.

from natural.

alfredwesterveld avatar alfredwesterveld commented on July 25, 2024

+1 for Dutch stemming. Hopefully I can help out in some sort of way in the future.

from natural.

joscha avatar joscha commented on July 25, 2024

You can use the JS Snowball port to do so:

https://github.com/fortnightlabs/snowball-js

It does change the capital letter U to lowercase though: http://code.google.com/p/urim/issues/detail?id=3

from natural.

Hugo-ter-Doest avatar Hugo-ter-Doest commented on July 25, 2024

Added Porter Stemmer for Dutch. I should say that the Porter algorithm makes mistakes in Dutch and that my implementation fails in 305 cases of 45669 in the snowball file. That is less than 1% failure. Also the Snowball file contains wrong examples; for instance afvalstortplaats is stemmed as afvalstortplat, which is wrong, it should be afvalstortplaats.

Hugo

from natural.

webia1 avatar webia1 commented on July 25, 2024

News?

from natural.

Hugo-ter-Doest avatar Hugo-ter-Doest commented on July 25, 2024

I am also considering jsSnowball transpiled from Java sources. It is licensed with BSD 3.0 which can be combined with MIT license as well.

Source can be found here:
https://github.com/mazko/jssnowball

from natural.

Hugo-ter-Doest avatar Hugo-ter-Doest commented on July 25, 2024

See #663 for progress

from natural.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.