Giter VIP home page Giter VIP logo

ml_for_sla's People

Contributors

jonathanlanemcdonald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

codymdawson

ml_for_sla's Issues

Consider retraining with other morphological analyzers

Hey, first I just want to thanks you for the very detailed README writeup. It was a great read!
Since it seems like you are rewriting/polishing some parts of the project, is it possible that you retrain the model with other, more accurate morphological analyzers?
I'm suggesting taking a look at Juman++: A new morphological analyser that considers semantic plausibility of word sequences by using a recurrent neural network language model (RNNLM)
Or using mecab with mecab-ipadic-NEologd for more accurate results.
If you decide to retrain the model, some comparison details would be greatly appreciated!

hi Jonathan your tool is brilliant

i always had same ideas but i am not a programmer
i am thinking about doing the exact same thing for Chinese language and other languages as well
how hard it is for a non programmer to actually modify your tool?
here what ideally want:
1- scrape a large set of Chinese text from different corpuses. like a large collection of books txt, in different fields. same thing for news websites, magazines. movies subs (are good since they are already chunked manually) , social media posts, and i saw before a data of Weibo private messages it's like real people private messages data that would be cool to add because mostly messages and public posts are mostly simple and important language.
i used to use anki a lot but i don't anymore. for me it's just time consuming and a never ending game. been there RTH-ed 4000 Hanzi. u guys call it RTK-ed kanji. so I'm done with srs. feel like a system like yours for me is even better than srs. because i reached a level where as u said if i see an I+1 in multiple great chosen sentences uniquely for me that would make the sentence and the word stick with no much effort.

my language setup now:

used to use morphman it's good (if you don't answer me maybe i will just go back to morphman)
cuz i think all i need is a sentence sorter by i+1

i bought the migaku if you heard of it yoga went and made it but always same problem remains. yes it made card creation easier but that's not what i need.

the ready made sentence database sorted seems better which is a common thing between us

after that i bought an old tool called Chinese analyzer haha it turned better than migaku and way cheaper. https://www.chinesetextanalyser.com/

you can read more details. simply it tracks your knowledge and shows unknown words in red, known are white. and has a built in dict for Chinese. (the logic behind the dict popup is brilliant which is if u click to define the word automatically it turns the word unknown there is no gray area. there is only u know it or not. i believe in that. anki has 3 or 4 buttons long before i used this tool i knew this and i only clicked green or red this really helped memorization quality over long run)

also has analytics you can give it any piece of text it tells you how many words u know and which u don't
so i go out to hunt some text or subs i copy paste and if find a text with less than 100 unknown words i consider it.

recently i discovered a deck in Chinese called spoon-fed the guy who made it,
made it from easy level Chinese sentences to difficult sentences so he already did the sorting manually

so when i took that deck and took it's text copy paste into this text analyzer i was really surprised at how fast can one learn when real comprehensible data is presented it's like starting with a pregnant sheep haha and then before you realize you find out you have thousands of sheep naturally u just need to be available.

and immersion extensive reading and listening becomes the real srs by that point personally i don't even need an srs. i only need huge amount of sentences
not very long ones... sorted the way you describe it and preferably linked to a tts text to speech engine so you can hear the sentence if needed.

i have a way to pronounce any sentence using a keyboard shortcut. i can share with u if u want

i think me and you can make great team.

i study now 10 hours Chinese per day for a long time. i need to learn Chinese asap it's gonna solve my financial problems

and i think together we can do some great experiments in a field we both love.

if you like my article please drop your contact this way it's faster we can contact each other
or send me a WhatsApp at: +212637709291
I'm from morocco. sorry if my English doesn't make sense
and wish us good luck tomorrow morocco has semi finals football match with France then we might get the world cup lol -_-

i really need your help this time it's an emergency. thanks infinitely Jonathan

God bless u.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.