Giter VIP home page Giter VIP logo

semantle-he's People

Contributors

eladheller avatar eyalgr avatar giladgo avatar iddoyadlin avatar ishefi avatar itamar-scala avatar splintor avatar yantiparazi4567 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

semantle-he's Issues

Change refresh time

The game used to refresh (new secret word) at 2:00am (Israel time, GMT‎+3).
Since the switch to DST (Daylight Saving Time / שעון קיץ) it now refreshes at 3:00am.
If possible (and accepted) - I suggest to change the refresh time to midnight - 12:00am (along with all Wordle variations) or at least back to 2:00am.

הצעה לשיפור

אם מישהו הצליח את הסמנטעל בפחות מ-20 נסיונות, כנראה שזה היה ניחוש אז אפשר לעשות שאם הצלחת מהר מדי תקבל אחד חדש של מילה אקראית.

marking progression milestones (WARNING: #31 spoiler)

Hi,
I suggest marking breakthrough words, that is, guesses which were the closest to to the target when first tried.
This might help advancing to the goal, and once the goal is achieved, it will provide a nice view of the road to victory.
I'm attaching an example, be aware that it might spoil today's riddle (#31, although so far I didn't solve the damn thing)
progressions
.

Collaborate accross the web (feature request)

Would be nice to be able share guesses with one or more collaborators on the web so that each one can guess words and see the results of the other.
I do not have experience in such interfaces so I only have a vague idea of how this could be implemented (which may not be realistic) and I realize that this will require numerous additions. I think that it would be better not to keep any data that scales with the no. of players on the server. If each player that connects to the server has a unique id, the web page on the player's browser can keep a list of player ids to which the guesses will be shared with.

Share incomplete guess (feature request)

Great application, thank you!
Would be nice to be able to share the results of incomplete guesses in an analogous format to that of a complete guess.
Often, the secret word is difficult and I would have liked to share how close I was. For example, when the word was גלגלת my top guess was ידית 999/1000. A possible text could be:
לא פתרתי היום את סמנטעל #99. לאחר 731 ניחושים הגעתי ל-999/1000:
https://semantle-he.herokuapp.com

Personalized Word Embeddings from Game Statistics

Hi Itamar,

My name is Itay Nakash, and I'm an MSc student studying natural language processing at Technion. I find the game you developed very intriguing, and I believe its statistics could offer valuable insights for creating personalized word embeddings.

If you're interested in utilizing this platform and people's responses from the game, I'd love to collaborate with you on this project, at any scale you prefer.

It will require some changes in the code to collect the statistics, and some nlp work to try and match the new word embedding.
In addition, I believe that framing this game as a nlp task, with a significant dataset, with a new task/goal that utilize this data could be a great contribution to the community.

Before I begin implementing and developing the idea, I would like to check with you whether you are open to integrating it into the platform, given your background as an NLP researcher.

Thank you,
Itay

Add positional encodings for better consistency?

Something like they did in BERT. In the standard gensim Word2Vec they don't take into considerations order of the sentence.
Adding positional encodings maybe can improve consistency by giving some weight to order.

Missing word2vec.db

The readme says the db (word2vec.db) is part of the repo but it's not there

Apostrophes in words makes them distinct when they aren't

Adding an apostrophe (or apostrophes) anywhere in a recognizable word will be treated as a distinct word, but will have the same closeness value as the word without the apostrophes.

For example, all of the following words were accepted as distinct words, and they all had the exact same closeness value:
צבע
צבע'
'צבע
צ'בע
צב'ע
צב'ע'
צ''''בע

More correct behavior would probably be to either reject those words or not count them as distinct from the original.

Augment data using english language

Just a thought about how you can improve the precision and make your model better understand semantic relationship between words. Why not just use english? Then you can just add a layer of tramslation before generating the embedding of each guess. Assuming that hebrew to english translation is reliable, you'll be able to benefit from the abundance of work that has been done on english word2bec or any other word embedding technique. :)

Dealing with plene/deficient spelling

A couple of days ago the solution was "דעה". I guessed "דיעה" and it got only 996/1000 (66.54).

  • The same word in plene spelling (כתיב מלא) and in deficient spelling (כתיב חסר) should generate the same similarity ranking.
  • I thought of 2 possible solutions for this:
    1 - Standardize the words (guesses) - turn all plene spelled words to deficient spelling or the other way around (just like the English version of the game automatically turns all the words to lower case and British to American spelling).
    2 - Reject one form of spelling (plene / deficient).

Deal with apostrophe

Right now, words such as ג'קט are not accepted by the algorithm while גקט does.

We should either:

  1. Migrate the data properly to include apostrophes
  2. Sanitize user input and remove apostrophes before querying the db

Phrases with more than one word are never accepted

The How To Play page says that a guess can be "מילה או ביטוי קצר". To date, I have not found a guess with more than one word that was accepted. Examples: ראש ממשלה, עמוד שדרה, קרוב משפחה.

I haven't looked at the code but I suspect there is no mechanism to ever add these kinds of words to the database. In that case, the How To Play text should be updated.

Generating the word2vec db

Hi,
Could you provide more detailed instructions on how to generate the word2vec db? E.g., at what part and how to use the HebPipe you mentioned in the faq.
Thanks

Stattistics

Any chances to get some statistics of the game?
how often people succeeded with the number?
how the score change along the guessing?

Unable to reproduce model

Hi,

I've been playing around with Word2Vec and the model linked here, and I can't seem to reproduce the same distances.

For example:

Python 3.11.2 (main, Feb 12 2023, 00:48:52) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gensim
>>> model = gensim.models.Word2Vec.load('./wiki_tokenized_model/model.mdl')
>>> model.wv.similar_by_word('אשליה')
[('אשליית', 0.7949888110160828), ('אשלייתי', 0.7358855605125427), ('תחושה', 0.7196317911148071), ('סימולקרה', 0.7147767543792725), ('מתעתעת', 0.7013854384422302), ('השתקפות', 0.6864952445030212), ('אסטרלית', 0.6836147308349609), ('אשלייתית', 0.6831943392753601), ('אילוזיה', 0.6829365491867065), ('סיראנית', 0.6813762784004211)]

Note the distances.

However, the distance Semantle gives is different:

Screenshot from 2023-03-07 08-26-36

Am I doing anything wrong? I'd love some feedback!

Missing word

המשחק אמר שהוא לא מכיר את המילה וניל, מנחש שאולי היא פורקה ל ו+ניל
בויקיפדיה מילים כתובות עם ניקוד אם אני לא טועה אז אפשר לקחת את זה בחשבון או בתהליך הפירוק של המילים או בכללי

"Give Up" button

text on button should be ״נכנעתי״.

Should appear after GIVEUP_THRESH (env var) good guesses (i.e., not nonexistent words)

Share to telegram with spoiler

It'll be nice to share my solve story with the actuall guesses I tried, to avoid spoilers, the "spoiler" markdown in telegram can be used. Since this feature avaible only on telegram, a seperate share button should be added.

The message should look like this:
image

and after clicking a spoiler:
image

telegram mardown for spoiler is leading and trailing '||'

proper names overrunning top word list

In the word for 20220320, which was "קרקס", what seems to be the majority of the close words had been proper names of people and fictional charterers. Such words should, generally, not appear in the word list in the first place. As removing them may be an annoying issue (I suspect there should be an easy way to filter them reasonably with the pipeline), at least it is worth verifying that the list is not overrun by such words when selecting the daily words, as it can be very frustrating to guess such words.

Negative mark?

Yesterday (the word was "Joke"), the word "GILUY" got a negative mark.
is it a bug or a feature?
Thanks,

XSS Vulnerability

There is an unlikely but possible XSS vulnerablity. If someone is convinced to paste a guess, then an attacker can execute arbitrary JS on the victim's browser.

How to reproduce:

  1. Go to semantel
  2. Paste the following in the text input (with the quotes in the end): היי&"><iframe src="/" onload="alert('PWND')" width="0px" height="0px" />"
  3. Press the button and observe the result:
    image

There are two parts to exploiting the vulnerability:

  1. First, we have to make the server return a response that recognizes the guessed word. Since, if the server returns an empty response the guess row element is not generated and we get an error.

In order to do that, we can just write an actual word; e.g. היי, add the & character in order to make the server think it's a different parameter and add arbitrary text afterwards.
When we send היי&Malicious code here and get a response from the server as if we only sent היי.
Sanitizing the input before executing the following lines of code should solve this problem.

const url = "/api/distance" + '?word=' + word;
const response = await fetch(url);
  1. The use of innerHTML in function guessRow, specifically here:
return `<tr><td>${guessNumber}</td>
<td style="color:${color}" onclick="select('${oldGuess}', secretVec);">${oldGuess}</td>
<td align="right" dir="ltr">${similarity.toFixed(2)}</td>
<td class="${cls}">${percentileText}${progress}
</td></tr>`;

Using an alternative to innerHTML or escaping the input should also help preventing this attack.

Combining these two vulnerabilities, when we enter the malicious input, the following dangerous HTML is generated:

<tr>
   <td>1</td>
   <td style="color:#c0c" onclick="select('היי&amp;">
      <iframe src="/" onload="alert('PWND')" width="0px" height="0px">"', secretVec);">היי&">
      <iframe src="/" onload="alert('PWND')" width="0px" height="0px" />
         "
   </td>
   <td align="right" dir="ltr">24.84</td>
   <td class="">(רחוק)
   </td>
</tr>
<tr><td colspan=4><hr></td></tr></iframe></td></tr>

Guess number issue upon page reload

Page reload subtract 1 from the guess counting index.
When solving, the number of guesses is OK, but the number of the guess number of the found word is smaller by 1.
Reproduce by:
Guess a word
Reload page
Guess another word

Both words would have 1 as a guess number.
Additional reload doesn't harm the guess counter.

Add option to sort guesses by guess order & similarity

Id like to have an option to view my progress while guessing.
For example to see the greatest breakthroughs, when I was stuck the most etc.

I suggest to add a progress button named "התקדמות".
The button will present the guesses as following:
It will present guesses in the order you guessed them, but! It will only present the guesses which got closer to the secret word.

For example the secret word is "צרידות"
And I guessed:

  1. בננה 26.64
  2. חולי 55.6
  3. תפוח 23.6
  4. סמוראי 16.81
  5. צרידות 100
    So it will present the following words in the following order
  6. בננה 26.64
  7. חולי 55.6
  8. צרידות 100
    In that way one can understand his progress and breakthroughs.

תוצאה מוזרה

בסמנטעל #74 הציון של המלה "כרבול" הוא 99.99 ומופיע אייקון של שני אנשים מחובקים.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.