Giter VIP home page Giter VIP logo

Comments (4)

raffaem avatar raffaem commented on May 19, 2024

Yeah, the weat return as follows:

{'query_name':  [MY QUERY NAME], 'result': nan, 'weat': nan, 'effect_size': nan}

from wefe.

raffaem avatar raffaem commented on May 19, 2024

Is it possible to know why it is not returning a result?

from wefe.

pbadillatorrealba avatar pbadillatorrealba commented on May 19, 2024

Hello

Based on what you are describing (that the query returns values in some models and not in others) I could infer that the problem lies in that when transforming the query word sets to embeddings sets there is (at least) one word set that is losing 20% of its words. In this case, WEFE by default invalidates the query making it return None.
This could be because the model you are using does not have words in capital letters, does not have words with accents or the words do not exist in its vocabulary.

The behavior of queries invalidated by missing many words is detailed in the warning of this subsection:
https://wefe.readthedocs.io/en/latest/user_guide.html#word-preprocessors

You can use the parameter warn_not_found_words=True to see which words are being lost when converting the query to embeddings.

wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
    query, wefemodel, calculate_p_value=True, warn_not_found_words=True,
)

A possible solution would be to use a word preprocessor (specified in the run_query parameter preprocessor_args or secondary_preprocessor_args).

wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
    query,
    wefemodel,
    calculate_p_value=True,
    secondary_preprocessor_args={"lowercase": True, "strip_accents": True},
    warn_not_found_words=True,
)

In practical terms, with this parameter you specify to run_query that for each word o each set, first look for its original version in the model vocabulary and in case it does not find them, preprocess the word (lowercase and without accents) and try again this search.

Pablo.

from wefe.

raffaem avatar raffaem commented on May 19, 2024

Hello,

Thank you for your support and your prompt and detailed answer.

I'm making sure that all the words of the word sets are present in the embedding before running the query. So I don't think that's the problem.

Anyway I think WEFE should throw an exception by default instead of returning nothing.

I will try again next week.

Thank you again

from wefe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.