Comments (4)
Yeah, the weat return as follows:
{'query_name': [MY QUERY NAME], 'result': nan, 'weat': nan, 'effect_size': nan}
from wefe.
Is it possible to know why it is not returning a result?
from wefe.
Hello
Based on what you are describing (that the query returns values in some models and not in others) I could infer that the problem lies in that when transforming the query word sets to embeddings sets there is (at least) one word set that is losing 20% of its words. In this case, WEFE by default invalidates the query making it return None.
This could be because the model you are using does not have words in capital letters, does not have words with accents or the words do not exist in its vocabulary.
The behavior of queries invalidated by missing many words is detailed in the warning of this subsection:
https://wefe.readthedocs.io/en/latest/user_guide.html#word-preprocessors
You can use the parameter warn_not_found_words=True
to see which words are being lost when converting the query to embeddings.
wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
query, wefemodel, calculate_p_value=True, warn_not_found_words=True,
)
A possible solution would be to use a word preprocessor (specified in the run_query parameter preprocessor_args or secondary_preprocessor_args
).
wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
query,
wefemodel,
calculate_p_value=True,
secondary_preprocessor_args={"lowercase": True, "strip_accents": True},
warn_not_found_words=True,
)
In practical terms, with this parameter you specify to run_query
that for each word o each set, first look for its original version in the model vocabulary and in case it does not find them, preprocess the word (lowercase and without accents) and try again this search.
Pablo.
from wefe.
Hello,
Thank you for your support and your prompt and detailed answer.
I'm making sure that all the words of the word sets are present in the embedding before running the query. So I don't think that's the problem.
Anyway I think WEFE should throw an exception by default instead of returning nothing.
I will try again next week.
Thank you again
from wefe.
Related Issues (20)
- WEAT effect size: Different values HOT 4
- Problem with the library typing HOT 2
- ECT score HOT 3
- Metrics not discussed in the paper HOT 1
- WEAT p-value is nan HOT 4
- RNSB Deprecation Warning HOT 1
- RNSB Error HOT 1
- word_embedding not found under wefe HOT 10
- ImportError: cannot import name 'Literal' from 'typing' in Python 3.7 HOT 1
- WEFE documentation is inconsistent with the literature HOT 2
- How to reproduce table 1 as in the paper? HOT 5
- Information about the pre-loaded wordsets (Dataloaders) HOT 3
- Error on import in Google Colab (v0.3.2) with PL HOT 2
- Issue with RIPA metric HOT 3
- Missing WEAT words HOT 3
- Availability for embeddings created with transformer models
- Import error: cannot import name 'BaseKeyedVectors' from 'gensim.models.keyedvectors' HOT 2
- [Question] One-sided p-value for WEAT? HOT 8
- Support gensim 4 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wefe.