This issue is to discuss ways to best combine vector embeddings so that a wikirec user

Just did a commit that adds a combined BERT/TFIDF output to <a href="https://github.co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Devising ways to best combine recommendations,about andrewtavis/wikirec

Comments (19)

victle commented on May 26, 2024 1

Cool! I can do the PR for it.

And I agree, I don't have feel strongly either way yet, so leaving it for now is fine.

from wikirec.

andrewtavis commented on May 26, 2024 1

Just did a commit that adds a combined BERT/TFIDF output to examples/rec_ratings :) I moved The Hobbit up to a 6 from a 5 (full disclosure, I love A Wizard of Earthsea and wanted it in the top 20 😄). Thanks for the initial commit on all this! Happy we have two full examples at this point 😊

from wikirec.

victle commented on May 26, 2024

Hello! Cool project! I'm sort of new to NLP in general. But, I was thinking for multiple inputs, maybe an approach could be aggregating the sims from different methods (e.g., median, weighted averages), and picking the sims with the highest score?

from wikirec.

andrewtavis commented on May 26, 2024

Hi, and thanks for the praise! Quick aside, sort of new to NLP in general's all good, so no worries :)

Definitely combined recommendations will continue to be some kind of averaging of the scores and then selecting from an ordered list. Your idea of taking the median is an interesting one though - say taking the median value of the similarities for the titles that have already been selected?

With regards to weighted averages, this is actually very interesting to me as I think it'd be great if people could indicate that one item in a list is more important than the others. As of now I'm thinking about how to do this in an intuitive fashion where a user would be able to pass titles of interest and their weights and then get results.

Would be great to get further opinions from you on all this :)

from wikirec.

victle commented on May 26, 2024

Hi, and thanks for the praise! Quick aside, sort of new to NLP in general's all good, so no worries :)

Yep! I'll try my best haha.

Median would be an option, especially if the distribution of different titles may be skewed. Though, I'm not sure if this would actually improve "performance".

Off the top of my head, perhaps people can pass an optional argument or keyword argument with a list of "ratings" for each of titles passed in? At least that way it won't break anything?

Either way, I'd love to know if you have any suggestions for how I can start contributing :). Perhaps a start could be implementing different aggregating methods for this snippet in the wikirec.model.recommend function?

from wikirec.

andrewtavis commented on May 26, 2024

If you figure out the way to effectively get ratings into and out of this thing then you and I can start ourselves an open source organization :D Sarcasm that's actually serious aside, I'd be interested in exploring this more, and is something I've already been working on a bit :)

Simplest thing that could be done is, as you said, add a ratings keyword argument to wikirec.model.recommend, with the relative importance of similarities then being weighted by the proportion of the corresponding score to the sum of the passed scores. I've tried things similar but actually not exactly like this before, and the results are good, but the thing to note is that you're not getting valid scores back out of it. All the sims are on [0, 1), so you can't expect to get out a 9/10 when the the 9s you're passing are constantly multiplied by a number < 1. To me the best thing to then do would be to turn off the numerical part of the output so you just get the list, but then you could also experiment with scaling the numerical part to be on [0, 1] or even [0, 10]. The latter would give too many 9.9/10s to be sensible, but then to me it's worth a try.

If you wanted to implement a method and/or ratings kwarg for wikirec.model.recommend, then the effort would be much appreciated! For models, I'd focus on BERT, TFIDF, and LDA, as Doc2vec is just so time intensive and so meh as of now... You'd be more than welcome to create your own notebook in the examples to test these arguments, as I guess as of now I'd prefer to keep the current examples as they are. Your implementations and results would then be welcome additions to the readme :) (thinking out loud, if the results are getting long we should look to add drop downs for comparative results to the readme as I've done with causeinfer's readme).

And speaking of examples, getting examples/rec_movies ran all the way through would also be helpful if you have access to a server that could handle the file sizes (no biggie if not, as I expect someone will come along who can run these larger examples).

Great to have you!

from wikirec.

victle commented on May 26, 2024

Thanks for the informative answer! An open source org is not out of the question 😆. I'll take a crack at putting together a basic framework that will take ratings in and then manipulating the sims. And then, putting together an examples to test them. I do have to refresh myself on BERT, TFIDF, and LDA (would appreciate any resources on that), but I think the methods themselves should be pretty intuitive based on your examples.

from wikirec.

andrewtavis commented on May 26, 2024

I'll put together some resources for the models and send them along tomorrow :) Maybe trying the Wiki feature of repos would be useful in this regard 🤔

For the Jupyter notebook, in case you don't use them already check out the notebook extensions, which will make the example a lot easier as far as an auto ToC with Table of Contents (2), Jupyter Black (as the code format is Black, but don't stress about this!), Highlight Selected Word, spellchecker (for markdown cells), and Execute Time - with these being the main ones I use :)

Also, if you're using VS Code for your editor and are looking for extension suggestions to similarly make it all easier, I'd be happy to shoot some along!

from wikirec.

andrewtavis commented on May 26, 2024

@victle, the package Wiki now has a page Resources for Models that has what to me were the best descriptions and videos for the models, as well as links to the documentation for the Python implementations that wikirec sources :)

from wikirec.

victle commented on May 26, 2024

Thanks, I'll definitely look them over. I've been trying to reproduce some of the examples in the repository, and have been running across a few issues that I've personally noted down. One main problem I've had is that I haven't been able to get the same output from using wikirec.model.recommend(), though I've only tested LDA and TDIDF so far. For reference, this is what I'm getting in my notebook from using LDA, but TDIDF I'm able to reproduce. Not sure if it's a version issue with some packages or anything.

from wikirec.

andrewtavis commented on May 26, 2024

For LDA you have a random_state argument that's going to be a factor in all of this that will change the results :) See the LDA multicore docs. That argument should be able to be passed as a kwarg for model.gen_embeddings, which will then produce the same results each time. The same is true for BERT and Doc2vec, although they might be referred to as random seed instead. You can see np.random.seed(42) in use in the testing base file for wikirec, although looking at that now I really do need to alter the tests so they're checking the models for explicit return values (will do that this weekend).

Machine learning algorithms all (generally) have an aspect of a pseudorandom number generator that's being used to set the initial state(s) that are then optimized in the modeling process, which is why not setting the random state/seed will lead to slightly different results. The path that the optimization takes will also be different given the state/seed.

Those results for LDA look fine to me :D Most of the other HP books, something that makes sense in The Marvelous Land of Oz, and then some stuff that's just ???????? Fine tuning the LDA model given the multicore parameters would be something for later, as MVP wise I was just thinking get things out and see what works best regardless of optimization.

TFIDF isn't a machine learning algorithm (roughly is just counts and division), and thus will produce the same results :)

from wikirec.

victle commented on May 26, 2024

I had a gut feeling there was a random seed happening underneath 🤦 . I guess without a seed in the notebook examples I assumed there wasn't. Either way, thanks for clearing that up! I was scratching my head a lot trying to figure out why notebooks weren't generating similar recommendations 😅 I'll keep on tackling a method for taking multiple inputs 👍

from wikirec.

andrewtavis commented on May 26, 2024

Maybe would be best if there was a seed set in notebooks so that it's explicit and the results are reproducible? Makes sense that it would be a bit more confusing with one thing generating the same and one thing not 😊 And I was 99.9% positive that the full seed explanation was overkill, but just wanted to check :)

Looking forward to the results!

from wikirec.

andrewtavis commented on May 26, 2024

Thanks for the work in #38 :) As I said in the review, I'll go through soon and format the example, and will maybe add in BERT too as I have the sim matrix saved.

Only thing that jumped out was that the weights are generated by division by 10 rather than the sum of the ratings.

if ratings:
    if any(True for k in ratings if (k > 10) | (k < 0)):  # this didn't need the list comp within :)
        raise ValueError("Ratings must be between 0 and 10.")
    weights = np.divide(ratings, 10)

To me the sum would give a better representation of the relative weight of each, so weights = np.divide(ratings, sum(ratings)), but let me know on it 😊

I just did a quick commit/push that added the following to the project description, btw:

Along with NLP based similarity recommendations, user ratings can also be leveraged to weight inputs and indicate preferences.

The ratings kwarg has further been added to the readme model.recommend example, and there's a reference to examples/rec_ratings as well :)

Thanks again!

from wikirec.

andrewtavis commented on May 26, 2024

Also, do you think we should add drop downs to the readme for the results to make it a bit more succinct? I'm thinking also that the methods section might benefit from having the model sub sections compartmentalized. Lemme know what your thoughts are, and for example:

Methods (would include all 4):

BERT

Bidirectional Encoder Representations from Transformers derives representations of words based on NLP models ran over open source Wikipedia data. These representations are leveraged to derive article similarities that are then used to deliver recommendations.

wikirec uses sentence-transformers pretrained models. See their GitHub and documentation for the available models.

from wikirec import model

# Remove n-grams for BERT training
corpus_no_ngrams = [
    " ".join([t for t in text.split(" ") if "_" not in t]) for text in text_corpus
]

# We can pass kwargs for sentence_transformers.SentenceTransformer.encode
bert_embeddings = model.gen_embeddings(
        method="bert",
        corpus=corpus_no_ngrams,
        bert_st_model="xlm-r-bert-base-nli-stsb-mean-tokens",
        show_progress_bar=True,
        batch_size=32,
)

Results:

Baseline NLP Models

Outputs

Weighted NLP Approach

tfidf_weight = 0.35
bert_weight = 1.0 - tfidf_weight
bert_tfidf_sim_matrix = tfidf_weight * tfidf_sim_matrix + bert_weight * bert_sim_matrix

Outputs

Adding User Ratings

ratings = [1, 2, 3]

Outputs

from wikirec.

victle commented on May 26, 2024

Thanks for the feedback regarding #38 ! Adding BERT would definitely be cool.

I thought about dividing by 10 versus dividing by the sum too. I was finding that dividing by the sum was pushing similarity scores really low, and I wasn't sure if we would want that. Either way, I believe the result should be relatively the same? I went with 10 because of the following situation. Imagine that some books all had a similarity score of 1. If someone gave a 10 to two of the books, the recommendations should come back with a weighted similarity score of 1 as well, which is what happens when you divide by 10 (each book gets a weight of 1). On the other hand, if you divided by the sum of ratings, the max similarity score you would get if 0.5 in this scenario. Hope that made sense 🤔 I'm sure there can be cases for both though!

Also, one issue I was thinking about was how for > 3 inputs, I don't think the current method weighs them properly. The reason is because the averages are taken iteratively, rather than all at once. I think this stackoverflow link helps to explains my issue. As we iterate through more than 2 inputs, we are taking the mean between 1) a previous mean and 2) a singular similarity value of the latest input, which doesn't reflect the grand mean across all the inputs. Is my understanding of that correct? I think we'd have to adjust with something like below:

sims = [np.mean([ (r+1) * s, sim_matrix[i][j]]) for j, s in enumerate(sims)]

Let me know what you think! And yes, I think having dropdowns would help clean things up 😄

from wikirec.

andrewtavis commented on May 26, 2024

The reason the damn recommendations are always similar to the most recent input! I've been confused 😄 The toying around that I do with this myself involves iterating the inputs, and it's been super annoying that the most recent one has just been dominating the results. Your adjustment of how sims is calculated is more than welcome if you'd like to do a PR for it :) Seems like the way to fix it to me. This is great 😊

I'm thinking that the sum would be the best option for the ratings in that their relationship to one another would be maintained, but I do see what you mean by this. Is exactly what I was writing about before - the similarities just get super small and meaningless... I'm fine with leaving it as is for now and we can readjust as we go :) For now a sensible output is likely better.

If you want to do that PR, lemme know and then I'll get to the example and readme updates after! Also fine with doing it myself as it's a quick one. You're further welcome to do the readme dropdowns if you feel like toying around with that too :)

from wikirec.

andrewtavis commented on May 26, 2024

Thanks for #39! Makes sense that it's r * s as the first index is covered by first_input . I updated the readme as we discussed and also did the same for the components of the data section so that it's also not an uninviting wall of text 😄

I'll update the example, and then I'm thinking that the work you've put in warrants a version update and a release 😊 Will do all that tomorrow at the latest!

I guess the next thing I'm thinking about is #33 - allowing disinterest in the recommendations. Now that we have ratings this is less important, and maybe even worthless as I'm thinking that results for ["Harry Potter and the Philosopher's Stone", "!Twilight"] could be really really bad... Then again it might still be useful to allow a user to express disinterest based on similarity scores alone? Your thoughts on this would be much appreciated, if you're so inclined :) I'm honestly thinking now that it's kind of funneling usage towards a less intuitive case, so maybe it should just be closed?

from wikirec.

victle commented on May 26, 2024

Yea, I was wondering why (r+1) * s wasn't working!😅 A version update sounds exciting! I was thinking about #33 as well while I was working this. I was thinking that giving a particular input a low rating (say a 1 or 0) would be sufficient for expressing disinterest. However, in the computations, it still means that the input is averaged in (with a weight of 1/10), which is not desirable. In one approach, I imagine that by "disinterest" we could completely remove recommendations that are in a certain distance from the item of disinterest in the n-dimensional space? Like, say if the user expressed disinterest in Twilight, we could remove the 10 most similar items to Twilight from the similarity matrix? I guess another approach might be to add similarity values that are in the opposite direction of Twilight in the n-space? It'd like we're pushing the recommendations away from Twilight as efficiently as possible? This might get complicated 😆 Anyway, I'm not sure if I'm explaining this the best way, but I do think there's many ways to go about this! 🤔

from wikirec.

Devising ways to best combine recommendations about wikirec HOT 19 OPEN

Comments (19)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent