activist is an open-source, non-profit political action network. The current goal is the creation of activist.org, a platform to find and discover political events and organizations.
Scribe creates keyboard apps for language learners that include translation, verb conjugation and word annotation for confident communication without leaving the keyboard.
The WikilinkNN currently best supports book recommendations in wikirec as there are preset links that are removed via the following in wikirec.model._wikilink_nn:
It would be best if this could be adapted for other kinds of recommendation inputs. The style of input could potentially be passed to wikirec.model.gen_embeddings, but a discussion could also be had about other ways to derive which links should be removed.
A potential addition to wikirec would be allowing a user to change the recommendations based on the topics. As of now this is only a sketch, but the general idea would be that topic coherences could be returned to the user with the words that define a topic, and then the user could say that they want results that are more in line with a topic by passing percentages a word or n-gram along with a general score. 0.5 could be that topics that include the passed word would not be weighted, with numbers below or above implying that topic importances should be shifted based on the words importance in them.
This would allow a user to express interest in genres, or simply say that results should be more similar to those that are focussed on a similar topic keyword. kwx could be looked to for topic keyword derivation in this case.
This issue is for discussing and eventually implementing an update for gensim implementations of LDA in wikirec. The package was originally written with 3.x versions of gensim, and 4.x versions apparently have some dramatic improvements as far as modeling options/efficency and n-gram creation (for wikirec.data_utils.clean). Changes would need to be made in wikirec.data_utils and wikirec.model.
Documenting what would need to happen for the switch and then work towards implementing it would be very much appreciated :)
This issue is to discuss ways to best combine vector embeddings so that a wikirec user can optimally pass more than one argument to wikirec.model.recommend.
The current way of combining recommendations for more than one input is to simply take the arithmetic means of the similarity matrix rows for each passed title, which is depicted in the following snippet from wikirec.model.recommend:
A discussion of whether this is the best way to do this would be much appreciated! Furthermore, how could the above be changed to allow a user to express disinterest (as discussed in #33).
This issue is for discussing and potentially implementing a way for users to express disinterest in a title when calling wikirec.model.recommend. The general idea now would be to allow users to pass a title with a negation indicator of some kind (ex "!title"), in which case the selections given the similarity matrix for the given item would be reversed.
It would be great to know if the above would be intuitive UI, and an implementation would be welcome!
One way to provide more data for wikirec would be to add metadata for the given article via its Wikidata:Main_Page page. This would change the manner in which the data is extracted, but article texts could be derived as well via the Wikipedia pages that are linked to the Wikidata entity. Whether or not the project should be shifted to focus on Wikidata as a main data source could also be discussed, with tools like WikidataIntegrator being used to derive article categories and query the needed information.
This issue is for creating concise versions of requirements.txt and environment.yml for wikirec. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.
As of now both files are being created with the following commands in the package's conda virtual environment:
wikirec, en-core-web-sm (spacy package that breaks tests), and other obviously unneeded packages are then removed from these files before being uploaded.
It would be helpful to be able to visualize the embeddings created by wikirec models, and one such way to achieve this is t-SNE. This would allow the results models to be visually compared to see how relationships are being derived.
The Python package kwx has an implementation of t-SNE that could be adopted for this package, with another reference being the blogpost that this package was originally based on, which is found here. Ideally this would be put into a visuals.py module, which further would be added to the documentation and tested using pytest's monkeypatch feature (see the tests for kwx for an example). Partial implementations are more than welcome though!
Please first indicate your interest in working on this, as it is a feature implementation :)
This issue is for adding an embeddings neural network implementation to wikirec. This package was originally based on the linked blog post, but the original model implementation to now has not been included. That original work and the provided codes could serve as the basis to adding such a model to wikirec, which ideally would also be included in the documentation and tested. That model was based on analyzing the links between pages, which could serve as a basis for the wikirec version with modifications to wikirec.data_utils, or the model could focus on the article texts. Partial implementations are more than welcome though :)
Please first indicate your interest in working on this, as it is a feature implementation :)
This issue is to discuss and implement keys for wikirec.data_utils.input_conversion_dict to make it easier for people to find valid arguments to parse Wikipedia articles using wikirec.data_utils.parse_to_ndjson. Rather than needing to search for the given Infobox topic, a user could instead simply query the keys of input_conversion_dict for the desired language and see what would be valid values to pass to the topics argument. Suggestions and pull requests are welcome for any language :)