Comments (14)
Hi!
That's not a bad idea, I've had it in the passed but can't exactly remember why I didn't pursue it :/.
- You can actually directly try it using the arguments
highjack_due_query
andhighjack_rated_query
. I think you should try'deck:"my_deck" is:new'
anddeck:"my_deck" is:review
. - The
reference_order
should beorder_added
or possiblyrelative_overdueness
. - You would also have to adjust
score_adjustment_factor
to something weird like(1, 0.5)
to make sure you don't spread the cards wayyy too much.
Would you mind trying and reporting back?
Btw the tone of your message made it really nice to read and made me happy, have a great day too!
from anna_anki_neuronal_appendix.
Cool! Editing the file I noticed this:
stopwords_lang=["swedish", "english", "french"],
I'm currently learning Chinese and Swedish, do I need to edit this like that or something else?
Also, where do I put the deck:"my_deck" part? This is how my file looks atm:
deckname=None,
reference_order="order_added", # any of "lowest_interval", "relative overdueness", "order_added"
task="filter_review_cards", # any of "filter_review_cards", "bury_excess_review_cards", "bury_excess_learning_cards"
target_deck_size="80%", # format: 80%, 0.8, "all"
stopwords_lang=["swedish", "english", "french"],
rated_last_X_days=4,
score_adjustment_factor=(1, 0.5),
field_mappings="field_mappings.py",
acronym_file="acronym_file.py",
acronym_list=None,
# others:
minimum_due=15,
highjack_due_query=True,
highjack_rated_query=True,
log_level=2, # 0, 1, 2
replace_greek=True,
keep_OCR=True,
tags_to_ignore=None,
tags_separator="::",
fdeckname_template=None,
show_banner=True,
skip_print_similar=False,
from anna_anki_neuronal_appendix.
Hi!
-
Stopwords are for example the words in bold in "this is the best thing since a recent event`".
You should always try to add the stopwords of the language in question when using AnnA but I don't know enough about chinese language to know if it's actually relevant here.
An issue is that stop words are removed before using TF_IDF so also before tokenization. This could maybe be an issue with chinese but I don't know...
Either way keeping stopwords is actually not very penalizing.
It should also rarely be an issue to add too many languages to the stop words list. -
I see you set the highjack values to True, this is not how it works at all, I edited the README.md, hopping it is now a bit clearer :)
from anna_anki_neuronal_appendix.
Cool, understanding stopwords now! What would I need to set the highjack values to? I read the readme file but there are no possible values
from anna_anki_neuronal_appendix.
Open anki's browser, look for cards using a search query for example deck:"my_deck" is:due -rated:14 flag:1
.
This query is the way you ask anki to find cards.
Well highjack arguments by default are set to None
to disable them, but they can contain the same kind of queries as string :
highjack_rated_query
is the query originaly used to find the cards that you rated in the last few days but if you highjack it you can set it to whatever you want.highjack_due_query
is the query originaly used to find which cards are due.
Tell me if it's more clear, in which case I'll link the README to this issue.
from anna_anki_neuronal_appendix.
Oh! Perfectly understood now...hahaha I didn't get it at first. They're now like this:
highjack_due_query='deck:"Swedish" is:new',
highjack_rated_query='deck:"Swedish" is:review',
from anna_anki_neuronal_appendix.
You might want to add something like rated:14
in the rated_query, depending on the size of your deck.
Don't forget to tell me if it works :) I suggest lowering the score adjustment factor to (1, 0.1) to try and see if it's better.
from anna_anki_neuronal_appendix.
I've got approximately 15.000 sentence cards which I use to mine the language, so I'll try these out and report back! Thanks so much for your time :)
from anna_anki_neuronal_appendix.
Working!! Swedish worked flawlessly :) Will report back with my chinese deck.
from anna_anki_neuronal_appendix.
Hey again! So this error pops up when running the script on my chinese deck. Probably I'm running out of memory because my notebook only has 4gb of ram. I googled and it seems to be a python problem and not your script's. Anyways, maybe this'll happen to other people, so maybe you need to implement something here?
Vectorizing text using TFIDF: 100%|███████████████████████████████████████████████████████████████████████████| 23366/23366 [00:03<00:00, 6952.34it/s]
Reducing dimensions to 100 using SVD... Explained variance ratio after SVD on Tf_idf: 98.2%
Computing distance matrix on all available cores...
Killed
from anna_anki_neuronal_appendix.
Hi,
I implemented the argument "low_power_mode". If you set it to true, the tokenizer will use unigram instead of ngrams between 1 and 5.
This should considerably reduce the number of computation.
It's currently only in the dev branch, if you test it and i works I'll merge it with main.
Another thing you might want to test afterwards please is lowering TFIDF_dim, currently 100 dimensions is enough for 98.2% of the variance, which means you are wayyy overdoing it.
from anna_anki_neuronal_appendix.
Reporting back!
Working splendidly after allocating more swap to the computer :)
low power mode and TFIDF_dim=60 resulted in python3 not being killed when analyzing a subdeck with 5k cards.
Trying either with TFIDF_dim=100 or 60, with or without low power mode on my main deck of 23k cards caused a kill and it never works.
Thanks so much for your help!!
from anna_anki_neuronal_appendix.
Muchos gracias por tu mensaje! (btw, the name of this software is from an argentinian person :) )
I think it's better to use low_power_mode than to reduce the number of dim drastically.
That being said, the number of dim can and should be reduced anyway if you see it's keeping more than say 70% of the variance IMO.
from anna_anki_neuronal_appendix.
Oh! That's so cool :)
Okay, I'll write down 70%... luckily it's working and putting out no less than 95% with dim 60 and it's speeding up the process a lot :D
from anna_anki_neuronal_appendix.
Related Issues (20)
- Support custom tag level separators HOT 1
- Deduplicate words from tags HOT 3
- Use "127.0.0.1" instead of "localhost"?
- Error with Mandarin Chinese HOT 1
- Implementing a frequency list for language learning HOT 2
- Do not use "/" in the cache directory name HOT 1
- Do not use collections in Trash HOT 2
- Negative improvement ratio all the time since updates HOT 8
- Crash AnnA.py(1517)_compute_opti_rev_order() ro = -1 * (df.loc[due, "interval"].values + HOT 1
- Could you please make a YouTube video going through the setup of AnnA HOT 1
- FSRS-AnnA Integration HOT 19
- [Anki 23.10 beta 6] No module named 'PyQt5' HOT 1
- the one sentence summary is actually three sentence HOT 1
- Numpy wheels Python 3.10 HOT 3
- Several notetypes match, wrong one selected HOT 1
- Doesn't create logs.txt, and therefore crashes HOT 1
- tqdm.std.TqdmKeyError: "Unknown argument(s): {'delay': 2}" HOT 3
- Exception: 'Scheduler' object has no attribute 'rebuild_filtered_deck' (Anki 2.1.22) HOT 8
- Move the tag ignoring settings to a user-customizable place? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anna_anki_neuronal_appendix.