<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There are 2 simple ways to solve this: <div class="snippet-clipboard-content notra

For 2, here is a extreme case: <a target="_blank" rel="noopener nore

[BUG] Initial stability for "Good" will be larger than for "Easy" if "Good" has more datapoint about fsrs-optimizer HOT 24 CLOSED

L-M-Sherlock commented on June 3, 2024

[BUG] Initial stability for "Good" will be larger than for "Easy" if "Good" has more datapoint

from fsrs-optimizer.

Comments (24)

commented on June 3, 2024 2

Maybe decide this based on the number of datapoints in each case?

if S0_good > S0_easy:
    if n_datapoints_good > n_datapoints_easy:
        S0_easy = S0_good
    else:
        S0_good = S0_easy

However if you look at the table in #5 (comment), there are cases where S0_again > S0_hard or S0_hard > S0_good so this issue is not limited to the pair of Good and Easy.

from fsrs-optimizer.

Expertium commented on June 3, 2024

There are 2 simple ways to solve this:

if S0_good > S0_easy:
    S0_good = S0_easy

if S0_good > S0_easy:
    S0_easy = S0_good

In the first method we artificially decrease S0 for Good, in the second method we artificially increase S0 for Easy. I don't know which one makes more sense, but probably the latter. If S0 for Good is based on a larger number of reviews, then it is calculated more accurately than S0 for Easy, and therefore we shouldn't change it and instead we should change the less accurate S0 for Easy.

from fsrs-optimizer.

user1823 commented on June 3, 2024

In my opinion, the second approach makes more sense.

from fsrs-optimizer.

Expertium commented on June 3, 2024

I suppose the idea above should be applied to all pairs: Again-Hard, Hard-Good and Good-Easy.

from fsrs-optimizer.

Expertium commented on June 3, 2024

@L-M-Sherlock here are some good ideas:

The one above by nb9618, but apply it to all pairs: Again-Hard, Hard-Good and Good-Easy. There will likely be issues with that, though. I don't expect it to work on the first try without creating new problems.
When using additive smoothing, instead of using retention of the entire collection/deck, only use retention based on second reviews to calculate p0 (the initial guess).
When using the outlier filter based on IQR, use ln(delta_t) rather than delta_t itself. Filtering based on IQR doesn't work well on data that isn't normally distributed, and delta_t certainly isn't.

Of course, all of these changes should be evaluated using statistical significance tests, I hope by now you have set up an automated system to run tests on all 66 collections.

Oh, also: in the scheduler code change // recommended setting: 0.8 ~ 0.9 to // recommended values: 0.75 ~ 0.97

from fsrs-optimizer.

Expertium commented on June 3, 2024

@L-M-Sherlock you've been inactive for a couple of days, so there is a good chance you missed my comment above. I'm pinging you just to remind you about it.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

I am just tired to maintain the optimizer module. You can check these parameters in the batch training in collected data: open-spaced-repetition/fsrs4anki#351 (comment). There are some cases where the initial stability of again is large than the initial stability or the initial stability of good is large than the initial stability of easy. These cases would have different reason. We should deal with these problem according to the concrete cases.

from fsrs-optimizer.

Expertium commented on June 3, 2024

Ok, forget about 1, but I would still ask you to test 2 and 3.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

For 2, here is a extreme case:

The user always remember in the next review when he pressed easy in the first learning. In this case, the retention is 100%. If we use this value, the additive smoothing will be useless.

from fsrs-optimizer.

Expertium commented on June 3, 2024

I think you misunderstood my idea a little bit. I didn't mean "use four different initial guesses for each grade", I meant "use the same initial guess for each grade". So just calculate average retention for all second reviews.

from fsrs-optimizer.

Expertium commented on June 3, 2024

By the way, have you automated running statistical significance tests on all collections?

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

3. When using the outlier filter based on IQR, use ln(delta_t) rather than delta_t itself. Filtering based on IQR doesn't work well on data that isn't normally distributed, and delta_t certainly isn't.

I'm testing this in all 66 collections.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

I think you misunderstood my idea a little bit. I didn't mean "use four different initial guesses for each grade", I meant "use the same initial guess for each grade". So just calculate average retention for all second reviews.

OK. I will test it after above test. It will cost nearly 3 hours.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

I'm testing this in all 66 collections.

Before:

Weighted RMSE: 0.04149183369953192
Weighted Log loss: 0.3815897150075234
Weighted MAE: 0.02342977913950602
Weighted R-squared: 0.7697902622572932

After:

Weighted RMSE: 0.04174954832152736
Weighted Log loss: 0.38212856042129156
Weighted MAE: 0.02374078044685508
Weighted R-squared: 0.7672438581669868

p = 0.0045 (for RMSE)

3. When using the outlier filter based on IQR, use ln(delta_t) rather than delta_t itself. Filtering based on IQR doesn't work well on data that isn't normally distributed, and delta_t certainly isn't.

It's worse than the current version with statistical significance.

Here is the code:

        def remove_outliers(group: pd.DataFrame) -> pd.DataFrame:
            # threshold = np.mean(group['delta_t']) * 1.5
            # threshold = group['delta_t'].quantile(0.95)
            Q1 = group['delta_t'].map(np.log).quantile(0.25)
            Q3 = group['delta_t'].map(np.log).quantile(0.75)
            IQR = Q3 - Q1
            threshold = Q3 + 1.5 * IQR
            group = group[group['delta_t'].map(np.log) <= threshold]
            return group

from fsrs-optimizer.

Expertium commented on June 3, 2024

Huh, I'm surprised. Maybe the more data is removed, the easier it is for FSRS to fit the remaining data well? In other words, what if we cannot rely on RMSE when removing outliers because, between two methods that both aim at removing outliers, the one that removes more data will always result in a lower RMSE?

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

Removing more data not always results in a lower RMSE. Removing too many data might lead to underfitting, where the model fails to capture the underlying trend of the data. This can also increase the RMSE.

from fsrs-optimizer.

Expertium commented on June 3, 2024

Alright, then test the idea with p0 for additive smoothing, and that's it.
After that I would like you to benchmark all 5 algorithms, I'll explain it in a bit more detail in the relevant issue.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

additive smoothing:

Weighted RMSE: 0.04147353655819303
Weighted Log loss: 0.3815885589708383
Weighted MAE: 0.023376754517799636
Weighted R-squared: 0.7699164899424069

p=0.38

It is slightly better but not statistically significant.

from fsrs-optimizer.

user1823 commented on June 3, 2024

Removing more data not always results in a lower RMSE. Removing too many data might lead to underfitting, where the model fails to capture the underlying trend of the data. This can also increase the RMSE.

I agree that removing more data would not always result in a lower RMSE.

But here, we are selectively removing the data which lies at the right-hand-side of the curve (and not any random data). So, the remaining data is more homogenous and this might explain why the RMSE is lower.

from fsrs-optimizer.

Expertium commented on June 3, 2024

So, the remaining data is more homogenous and this might explain why the RMSE is lower.

Yeah, I'm just surprised that my approach is somehow worse, even though in theory IQR should work better with normally distributed data.

from fsrs-optimizer.

user1823 commented on June 3, 2024

I think that the increase in RMSE that we saw when using log of delta_t is just an artifact.

For example, when the optimizer filtered out all the cards with first rating = Again in my collection, the RMSE got a crazy low value (0.0056). I first mentioned this here: open-spaced-repetition/fsrs4anki#348 (comment)

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

I think that the increase in RMSE that we saw when using log of delta_t is just an artifact.

So we should not only consider the RMSE, right? We should have other criterion to make decision whether a idea should be employed in FSRS.

from fsrs-optimizer.

L-M-Sherlock commented on June 3, 2024

Maybe decide this based on the number of datapoints in each case?

I will adopt this idea, not for the sake of enhancing the model's accuracy, but to alleviate users' confusion. Therefore, I would not to run evaluation tests.

from fsrs-optimizer.

user1823 commented on June 3, 2024

I think that the increase in RMSE that we saw when using log of delta_t is just an artifact.

So we should not only consider the RMSE, right? We should have other criterion to make decision whether a idea should be employed in FSRS.

Yes, but I don't know which metric would be appropriate in this case.

Also, let's discuss this further in #16.

from fsrs-optimizer.

[BUG] Initial stability for "Good" will be larger than for "Easy" if "Good" has more datapoint about fsrs-optimizer HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent