Giter VIP home page Giter VIP logo

fsrs-vs-sm17's Introduction

FSRS vs SM-17

All Contributors

It is a simple comparison between FSRS and SM-17. FSRS-v-SM16-v-SM17.ipynb is the notebook for the comparison.

Due to the difference between the workflow of SuperMemo and Anki, it is not easy to compare the two algorithms. I tried to make the comparison as fair as possible. Here is some notes:

  • The first interval in SuperMemo is the duration between creating the card and the first review. In Anki, the first interval is the duration between the first review and the second review. So I removed the first record of each card in SM-17 data.
  • There are six grades in SuperMemo, but only four grades in Anki. So I merged 0, 1 and 2 in SuperMemo to 1 in Anki, and mapped 3, 4, and 5 in SuperMemo to 2, 3, and 4 in Anki.
  • I use the R (SM17)(exp) recorded in sm18/systems/{collection_name}/stats/SM16-v-SM17.csv as the prediction of SM-17. Reference: Confusion among R(SM16), R(SM17)(exp), R(SM17), R est. and expFI.
  • To ensure FSRS has the same information as SM-17, I implement an online learning version of FSRS, where FSRS has zero knowledge of the future reviews as SM-17 does.
  • The results are based on the data from a small group of people. It may be different from the result of other SuperMemo users.

Metrics

We use two metrics in the FSRS benchmark to evaluate how well these algorithms work: log loss and a custom RMSE that we call RMSE (bins).

  • Log Loss (also known as Binary Cross Entropy): Utilized primarily for its applicability in binary classification problems, log loss serves as a measure of the discrepancies between predicted probabilities of recall and review outcomes (1 or 0). It quantifies how well the algorithm approximates the true recall probabilities, making it an important metric for model evaluation in spaced repetition systems.
  • Weighted Root Mean Square Error in Bins (RMSE (bins)): This is a metric engineered for the FSRS benchmark. In this approach, predictions and review outcomes are grouped into bins according to the predicted probabilities of recall. Within each bin, the squared difference between the average predicted probability of recall and the average recall rate is calculated. These values are then weighted according to the sample size in each bin, and then the final weighted root mean square error is calculated. This metric provides a nuanced understanding of model performance across different probability ranges.

Smaller is better. If you are unsure what metric to look at, look at RMSE (bins). That value can be interpreted as "the average difference between the predicted probability of recalling a card and the measured probability". For example, if RMSE (bins)=0.05, it means that that algorithm is, on average, wrong by 5% when predicting the probability of recall.

Result

Total users: 16

Total repetitions: 194,281

The following tables represent the weighted means and the 99% confidence intervals.

Weighted by number of repetitions

Algorithm Log Loss RMSE(bins)
FSRS-4.5 0.4±0.08 0.06±0.021
FSRSv4 0.4±0.09 0.07±0.025
FSRSv3 0.4±0.09 0.08±0.021
SM-17 0.4±0.10 0.08±0.020
SM-16 0.4±0.09 0.11±0.026

Weighted by ln(number of repetitions)

Algorithm Log Loss RMSE(bins)
FSRS-4.5 0.4±0.08 0.09±0.030
SM-17 0.5±0.10 0.10±0.029
FSRSv4 0.4±0.09 0.11±0.043
FSRSv3 0.5±0.10 0.11±0.035
SM-16 0.5±0.11 0.12±0.033

The image below shows the p-values obtained by running the Wilcoxon signed-rank test on the RMSE (bins) of all pairs of algorithms. Red means that the row algorithm performs worse than the corresponding column algorithm, and green means that the row algorithm performs better than the corresponding column algorithm. Grey means that the p-value is >0.05, and we cannot conclude that one algorithm performs better than the other.

It's worth mentioning that this test is not weighted, and therefore doesn't take into account that RMSE (bins) depends on the number of reviews.

Wilcoxon-16-collections

Share your data

If you would like to support this project, please consider sharing your data with us. The shared data will be stored in ./dataset/ folder.

You can open an issue to submit it: https://github.com/open-spaced-repetition/fsrs-vs-sm17/issues/new/choose

Contributors

leee_
leee_

🔣
Jarrett Ye
Jarrett Ye

🔣
天空守望者
天空守望者

🔣
reallyyy
reallyyy

🔣
shisuu
shisuu

🔣
Winston
Winston

🔣
Spade7
Spade7

🔣
John Qing
John Qing

🔣
WolfSlytherin
WolfSlytherin

🔣
HyFran
HyFran

🔣
Hansel221
Hansel221

🔣
曾经沧海难为水
曾经沧海难为水

🔣
Pariance
Pariance

🔣
github-gracefeng
github-gracefeng

🔣

fsrs-vs-sm17's People

Contributors

allcontributors[bot] avatar expertium avatar l-m-sherlock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

qqxx0011

fsrs-vs-sm17's Issues

[Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。
SM16-v-SM17.csv

[Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.
SM16-v-SM17.csv

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

Gather more data

I know that this isn't very helpful, and I can't contribute anything substantial or help in any way, so I'm just opening this issue to remind you that right now this benchmark is based on very limited data, and ideally we need 1000000+ reviews. So if you have any ideas where to find more SuperMemo users who are willing to share their data, it would be great.

NoHeartPen's [Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

SM16-v-SM17.csv

说明一下:虽然文件里包含了最近一年的数据,但我用 SuperMemo 用得不是那么多,大部分都是「三」天打鱼「三十」天晒网的状态233,希望我的「脏」数据也能有一定的参考价值:)

[Data]

SM16-v-SM17.csv
Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

[Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

[Data]

Data file
SM16-v-SM17.csv

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it befo
re uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

Something is wrong with the comparison

FSRS is indeed a good SRS model but, I can't believe that it would be better than SM-17. However, the results of the comparison made in this repo suggest that FSRS is better than SM-17.
image

This makes me think that there is something wrong with the comparison. However, I am unable to come up with reasonable causes for the poor performance of SM-17 against FSRS in this comparison.

[Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

SM16-v-SM17.csv

Add FSRS v4 and FSRS v3 to the comparison

Currently the table just says "FSRS".
image
Ideally, both FSRS v3 and FSRS v4 should be added. If you don't want to change v3 code just for benchmarking that's fine, although I think it would be interesting to see the difference. Also, please specify which version of FSRS is used in the benchmark. I know it's v4, but I still think it would be better to specify that explicitly.

[Data]

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.
SM16-v-SM17.csv
原神.csv
上古合集.csv

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。

T-test between FSRS and SM-17

Total number of users: 16
Total size: 194281

Scale: reviews
FSRSv4 LogLoss: 0.4±0.08
FSRSv4 LogLoss (mean±std): 0.370±0.101
FSRSv3 LogLoss: 0.4±0.09
FSRSv3 LogLoss (mean±std): 0.401±0.116
SM17 LogLoss: 0.4±0.10
SM17 LogLoss (mean±std): 0.414±0.121
SM16 LogLoss: 0.4±0.09
SM16 LogLoss (mean±std): 0.421±0.118

FSRSv4 RMSE(bins): 0.06±0.027
FSRSv4 RMSE(bins) (mean±std): 0.061±0.034
FSRSv3 RMSE(bins): 0.10±0.028
FSRSv3 RMSE(bins) (mean±std): 0.098±0.035
SM17 RMSE(bins): 0.10±0.039
SM17 RMSE(bins) (mean±std): 0.096±0.047
SM16 RMSE(bins): 0.12±0.027
SM16 RMSE(bins) (mean±std): 0.117±0.035
image

https://www.evanmiller.org/ab-testing/t-test.html

@Expertium, I find that the difference is statistically significant when we compare FSRS and SM-17 with their weighted mean and weighted standard deviation in T-test.

[Data] SM18 Data Share

Data file

You can find SM16-v-SM17.csv in sm18/systems/{collection_name}/stats folder. The private content is stored in column Title. I recommend removing it before uploading to GitHub and sharing. Don't forget to make a copy before you delete it. Then you can drag and drop the file here.

你可以在 sm18/systems/{collection_name}/stats 文件夹中找到 SM16-v-SM17.csv 文件。Title 列中可能包含隐私信息,建议在上传到 GitHub 分享之前删除它。删之前别忘了备份。然后你就可以将文件拖进这里上传分享。
SM16-v-SM17.csv
SM16-v-SM17.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.