Comments (4)
Logic is easy enough (turning update_after etc. to/from seconds not shown):
import random
from dataclasses import dataclass
@dataclass
class Feed:
url: str
update_period: int
update_after: int = 0
# last_updated is only set when the feed is actually updated
# (not when it's not modified, not when there was an exception)
# https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L276
# https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L445
last_retrieved: int = 0
def get_feeds_for_update(feeds, now):
return [f for f in feeds if f.update_after <= now]
def next_period(feed, now, jitter_ratio=0):
jitter = random.random() * jitter_ratio
current_period_no = now // feed.update_period
return (current_period_no + 1 + jitter) * feed.update_period
def update_feeds(feeds, now, get_update_after=next_period):
to_update = get_feeds_for_update(feeds, now)
for feed in to_update:
feed.last_retrieved = now
feed.update_after = get_update_after(feed, now)
return to_update
def set_update_period(feed, update_period):
feed.update_period = update_period
feed.update_after = next_period(feed, feed.last_retrieved)
Tests:
from collections import Counter
from functools import partial
import pytest
@pytest.mark.parametrize('old_after, new_after, now', [
(0, 10, 0),
(0, 10, 1),
(0, 10, 9.999),
(0, 20, 10),
(5, 10, 5),
(10, 20, 10),
(105, 110, 109),
(105, 120, 110),
(105, 200, 199.999),
(105, 210, 200),
])
def test_update(old_after, new_after, now):
feeds = [Feed('one', 10, old_after)]
assert len(update_feeds(feeds, now)) == 1
assert feeds == [Feed('one', 10, new_after, now)]
@pytest.mark.parametrize('old_after, now', [
(5, 4),
(10, 9.999),
(20, 19),
])
def test_no_update(old_after, now):
feeds = [Feed('one', 10, old_after)]
assert len(update_feeds(feeds, now)) == 0
assert feeds == [Feed('one', 10, old_after)]
@pytest.mark.parametrize('get_update_after', [
next_period,
# jitter ratio less than 10-1, to account for time step
partial(next_period, jitter_ratio=.9),
])
def test_sweep(get_update_after):
feeds = [Feed('one', 10), Feed('two', 20), Feed('three', 100)]
counts = Counter()
for now in range(100):
for feed in update_feeds(feeds, now, get_update_after):
counts[feed.url] += 1
assert counts == {'one': 10, 'two': 5, 'three': 1}
def test_set_period_up():
feeds = [Feed('one', 10)]
update_feeds(feeds, 5)
set_update_period(feeds[0], 20)
# no update needed, already updated in this period
assert len(update_feeds(feeds, 15)) == 0
def test_set_period_down():
feeds = [Feed('one', 20)]
update_feeds(feeds, 5)
set_update_period(feeds[0], 10)
# update needed, if taking new period into account
assert len(update_feeds(feeds, 15)) == 1
On to the API!
Update: We can get rid of last_retrieve and rely on the current time in set_update_period(); all tests pass with minimal changes.
from reader.
API
Add update_feeds(scheduled: bool | None = None)
argument that filters feeds to update:
- if none: if global tag
.reader.update
is set (regardless of value), assume true, else assume false - if true: update only
update_after < now
- if false: update everything (no
update_after
filtering)
In reader 4.0 (#291), we can make scheduled
default to True (or just change the default behavior).
To configure the update interval, we can use a .reader.update
tag:
- can be set globally or per feed, feed overrides global overrides default
- format:
interval: int
(seconds),jitter: float|bool
(in [0,1])- for interval, we could also use cron expressions, or at least
@hourly, @daily, @weekly, @monthly
values
- for interval, we could also use cron expressions, or at least
- default:
{interval: 3600, jitter: 0.25}
Using tags is good because it allows configuring stuff without additional UI.
Possible cons (WIP):
- We need to find a way to reset
update_after
when the update interval changes (a dedicatedset_feed_update_period()
method can take care of this out of the box). One solution may be being able to register hooks on tag updates. - We want to validate the config. How do we handle validation errors? Ignore / use default, or raise to the user? The latter may be accomplished with a hook, although a standard validation mechanism would be better.
In the (low level) storage API:
- add
update_after
andlast_retrieved
to FeedUpdateIntent and FeedForUpdate get_feeds_for_update(after: datetime | None)
(also returns nulls)new
useslast_updated
, it should uselast_retrieved
set_feed_update_after()
(used when setting the interval)
from reader.
To do (minimal):
- add update_after and last_retrieved on Feed, FeedUpdateIntent, and FeedForUpdate
- get_feeds_for_update(after)
- (new should use last_retrieved)
- update_feeds(scheduled) / get_feeds(scheduled) / get_feed_counts(scheduled)
- update logic (just ignore invalid tag values and use default)
- expose scheduled in the CLI
- make the cli_status plugin entries make sense in a --scheduled world
- tests
- documentation
- docstrings
- user guide
- changelog, dev notes
Later:
- reset update_after after interval changes / set_feed_update_after()
- scheduled=None (default to false initially)
- config validation
from reader.
update_after and last_retrieved should go on FeedUpdateIntent, and in turn FeedUpdateIntent should have union of FeedData-with-extra-stuff or exception or None, but I'm having trouble finding a good name for FeedData-with-extra-stuff.
For reference, here's all the feed-related data classes and how they're used:
.-- FeedForUpdate ---.
v |
parser |
| |
ParsedFeed storage -.
(has FeedData) ^ |
v | |
updater | |
| | |
|- FeedUpdateIntent -' Feed
| (has ??? (has FeedData) |
| or ExceptionInfo |
| or None) |
| v
'---- UpdateResult -----> user
(has UpdatedFeed)
from reader.
Related Issues (20)
- Web app re-design wishlist
- make_reader(read_only=True)
- Entry.links
- Deal with deprecated TIMESTAMP sqlite3 converters in Python 3.12 HOT 3
- Automatic .dedupe.once.title, sometimes HOT 2
- Search sync simplification HOT 3
- Unstable support for multiple storage implementations
- Support Python 3.12
- get_entries(has_enclosures=...) should be a plugin(?) HOT 4
- Filter entries by entry tags
- Consider using pluggy for plugin management
- Simpler entries_by_recent index HOT 1
- User guide still has new_only instead of new
- AssertionError when running update_feeds() HOT 9
- How to add custom headers to the reader? HOT 1
- entry_dedupe flip-flops between entries HOT 3
- Support Python 3.13
- enclosure_tags: set genre to podcast based on tag
- Have you used the feedreader score service? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reader.