Giter VIP home page Giter VIP logo

Comments (4)

lemon24 avatar lemon24 commented on August 15, 2024

Logic is easy enough (turning update_after etc. to/from seconds not shown):

import random
from dataclasses import dataclass

@dataclass
class Feed:
    url: str
    update_period: int
    update_after: int = 0
    # last_updated is only set when the feed is actually updated
    # (not when it's not modified, not when there was an exception)
    # https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L276
    # https://github.com/lemon24/reader/blob/3.12/src/reader/_update.py#L445
    last_retrieved: int = 0

def get_feeds_for_update(feeds, now):
    return [f for f in feeds if f.update_after <= now]

def next_period(feed, now, jitter_ratio=0):
    jitter = random.random() * jitter_ratio
    current_period_no = now // feed.update_period
    return (current_period_no + 1 + jitter) * feed.update_period

def update_feeds(feeds, now, get_update_after=next_period):
    to_update = get_feeds_for_update(feeds, now)
    for feed in to_update:
        feed.last_retrieved = now
        feed.update_after = get_update_after(feed, now)
    return to_update

def set_update_period(feed, update_period):
    feed.update_period = update_period
    feed.update_after = next_period(feed, feed.last_retrieved)
Tests:
from collections import Counter
from functools import partial
import pytest


@pytest.mark.parametrize('old_after, new_after, now', [
    (0, 10, 0),
    (0, 10, 1),
    (0, 10, 9.999),
    (0, 20, 10),
    (5, 10, 5),
    (10, 20, 10),
    (105, 110, 109),
    (105, 120, 110),
    (105, 200, 199.999),
    (105, 210, 200),
    
])
def test_update(old_after, new_after, now):
    feeds = [Feed('one', 10, old_after)]
    assert len(update_feeds(feeds, now)) == 1
    assert feeds == [Feed('one', 10, new_after, now)]    


@pytest.mark.parametrize('old_after, now', [
    (5, 4),
    (10, 9.999),
    (20, 19),
])
def test_no_update(old_after, now):
    feeds = [Feed('one', 10, old_after)]
    assert len(update_feeds(feeds, now)) == 0
    assert feeds == [Feed('one', 10, old_after)]    


@pytest.mark.parametrize('get_update_after', [
    next_period, 
    # jitter ratio less than 10-1, to account for time step
    partial(next_period, jitter_ratio=.9),
])
def test_sweep(get_update_after):
    feeds = [Feed('one', 10), Feed('two', 20), Feed('three', 100)]
    
    counts = Counter()
    for now in range(100):
        for feed in update_feeds(feeds, now, get_update_after):
            counts[feed.url] += 1
            
    assert counts == {'one': 10, 'two': 5, 'three': 1}


def test_set_period_up():
    feeds = [Feed('one', 10)]
    update_feeds(feeds, 5)
    
    set_update_period(feeds[0], 20)
    
    # no update needed, already updated in this period
    assert len(update_feeds(feeds, 15)) == 0


def test_set_period_down():
    feeds = [Feed('one', 20)]
    update_feeds(feeds, 5)
    
    set_update_period(feeds[0], 10)
    
    # update needed, if taking new period into account
    assert len(update_feeds(feeds, 15)) == 1
    

On to the API!

Update: We can get rid of last_retrieve and rely on the current time in set_update_period(); all tests pass with minimal changes.

from reader.

lemon24 avatar lemon24 commented on August 15, 2024

API

Add update_feeds(scheduled: bool | None = None) argument that filters feeds to update:

  • if none: if global tag .reader.update is set (regardless of value), assume true, else assume false
  • if true: update only update_after < now
  • if false: update everything (no update_after filtering)

In reader 4.0 (#291), we can make scheduled default to True (or just change the default behavior).


To configure the update interval, we can use a .reader.update tag:

  • can be set globally or per feed, feed overrides global overrides default
  • format: interval: int (seconds), jitter: float|bool (in [0,1])
    • for interval, we could also use cron expressions, or at least @hourly, @daily, @weekly, @monthly values
  • default: {interval: 3600, jitter: 0.25}

Using tags is good because it allows configuring stuff without additional UI.

Possible cons (WIP):

  • We need to find a way to reset update_after when the update interval changes (a dedicated set_feed_update_period() method can take care of this out of the box). One solution may be being able to register hooks on tag updates.
  • We want to validate the config. How do we handle validation errors? Ignore / use default, or raise to the user? The latter may be accomplished with a hook, although a standard validation mechanism would be better.

In the (low level) storage API:

  • add update_after and last_retrieved to FeedUpdateIntent and FeedForUpdate
  • get_feeds_for_update(after: datetime | None) (also returns nulls)
    • new uses last_updated, it should use last_retrieved
  • set_feed_update_after() (used when setting the interval)

from reader.

lemon24 avatar lemon24 commented on August 15, 2024

To do (minimal):

  • add update_after and last_retrieved on Feed, FeedUpdateIntent, and FeedForUpdate
  • get_feeds_for_update(after)
    • (new should use last_retrieved)
  • update_feeds(scheduled) / get_feeds(scheduled) / get_feed_counts(scheduled)
  • update logic (just ignore invalid tag values and use default)
  • expose scheduled in the CLI
  • make the cli_status plugin entries make sense in a --scheduled world
  • tests
  • documentation
    • docstrings
    • user guide
    • changelog, dev notes

Later:

  • reset update_after after interval changes / set_feed_update_after()
  • scheduled=None (default to false initially)
  • config validation

from reader.

lemon24 avatar lemon24 commented on August 15, 2024

update_after and last_retrieved should go on FeedUpdateIntent, and in turn FeedUpdateIntent should have union of FeedData-with-extra-stuff or exception or None, but I'm having trouble finding a good name for FeedData-with-extra-stuff.

For reference, here's all the feed-related data classes and how they're used:

        .-- FeedForUpdate ---.
        v                    |
     parser                  |
        |                    | 
    ParsedFeed            storage -. 
  (has FeedData)             ^     |
        v                    |     |
     updater                 |     |
        |                    |     |
        |- FeedUpdateIntent -'    Feed
        | (has ??? (has FeedData)  |
        |   or ExceptionInfo       |
        |   or None)               |
        |                          v
        '---- UpdateResult -----> user
            (has UpdatedFeed)

from reader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.