Comments (4)
What about get_feeds(broken=..., updates_enabled=..., new=...)?
Where do we draw the line? Is this turning into #253? (DynamoDB has rotted my brain.)
from reader.
Related: http://howto.philippkeller.com/2005/04/24/Tags-Database-schemas/, vaguely reminiscent of https://en.wikipedia.org/wiki/Star_schema; also see https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
What would reader look like if you could only filter and sort by tags?
- Both feed (user) title and entry recent sort are derived values, so we wouldn't have an issue here.
- Ensuring consistency would be left to reader code (e.g. required "tag" attributes, updating computed values, modeling tristate attributes like important).
- Idem for a lot of migrations (which may be better for databases that do not support schema transactions).
- What would indices look like? (use cases: filter by [tag1, tag2, ...], sort by tag1 value, count by tag1)
from reader.
So, based on various SQLite forum threads, the general conclusion seems to be "don't bother – design your schema as you normally would, and add indexes as needed later on"; in fairness, this is something I already knew, but as I said, DynamoDB has rotted my brain.
I also tentatively removed has_enclosures, and it didn't remove all that much code.
So:
- has_enclosure may become a tag, but that would likely lock in a performance penalty (right now, we don't have indexes on it, but if we make it a tag it won't be possible to add one)
- it might be interesting to see what query performance looks like with tags, though (update: slightly worse, see #327 (comment))
- the more specific enclosures filtering (e.g. .has-audio-enclosures) can still be achieved via the plugin
- read and important are integral to the data model / filtering, so we still want them as regular columns
- related: #253
- feed filtering attributes do not matter all that much, since feeds are both much fewer and smaller than entries (a feeds full table scan is likely negligible)
- on one hand, this is an argument for "do nothing", since the code is already there
- on the other hand, it may be an argument for "use tags" (since we can afford the performance penalty)
- we may do this once we can set tags in a transaction, and get tags in a single query
from reader.
Ran some benchmarks, here's a summary:
- With only the
has-enclosures
entry tag, there seems to be almost no difference between using has_enclosures or the tag. - Adding a 1-2 more tags to each entry seems to make using tags only a bit worse.
- Adding 20 more tags to each entry seems to make using tags ~1.5x worse.
Single entry tag results.
Given a has-enclosures
entry tag set like this:
$ python -c '
from reader import make_reader
reader = make_reader("db.sqlite")
for e in reader.get_entries(has_enclosures=True):
reader.set_tag(e, "has-enclosures")
print(reader.get_entry_counts())
'
EntryCounts(total=21609, read=15614, important=222, has_enclosures=3978, averages=(0.0, 6.868131868131868, 10.117808219178082))
...and this benchmark script:
export BENCH_TIME_STAT='avg min'
lines='for _ in reader.get_entries(has_enclosures=True): pass
for _ in reader.get_entries(tags=["has-enclosures"]): pass
for _ in reader.get_entries(has_enclosures=True, limit=100): pass
for _ in reader.get_entries(tags=["has-enclosures"], limit=100): pass
for _ in reader.search_entries("python", has_enclosures=True): pass
for _ in reader.search_entries("python", tags=["has-enclosures"]): pass
for _ in reader.search_entries("python", has_enclosures=True, limit=20): pass
for _ in reader.search_entries("python", tags=["has-enclosures"], limit=20): pass'
while IFS= read -r line; do
echo "# $line"
sync && sudo purge
python scripts/bench.py time snippet -r10 --snippet "$line"
done <<< "$lines"
The output is:
# for _ in reader.get_entries(has_enclosures=True): pass
stat number repeat snippet
avg 1 10 0.702
min 1 10 0.374
# for _ in reader.get_entries(tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 0.571
min 1 10 0.393
# for _ in reader.get_entries(has_enclosures=True, limit=100): pass
stat number repeat snippet
avg 1 10 0.022
min 1 10 0.010
# for _ in reader.get_entries(tags=["has-enclosures"], limit=100): pass
stat number repeat snippet
avg 1 10 0.020
min 1 10 0.010
# for _ in reader.search_entries("python", has_enclosures=True): pass
stat number repeat snippet
avg 1 10 0.538
min 1 10 0.384
# for _ in reader.search_entries("python", tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 0.514
min 1 10 0.395
# for _ in reader.search_entries("python", has_enclosures=True, limit=20): pass
stat number repeat snippet
avg 1 10 0.250
min 1 10 0.110
# for _ in reader.search_entries("python", tags=["has-enclosures"], limit=20): pass
stat number repeat snippet
avg 1 10 0.226
min 1 10 0.112
1-2 entry tags results.
Extra tags were set for read and (un)important like so:
$ python -c '
from reader import make_reader
reader = make_reader("db.sqlite")
for e in reader.get_entries():
if e.read:
reader.set_tag(e, "read")
if e.important is True:
reader.set_tag(e, "important")
if e.important is False:
reader.set_tag(e, "unimportant")
'
Output (same script as before, but only for the tags snippets):
# for _ in reader.get_entries(tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 0.592
min 1 10 0.408
# for _ in reader.get_entries(tags=["has-enclosures"], limit=100): pass
stat number repeat snippet
avg 1 10 0.022
min 1 10 0.011
# for _ in reader.search_entries("python", tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 0.536
min 1 10 0.408
# for _ in reader.search_entries("python", tags=["has-enclosures"], limit=20): pass
stat number repeat snippet
avg 1 10 0.245
min 1 10 0.115
20+ entry tags results.
Extra tags were set for read and (un)important like so:
$ python -c '
from reader import make_reader
reader = make_reader("db.sqlite")
tags = "one two three four five six seven eight nine ten eleven twelve thirteen fourteen sixteen seventeen eighteen nineteen twenty".split()
for e in reader.get_entries():
for tag in tags:
reader.set_tag(e, tag)
'
Output (same script as before, but only for the tags snippets):
# for _ in reader.get_entries(tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 1.170
min 1 10 0.613
# for _ in reader.get_entries(tags=["has-enclosures"], limit=100): pass
stat number repeat snippet
avg 1 10 0.042
min 1 10 0.016
# for _ in reader.search_entries("python", tags=["has-enclosures"]): pass
stat number repeat snippet
avg 1 10 0.789
min 1 10 0.548
# for _ in reader.search_entries("python", tags=["has-enclosures"], limit=20): pass
stat number repeat snippet
avg 1 10 0.342
min 1 10 0.174
from reader.
Related Issues (20)
- Web app re-design wishlist
- make_reader(read_only=True)
- Entry.links
- Deal with deprecated TIMESTAMP sqlite3 converters in Python 3.12 HOT 3
- Automatic .dedupe.once.title, sometimes HOT 2
- Search sync simplification HOT 3
- Unstable support for multiple storage implementations
- Support Python 3.12
- Filter entries by entry tags
- Consider using pluggy for plugin management
- Simpler entries_by_recent index HOT 1
- Different feed update frequencies HOT 4
- User guide still has new_only instead of new
- AssertionError when running update_feeds() HOT 9
- How to add custom headers to the reader? HOT 1
- entry_dedupe flip-flops between entries HOT 3
- Support Python 3.13
- enclosure_tags: set genre to podcast based on tag
- Have you used the feedreader score service? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reader.