w4 / stork Goto Github PK
View Code? Open in Web Editor NEW🐦 Scrapes a given source for all links without making a mess of your pots and pans
Home Page: https://docs.rs/stork
License: Other
🐦 Scrapes a given source for all links without making a mess of your pots and pans
Home Page: https://docs.rs/stork
License: Other
When starting on a website like stackoverflow you'll get tons of the "same" links back over and over again for a single page (like tags for example), we should probably make Storkable
's <T>
impl Hash
and keep a little vec with the hashes 'seen' so we don't return the same links over and over (though this functionality might be helpful or even necessary in some scenarios/protocols).
We store hashes of each direct descendant's Storkable::value
in each Storkable
as a fix for #1 and we don't expose this to consumers. Right now consumers have to create a flat Vec
and hash each Storkable::value
themselves to check if they've seen the link before - even if they've all come from the same parent.
We need to figure out a good API to see if a "value" has been seen before from any parent, for example if the heirarchy looks like this:
root
↳ 123
↳ 456
↳ 789
↳ abc
↳ def
↳ ghi
We'd expect root.seen(789)
and 123.seen(456)
to be true but 456.seen(123)
or abc.seen(123)
not to be.
Right now children only hold references to their parents and not vice-versa so this lookup can currently only be done in reverse.
A --same-origin
flag would make stork great for generating a public-view sitemap of a particular website quickly and easily.
Right now we only have a filter on And(Name("a"), Not(Attr("rel", "nofollow")))
. Maybe we should just match all <a href=...>
tags and the rel="nofollow"
requirement should just come under a default attribute Filter
or something?
Right now we're doing non-streaming synchronous document parsing using select
. Can we move this over something at least streaming based?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.