Comments (15)
Incremental pull of updated repos is now available at:
https://github.com/ivarref/dewey/blob/main/src/com/phronemophobic/dewey.clj
You can evaluate (def all-repos-vec (vec (find-clojure-repos))))
and it should only fetch the most recent repos (given that CWD contains all-repos.tsv
.)
Can you try it out and see if it works as expected for you?
It's something like an hour since I ran the update, and for me the console now outputs:
(def all-repos-vec (vec (find-clojure-repos)))
continuing at {:url https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2023-06-21T15%3A30%3A48Z&page=4, :pushed_at nil}
0 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2023-06-21T15%3A30%3A48Z&page=4", :pushed_at nil}
new items: 14 , total items: 86186 , spent 2655 ms
1 {:pushed_at "2023-06-25T21:31:18Z"}
limit remaining 29
new items: 0 , total items: 86186 , spent 664 ms
all-repos.tsv
is an append-only file (for now). I'm sure it and the code could be improved.
Edit: I'm not 100% sure this method is bulletproof. Can you see any problems with it? It's sorting by asc
, doing basic de-duplication and continuing at the place where it last got results.
Sorry about formatting diff in the commit.
Regards.
from dewey.
Hi again, and thanks!
I discovered a bug when continuing from a &page=10
url: the indexing would then stop.
It's fixed in main at https://github.com/ivarref/dewey
I changed the data format to be new line delimited edn. all-repos.edn
is now 456M! GitHub only allows for 100M per file.
all-repos.edn
now also stores all information returned from the GitHub query. I'm not sure how much of that information is actually useful/needed.
git-lfs is one option for storing large files, and it's recommended by GitHub. Any thoughts on that?
Each line/entry in all-repos.edn
contains the key :session-index
. For new items the value will be one more than the previous run of find-clojure-repos
, i.e. you could use this value to select only what has changed since last run. Does that make sense? I'm not sure I explained it very well.
BTW: My original goal when looking at this code, was to create something similar to https://github.com/phronmophobic/add-deps but for the CLI and tools.deps/gitlibs.
from dewey.
Expanding the number of clojure libraries dewey can index sounds great. There's a few different changes I'm thinking about for the data pipeline, so I'm not exactly sure what timeline for integrating this kind of change would look like.
I was able to get 4431 repositories with 2 stars using this code:
Do you think that's close to the total number of 2 star repositories? If not, what do you think is the limiting factor?
from dewey.
Do you think that's close to the total number of 2 star repositories?
Yes, that's close to it, re-running it now gives 4477.
There is one theoretical bug in the "algorithm": if 1000+ repositories have the exact same pushed_at, the iteration could be stopped prematurely. I think that theoretical chance is safe to rule out. Otherwise I don't know any shortcomings of this method.
I ran the whole thing now (removed stars:2
), and ended up at a total of 89245 Clojure repositories.
stardev, which is updated now and then - I'm not sure how often - gives 85761
as the number of Clojure repositories. The numbers are thus roughly the same.
from dewey.
I ran the whole thing now (removed stars:2), and ended up at a total of 89245 Clojure repositories.
So cool! How long did that take? Maybe that's an even better method. Also, in theory, once we have an update to date list, we would only need to query for libraries pushed since the last time the process was run.
89,245 is a lot of repositories to analyze every week, but it's probably not so bad if we only re-analyze libraries that have changed.
from dewey.
I'm getting about ~30 repos/second:
(defn find-clojure-repos []
(iteration
(with-retries
(fn [{:keys [start-time cnt url pushed_at last-response] :as k}]
(prn cnt (select-keys k [:url :pushed_at]))
(let [start-time (or start-time (System/currentTimeMillis))
req
(cond
;; initial request
(= cnt 0) (search-repos-request "language:clojure")
;; received next-url
url (assoc base-request :url url)
;; received star number
pushed_at (search-repos-request (str "language:clojure pushed:<=" pushed_at))
:else (throw (Exception. (str "Unexpected key type: " (pr-str k)))))]
(rate-limit-sleep! last-response)
(let [response (http/request (with-auth req))
prev-items (into #{} (get-in last-response [:body :items] []))
page-items (get-in response [:body :items] [])
new-items (vec (remove (partial contains? prev-items) page-items))
new-cnt (+ cnt (count new-items))
spent-time-seconds (/ (max 1 (- (System/currentTimeMillis) start-time))
1000)
repos-per-second (/ new-cnt spent-time-seconds)]
(println "Repos/second:" (format "%.1f" (double repos-per-second)))
(-> response
(assoc :cnt new-cnt)
(assoc :start-time start-time)
(assoc ::key k
::request req)
(assoc-in [:body :items] new-items))))))
:kf
(fn [{:keys [cnt] :as response}]
(let [url (-> response :links :next :href)]
(when-let [m (if url
{:url url}
(when-let [pushed_at (some-> response :body :items last :pushed_at)]
{:pushed_at pushed_at}))]
(merge m
(select-keys response [:cnt :start-time])
{:last-response response}))))
:initk {:cnt 0}))
and 90K repos / 30s = 3000 seconds => ~50 minutes to fetch it all.
(edit: Forgive me if my math is way off, it's late and I didn't double check anything. But 50 minutes sounds reasonable..)
It sounds like a good idea to avoid re-analyzing everything.
from dewey.
Do you have an enterprise account? I thought the normal rate limiting was around 5k/hr.
from dewey.
I suspect I wasn't being clear enough? I don't have an enterprise account.
For me it was evaluating:
(def all-repos (vec (find-clojure-repos)))
that took 50 minutes.
Does that make sense?
from dewey.
Does this method include github's rate limiting?
Otherwise, I'm trying to figure out how it gets the data so quickly while staying under github's rate limit.
from dewey.
Yes, I believe it does.
Are you sure that com.phronemophobic.dewey.util/auth
is a proper value?
I noticed that nil
is a legal value (and no warning will be printed).
Here is the exact code that's running:
https://github.com/ivarref/dewey
I believe I created the token with as little permissions as possible. Maybe that makes a difference?
Does that help?
And a sample output from the console:
(def all-repos (vec (find-clojure-repos)))
0 {:pushed_at nil}
Time for http/request ... SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
3379 ms
Repos/second: 29.6
100 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=2"}
limit remaining 29
Time for http/request ... 2633 ms
Repos/second: 33.2
200 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=3"}
limit remaining 28
Time for http/request ... 4666 ms
Repos/second: 28.0
300 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=4"}
limit remaining 27
Time for http/request ... 2885 ms
Repos/second: 29.4
400 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=5"}
limit remaining 26
Time for http/request ... 2874 ms
Repos/second: 30.4
500 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=6"}
limit remaining 25
Time for http/request ... 2800 ms
Repos/second: 31.1
600 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=7"}
limit remaining 24
Time for http/request ... 2969 ms
Repos/second: 31.5
700 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=8"}
limit remaining 23
Time for http/request ... 2635 ms
Repos/second: 32.1
800 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=9"}
limit remaining 22
Time for http/request ... 2528 ms
Repos/second: 32.8
900 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure&page=10"}
limit remaining 21
Time for http/request ... 3603 ms
Repos/second: 32.2
1000 {:pushed_at "2010-12-22T17:06:58Z"}
limit remaining 20
Time for http/request ... 2627 ms
Repos/second: 32.6
1099 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=2"}
limit remaining 19
Time for http/request ... 2885 ms
Repos/second: 32.8
1199 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=3"}
limit remaining 18
Time for http/request ... 2792 ms
Repos/second: 33.0
1299 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=4"}
limit remaining 17
Time for http/request ... 2933 ms
Repos/second: 33.1
1399 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=5"}
limit remaining 16
Time for http/request ... 2861 ms
Repos/second: 33.2
1499 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=6"}
limit remaining 15
Time for http/request ... 2622 ms
Repos/second: 33.5
1599 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=7"}
limit remaining 14
Time for http/request ... 2728 ms
Repos/second: 33.6
1699 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=8"}
limit remaining 13
Time for http/request ... 2826 ms
Repos/second: 33.7
1799 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=9"}
limit remaining 12
Time for http/request ... 2835 ms
Repos/second: 33.8
1899 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2010-12-22T17%3A06%3A58Z&page=10"}
limit remaining 11
Time for http/request ... 2927 ms
Repos/second: 33.8
1999 {:pushed_at "2011-09-17T18:08:12Z"}
limit remaining 10
Time for http/request ... 2957 ms
Repos/second: 33.8
2098 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=2"}
limit remaining 9
Time for http/request ... 2719 ms
Repos/second: 33.9
2198 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=3"}
limit remaining 29
Time for http/request ... 3089 ms
Repos/second: 33.8
2298 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=4"}
limit remaining 28
Time for http/request ... 3057 ms
Repos/second: 33.8
2398 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=5"}
limit remaining 27
Time for http/request ... 2845 ms
Repos/second: 33.8
2498 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=6"}
limit remaining 26
Time for http/request ... 2879 ms
Repos/second: 33.9
2598 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=7"}
limit remaining 25
Time for http/request ... 2632 ms
Repos/second: 34.0
2698 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=8"}
limit remaining 24
Time for http/request ... 3073 ms
Repos/second: 34.0
2798 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=9"}
limit remaining 23
Time for http/request ... 3263 ms
Repos/second: 33.8
2898 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2011-09-17T18%3A08%3A12Z&page=10"}
limit remaining 22
Time for http/request ... 2892 ms
Repos/second: 33.8
2998 {:pushed_at "2012-03-02T23:45:08Z"}
limit remaining 21
Time for http/request ... 2852 ms
Repos/second: 33.9
3097 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=2"}
limit remaining 20
Time for http/request ... 3202 ms
Repos/second: 33.8
3197 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=3"}
limit remaining 19
Time for http/request ... 2941 ms
Repos/second: 33.8
3297 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=4"}
limit remaining 18
Time for http/request ... 2839 ms
Repos/second: 33.8
3397 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=5"}
limit remaining 17
Time for http/request ... 2960 ms
Repos/second: 33.8
3497 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=6"}
limit remaining 16
Time for http/request ... 2781 ms
Repos/second: 33.9
3597 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=7"}
limit remaining 15
Time for http/request ... 3289 ms
Repos/second: 33.8
3697 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=8"}
limit remaining 14
Time for http/request ... 2865 ms
Repos/second: 33.8
3797 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=9"}
limit remaining 13
Time for http/request ... 2818 ms
Repos/second: 33.8
3897 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-03-02T23%3A45%3A08Z&page=10"}
limit remaining 12
Time for http/request ... 2881 ms
Repos/second: 33.9
3997 {:pushed_at "2012-07-07T13:28:53Z"}
limit remaining 11
Time for http/request ... 3006 ms
Repos/second: 33.8
4096 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=2"}
limit remaining 10
Time for http/request ... 2736 ms
Repos/second: 33.9
4196 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=3"}
limit remaining 9
Time for http/request ... 3017 ms
Repos/second: 33.9
4296 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=4"}
limit remaining 29
Time for http/request ... 2940 ms
Repos/second: 33.9
4396 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=5"}
limit remaining 28
Time for http/request ... 3171 ms
Repos/second: 33.8
4496 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=6"}
limit remaining 27
Time for http/request ... 2874 ms
Repos/second: 33.8
4596 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=7"}
limit remaining 26
Time for http/request ... 2910 ms
Repos/second: 33.8
4696 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=8"}
limit remaining 25
Time for http/request ... 2935 ms
Repos/second: 33.8
4796 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=9"}
limit remaining 24
Time for http/request ... 3348 ms
Repos/second: 33.8
4896 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-07-07T13%3A28%3A53Z&page=10"}
limit remaining 23
Time for http/request ... 2933 ms
Repos/second: 33.8
4996 {:pushed_at "2012-10-29T22:18:09Z"}
limit remaining 22
Time for http/request ... 2840 ms
Repos/second: 33.8
5095 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-10-29T22%3A18%3A09Z&page=2"}
limit remaining 21
Time for http/request ... 3062 ms
Repos/second: 33.8
5195 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-10-29T22%3A18%3A09Z&page=3"}
limit remaining 20
Time for http/request ... 2862 ms
Repos/second: 33.8
5295 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-10-29T22%3A18%3A09Z&page=4"}
limit remaining 19
Time for http/request ... 2998 ms
Repos/second: 33.8
5395 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-10-29T22%3A18%3A09Z&page=5"}
limit remaining 18
Time for http/request ... 2951 ms
Repos/second: 33.8
5495 {:url "https://api.github.com/search/repositories?per_page=100&sort=updated&order=asc&q=language%3Aclojure+pushed%3A%3E%3D2012-10-29T22%3A18%3A09Z&page=6"}
limit remaining 17
...
from dewey.
What is 5k per hour? Is that 5k requests?
If there are 90 000 repos and each request fetches ~100, that only needs 900 requests (and finishes in 50 minutes here).
from dewey.
Oh, got it. I read 30 repos per second as 30 requests per second for some reason. It all makes sense now 👍
from dewey.
This is really great stuff. I'm pretty excited about getting it integrated.
This will expand the dataset quite a bit which might require some additional changes. Just brainstorming a bit:
- Currently, the analyzer analyzes every project every week. I've been meaning to change the pipeline so it only analyzes projects that have been updated. A quick workaround might be to just analyze projects with 3 or more stars.
- All of the data is released as
.edn
files. That's fine for small amounts of data, but we should probably find a better data format to allow for more data! Potentially, that might just be new line delimited edn. - The web interface requests all the data searches it locally in the browser. I'm not sure that will work with 5-6x more data. Currently, the web interface is completely static for ease and cost. There's probably a better alternative, but I haven't had the time to settle on a different one. Anyway, only including repos with 3 or more stars might be a workaround here as well.
Most of dewey uses .edn
files to store data between runs and steps. Any particular reason to use .tsv
rather than .edn
for all-repos.tsv
?
from dewey.
GitHub only allows for 100M per file.
I think that only applies to objects in a git repository. Currently, data dumps are only being uploaded as part of releases where we probably won't have to worry about the limit. The static analysis edn is already approaching 1gb. I think their docs say there isn't a specified limit.
I also have an s3 bucket that I've been using for temporary storage, but would be open to using it for public data if it makes sense.
all-repos.edn now also stores all information returned from the GitHub query. I'm not sure how much of that information is actually useful/needed.
The reasoning is based on practical experience building ETL pipelines. Since storage is so cheap, I find that the easiest way to stay sane is have a dumb step that only fetches data and separately have steps that process the data. The benefit is that if you decide you want to update your transform based on more info, you can rerun the transform step without re-fetching data. Having a "dumb" fetch process also reduces the risk of bugs that cause data loss. There are reasons to combine the steps, but I don't think we're at the scale of data where trying to optimize for efficiency is that helpful.
BTW: My original goal when looking at this code, was to create something similar to https://github.com/phronmophobic/add-deps but for the CLI and tools.deps/gitlibs.
I do have a gui version based on the latest clojure alpha, https://github.com/phronmophobic/add-deps/blob/main/src/com/phronemophobic/add_deps2.clj. I still think the most annoying part if figuring out the best way to make the data available. My latest brain storm is to try hosting a readonly db on s3 using xtdb or datomic.
from dewey.
Hi again @phronmophobic
Thanks for your input. I totally agree with your comments on storage: it is indeed cheap.
I think a person who wants to develop/modify dewey can be expected to download a release and continue at all-repos.edn
in the release if he/she wants to fetch the latest changes.
The web interface requests all the data searches it locally in the browser. I'm not sure that will work with 5-6x more data.
If dewey
is about git libraries, how about removing projects with zero tags and (perhaps) zero stars?
My guess is that there is many such projects.
PS: Catching up with the latest data is fast. I executed the following locally:
(def all-repos-vec (time (vec (find-clojure-repos {}))))
0 nil {"q" "language:clojure pushed:>=2023-06-20T19:03:33Z", "page" "7"}
Saved 100 items to disk
New items: 100 , total items: 86276 , spent 4406 ms
limit remaining 29
1 nil {"q" "language:clojure pushed:>=2023-06-20T19:03:33Z", "page" "8"}
limit remaining 29
Saved 85 items to disk
New items: 85 , total items: 86361 , spent 2976 ms
limit remaining 28
2 "2023-07-04T09:05:50Z" nil
limit remaining 28
Saved 0 items to disk
New items: 0 , total items: 86361 , spent 366 ms
"Elapsed time: 19060.095366 msecs"
=> #'com.phronemophobic.dewey/all-repos-vec
That's two weeks of Clojure git(hub) changes in 19 seconds.
PS 2: a more human and incremental view of all-repos.edn
is possible like this:
$ tail -f all-repos.edn | bb -I --stream -e '(println (select-keys *input* [:full_name :pushed_at]))'
(I'm personally not very familiar with babashka, and it's my first time using -I --stream and -e
, so I figured I'd share it.)
Edit: I'll be busy for some time now, so I may not have the time to respond in a very timely manner. Hope that you will make progress and comment here as you like, and of course you may use ivarref/dewey
as you find useful. The function name find-clojure-repos
is misnamed, it should be called something like find-new-or-updated-clojure-repos
. You also know this of course. All the best.
Regards.
from dewey.
Related Issues (13)
- What is the rationale behind the `:lib` value format? HOT 2
- Suggestion: document list of fields included in each pre-computed dump HOT 3
- Add `:lib` key to make it easier to analyze/process data
- Expanding the clj-kondo analysis config HOT 3
- Pre-retrieved data formatting HOT 1
- Document `:basis` key in `analysis.edn` HOT 1
- Reproducibility: include `:git/sha` or similar in `analysis.edn` HOT 2
- why "FiraCode" ? HOT 2
- Include latest sha in deps-libs.edn
- Add `clojure/clojure` analysis HOT 5
- Use scm attribute in clojars pom file to find repos outside of github
- ns -> dep mapping
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dewey.