ullaakut / astronomer Goto Github PK
View Code? Open in Web Editor NEWA tool to detect illegitimate stars from bot accounts on GitHub projects
License: MIT License
A tool to detect illegitimate stars from bot accounts on GitHub projects
License: MIT License
Using the same Github personal token for Linux (windows subsystem for linux) and windows on the same PC, the astronomer
linux Go-based executable works fine, but the Windows Go-based executable gives an error like:
astronomer.exe username/repo
Beginning fetching process for repository username/repo
Pre-fetching all stargazers...ko
✖ failed to query stargazer data: unable to write user contribution data to cache:
unable to create cache file: open data\username\repo\https-api-github-com-graphql-list-firstpage:
The system cannot find the path specified.
See https://shields.io/endpoint
Host it at astronomer.ullaakut.eu
.
Hello. Nice project. I ran it on appsquickly/typhoon, which is a still active, but slowly being sunsetted.
Was surprised that it only reported a 'B' score. Report as follows:
Beginning fetching process for repository appsquickly/Typhoon
Pre-fetching all stargazers...ok
> Selecting 200 first stargazers out of 2656
> Selecting 800 random stargazers out of 2656
Fetching contributions for 1000 users up to year 2013
Building trust report...ok
Averages Score Trust
-------- ----- -----
Weighted contributions: 20234 B
Private contributions: 447 A
Created issues: 12 C
Commits authored: 227 C
Repositories: 40 A
Pull requests: 10 D
Code reviews: 2 E
Account age (days): 2275 A
5th percentile: 26 A
10th percentile: 70 A
15th percentile: 106 A
20th percentile: 192 A
25th percentile: 313 A
30th percentile: 495 A
35th percentile: 626 A
40th percentile: 968 A
45th percentile: 1181 A
50th percentile: 1470 A
55th percentile: 2192 A
60th percentile: 2586 B
65th percentile: 3969 B
70th percentile: 5271 B
75th percentile: 7115 B
80th percentile: 10357 B
85th percentile: 14953 C
90th percentile: 34799 A
95th percentile: 135676 A
----------------------------------------------------------
Overall trust: B
✔ Analysis successful. 1000 users computed.
What does the pull-requests metric mean? The project didn't have many pull requests? Or the users who started the project don't make many?
I gave trusted committers push access. <-- Maybe this is useful?
Again does that mean the users who starred the repo didn't do code reviews, or that we didn't?
Just sharing some #random feedback. Please close this issue once received. Again, very nice project.
Currently, we generate a report for the 200 first users and another one for the rest. Since the rest of the users are less than 20, the second report is empty and takes priority during the computation.
This needs to be fixed asap.
The v4 API is GraphQL based. So it will drastically cut down on the number of requests needed.
Beginning fetching process for repository icecrime/poule
Pre-fetching all stargazers...ok
> All 216 stargazers will be scanned
This repository appears to have a low amount of stargazers. Trust calculations might not be accurate.
Fetching contributions for 216 users up to year 2013
[=>------------------------------------------------------------] ETA: 6h13m23s Elapsed: 9m34s Progress: 3 %
Failed to fetch user contributions from GitHub API too many times.
panic: runtime error: index out of range
goroutine 1 [running]:
github.com/ullaakut/astronomer/pkg/gql.FetchContributions(0xc00021bf30, 0xc000176300, 0xa, 0x10, 0x7dd, 0x0, 0x100, 0x14fc936, 0x32, 0xc0001bfbb8)
/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/pkg/gql/fetch.go:315 +0x1992
main.detectFakeStars(0xc00021bf30, 0x14e7be3, 0x7)
/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/main.go:106 +0x3f8
main.main()
/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/main.go:80 +0x4b9
Need to use httptest
and write slightly complex tests, will take more time than the rest
Thank you for this very interesting project. Here I share a few of my tests while using the project.
I initially tested my personal project which has about 3.9k stars, the result seems wasn't so good.
$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer changkun/modern-cpp-tutorial [22:00:10]
Beginning fetching process for repository changkun/modern-cpp-tutorial
Pre-fetching all stargazers...ok
> Selecting 200 first stargazers out of 3930
> Selecting 800 random stargazers out of 3930
Fetching contributions for 1000 users up to year 2013
Building trust report...ok
Averages Score Trust
-------- ----- -----
Weighted contributions: 4132 E
Private contributions: 65 E
Created issues: 9 D
Commits authored: 238 C
Repositories: 37 A
Pull requests: 6 E
Code reviews: 2 E
Account age (days): 1444 B
5th percentile: 9 A
10th percentile: 24 A
15th percentile: 59 A
20th percentile: 85 B
25th percentile: 111 C
30th percentile: 157 C
35th percentile: 194 D
40th percentile: 328 C
45th percentile: 436 C
50th percentile: 541 D
55th percentile: 770 D
60th percentile: 899 D
65th percentile: 1255 D
70th percentile: 1579 D
75th percentile: 2599 D
80th percentile: 3652 D
85th percentile: 5277 E
90th percentile: 6836 E
95th percentile: 14190 E
----------------------------------------------------------
Overall trust: D
✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dbilibili%26name%3Dkratos
Then, I picked another project from GitHub trend page:
$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer bilibili/kratos [22:12:59]
Beginning fetching process for repository bilibili/kratos
Pre-fetching all stargazers...ok
> Selecting 200 first stargazers out of 5739
> Selecting 800 random stargazers out of 5739
Fetching contributions for 1000 users up to year 2013
Building trust report...ok
Averages Score Trust
-------- ----- -----
Weighted contributions: 2536 E
Private contributions: 71 E
Created issues: 6 D
Commits authored: 137 D
Repositories: 30 A
Pull requests: 6 D
Code reviews: 1 E
Account age (days): 1545 B
5th percentile: 9 A
10th percentile: 25 A
15th percentile: 43 A
20th percentile: 55 C
25th percentile: 74 D
30th percentile: 106 D
35th percentile: 146 D
40th percentile: 188 D
45th percentile: 245 D
50th percentile: 349 D
55th percentile: 490 D
60th percentile: 638 E
65th percentile: 832 E
70th percentile: 1092 E
75th percentile: 1577 E
80th percentile: 2072 E
85th percentile: 3117 E
90th percentile: 5329 E
95th percentile: 9192 E
----------------------------------------------------------
Overall trust: D
✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dbilibili%26name%3Dkratos
OK, then let's test Tensorflow.
$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer tensorflow/tensorflow [23:32:47]
Beginning fetching process for repository tensorflow/tensorflow
Pre-fetching all stargazers...ok
> Selecting 200 first stargazers out of 131149
> Selecting 800 random stargazers out of 131149
Fetching contributions for 1000 users up to year 2013
Building trust report...ok
Averages Score Trust
-------- ----- -----
Weighted contributions: 7495 D
Private contributions: 190 C
Created issues: 18 B
Commits authored: 198 D
Repositories: 16 C
Pull requests: 10 D
Code reviews: 3 D
Account age (days): 1145 C
5th percentile: 1 E
10th percentile: 2 E
15th percentile: 5 E
20th percentile: 10 E
25th percentile: 22 E
30th percentile: 32 E
35th percentile: 40 E
40th percentile: 59 E
45th percentile: 76 E
50th percentile: 114 E
55th percentile: 153 E
60th percentile: 217 E
65th percentile: 368 E
70th percentile: 707 E
75th percentile: 1076 E
80th percentile: 2109 E
85th percentile: 3390 E
90th percentile: 14580 D
95th percentile: 30685 D
----------------------------------------------------------
Overall trust: D
✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dtensorflow%26name%3Dtensorflow
This repo is proposing a justice algorithm without previous study on the ratio of algorithm. As a user of your algorithm, I particularly expect the following supporting points on why the algorithm is accurate:
Showing theoretical analysis regarding the influence of each of the defined factors, and providing regression analysis and statistical stability of the algorithm.
Making benchmarks on various projects, illustrates how your algorithm match the theoretical analysis for the TOP10 valuable open source projects, like golang/go, torvalds/linux, etc.
"Those random stargazers can then sometimes be responsible for slight changes in the results, but they usually represent a difference of 1% to 3%, which is negligeable." -- README.md
May I have how did you have this conclusion? How large is your test samples? What are they? etc.
Establish a user study, an important way of evaluating usability issue is to held an user study. Typically, a single score has lack of expression on many different aspects, and it is not easy to say if the star of a repo is seriously fake or unworthy. Making quantitative analysis on, for example, how other users feel about the score provided by the algorithm, does the score matches your mental expectation? why? how could we help? those are questions should be seriously considered.
SendReport(ctx *context.Context, report *trust.Report) error
Check(report *SignedReport) error
I mistakenly forgot to take the percentile values into account when computing the overall trust factor. Will do.
--scanFirstStars
option to scan the (by default) up to 1000 first stars of a repository (can be changed with option -s
).query.go
, simply make the getCursors
function return the last cursors if the scanFirstStars
option is enabled.This will allow to easily detect foul play in repositories which bought/botted their first stars and now achieved organic growth.
Dependabot couldn't parse the go.mod found at /go.mod
.
The error Dependabot encountered was:
go: github.com/spf13/[email protected] requires
github.com/grpc-ecosystem/[email protected] requires
gopkg.in/[email protected]: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /opt/go/gopath/pkg/mod/cache/vcs/9241c28341fcedca6a799ab7a465dd6924dc5d94044cbfabb75778817250adfc: exit status 128:
fatal: The remote end hung up unexpectedly
99% A+
80-98% A
60-80% B
45-50% C
35-45% D
25-35% E
0-25% F
Something like that.
-d
for detailed reports--cachedir
for specifying a custom cache directoryAfter I seen astronomer on HN I put together this homebrew tap to ease installation on OSX: https://github.com/dkanejs/homebrew-astronomer
It would be cool to get this into the official Homebrew itself or have you as the maintainer of the tap under your own GitHub namespace, this way you can also update the formulae with each release 👍
What do you think?
updateUsers(users []User, response listStargazersResponse, year int) []User
getCursors(ctx *context.Context, sg []stargazers, totalUsers uint) []string
buildRequestBody(ctx *context.Context, baseRequest string, pagination int) string
getCursor(cursors []string, page int, reverseOrder bool) string
pickRandomStringsExcept(s []string, picked []string, amount uint) []string
isBlacklisted(user string) bool
parseResponse(resp *http.Response) (*listStargazersResponse, []byte, error)
--fast
(maybe turned on by default?) for big repositories (>2K stars) where Astronomer would
This would greatly reduce the scan time while remaining fairly accurate.
Dependabot couldn't parse the go.mod found at /go.mod
.
The error Dependabot encountered was:
go: github.com/spf13/[email protected] requires
github.com/grpc-ecosystem/[email protected] requires
gopkg.in/[email protected]: invalid version: git fetch --unshallow -f origin in /opt/go/gopath/pkg/mod/cache/vcs/748bced43cf7672b862fbc52430e98581510f4f2c34fb30c0064b7102a68ae2c: exit status 128:
fatal: The remote end hung up unexpectedly
Need to use httptest
and write slightly complex tests, will take more time than the rest
getCache(ctx *context.Context, req *http.Request, pagination string) (*http.Response, error)
readCachedResponse(filename string, req *http.Request) (*http.Response, error)
putCache(ctx *context.Context, req *http.Request, pagination string, body []byte) error
cacheEntryFilename(ctx *context.Context, url string) string
listFilePagination(cursor string) string
contribFilePagination(cursor string, year int) string
I like trying to find new projects by searching for random things on GitHub and sorting the results by number of stars. I'd love to be able to do the same with trust score or, better yet, some sort of "trusted stars" metric which combines trust score with star count. Would this be possible given the data being curated by Astrolab?
#42 Yeah consider developer program guys and users who are a part of some organization!
Compute a basic graph of the evolution of user trustworthiness over time
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.