Giter VIP home page Giter VIP logo

astronomer's People

Contributors

dependabot-preview[bot] avatar nicolascarpi avatar ullaakut avatar vbauerster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

astronomer's Issues

Support windows paths for cache

Using the same Github personal token for Linux (windows subsystem for linux) and windows on the same PC, the astronomer linux Go-based executable works fine, but the Windows Go-based executable gives an error like:

astronomer.exe username/repo

Beginning fetching process for repository username/repo
Pre-fetching all stargazers...ko
✖ failed to query stargazer data: unable to write user contribution data to cache: 
unable to create cache file: open data\username\repo\https-api-github-com-graphql-list-firstpage: 
The system cannot find the path specified.

Code Reviews / Pull Requests

Hello. Nice project. I ran it on appsquickly/typhoon, which is a still active, but slowly being sunsetted.

Was surprised that it only reported a 'B' score. Report as follows:

Beginning fetching process for repository appsquickly/Typhoon
Pre-fetching all stargazers...ok
  > Selecting 200 first stargazers out of 2656
  > Selecting 800 random stargazers out of 2656
Fetching contributions for 1000 users up to year 2013
Building trust report...ok

Averages                             Score           Trust
--------                             -----           -----
Weighted contributions:              20234             B
Private contributions:               447               A
Created issues:                      12                C
Commits authored:                    227               C
Repositories:                        40                A
Pull requests:                       10                D
Code reviews:                        2                 E
Account age (days):                  2275              A
5th percentile:                      26                A
10th percentile:                     70                A
15th percentile:                     106               A
20th percentile:                     192               A
25th percentile:                     313               A
30th percentile:                     495               A
35th percentile:                     626               A
40th percentile:                     968               A
45th percentile:                     1181              A
50th percentile:                     1470              A
55th percentile:                     2192              A
60th percentile:                     2586              B
65th percentile:                     3969              B
70th percentile:                     5271              B
75th percentile:                     7115              B
80th percentile:                     10357             B
85th percentile:                     14953             C
90th percentile:                     34799             A
95th percentile:                     135676            A
----------------------------------------------------------
Overall trust:                                         B

✔ Analysis successful. 1000 users computed.

What does the pull-requests metric mean? The project didn't have many pull requests? Or the users who started the project don't make many?

I gave trusted committers push access. <-- Maybe this is useful?

Again does that mean the users who starred the repo didn't do code reviews, or that we didn't?

Just sharing some #random feedback. Please close this issue once received. Again, very nice project.

Repos with between 201 and 219 stars all have 0% trust

Currently, we generate a report for the 200 first users and another one for the rest. Since the rest of the users are less than 20, the second report is empty and takes priority during the computation.

This needs to be fixed asap.

Index out of range after failing to fetch

Beginning fetching process for repository icecrime/poule
Pre-fetching all stargazers...ok
  > All 216 stargazers will be scanned
This repository appears to have a low amount of stargazers. Trust calculations might not be accurate.
Fetching contributions for 216 users up to year 2013
 [=>------------------------------------------------------------] ETA: 6h13m23s Elapsed: 9m34s Progress: 3 %
Failed to fetch user contributions from GitHub API too many times.
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/ullaakut/astronomer/pkg/gql.FetchContributions(0xc00021bf30, 0xc000176300, 0xa, 0x10, 0x7dd, 0x0, 0x100, 0x14fc936, 0x32, 0xc0001bfbb8)
	/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/pkg/gql/fetch.go:315 +0x1992
main.detectFakeStars(0xc00021bf30, 0x14e7be3, 0x7)
	/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/main.go:106 +0x3f8
main.main()
	/Users/ullaakut/Work/go/src/github.com/ullaakut/astronomer/main.go:80 +0x4b9

Documenting the algorithm and providing justification evidence

Thank you for this very interesting project. Here I share a few of my tests while using the project.

I initially tested my personal project which has about 3.9k stars, the result seems wasn't so good.

$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer changkun/modern-cpp-tutorial                                                                                          [22:00:10]
Beginning fetching process for repository changkun/modern-cpp-tutorial
Pre-fetching all stargazers...ok
  > Selecting 200 first stargazers out of 3930
  > Selecting 800 random stargazers out of 3930
Fetching contributions for 1000 users up to year 2013
Building trust report...ok

Averages                             Score           Trust
--------                             -----           -----
Weighted contributions:              4132              E
Private contributions:               65                E
Created issues:                      9                 D
Commits authored:                    238               C
Repositories:                        37                A
Pull requests:                       6                 E
Code reviews:                        2                 E
Account age (days):                  1444              B
5th percentile:                      9                 A
10th percentile:                     24                A
15th percentile:                     59                A
20th percentile:                     85                B
25th percentile:                     111               C
30th percentile:                     157               C
35th percentile:                     194               D
40th percentile:                     328               C
45th percentile:                     436               C
50th percentile:                     541               D
55th percentile:                     770               D
60th percentile:                     899               D
65th percentile:                     1255              D
70th percentile:                     1579              D
75th percentile:                     2599              D
80th percentile:                     3652              D
85th percentile:                     5277              E
90th percentile:                     6836              E
95th percentile:                     14190             E
----------------------------------------------------------
Overall trust:                                         D

✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dbilibili%26name%3Dkratos

Then, I picked another project from GitHub trend page:

$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer bilibili/kratos                                                                                                       [22:12:59]
Beginning fetching process for repository bilibili/kratos
Pre-fetching all stargazers...ok
  > Selecting 200 first stargazers out of 5739
  > Selecting 800 random stargazers out of 5739
Fetching contributions for 1000 users up to year 2013
Building trust report...ok

Averages                             Score           Trust
--------                             -----           -----
Weighted contributions:              2536              E
Private contributions:               71                E
Created issues:                      6                 D
Commits authored:                    137               D
Repositories:                        30                A
Pull requests:                       6                 D
Code reviews:                        1                 E
Account age (days):                  1545              B
5th percentile:                      9                 A
10th percentile:                     25                A
15th percentile:                     43                A
20th percentile:                     55                C
25th percentile:                     74                D
30th percentile:                     106               D
35th percentile:                     146               D
40th percentile:                     188               D
45th percentile:                     245               D
50th percentile:                     349               D
55th percentile:                     490               D
60th percentile:                     638               E
65th percentile:                     832               E
70th percentile:                     1092              E
75th percentile:                     1577              E
80th percentile:                     2072              E
85th percentile:                     3117              E
90th percentile:                     5329              E
95th percentile:                     9192              E
----------------------------------------------------------
Overall trust:                                         D

✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dbilibili%26name%3Dkratos

OK, then let's test Tensorflow.

$ docker run -t -e GITHUB_TOKEN=$GITHUB_TOKEN -v "/Users/changkun/dev/mct:/data/" ullaakut/astronomer tensorflow/tensorflow                                                                                                 [23:32:47]
Beginning fetching process for repository tensorflow/tensorflow
Pre-fetching all stargazers...ok
  > Selecting 200 first stargazers out of 131149
  > Selecting 800 random stargazers out of 131149
Fetching contributions for 1000 users up to year 2013
Building trust report...ok

Averages                             Score           Trust
--------                             -----           -----
Weighted contributions:              7495              D
Private contributions:               190               C
Created issues:                      18                B
Commits authored:                    198               D
Repositories:                        16                C
Pull requests:                       10                D
Code reviews:                        3                 D
Account age (days):                  1145              C
5th percentile:                      1                 E
10th percentile:                     2                 E
15th percentile:                     5                 E
20th percentile:                     10                E
25th percentile:                     22                E
30th percentile:                     32                E
35th percentile:                     40                E
40th percentile:                     59                E
45th percentile:                     76                E
50th percentile:                     114               E
55th percentile:                     153               E
60th percentile:                     217               E
65th percentile:                     368               E
70th percentile:                     707               E
75th percentile:                     1076              E
80th percentile:                     2109              E
85th percentile:                     3390              E
90th percentile:                     14580             D
95th percentile:                     30685             D
----------------------------------------------------------
Overall trust:                                         D

✔ Analysis successful. 1000 users computed.
GitHub badge available at https://img.shields.io/endpoint.svg?url=https%3A%2F%2Fastronomer.ullaakut.eu%2Fshields%3Fowner%3Dtensorflow%26name%3Dtensorflow

Issues to the Algorithm

This repo is proposing a justice algorithm without previous study on the ratio of algorithm. As a user of your algorithm, I particularly expect the following supporting points on why the algorithm is accurate:

  1. Showing theoretical analysis regarding the influence of each of the defined factors, and providing regression analysis and statistical stability of the algorithm.

  2. Making benchmarks on various projects, illustrates how your algorithm match the theoretical analysis for the TOP10 valuable open source projects, like golang/go, torvalds/linux, etc.

    "Those random stargazers can then sometimes be responsible for slight changes in the results, but they usually represent a difference of 1% to 3%, which is negligeable." -- README.md

    May I have how did you have this conclusion? How large is your test samples? What are they? etc.

  3. Establish a user study, an important way of evaluating usability issue is to held an user study. Typically, a single score has lack of expression on many different aspects, and it is not easy to say if the star of a repo is seriously fake or unworthy. Making quantitative analysis on, for example, how other users feel about the score provided by the algorithm, does the score matches your mental expectation? why? how could we help? those are questions should be seriously considered.

Add CI and goreleaser

  • Add a CI (probably Travis)
  • Integrate goreleaser to generate the binaries for each release
  • Document the use of those binaries

Make trust computation distributed

  • Make it so that user scans are sent to Astronomer's server, which collects all Astronomer scans from users and uses them to generate badges
  • Sign reports using a secret key in order to guarantee legitimacy

Mode to scan first 1000 users

  • Add --scanFirstStars option to scan the (by default) up to 1000 first stars of a repository (can be changed with option -s).
  • In query.go, simply make the getCursors function return the last cursors if the scanFirstStars option is enabled.

This will allow to easily detect foul play in repositories which bought/botted their first stars and now achieved organic growth.

Dependabot can't parse your go.mod

Dependabot couldn't parse the go.mod found at /go.mod.

The error Dependabot encountered was:

go: github.com/spf13/[email protected] requires
	github.com/grpc-ecosystem/[email protected] requires
	gopkg.in/[email protected]: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /opt/go/gopath/pkg/mod/cache/vcs/9241c28341fcedca6a799ab7a465dd6924dc5d94044cbfabb75778817250adfc: exit status 128:
	fatal: The remote end hung up unexpectedly

View the update logs.

Build web application to let users request scans

  • Build an API for Astronomer where it would run scans of a repository's stars and answer with the trust report.
  • Build a web application to let people request scans and get reports (@veliona)
  • Make sure that previously generated reports are kept somewhere and accessible from the web interface

Make available on Hombrew

After I seen astronomer on HN I put together this homebrew tap to ease installation on OSX: https://github.com/dkanejs/homebrew-astronomer

It would be cool to get this into the official Homebrew itself or have you as the maintainer of the tap under your own GitHub namespace, this way you can also update the formulae with each release 👍

What do you think?

Unit test gql helpers

  • updateUsers(users []User, response listStargazersResponse, year int) []User
  • getCursors(ctx *context.Context, sg []stargazers, totalUsers uint) []string
  • buildRequestBody(ctx *context.Context, baseRequest string, pagination int) string
  • getCursor(cursors []string, page int, reverseOrder bool) string
  • pickRandomStringsExcept(s []string, picked []string, amount uint) []string
  • isBlacklisted(user string) bool
  • parseResponse(resp *http.Response) (*listStargazersResponse, []byte, error)

Pre-scan stargazers and add progress bar

  • Before scanning contributions, just scan all stargazers in order to know how many there will be to scan (this task is also a part of #10)
  • Display a progress bar while the scan is in progress

Add a fast mode

  • Add an option --fast (maybe turned on by default?) for big repositories (>2K stars) where Astronomer would
    • Fetch the list of stargazers first without querying user data
    • Select 50 random slices of 20 users within those stargazers
    • Compute the statistics on those 1000 users

This would greatly reduce the scan time while remaining fairly accurate.

Unit test gql cache functions

  • getCache(ctx *context.Context, req *http.Request, pagination string) (*http.Response, error)
  • readCachedResponse(filename string, req *http.Request) (*http.Response, error)
  • putCache(ctx *context.Context, req *http.Request, pagination string, body []byte) error
  • cacheEntryFilename(ctx *context.Context, url string) string
  • listFilePagination(cursor string) string
  • contribFilePagination(cursor string, year int) string

Detect suspicious ranges of percentiles

  • Remove the computation of the 65/85/95th percentiles as it's done at the moment
  • Add a new step which computes every 5th percentile (5, 10, 15 and so on) and detect anomalies within ranges (for example if percentiles 20, 25 and 30 are all abnormally low or abnormally high, it could very well indicate illegitimate stars)
  • Find a good way to represent this in the trust report

Add a searchable leaderboard

I like trying to find new projects by searching for random things on GitHub and sorting the results by number of stars. I'd love to be able to do the same with trust score or, better yet, some sort of "trusted stars" metric which combines trust score with star count. Would this be possible given the data being curated by Astrolab?

Compute graph of user trust over time

Compute a basic graph of the evolution of user trustworthiness over time

  • Compute individual user trustworthiness
  • Use graph library to display chart in terminal
  • Do it only when -d option is enabled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.