Giter VIP home page Giter VIP logo

npms-analyzer's People

Contributors

atduarte avatar bcoe avatar bennycode avatar carsonreinke avatar greenkeeperio-bot avatar kikobeats avatar larsgw avatar mastilver avatar notslang avatar satazor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

npms-analyzer's Issues

Suggestion: Improve scoring by including dependency qualities

I feel that, if you have a (tiny) module with some (huge) dependencies, the quality of the dependencies, which are also executing in a user's environment, should affect the overall quality of your module (a bit).

As a side product this would encourage developers to look for (alternative) quality dependencies, if they have a choice, or even motivate them to contribute to the quality of their dependencies. Win-Win.

For example, I am currently working on the quality of one of my modules and I previously outsourced a couple of its internals to their own modules. Now that these do not affect the quality of the main module anymore, even though they are substantial parts of it, I do not feel the same motivation to improve their quality as well (mainly coverage), which is stupid of course, but could be solved once and for all by technology.

Search results are case sensitive

The search for "MySQL" returns different results than searching for "mysql". I’m guessing the search should not be case-sensitive.

Multiple packages point to popular GitHub repos

Multiple packages point to popular GitHub repos and most have almost no downloads. Searching for jquery this is very clear:

https://www.npmjs.com/package/jquery-compat
https://www.npmjs.com/package/jquery1
https://www.npmjs.com/package/shimney-jquery

Since these repos all point to the JQuery GitHub Repo, they end up having a high score, but they shouldn't.

This affects current searches, but could affect even more in the future if some scammers discover this issue.

Suggestion: If there's more than one package with the same GitHub repo only consider the GitHub stats for the one with more downloads on npm.

Improve stemming

The current stemmer is very loose. We opted for this because the aggressive one was causing some troubles, e.g.: expression -> express. Though we are loosing some meaningful results, e.g.: pm2 contains process and manager but searching for process management does not return it.

We could experiment having both loose and aggressive stemmers as separate fields and give the aggressive stemmer field a lower weight.

do metadata collection in map-reduce?

I might be off-base with this since I've only just begun to read the code, but have you considered moving the metadata processing from lib/analyze/collect/metadata.js into map-reduce? All that data is already in CouchDB anyway, and MR is excellent for bulk processing.

Enable experimentation

I think it would be very interesting to develop a system where it was easy to run some "what if..." tweaks to the algorithm and see how they impacted results.

Also, develop some reporting tools that allowed some analysis of how well the algorithm is working; Are there modules that rank high with popularity, personality, and quality, but suffer on the maintenance score? That might help us discover problems with the maintenance scoring algorithm.

Improve test coverage analysis

Before changing the average Y's for the scoring, the tests value should be improved by improving test coverage analysis. Test coverage is really important for the score we give.

link to non-public image

https://npms-io.slack.com/files/andreduarte/F0QU3SYJW/twe.png is mentioned in lib/scoring/score.js, but it's not publicly available 😒

Machine learning project

Hello, I tweeted you two weeks ago (sorry, i was on a backpacking vacation and did not really get a chance to properly sit down and write this).

I am working on a machine learning project which aims to rank node packages based on anything that is available online. To name sources: github archive, NPM, github API. I read your docs and noticed that you already collected most of the data.

Was wondering how interested you would be in collaborating.

An in-range update of coveralls is breaking the build 🚨

Version 2.12.0 of coveralls just got published.

Branch Build failing 🚨
Dependency coveralls
Current Version 2.11.16
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As coveralls is β€œonly” a devDependency of this project it might not break production or downstream projects, but β€œonly” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this πŸ’ͺ


Status Details
  • ❌ continuous-integration/travis-ci/push The Travis CI build failed Details
Release Notes Branch coverage support

Adds branch coverage data to Coveralls API post.

Commits

The new version differs by 2 commits .

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

Does the algorithm work with monorepos?

I'm seeing low quality values for packages using a monorepo. Try searching for Babel or PouchDB for example. Does the algorithm take their different structure into account?

Consider indexing words from the README

Several developers publish modules without keywords. I don't blame them because, AFAIK, npm does not use them for search purposes. Though, these keywords might be in the README. We could select the top words from the README (excluding words from a common stopword list) and also index them. Then we could also use this field with lower priority when doing searches.

Terrible score on ckeditor, one of the most popular wysiwyg editor

I found two packages, one is called wysiwyg.

The other package is called ckeditor.

As you can see, wysiwyg has a fairly low download count at 40 download per month, while ckeditor is netting 20k per month.

However, the scoring data from npms-analyzer yields these:

ckeditor

  "score": {
    "final": 0.5160201557764348,
    "detail": {
      "quality": 0.6212124966030818,
      "popularity": 0.27654167033127186,
      "maintenance": 0.6653337776559003
    }

wysiwyg

  "score": {
    "final": 0.5907400025771101,
    "detail": {
      "quality": 0.7561566835417988,
      "popularity": 0.039946939481510804,
      "maintenance": 0.9997473391315477
    }

How could wysiwyg get a higher final score? Searching wysiwyg on npms doesn't even yield ckeditor. I had to search ckeditor exactly to get ckeditor. Yet, both of these packages has the wysiwyg tag.

An in-range update of pino is breaking the build 🚨

Version 3.0.5 of pino just got published.

Branch Build failing 🚨
Dependency pino
Current Version 3.0.4
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

As pino is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you it’s very likely also breaking them.
I recommend you give this issue a very high priority. I’m sure you can resolve this πŸ’ͺ


Status Details
  • ❌ continuous-integration/travis-ci/push The Travis CI build failed Details
Release Notes v3.0.5
Commits

The new version differs by 7 commits .

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

Deprecated modules

Deprecated modules should have a VERY LOW quality evaluation. At the moment, they are being lowered but not quite enough.

npms shows wrong score for pouchdb

If I write pouchdb in the search box then pouchdb is returned first but with a very bad score, bad quality, bad popularity and bad maintenance, this is obviously wrong.

Tweak popularity weight

As @sindresorhus pointed out, dependantsCount and downloadsCount are much more important than the communityInterest (forks, stars, etc). The current weight scheme gives more importance to the communityInterest. We need to change the weight, experimenting with different values until we find the sweet spot.

e.g. when searching for parse-json should have higher popularity than body-parse-json.

Searching for "polymer" gives an odd first result

https://npms.io/search?term=polymer shows https://www.npmjs.com/package/polymer as the first result.

I'm not sure what's wrong with https://www.npmjs.com/package/Polymer that NPMJS thinks it has no README, but that, or better yet, https://www.npmjs.com/package/@polymer/polymer, should be the first result.

https://www.npmjs.com/package/npm-polymer-elements, published by a personality, doesn't show up until the 5th page of results or so.

Polymer packages are at https://www.npmjs.com/~polymer.

Finalize tests

  • analyze/download/github (done but incomplete)
  • analyze/download/git
  • analyze/download/npm
  • analyze/download/index
  • analyze/download/util/*

  • analyze/collect/metadata
  • analyze/collect/github (done but incomplete)
  • analyze/collect/npm
  • analyze/collect/source
  • analyze/collect/index
  • analyze/collect/util/*

  • analyze/evaluate/quality
  • analyze/evaluate/maintenance
  • analyze/evaluate/popularity
  • analyze/evaluate/index
  • analyze/evaluate/util/*

  • analyze/index
  • analyze/util/*

  • scoring/score
  • scoring/aggregate
  • scoring/prepare
  • scoring/finalize
  • scoring/util/*

  • observers/realtime
  • observers/stale

  • queue

  • cli

Merge custom-avgy branch

This branch should solve #65 and similar cases. Blocking stuff:

  • Improve test coverage analysis
  • Tweak the avgy values for each metric

An in-range update of fetch-coverage is breaking the build 🚨

Version 1.1.0 of fetch-coverage just got published.

Branch Build failing 🚨
Dependency fetch-coverage
Current Version 1.0.4
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

As fetch-coverage is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you it’s very likely also breaking them.
I recommend you give this issue a very high priority. I’m sure you can resolve this πŸ’ͺ


Status Details
  • ❌ continuous-integration/travis-ci/push The Travis CI build failed Details
Commits

The new version differs by 6 commits .

  • 30de3a4 Release 1.1.0
  • f2cf14f Merge pull request #6 from dsifford/patch-1
  • 840e442 Add missing 'codecov' service
  • 307f2c3 Merge pull request #1 from IndigoUnited/greenkeeper/update-all
  • fc1ae41 chore(package): update dependencies
  • 29ec141 Unnecessary.

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

how to run with already-downloaded source-code files

It would be really nice if lib/analyze/download/npm.js could be skipped & I could just pass in paths to already-downloaded tarballs.

This would allow us to cache tarballs when rebuilding the database (if/when methods for analyzing source code change) and would allow us to distribute tarballs through alternate methods (like torrent backups). Also, I already have the npm registry backed up, so it would be nice to avoid hitting it again.

Identify whether written in es5 or es2015+

This is awesome project. I pretty much use it every day.
I would love to see this to identify which node versions the module can support without transpiling. In case of front end modules, it will help identify whether bundlers like webpack2 can take advantage of tree-shaking.

An in-range update of nock is breaking the build 🚨

Version 9.0.9 of nock just got published.

Branch Build failing 🚨
Dependency nock
Current Version 9.0.8
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As nock is β€œonly” a devDependency of this project it might not break production or downstream projects, but β€œonly” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this πŸ’ͺ


Status Details
  • ❌ continuous-integration/travis-ci/push The Travis CI build failed Details
Commits

The new version differs by 4 commits .

  • 5b1a8d8 9.0.9: Revert PR #802
  • 51809e1 Merge pull request #840 from node-nock/revert-802-fix-754
  • 0a083f6 Revert "Fix request timeout no working"
  • 3728a1f Changelog v9.0.8

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

Invalid alias name [npms-write], an index exists with the same name as the alias

$ ./cli.js scoring --log-level silly
verb util/bootstrap Elasticsearch is ready
verb util/bootstrap CouchDB for npms is ready
info Starting scoring cycle
info scoring/prepare Preparing scoring..
verb scoring/prepare Gathered elasticsearch info.. { indices: [ 'npms-1461027817406' ],
verb scoring/prepare   aliases: { read: [], write: [] } }
verb scoring/prepare Created new index { index: 'npms-1461028580891' }
ERR! Scoring cycle failed { err: 
ERR!    { [Error: [invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }]
ERR!      status: 400,
ERR!      displayName: 'BadRequest',
ERR!      message: '[invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }',
ERR!      path: '/_aliases',
ERR!      query: {},
ERR!      body: { error: [Object], status: 400 },
ERR!      statusCode: 400,
ERR!      response: '{"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"},"status":400}',
ERR!      toString: [Function],
ERR!      toJSON: [Function] } }
ERR! Stack:
ERR! at respond (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:289:15)
ERR!     at checkRespForFailure (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:248:7)
ERR!     at HttpConnector.<anonymous> (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
ERR!     at IncomingMessage.wrapper (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/node_modules/lodash/index.js:3095:19)
ERR!     at emitNone (events.js:85:20)
ERR!     at IncomingMessage.emit (events.js:179:7)
ERR!     at endReadableNT (_stream_readable.js:906:12)
ERR!     at nextTickCallbackWith2Args (node.js:475:9)
ERR!     at process._tickCallback (node.js:389:17)
ERR! From previous event:
ERR!     at prepare (/usrdata/proj/npms-analyzer/lib/scoring/prepare.js:61:6)
ERR!     at cycle (/usrdata/proj/npms-analyzer/cli/scoring.js:52:5)
ERR!     at /usrdata/proj/npms-analyzer/cli/scoring.js:110:21
ERR!     at processImmediate [as _immediateCallback] (timers.js:383:17)
ERR! From previous event:
ERR!     at /usrdata/proj/npms-analyzer/cli/scoring.js:110:10
ERR! From previous event:
ERR!     at Object.Promise.resolve.then.prepare.tap.then.exports.builder.exports.handler.bootstrap.spread [as handler] (/usrdata/proj/npms-analyzer/cli/scoring.js:103:6)
ERR!     at Object.self.runCommand (/usrdata/proj/npms-analyzer/node_modules/yargs/lib/command.js:113:22)
ERR!     at parseArgs (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:632:24)
ERR!     at Object.Yargs.Object.defineProperty.get [as argv] (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:592:16)
ERR!     at Object.<anonymous> (/usrdata/proj/npms-analyzer/cli.js:34:1)
ERR!     at Module._compile (module.js:413:34)
ERR!     at Object.Module._extensions..js (module.js:422:10)
ERR!     at Module.load (module.js:357:32)
ERR!     at Function.Module._load (module.js:314:12)
ERR!     at Function.Module.runMain (module.js:447:10)
ERR!     at startup (node.js:140:18)
ERR!     at node.js:1001:3
ERR! 
info Waiting 1 hour before running the next cycle.. { now: '2016-04-19T01:16:21.080Z' }
stat process pid: 8245; memory: 45.15 MB; uptime: 21 seconds
^C

...I'll start reading up on Elasticsearch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.