npms-io / npms-analyzer Goto Github PK

View Code? Open in Web Editor NEW

315.0 315.0 37.0 10.79 MB

The analyzer behind https://npms.io

License: MIT License

JavaScript 99.61% Shell 0.39%

npms-analyzer's People

Contributors

Stargazers

Watchers

npms-analyzer's Issues

Get also the gzipped size of the module

Many frontend developers will thank you :)

2 interestings links:
npm/registry#8

http://stackoverflow.com/questions/34700844/how-can-i-find-out-the-size-of-the-included-packages-with-webpack-or-npm

Suggestion: Improve scoring by including dependency qualities

I feel that, if you have a (tiny) module with some (huge) dependencies, the quality of the dependencies, which are also executing in a user's environment, should affect the overall quality of your module (a bit).

As a side product this would encourage developers to look for (alternative) quality dependencies, if they have a choice, or even motivate them to contribute to the quality of their dependencies. Win-Win.

For example, I am currently working on the quality of one of my modules and I previously outsourced a couple of its internals to their own modules. Now that these do not affect the quality of the main module anymore, even though they are substantial parts of it, I do not feel the same motivation to improve their quality as well (mainly coverage), which is stupid of course, but could be solved once and for all by technology.

Search results are case sensitive

The search for "MySQL" returns different results than searching for "mysql". I’m guessing the search should not be case-sensitive.

libraries.io

libraries.io seems to have a similar goal to this project. It might be worthwhile to work with them, either offering them more detailed data on npm, or reusing some of their components https://github.com/librariesio

Benefit modules that have test coverage

It's a little bit odd for me to see, how modules without tests get quality score ~100.

Multiple packages point to popular GitHub repos

Multiple packages point to popular GitHub repos and most have almost no downloads. Searching for jquery this is very clear:

https://www.npmjs.com/package/jquery-compat
https://www.npmjs.com/package/jquery1
https://www.npmjs.com/package/shimney-jquery

Since these repos all point to the JQuery GitHub Repo, they end up having a high score, but they shouldn't.

This affects current searches, but could affect even more in the future if some scammers discover this issue.

Suggestion: If there's more than one package with the same GitHub repo only consider the GitHub stats for the one with more downloads on npm.

Some modules show an older version

See, for example, babel-core.

https://npms.io/search?term=babel-core (6.9.1)
https://www.npmjs.com/package/babel-core (6.10.4)

Improve stemming

The current stemmer is very loose. We opted for this because the aggressive one was causing some troubles, e.g.: expression -> express. Though we are loosing some meaningful results, e.g.: pm2 contains process and manager but searching for process management does not return it.

We could experiment having both loose and aggressive stemmers as separate fields and give the aggressive stemmer field a lower weight.

Search relevance should be improved

With a search for "pdf" I was expecting to find the most popular PDF generators (jspdf and pdfkit) but the result I got was not really good. Maybe you should give more credit to the GitHub stars?

do metadata collection in map-reduce?

I might be off-base with this since I've only just begun to read the code, but have you considered moving the metadata processing from lib/analyze/collect/metadata.js into map-reduce? All that data is already in CouchDB anyway, and MR is excellent for bulk processing.

Improve detection of 403 errors related to invalid github tokens

Ideally this should result in a log.fatal so that it gets well visible in kibana.

Enable experimentation

I think it would be very interesting to develop a system where it was easy to run some "what if..." tweaks to the algorithm and see how they impacted results.

Also, develop some reporting tools that allowed some analysis of how well the algorithm is working; Are there modules that rank high with popularity, personality, and quality, but suffer on the maintenance score? That might help us discover problems with the maintenance scoring algorithm.

Improve test coverage analysis

Before changing the average Y's for the scoring, the tests value should be improved by improving test coverage analysis. Test coverage is really important for the score we give.

link to non-public image

https://npms-io.slack.com/files/andreduarte/F0QU3SYJW/twe.png is mentioned in lib/scoring/score.js, but it's not publicly available 😢

Machine learning project

Hello, I tweeted you two weeks ago (sorry, i was on a backpacking vacation and did not really get a chance to properly sit down and write this).

I am working on a machine learning project which aims to rank node packages based on anything that is available online. To name sources: github archive, NPM, github API. I read your docs and noticed that you already collected most of the data.

Was wondering how interested you would be in collaborating.

Add retry mechanism for http requests for some 5xx status codes

503 - Service unavailable
504 - Gateway timeout
500 - Internal error

An in-range update of coveralls is breaking the build 🚨

Version 2.12.0 of coveralls just got published.

Branch	Build failing 🚨
Dependency	coveralls
Current Version	2.11.16
Type	devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As coveralls is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this 💪

Status Details

❌ continuous-integration/travis-ci/push The Travis CI build failed Details

Release Notes

Branch coverage support

Adds branch coverage data to Coveralls API post.

Commits

The new version differs by 2 commits .

d571dac merge, version bump
1575050 branching WIP

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.

Your Greenkeeper Bot 🌴

Does the algorithm work with monorepos?

I'm seeing low quality values for packages using a monorepo. Try searching for Babel or PouchDB for example. Does the algorithm take their different structure into account?

Consider indexing words from the README

Several developers publish modules without keywords. I don't blame them because, AFAIK, npm does not use them for search purposes. Though, these keywords might be in the README. We could select the top words from the README (excluding words from a common stopword list) and also index them. Then we could also use this field with lower priority when doing searches.

Detect if package.json `files` are specified

Similarly to hasNpmIgnore, it should contribute to the evaluation.quality.carefulness.

Some packages do not have the github object

I'm following the progress of the migration of the yeoman-generator-list to the npms.io API, but noticed that some packages don't have the github object, like generator-react-fullstack and generator-arc.

Is this a bug or the package's maintainers need to do something?

Should not put exact matches first

I don't care if a module matches what I wrote. I want the best one.

For example, https://npms.io/search?term=electron 99% of people would want the second result electron-prebuilt, not an obscure and unmaintained CLI framework with much lower score.

Terrible score on ckeditor, one of the most popular wysiwyg editor

I found two packages, one is called wysiwyg.

The other package is called ckeditor.

As you can see, wysiwyg has a fairly low download count at 40 download per month, while ckeditor is netting 20k per month.

However, the scoring data from npms-analyzer yields these:

ckeditor

  "score": {
    "final": 0.5160201557764348,
    "detail": {
      "quality": 0.6212124966030818,
      "popularity": 0.27654167033127186,
      "maintenance": 0.6653337776559003
    }

wysiwyg

  "score": {
    "final": 0.5907400025771101,
    "detail": {
      "quality": 0.7561566835417988,
      "popularity": 0.039946939481510804,
      "maintenance": 0.9997473391315477
    }

How could wysiwyg get a higher final score? Searching wysiwyg on npms doesn't even yield ckeditor. I had to search ckeditor exactly to get ckeditor. Yet, both of these packages has the wysiwyg tag.

An in-range update of pino is breaking the build 🚨

Version 3.0.5 of pino just got published.

Branch	Build failing 🚨
Dependency	pino
Current Version	3.0.4
Type	dependency

This version is covered by your current version range and after updating it in your project the build failed.

As pino is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you it’s very likely also breaking them.
I recommend you give this issue a very high priority. I’m sure you can resolve this 💪

Status Details

❌ continuous-integration/travis-ci/push The Travis CI build failed Details

Release Notes

v3.0.5

Reduce the size of the package #137 #136 @osher

Commits

The new version differs by 7 commits .

e1d8d8b Bumped v3.0.5.
e295fe1 Merge branch 'patch-1' of https://github.com/osher/pino into osher-patch-1
b545c01 remove LISENEE... - no need to mention it.
572cf68 include example.js file and test folder
18ef2a6 include example.js file and test folder
a02850f Use pagkage.files section instead of .npmignore
666b8f5 add .npmignore file

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.

Your Greenkeeper Bot 🌴

Deprecated modules

Deprecated modules should have a VERY LOW quality evaluation. At the moment, they are being lowered but not quite enough.

"jingo" doesn't appear in searches for wiki

https://npms.io/search?term=wiki doesn't produce jingo, which IMO is the most promising wiki engine based on Node.sj.

Is this a case of bad SEO, or bad scoring?

Public scoped packages are not being analyzed

Apparently the skimbdb registry does not have them, see: https://twitter.com/satazor/status/735107823256866816

Needs investigation.

One thing to note is that the exact-match boost needs to be tweaked so that a boost to "@polymer/polymer" is given when searching for just "polymer".

Prefer bulk operations whenever possible

Use bulk operations in scoring, tasks re-evaluate & tasks re-metadata to improve performance.

Do not impact finished modules negatively over time

https://github.com/npms-io/npms-analyzer/blob/master/docs/architecture.md#maintenance

It considers factors like "Most recent commit" and "Commit frequency", but doesn't take into account that often in the npm ecosystem small modules can be finished. Meaning they don't receive frequent commits. Not because they're badly maintained, but rather because they're focused on solving one problem and does it well.

Don't link to GitHub repo if it doesn't exists

In that case, fall back to the npm page.

Example, click the result here: https://npms.io/search?term=alfred-logger

npms shows wrong score for pouchdb

If I write pouchdb in the search box then pouchdb is returned first but with a very bad score, bad quality, bad popularity and bad maintenance, this is obviously wrong.

Tweak popularity weight

As @sindresorhus pointed out, dependantsCount and downloadsCount are much more important than the communityInterest (forks, stars, etc). The current weight scheme gives more importance to the communityInterest. We need to change the weight, experimenting with different values until we find the sweet spot.

e.g. when searching for parse-json should have higher popularity than body-parse-json.

Make tests pass on node v6

See: LinkedInAttic/sepia#15

Improve popularity by considering the longevity of the module

Several popularity metrics need take into the consideration how the module exists.

Provide a scoring breakdown to allow for more informed usage decisions

I see the code that crunches the numbers, but is there an easy way to inspect how scores are calculated for a particular package?

In my particular case I'm a new maintainer of the github-api module and I'm trying to figure out why our module's score is so abysmally low when after skimming the calculations used I can't see any place that we fall horribly short.

Benefit modules that have changelogs

Searching for "polymer" gives an odd first result

https://npms.io/search?term=polymer shows https://www.npmjs.com/package/polymer as the first result.

I'm not sure what's wrong with https://www.npmjs.com/package/Polymer that NPMJS thinks it has no README, but that, or better yet, https://www.npmjs.com/package/@polymer/polymer, should be the first result.

https://www.npmjs.com/package/npm-polymer-elements, published by a personality, doesn't show up until the 5th page of results or so.

Polymer packages are at https://www.npmjs.com/~polymer.

Finalize tests

analyze/index
analyze/util/*

observers/realtime
observers/stale

queue

Merge custom-avgy branch

This branch should solve #65 and similar cases. Blocking stuff:

Improve test coverage analysis
Tweak the avgy values for each metric

Give exact matches a higher boost

We are already giving a boost with the phrase prefix query but it needs to be re-evaluated. When searching for streaming-s3, the streaming-s3 itself appears in the second page. It feels right at first, because its score is low but it also feels wrong in some way..

Needs investigation, cc @atduarte
Reported in: https://twitter.com/tal_asad/status/741985395756257282

Polish tokenizer for certain strings

e.g.: socketio should be indexed as socket and socketio.

What is "download acceleration"

You mention "download acceleration" as an ranking attribute. What exactly is that?

Website not live yet

I'm seeing a namecheap domain expired when going to npms.io, are you aware of this?

Use releasesFrequency from metadata collector

An in-range update of fetch-coverage is breaking the build 🚨

Version 1.1.0 of fetch-coverage just got published.

Branch	Build failing 🚨
Dependency	fetch-coverage
Current Version	1.0.4
Type	dependency

This version is covered by your current version range and after updating it in your project the build failed.

As fetch-coverage is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you it’s very likely also breaking them.
I recommend you give this issue a very high priority. I’m sure you can resolve this 💪

Status Details

❌ continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 6 commits .

30de3a4 Release 1.1.0
f2cf14f Merge pull request #6 from dsifford/patch-1
840e442 Add missing 'codecov' service
307f2c3 Merge pull request #1 from IndigoUnited/greenkeeper/update-all
fc1ae41 chore(package): update dependencies
29ec141 Unnecessary.

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.

Your Greenkeeper Bot 🌴

author not populated correctly if maintainers/author email do not match

The email populated for my coworker @zkat's author field, does not match the emil populated in the maintainers field. This causes a problem for the following logic:

https://github.com/npms-io/npms-analyzer/blob/master/lib/analyze/collect/metadata.js#L176

Maybe we could do something creative in this edge-case like use the _npmUser field populated in the first publication ... nothing else immediately comes to my mind.

how to run with already-downloaded source-code files

It would be really nice if lib/analyze/download/npm.js could be skipped & I could just pass in paths to already-downloaded tarballs.

This would allow us to cache tarballs when rebuilding the database (if/when methods for analyzing source code change) and would allow us to distribute tarballs through alternate methods (like torrent backups). Also, I already have the npm registry backed up, so it would be nice to avoid hitting it again.

Identify whether written in es5 or es2015+

This is awesome project. I pretty much use it every day.
I would love to see this to identify which node versions the module can support without transpiling. In case of front end modules, it will help identify whether bundlers like webpack2 can take advantage of tree-shaking.

An in-range update of nock is breaking the build 🚨

Version 9.0.9 of nock just got published.

Branch	Build failing 🚨
Dependency	nock
Current Version	9.0.8
Type	devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As nock is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this 💪

Status Details

❌ continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 4 commits .

5b1a8d8 9.0.9: Revert PR #802
51809e1 Merge pull request #840 from node-nock/revert-802-fix-754
0a083f6 Revert "Fix request timeout no working"
3728a1f Changelog v9.0.8

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.

Your Greenkeeper Bot 🌴

Invalid alias name [npms-write], an index exists with the same name as the alias

$ ./cli.js scoring --log-level silly
verb util/bootstrap Elasticsearch is ready
verb util/bootstrap CouchDB for npms is ready
info Starting scoring cycle
info scoring/prepare Preparing scoring..
verb scoring/prepare Gathered elasticsearch info.. { indices: [ 'npms-1461027817406' ],
verb scoring/prepare   aliases: { read: [], write: [] } }
verb scoring/prepare Created new index { index: 'npms-1461028580891' }
ERR! Scoring cycle failed { err: 
ERR!    { [Error: [invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }]
ERR!      status: 400,
ERR!      displayName: 'BadRequest',
ERR!      message: '[invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }',
ERR!      path: '/_aliases',
ERR!      query: {},
ERR!      body: { error: [Object], status: 400 },
ERR!      statusCode: 400,
ERR!      response: '{"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"},"status":400}',
ERR!      toString: [Function],
ERR!      toJSON: [Function] } }
ERR! Stack:
ERR! at respond (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:289:15)
ERR!     at checkRespForFailure (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:248:7)
ERR!     at HttpConnector.<anonymous> (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
ERR!     at IncomingMessage.wrapper (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/node_modules/lodash/index.js:3095:19)
ERR!     at emitNone (events.js:85:20)
ERR!     at IncomingMessage.emit (events.js:179:7)
ERR!     at endReadableNT (_stream_readable.js:906:12)
ERR!     at nextTickCallbackWith2Args (node.js:475:9)
ERR!     at process._tickCallback (node.js:389:17)
ERR! From previous event:
ERR!     at prepare (/usrdata/proj/npms-analyzer/lib/scoring/prepare.js:61:6)
ERR!     at cycle (/usrdata/proj/npms-analyzer/cli/scoring.js:52:5)
ERR!     at /usrdata/proj/npms-analyzer/cli/scoring.js:110:21
ERR!     at processImmediate [as _immediateCallback] (timers.js:383:17)
ERR! From previous event:
ERR!     at /usrdata/proj/npms-analyzer/cli/scoring.js:110:10
ERR! From previous event:
ERR!     at Object.Promise.resolve.then.prepare.tap.then.exports.builder.exports.handler.bootstrap.spread [as handler] (/usrdata/proj/npms-analyzer/cli/scoring.js:103:6)
ERR!     at Object.self.runCommand (/usrdata/proj/npms-analyzer/node_modules/yargs/lib/command.js:113:22)
ERR!     at parseArgs (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:632:24)
ERR!     at Object.Yargs.Object.defineProperty.get [as argv] (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:592:16)
ERR!     at Object.<anonymous> (/usrdata/proj/npms-analyzer/cli.js:34:1)
ERR!     at Module._compile (module.js:413:34)
ERR!     at Object.Module._extensions..js (module.js:422:10)
ERR!     at Module.load (module.js:357:32)
ERR!     at Function.Module._load (module.js:314:12)
ERR!     at Function.Module.runMain (module.js:447:10)
ERR!     at startup (node.js:140:18)
ERR!     at node.js:1001:3
ERR! 
info Waiting 1 hour before running the next cycle.. { now: '2016-04-19T01:16:21.080Z' }
stat process pid: 8245; memory: 45.15 MB; uptime: 21 seconds
^C

...I'll start reading up on Elasticsearch

npms-io / npms-analyzer Goto Github PK

npms-analyzer's People

Contributors

Stargazers

Watchers

Forkers

npms-analyzer's Issues

Version 2.12.0 of coveralls just got published.

Version 3.0.5 of pino just got published.

Version 1.1.0 of fetch-coverage just got published.

Version 9.0.9 of nock just got published.

Recommend Projects

Recommend Topics

Recommend Org