npms-io / npms-analyzer Goto Github PK
View Code? Open in Web Editor NEWThe analyzer behind https://npms.io
License: MIT License
The analyzer behind https://npms.io
License: MIT License
Many frontend developers will thank you :)
2 interestings links:
npm/registry#8
I feel that, if you have a (tiny) module with some (huge) dependencies, the quality of the dependencies, which are also executing in a user's environment, should affect the overall quality of your module (a bit).
As a side product this would encourage developers to look for (alternative) quality dependencies, if they have a choice, or even motivate them to contribute to the quality of their dependencies. Win-Win.
For example, I am currently working on the quality of one of my modules and I previously outsourced a couple of its internals to their own modules. Now that these do not affect the quality of the main module anymore, even though they are substantial parts of it, I do not feel the same motivation to improve their quality as well (mainly coverage), which is stupid of course, but could be solved once and for all by technology.
The search for "MySQL" returns different results than searching for "mysql". Iβm guessing the search should not be case-sensitive.
libraries.io seems to have a similar goal to this project. It might be worthwhile to work with them, either offering them more detailed data on npm, or reusing some of their components https://github.com/librariesio
It's a little bit odd for me to see, how modules without tests get quality score ~100.
Multiple packages point to popular GitHub repos and most have almost no downloads. Searching for jquery this is very clear:
https://www.npmjs.com/package/jquery-compat
https://www.npmjs.com/package/jquery1
https://www.npmjs.com/package/shimney-jquery
Since these repos all point to the JQuery GitHub Repo, they end up having a high score, but they shouldn't.
This affects current searches, but could affect even more in the future if some scammers discover this issue.
Suggestion: If there's more than one package with the same GitHub repo only consider the GitHub stats for the one with more downloads on npm.
See, for example, babel-core
.
https://npms.io/search?term=babel-core (6.9.1)
https://www.npmjs.com/package/babel-core (6.10.4)
The current stemmer is very loose. We opted for this because the aggressive one was causing some troubles, e.g.: expression -> express
. Though we are loosing some meaningful results, e.g.: pm2
contains process
and manager
but searching for process management
does not return it.
We could experiment having both loose and aggressive stemmers as separate fields and give the aggressive stemmer field a lower weight.
With a search for "pdf" I was expecting to find the most popular PDF generators (jspdf and pdfkit) but the result I got was not really good. Maybe you should give more credit to the GitHub stars?
I might be off-base with this since I've only just begun to read the code, but have you considered moving the metadata processing from lib/analyze/collect/metadata.js
into map-reduce? All that data is already in CouchDB anyway, and MR is excellent for bulk processing.
Ideally this should result in a log.fatal so that it gets well visible in kibana.
I think it would be very interesting to develop a system where it was easy to run some "what if..." tweaks to the algorithm and see how they impacted results.
Also, develop some reporting tools that allowed some analysis of how well the algorithm is working; Are there modules that rank high with popularity, personality, and quality, but suffer on the maintenance score? That might help us discover problems with the maintenance scoring algorithm.
Before changing the average Y's for the scoring, the tests value should be improved by improving test coverage analysis. Test coverage is really important for the score we give.
https://npms-io.slack.com/files/andreduarte/F0QU3SYJW/twe.png
is mentioned in lib/scoring/score.js
, but it's not publicly available π’
Hello, I tweeted you two weeks ago (sorry, i was on a backpacking vacation and did not really get a chance to properly sit down and write this).
I am working on a machine learning project which aims to rank node packages based on anything that is available online. To name sources: github archive, NPM, github API. I read your docs and noticed that you already collected most of the data.
Was wondering how interested you would be in collaborating.
503 - Service unavailable
504 - Gateway timeout
500 - Internal error
Branch | Build failing π¨ |
---|---|
Dependency | coveralls |
Current Version | 2.11.16 |
Type | devDependency |
This version is covered by your current version range and after updating it in your project the build failed.
As coveralls is βonlyβ a devDependency of this project it might not break production or downstream projects, but βonlyβ your build or test tools β preventing new deploys or publishes.
I recommend you give this issue a high priority. Iβm sure you can resolve this πͺ
Adds branch coverage data to Coveralls API post.
There is a collection of frequently asked questions and of course you may always ask my humans.
Your Greenkeeper Bot π΄
I'm seeing low quality values for packages using a monorepo. Try searching for Babel or PouchDB for example. Does the algorithm take their different structure into account?
Several developers publish modules without keywords. I don't blame them because, AFAIK, npm does not use them for search purposes. Though, these keywords might be in the README. We could select the top words from the README (excluding words from a common stopword list) and also index them. Then we could also use this field with lower priority when doing searches.
Similarly to hasNpmIgnore
, it should contribute to the evaluation.quality.carefulness
.
I'm following the progress of the migration of the yeoman-generator-list to the npms.io API, but noticed that some packages don't have the github object, like generator-react-fullstack and generator-arc.
Is this a bug or the package's maintainers need to do something?
I don't care if a module matches what I wrote. I want the best one.
For example, https://npms.io/search?term=electron 99% of people would want the second result electron-prebuilt
, not an obscure and unmaintained CLI framework with much lower score.
I found two packages, one is called wysiwyg.
The other package is called ckeditor.
As you can see, wysiwyg has a fairly low download count at 40 download per month, while ckeditor is netting 20k per month.
However, the scoring data from npms-analyzer yields these:
"score": {
"final": 0.5160201557764348,
"detail": {
"quality": 0.6212124966030818,
"popularity": 0.27654167033127186,
"maintenance": 0.6653337776559003
}
"score": {
"final": 0.5907400025771101,
"detail": {
"quality": 0.7561566835417988,
"popularity": 0.039946939481510804,
"maintenance": 0.9997473391315477
}
How could wysiwyg get a higher final score? Searching wysiwyg
on npms
doesn't even yield ckeditor
. I had to search ckeditor exactly to get ckeditor. Yet, both of these packages has the wysiwyg
tag.
Branch | Build failing π¨ |
---|---|
Dependency | pino |
Current Version | 3.0.4 |
Type | dependency |
This version is covered by your current version range and after updating it in your project the build failed.
As pino is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you itβs very likely also breaking them.
I recommend you give this issue a very high priority. Iβm sure you can resolve this πͺ
The new version differs by 7 commits .
e1d8d8b
Bumped v3.0.5.
e295fe1
Merge branch 'patch-1' of https://github.com/osher/pino into osher-patch-1
b545c01
remove LISENEE... - no need to mention it.
572cf68
include example.js file and test folder
18ef2a6
include example.js file and test folder
a02850f
Use pagkage.files section instead of .npmignore
666b8f5
add .npmignore file
See the full diff.
There is a collection of frequently asked questions and of course you may always ask my humans.
Your Greenkeeper Bot π΄
Deprecated modules should have a VERY LOW quality evaluation. At the moment, they are being lowered but not quite enough.
https://npms.io/search?term=wiki doesn't produce jingo, which IMO is the most promising wiki engine based on Node.sj.
Is this a case of bad SEO, or bad scoring?
Apparently the skimbdb registry does not have them, see: https://twitter.com/satazor/status/735107823256866816
Needs investigation.
One thing to note is that the exact-match boost needs to be tweaked so that a boost to "@polymer/polymer" is given when searching for just "polymer".
Use bulk operations in scoring, tasks re-evaluate & tasks re-metadata to improve performance.
https://github.com/npms-io/npms-analyzer/blob/master/docs/architecture.md#maintenance
It considers factors like "Most recent commit" and "Commit frequency", but doesn't take into account that often in the npm ecosystem small modules can be finished. Meaning they don't receive frequent commits. Not because they're badly maintained, but rather because they're focused on solving one problem and does it well.
In that case, fall back to the npm page.
Example, click the result here: https://npms.io/search?term=alfred-logger
If I write pouchdb in the search box then pouchdb is returned first but with a very bad score, bad quality, bad popularity and bad maintenance, this is obviously wrong.
As @sindresorhus pointed out, dependantsCount
and downloadsCount
are much more important than the communityInterest
(forks, stars, etc). The current weight scheme gives more importance to the communityInterest
. We need to change the weight, experimenting with different values until we find the sweet spot.
e.g. when searching for parse-json
should have higher popularity than body-parse-json
.
Several popularity metrics need take into the consideration how the module exists.
I see the code that crunches the numbers, but is there an easy way to inspect how scores are calculated for a particular package?
In my particular case I'm a new maintainer of the github-api module and I'm trying to figure out why our module's score is so abysmally low when after skimming the calculations used I can't see any place that we fall horribly short.
https://npms.io/search?term=polymer shows https://www.npmjs.com/package/polymer as the first result.
I'm not sure what's wrong with https://www.npmjs.com/package/Polymer that NPMJS thinks it has no README, but that, or better yet, https://www.npmjs.com/package/@polymer/polymer, should be the first result.
https://www.npmjs.com/package/npm-polymer-elements, published by a personality, doesn't show up until the 5th page of results or so.
Polymer packages are at https://www.npmjs.com/~polymer.
This branch should solve #65 and similar cases. Blocking stuff:
We are already giving a boost with the phrase prefix query but it needs to be re-evaluated. When searching for streaming-s3
, the streaming-s3
itself appears in the second page. It feels right at first, because its score is low but it also feels wrong in some way..
Needs investigation, cc @atduarte
Reported in: https://twitter.com/tal_asad/status/741985395756257282
e.g.: socketio should be indexed as socket and socketio.
You mention "download acceleration" as an ranking attribute. What exactly is that?
I'm seeing a namecheap
domain expired when going to npms.io, are you aware of this?
Branch | Build failing π¨ |
---|---|
Dependency | fetch-coverage |
Current Version | 1.0.4 |
Type | dependency |
This version is covered by your current version range and after updating it in your project the build failed.
As fetch-coverage is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you itβs very likely also breaking them.
I recommend you give this issue a very high priority. Iβm sure you can resolve this πͺ
The new version differs by 6 commits .
30de3a4
Release 1.1.0
f2cf14f
Merge pull request #6 from dsifford/patch-1
840e442
Add missing 'codecov' service
307f2c3
Merge pull request #1 from IndigoUnited/greenkeeper/update-all
fc1ae41
chore(package): update dependencies
29ec141
Unnecessary.
See the full diff.
There is a collection of frequently asked questions and of course you may always ask my humans.
Your Greenkeeper Bot π΄
The email populated for my coworker @zkat's author field, does not match the emil populated in the maintainers
field. This causes a problem for the following logic:
https://github.com/npms-io/npms-analyzer/blob/master/lib/analyze/collect/metadata.js#L176
Maybe we could do something creative in this edge-case like use the _npmUser
field populated in the first publication ... nothing else immediately comes to my mind.
It would be really nice if lib/analyze/download/npm.js
could be skipped & I could just pass in paths to already-downloaded tarballs.
This would allow us to cache tarballs when rebuilding the database (if/when methods for analyzing source code change) and would allow us to distribute tarballs through alternate methods (like torrent backups). Also, I already have the npm registry backed up, so it would be nice to avoid hitting it again.
This is awesome project. I pretty much use it every day.
I would love to see this to identify which node versions the module can support without transpiling. In case of front end modules, it will help identify whether bundlers like webpack2
can take advantage of tree-shaking.
Branch | Build failing π¨ |
---|---|
Dependency | nock |
Current Version | 9.0.8 |
Type | devDependency |
This version is covered by your current version range and after updating it in your project the build failed.
As nock is βonlyβ a devDependency of this project it might not break production or downstream projects, but βonlyβ your build or test tools β preventing new deploys or publishes.
I recommend you give this issue a high priority. Iβm sure you can resolve this πͺ
The new version differs by 4 commits .
5b1a8d8
9.0.9: Revert PR #802
51809e1
Merge pull request #840 from node-nock/revert-802-fix-754
0a083f6
Revert "Fix request timeout no working"
3728a1f
Changelog v9.0.8
See the full diff.
There is a collection of frequently asked questions and of course you may always ask my humans.
Your Greenkeeper Bot π΄
$ ./cli.js scoring --log-level silly
verb util/bootstrap Elasticsearch is ready
verb util/bootstrap CouchDB for npms is ready
info Starting scoring cycle
info scoring/prepare Preparing scoring..
verb scoring/prepare Gathered elasticsearch info.. { indices: [ 'npms-1461027817406' ],
verb scoring/prepare aliases: { read: [], write: [] } }
verb scoring/prepare Created new index { index: 'npms-1461028580891' }
ERR! Scoring cycle failed { err:
ERR! { [Error: [invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }]
ERR! status: 400,
ERR! displayName: 'BadRequest',
ERR! message: '[invalid_alias_name_exception] Invalid alias name [npms-write], an index exists with the same name as the alias, with { index=npms-1461028580891 }',
ERR! path: '/_aliases',
ERR! query: {},
ERR! body: { error: [Object], status: 400 },
ERR! statusCode: 400,
ERR! response: '{"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [npms-write], an index exists with the same name as the alias","index":"npms-1461028580891"},"status":400}',
ERR! toString: [Function],
ERR! toJSON: [Function] } }
ERR! Stack:
ERR! at respond (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:289:15)
ERR! at checkRespForFailure (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/transport.js:248:7)
ERR! at HttpConnector.<anonymous> (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/src/lib/connectors/http.js:164:7)
ERR! at IncomingMessage.wrapper (/usrdata/proj/npms-analyzer/node_modules/elasticsearch/node_modules/lodash/index.js:3095:19)
ERR! at emitNone (events.js:85:20)
ERR! at IncomingMessage.emit (events.js:179:7)
ERR! at endReadableNT (_stream_readable.js:906:12)
ERR! at nextTickCallbackWith2Args (node.js:475:9)
ERR! at process._tickCallback (node.js:389:17)
ERR! From previous event:
ERR! at prepare (/usrdata/proj/npms-analyzer/lib/scoring/prepare.js:61:6)
ERR! at cycle (/usrdata/proj/npms-analyzer/cli/scoring.js:52:5)
ERR! at /usrdata/proj/npms-analyzer/cli/scoring.js:110:21
ERR! at processImmediate [as _immediateCallback] (timers.js:383:17)
ERR! From previous event:
ERR! at /usrdata/proj/npms-analyzer/cli/scoring.js:110:10
ERR! From previous event:
ERR! at Object.Promise.resolve.then.prepare.tap.then.exports.builder.exports.handler.bootstrap.spread [as handler] (/usrdata/proj/npms-analyzer/cli/scoring.js:103:6)
ERR! at Object.self.runCommand (/usrdata/proj/npms-analyzer/node_modules/yargs/lib/command.js:113:22)
ERR! at parseArgs (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:632:24)
ERR! at Object.Yargs.Object.defineProperty.get [as argv] (/usrdata/proj/npms-analyzer/node_modules/yargs/yargs.js:592:16)
ERR! at Object.<anonymous> (/usrdata/proj/npms-analyzer/cli.js:34:1)
ERR! at Module._compile (module.js:413:34)
ERR! at Object.Module._extensions..js (module.js:422:10)
ERR! at Module.load (module.js:357:32)
ERR! at Function.Module._load (module.js:314:12)
ERR! at Function.Module.runMain (module.js:447:10)
ERR! at startup (node.js:140:18)
ERR! at node.js:1001:3
ERR!
info Waiting 1 hour before running the next cycle.. { now: '2016-04-19T01:16:21.080Z' }
stat process pid: 8245; memory: 45.15 MB; uptime: 21 seconds
^C
...I'll start reading up on Elasticsearch
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.