Giter VIP home page Giter VIP logo

pkgnet's People

Contributors

adammcelhinney avatar bburns632 avatar bburns7 avatar bobknox42 avatar brian-burns-tc avatar j450h1 avatar jameslamb avatar jayqi avatar mcguinlu avatar mfrasco avatar olivroy avatar patrick-boueri avatar shivamx96 avatar terrytangyuan avatar tylergrantsmith avatar wdearden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pkgnet's Issues

PackageFunctionReporter$calculate_metrics(): error in covr but not tests

Oddly, with PR #42, devtools::test(".") returns a success, no issues found with the unit tests.

> devtools::test(".")
Loading pkgnet
Testing pkgnet
Abstract Graph Reporter Tests: ....
Abstract Package Reporter Tests: ....
CreatePackageReport: ..
Creation of graph of package functions: ...........
Package Dependency Reporter Tests: ........................
Package Function Reporter Tests: ...................NULL
.......

DONE ==================================================================================================
> 

However, covr::package_coverage(".") returns the following issue:

> testthat::test_check('pkgnet')
NULL
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127) 
testObj$calculate_metrics() produced warnings.


testthat results ================================================================
OK: 78 SKIPPED: 0 FAILED: 1
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127) 

Error: testthat unit tests failed
Execution halted

At the moment, this just means we cannot evaluate the coverage of package net. We can still instate the unit tests and associated code.

It has proven a difficult bug to remedy. Help is appreciated.

pkgdown docs are broken

The file _pkgdown.yml is way out of date with the current state of the package. Fixing this issue involves:

  1. updating _pkgdown.yml to reflect the actual functions in the package
  2. re-running pkgdown::build_sit() to generate the docs

function graph flipping

A graph is plotted twice from the same package reporter object, and the second plot is flipped compared to the first plot. The color field is changed between plots, one discrete and the other continuous, but I'm not sure if that's the cause. The edges are not reversed.

Both graphs should have the same node arrangement, just different colors.

Example:

 b <- PackageFunctionReporter$new()
  b$set_package('baseballstats'
 , packagePath = system.file('baseballstats',package="pkgnet")
 )
  b$calculate_metrics()
  b$get_raw_data()
  b$set_plot_node_color_scheme(field = "coverageRatio"
                               , pallete = c("red", "green")
  )
 
# First Plot
  expect_output(object = plotObj <- b$plot_network()
                , regexp = "Done creating plot"
                , info = "Plot with continuous coloring has issues")
  
  b$set_plot_node_color_scheme(field = "filename"
                               , pallete = RColorBrewer::brewer.pal(9,"Set1")
  )
  
# Second Plot
  expect_output(object = plotObj <- b$plot_network()
                , regexp = "Done creating plot"
                , info = "Plot with continuous coloring has issues")

First Plot:
image

Second Plot:
image

Package Report

I think the original intent of each PackageReporter was that it was able to knit a child markdown page, and give a high level summary stats, that could then be bundled into a single report for your package. Each page would have some intro about it, and a rank ordered list of "what do i do next if i want to fix this", as well as maybe some cool plots. I think this ordering should be pluggable but we can have sensible defaults.

For our current reporters this is what i envision:

  • "Dependencies" has a rank ordered list of packages you inherit for limited functionality based on the number of dependencies they introduce and how often they are used within a package (relies on namespacing, also another good candidate for a reporter. Which functions are ambiguous)
  • "Functions" has a centrality measure and ideally a testcoverage score, the product of which (with some edge conditioning) allows for you to know which test to write next.

Is this inline with everyone's thinking?

We should have a walkthrough with instructions on using pkgnet to curate a project

Our philosophy for this package is that people using pkgnet should be using it to make thoughtful choices about how they manage their code. We need some docs with one or two examples where we say "here's how you find this thing with pkgnet, here's what to do if it says this"

I think it would be appropriate to put in the project README (which will also get hosted on our pkgdown site)

Package Summary Reporter

This is a high level reporter that implements AbstractPackageReporter. It will provide high level information about the package in question re: authors, number of lines, license etc.

Code style is inconsistent throughout the repo

I propose we adopt camelCase for R6 class names and snake_case for everything else.

I definitely feel strongly that method names in the R6 classes should be snake_case because you need to use the $ accessor

object$doStuff()

is grosser, IMHO, than:

object$do_stuff()

Style guide

Does it make sense to have a style guide? Otherwise it's confusing what we should be doing in terms of:

  • Method names
  • Variable names
  • How to write roxygen documentation
  • Best way to comment stuff inline
  • Indent length
  • Commas

Maybe merge with the CONDUCT.md into a CONTRIBUTING.md?

typo in the vignette

C is dependent upon both A and C.

whoops.

needs to be:

C is dependent upon both A and B.

pkgnet does not currently pass R CMD check

Need to fix a bunch of miscellaneous documentation errors and other small things.

To see what needs to be fixed, you can run the following (after cloning this repo):

R CMD BUILD pkgnet
R CMD check pkgnet_0.1.0.tar.gz --no-tests

"Resetting cached network information" printed for new PackageDependencyReporter and PackageFunctionReporter

Issue: Initializing new PackageDependencyReporter and PackageFunctionReporter and using set_package will result in an unintuitive "Resetting cached network information..." logger statement to be printed.

To reproduce:

> myDepReporter <- PackageDependencyReporter$new()
> myDepReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:31] Resetting cached network information...

and

> myFuncReporter <- PackageFunctionReporter$new()
> myFuncReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:48] Resetting cached network information...

The reason this happens is because the reset_cache method of AbstractPackageReporter is checking whether the cache is null. Both PackageFunctionReporter and PackageDependencyReporter start with cache as a list of several variables with null values.

The more complicated and less flexible way to fix this is to have reset_cache check if cache is a list, and then check whether each value in the list is null. This will not work well if we ever end up having cached fields that initialize as a non-null value.

Another implementation that addresses this is to have cache initialized as null with a new reporter instance, and a template default_cache that the cache gets set/reset to. See implementation in older version here and here:

cannot install or otherwise reference test packages during Travis CI checks

We have two packages we created for testing: baseballstats and sartre. They are saved in the inst folder which seemingly are not saved in the temp folder created for unit testing and cannot be referenced otherwise.

Any help for a work around to reference these packages for testing would be appreciated. It is currently holding up PR #27 .

allow import of igraph object

In the spirit of integration with other packages, allow a network to be given to pkgnet in the form of an igraph object.

igraph is a popular network analysis tool & format written in R, python, and c++.

CreatePackageReport creates the report in /Library instead of the current working directory

When I evaluate CreatePackageReport with the default argument for report_path, it saves the output within the /Library/Frameworks/R.framework/Versions/3.4/Resources/library/pkgnet/package_report/ directory on my machine.

I think the issue is that the getwd() call that is supplied to report_path is not being evaluated within the calling environment. This suspicion was confirmed when I passed a string literal to report_path, as the html report was generated in the correct location.

Clearer error for CreatePackageReport with unavailable package

When running CreatePackageReport with a package that isn't available:
t <- CreatePackageReport(pkg_name = "notarealpackage")

It runs through a fair amount before failing on rendering the summary report.

INFO [2018-04-18 18:47:53] Creating package report for package notarealpackage with reporters:

SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Rendering package report...
Quitting from lines NA-10 (/usr/local/lib/R/3.4/site-library/pkgnet/package_report/package_summary_reporter.Rmd) 
 Error in data.table::data.table(Field = names(desc), Values = unlist(desc)) : 
  column or argument 1 is NULL 

Return value for CreatePackageReport is pretty gross

Ideally, CreatePackageReport should generate plots AND return the network to you in a usable form.

Right now it returns a bit gross list of plotting params.

Try it:

thing <- CreatePackageReport('ggplot2')
str(thing, max.level = 1)

Add ability to investigate individual function dependencies

I know this may be a lot of effort, but it would be great to be able to build dependency graphs for specific functions within a package. Even better, count the number of function calls for functions from other packages. This would be very useful as a package ages and dependencies change. You could look at each function within your package to see if there are any functions which have only a few calls to very "heavy" packages. This way you could look at alternatives to reduce the dependencies of a package.

we should remove the prefix Package* from reporter names

pkgnet is only intended for the analysis of packages. It's unnecessary to have every class's name start with Package. I think this would be cleaner:

  • PackageSummaryReporter --> SummaryReporter
  • PackageDependencyReporter --> DependencyReporter
  • PackageFunctionReporter --> FunctionReporter

@bburns632 @jayqi thoughts? I can do this tomorrow if you agree

calculate_test_coverage() does not update networkMeasures

Apparently, this code does not actually update network_measures.

> t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~/lubridateReport.html")
> str(t$FunctionReporter$network_measures)
List of 3
 $ centralization.OutDegree  : num 0.105
 $ centralization.betweenness: num 0.000187
 $ centralization.closeness  : num 0.000507

Results of `PackageFunctionReporter` and `PackageDependencyReporter` are switched

Currently, a PackageFunctionReporter returns the dependency network and PackageDependencyReporter returns the functional network. That should be the other way around.

Example:

t <- PackageFunctionReporter$new()
t$set_package('uptasticsearch', packagePath = "~/repos/uptasticsearch")
t$calculate_metrics()
t$get_raw_data()

returns

$cache
$cache$networkMeasures
NULL


$pkgGraph
IGRAPH 31df00b DN-- 5 2 -- 
+ attr: name (v/c)
+ edges from 31df00b (vertex names):
[1] chomp_aggs        ->es_search  unpack_nested_data->chomp_aggs

$nodes
         node coverage
1:          1       NA
2:          6       NA
3: chomp_aggs        1
4:  es_search        0

$edges
       TARGET             SOURCE value
1:  es_search         chomp_aggs     1
2: chomp_aggs unpack_nested_data     1

<...>

Those nodes and edges reflect the package dependencies, not the functions.

Remove reliance on mvbutils

mvbutils::foodweb is a key function used by the current version of pkgnet. That makes me nervous...mvbutils most recent release was in 2015, it has no tests AFAICT, and there is not clear package philosophy. Most of what's in that package is irrelevant to pkgnet and (in the spirit of what we're doing here) carrying it around as a shadow dependency seems like a risk we shouldn't tolerate.

Here's the core code for foodweb: https://github.com/markbravington/mvbutils/blob/97c07457e3af05c52ebc5dd241e7b048190dc7df/R/mvbutils.R#L3895

And where we use it: https://github.com/UptakeOpenSource/pkgnet/blob/8d7781dd3774b4abad4abccb11f2cd59a38e93d0/R/PackageFunctionReporter.R#L83

I propose that we contact the package maintainer for mvbutils and talk about the possibility of ripping foodweb out of his package, putting it into ours as a non-exported function, changing it to match our style and other standards, and adding him as an author on pkgnet. Thoughts?

#16 Package score ( health) implementation idea

#16 Referring on the package score issue -- @bburns632 ,Some of the quick deployment options,We could use in the next release can be-

  1. Using the statistics like number of downloads every week/month or use a rolling average of the number of downloads as a parameter to implement the green/yellow/red associated with each dimension and an overall health.
  2. Using the downloads since recent major update can also be used as a feature to assign a health to the package .Using the statistics of data from different packages ,we could normalise them and give out a score too.
  3. Getting further data about the downloads, and from benchmarking the data for different qualities( red/yellow/green) will be a quick fix.
    I want to open up this issue for further discussion and feedback on this rough sketch .. @jameslamb

Proposal: Prevent set_package after a package has already been set

Right now, we have a lot of calculated stuff cached in the Reporters, e.g., nodes, edges, metrics. If a user overwrites the package by calling set_package again, then it invalidates those cached objects and so we have functionality to reset_cache and wipe them out.

This has a downside in that having to worry about cache resetting adds complexity. As an example, it has led to bug #70.

I propose that instead, we prevent users from overwriting the package set for a Reporter (i.e., if private$private_pkg_name exists, throw a fatal error if they try to use set_package).

pkg_path in CreatePackageReport needs to be an absolute path

The documentation in CreatePackageReport does not specify whether the argument pkg_path needs to be an absolute path. However, when I used a relative path, I got the following error Error:path is invalid: <my_relative_path>.

Is this the intended behavior? And, if it is, would you be open to a PR where I clarified the documentation?

Thanks for the package!

Group orphan nodes together above a threshold value

Create a private variable orphanNodeClusteringThreshold and leverage visNetwork::visClusteringByConnection (or similar visNetwork function) to group orphan nodes together as one cluster node when there are more orphan nodes than the parameter, orphanNodeClusteringThreshold.

Reason being that the graphs get super busy when there are a lot of orphan nodes. See lubridate's graph as an example.

appveyor testing is not set up

Looking for someone to figure out how to configure appveyor testing for this repo. I've added the project to the UptakeOpenSource account with appveyor, just need someone to figure out how to create the .appveyor.yml that will run our tests

node clustering feature has been dropped accidentally

It seems as though node clustering has been removed as a feature in one of our recent PRs. Whoops.

We need to:

  1. Reenable it
  2. write a unit test to check it moving forward.

It was a minor feature, so not an urgent issue. However, it should be PR'd back in before v0.3

CRAN compliance: Test Packages "baseballstats" and "sartre" should not write to \Library folder

pkgnet has a minor CRAN compliance issue. I believe this is due to both issue #78 and our custom subpages for testing and vignettes being built in the \Library folder. The log files on CRAN seem to support this theory.

Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory (or during installation in the location
pointed to by TMPDIR: and such usage should be cleaned up).

Modifications or alternates to the devtools::install_local functions should be explored in the unit tests and the vignette.

Package Score

As I understand the original vision according @bburns632 , every package should have a "Score" that's a weighted metric of all the factors we consider for a package. It should have an indicator like green/yellow/red associated with each dimension and an overall health. The purpose of this in the future is not only to list your dependencies, but also your dependencies health as a function of its structure and its dependencies' health too.

I want to open up this issue for discussion to see if this is still on the table, and if so, what are some rough sketches about what it should look like?

Check and fail fast for bad inputs to CreatePackageReport

If a user puts something bad for report_path, like a directory instead of a filename, CreatePackageReport runs all the way through but then fails when it tries to knit the rmd.

Input:

t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~")

Result:

INFO [2018-04-18 18:40:20] Creating package report for package lubridate with reporters:

SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Rendering package report...
[WARNING] This document format requires a nonempty <title> element.
  Please specify either 'title' or 'pagetitle' in the metadata.
  Falling back to 'package_report.utf8'
pandoc: /Users/jqi: openFile: inappropriate type (Is a directory)
Error: pandoc document conversion failed with error 1

Function Network Plots Not 100% Repeatable

If you render the visualization for the function network multiple times in a row, you will see a few different versions.

To observe this, run:

library(pkgnet)
t <- CreatePackageReport("lubridate")
t[["PackageFunctionReporter"]]$plot_network()

... and repeat...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.