uptake / pkgnet Goto Github PK

View Code? Open in Web Editor NEW

152.0 152.0 37.0 5.96 MB

R package for analyzing other R packages via graph representations of their dependencies

Home Page: https://uptake.github.io/pkgnet/

License: Other

R 99.26% HTML 0.15% Shell 0.60%

code-quality dependency-analysis dependency-graph graph-theory r r-package

pkgnet's People

Contributors

Stargazers

Watchers

pkgnet's Issues

PackageFunctionReporter$calculate_metrics(): error in covr but not tests

Oddly, with PR #42, devtools::test(".") returns a success, no issues found with the unit tests.

> devtools::test(".")
Loading pkgnet
Testing pkgnet
Abstract Graph Reporter Tests: ....
Abstract Package Reporter Tests: ....
CreatePackageReport: ..
Creation of graph of package functions: ...........
Package Dependency Reporter Tests: ........................
Package Function Reporter Tests: ...................NULL
.......

DONE ==================================================================================================
>

However, covr::package_coverage(".") returns the following issue:

> testthat::test_check('pkgnet')
NULL
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127) 
testObj$calculate_metrics() produced warnings.


testthat results ================================================================
OK: 78 SKIPPED: 0 FAILED: 1
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127) 

Error: testthat unit tests failed
Execution halted

At the moment, this just means we cannot evaluate the coverage of package net. We can still instate the unit tests and associated code.

It has proven a difficult bug to remedy. Help is appreciated.

pkgdown docs are broken

The file _pkgdown.yml is way out of date with the current state of the package. Fixing this issue involves:

updating _pkgdown.yml to reflect the actual functions in the package
re-running pkgdown::build_sit() to generate the docs

A graph is plotted twice from the same package reporter object, and the second plot is flipped compared to the first plot. The color field is changed between plots, one discrete and the other continuous, but I'm not sure if that's the cause. The edges are not reversed.

Both graphs should have the same node arrangement, just different colors.

Example:

 b <- PackageFunctionReporter$new()
  b$set_package('baseballstats'
 , packagePath = system.file('baseballstats',package="pkgnet")
 )
  b$calculate_metrics()
  b$get_raw_data()
  b$set_plot_node_color_scheme(field = "coverageRatio"
                               , pallete = c("red", "green")
  )
 
# First Plot
  expect_output(object = plotObj <- b$plot_network()
                , regexp = "Done creating plot"
                , info = "Plot with continuous coloring has issues")
  
  b$set_plot_node_color_scheme(field = "filename"
                               , pallete = RColorBrewer::brewer.pal(9,"Set1")
  )
  
# Second Plot
  expect_output(object = plotObj <- b$plot_network()
                , regexp = "Done creating plot"
                , info = "Plot with continuous coloring has issues")

First Plot:

Second Plot:

Package Report

I think the original intent of each PackageReporter was that it was able to knit a child markdown page, and give a high level summary stats, that could then be bundled into a single report for your package. Each page would have some intro about it, and a rank ordered list of "what do i do next if i want to fix this", as well as maybe some cool plots. I think this ordering should be pluggable but we can have sensible defaults.

For our current reporters this is what i envision:

"Dependencies" has a rank ordered list of packages you inherit for limited functionality based on the number of dependencies they introduce and how often they are used within a package (relies on namespacing, also another good candidate for a reporter. Which functions are ambiguous)
"Functions" has a centrality measure and ideally a testcoverage score, the product of which (with some edge conditioning) allows for you to know which test to write next.

Is this inline with everyone's thinking?

We should have a walkthrough with instructions on using pkgnet to curate a project

Our philosophy for this package is that people using pkgnet should be using it to make thoughtful choices about how they manage their code. We need some docs with one or two examples where we say "here's how you find this thing with pkgnet, here's what to do if it says this"

I think it would be appropriate to put in the project README (which will also get hosted on our pkgdown site)

allow coloring of function nodes based on the script they came from

This should be a byproduct of the covr::tally_coverage function in PR #24 . However, I believe these fields are being dropped before populating the nodes table. We should attach them as well

Adjust Node Placement in Functional Network for Large Networks

Plots of the functional network for large packages are currently very hard to read. #45 might help a bit but this will probably still need some lovin'.

Precision is inconsisent in report tables

Tables produced by get_summary_view() produce inconsistent precision. Should add some kind of rounding

Package Summary Reporter

This is a high level reporter that implements AbstractPackageReporter. It will provide high level information about the package in question re: authors, number of lines, license etc.

Code style is inconsistent throughout the repo

I propose we adopt camelCase for R6 class names and snake_case for everything else.

I definitely feel strongly that method names in the R6 classes should be snake_case because you need to use the $ accessor

object$doStuff()

is grosser, IMHO, than:

object$do_stuff()

Style guide

Does it make sense to have a style guide? Otherwise it's confusing what we should be doing in terms of:

Method names
Variable names
How to write roxygen documentation
Best way to comment stuff inline
Indent length
Commas

Maybe merge with the CONDUCT.md into a CONTRIBUTING.md?

typo in the vignette

C is dependent upon both A and C.

whoops.

needs to be:

C is dependent upon both A and B.

pkgnet does not currently pass R CMD check

Need to fix a bunch of miscellaneous documentation errors and other small things.

To see what needs to be fixed, you can run the following (after cloning this repo):

R CMD BUILD pkgnet
R CMD check pkgnet_0.1.0.tar.gz --no-tests

Get pkgnet into CRAN compliance

Basically work through all the errors in the below command to get ready for cran release

R CMD CHECK --as-cran .

add codecov token ticker

Add a codecov token ticker to the readme to display package coverage.

Here's an example: rendered , raw code - line 2

"Resetting cached network information" printed for new PackageDependencyReporter and PackageFunctionReporter

Issue: Initializing new PackageDependencyReporter and PackageFunctionReporter and using set_package will result in an unintuitive "Resetting cached network information..." logger statement to be printed.

To reproduce:

> myDepReporter <- PackageDependencyReporter$new()
> myDepReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:31] Resetting cached network information...

and

> myFuncReporter <- PackageFunctionReporter$new()
> myFuncReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:48] Resetting cached network information...

The reason this happens is because the reset_cache method of AbstractPackageReporter is checking whether the cache is null. Both PackageFunctionReporter and PackageDependencyReporter start with cache as a list of several variables with null values.

The more complicated and less flexible way to fix this is to have reset_cache check if cache is a list, and then check whether each value in the list is null. This will not work well if we ever end up having cached fields that initialize as a non-null value.

Another implementation that addresses this is to have cache initialized as null with a new reporter instance, and a template default_cache that the cache gets set/reset to. See implementation in older version here and here:

cannot install or otherwise reference test packages during Travis CI checks

We have two packages we created for testing: baseballstats and sartre. They are saved in the inst folder which seemingly are not saved in the temp folder created for unit testing and cannot be referenced otherwise.

Any help for a work around to reference these packages for testing would be appreciated. It is currently holding up PR #27 .

allow import of igraph object

In the spirit of integration with other packages, allow a network to be given to pkgnet in the form of an igraph object.

igraph is a popular network analysis tool & format written in R, python, and c++.

cran-comments is out of date

@bburns632 at your earliest convenience, can you update cran-comments.md with details of why we had to resubmit?

CreatePackageReport creates the report in /Library instead of the current working directory

When I evaluate CreatePackageReport with the default argument for report_path, it saves the output within the /Library/Frameworks/R.framework/Versions/3.4/Resources/library/pkgnet/package_report/ directory on my machine.

I think the issue is that the getwd() call that is supplied to report_path is not being evaluated within the calling environment. This suspicion was confirmed when I passed a string literal to report_path, as the html report was generated in the correct location.

Clearer error for CreatePackageReport with unavailable package

When running CreatePackageReport with a package that isn't available:
t <- CreatePackageReport(pkg_name = "notarealpackage")

It runs through a fair amount before failing on rendering the summary report.

INFO [2018-04-18 18:47:53] Creating package report for package notarealpackage with reporters:

SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Rendering package report...
Quitting from lines NA-10 (/usr/local/lib/R/3.4/site-library/pkgnet/package_report/package_summary_reporter.Rmd) 
 Error in data.table::data.table(Field = names(desc), Values = unlist(desc)) : 
  column or argument 1 is NULL

Return value for CreatePackageReport is pretty gross

Ideally, CreatePackageReport should generate plots AND return the network to you in a usable form.

Right now it returns a bit gross list of plotting params.

Try it:

thing <- CreatePackageReport('ggplot2')
str(thing, max.level = 1)

Add ability to investigate individual function dependencies

I know this may be a lot of effort, but it would be great to be able to build dependency graphs for specific functions within a package. Even better, count the number of function calls for functions from other packages. This would be very useful as a package ages and dependencies change. You could look at each function within your package to see if there are any functions which have only a few calls to very "heavy" packages. This way you could look at alternatives to reduce the dependencies of a package.

we should remove the prefix Package* from reporter names

pkgnet is only intended for the analysis of packages. It's unnecessary to have every class's name start with Package. I think this would be cleaner:

PackageSummaryReporter --> SummaryReporter
PackageDependencyReporter --> DependencyReporter
PackageFunctionReporter --> FunctionReporter

@bburns632 @jayqi thoughts? I can do this tomorrow if you agree

Extract Internal Network for Object Oriented Packages.

Similar to functional based packages: inheritance as edges between objects.

allow import of nodes&edges from csv

In the spirit of integration with other packages, allow a network to be given to pkgnet in the form of a csv of nodes and a csv of edges.

calculate_test_coverage() does not update networkMeasures

Apparently, this code does not actually update network_measures.

> t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~/lubridateReport.html")
> str(t$FunctionReporter$network_measures)
List of 3
 $ centralization.OutDegree  : num 0.105
 $ centralization.betweenness: num 0.000187
 $ centralization.closeness  : num 0.000507

Results of `PackageFunctionReporter` and `PackageDependencyReporter` are switched

Currently, a PackageFunctionReporter returns the dependency network and PackageDependencyReporter returns the functional network. That should be the other way around.

Example:

t <- PackageFunctionReporter$new()
t$set_package('uptasticsearch', packagePath = "~/repos/uptasticsearch")
t$calculate_metrics()
t$get_raw_data()

returns

$cache
$cache$networkMeasures
NULL


$pkgGraph
IGRAPH 31df00b DN-- 5 2 -- 
+ attr: name (v/c)
+ edges from 31df00b (vertex names):
[1] chomp_aggs        ->es_search  unpack_nested_data->chomp_aggs

$nodes
         node coverage
1:          1       NA
2:          6       NA
3: chomp_aggs        1
4:  es_search        0

$edges
       TARGET             SOURCE value
1:  es_search         chomp_aggs     1
2: chomp_aggs unpack_nested_data     1

<...>

Those nodes and edges reflect the package dependencies, not the functions.

Remove reliance on mvbutils

mvbutils::foodweb is a key function used by the current version of pkgnet. That makes me nervous...mvbutils most recent release was in 2015, it has no tests AFAICT, and there is not clear package philosophy. Most of what's in that package is irrelevant to pkgnet and (in the spirit of what we're doing here) carrying it around as a shadow dependency seems like a risk we shouldn't tolerate.

Here's the core code for foodweb: https://github.com/markbravington/mvbutils/blob/97c07457e3af05c52ebc5dd241e7b048190dc7df/R/mvbutils.R#L3895

And where we use it: https://github.com/UptakeOpenSource/pkgnet/blob/8d7781dd3774b4abad4abccb11f2cd59a38e93d0/R/PackageFunctionReporter.R#L83

I propose that we contact the package maintainer for mvbutils and talk about the possibility of ripping foodweb out of his package, putting it into ours as a non-exported function, changing it to match our style and other standards, and adding him as an author on pkgnet. Thoughts?

#16 Package score ( health) implementation idea

#16 Referring on the package score issue -- @bburns632 ,Some of the quick deployment options,We could use in the next release can be-

Using the statistics like number of downloads every week/month or use a rolling average of the number of downloads as a parameter to implement the green/yellow/red associated with each dimension and an overall health.
Using the downloads since recent major update can also be used as a feature to assign a health to the package .Using the statistics of data from different packages ,we could normalise them and give out a score too.
Getting further data about the downloads, and from benchmarking the data for different qualities( red/yellow/green) will be a quick fix.
I want to open up this issue for further discussion and feedback on this rough sketch .. @jameslamb

pkgnet throws an error for some R packages

This R-bloggers article notes an issue with pkgnet throwing an (very unhelpful) error: https://www.r-bloggers.com/comparing-dependencies-of-popular-machine-learning-packages-with-pkgnet/

Some examples of packages throwing the error:
tensorflow, randomFores, gbm

Error text:

Error in data.table::data.table(node = names(igraph::V(self$pkg_graph)), : column or argument 1 is NULL

Proposal: Prevent set_package after a package has already been set

Right now, we have a lot of calculated stuff cached in the Reporters, e.g., nodes, edges, metrics. If a user overwrites the package by calling set_package again, then it invalidates those cached objects and so we have functionality to reset_cache and wipe them out.

This has a downside in that having to worry about cache resetting adds complexity. As an example, it has led to bug #70.

I propose that instead, we prevent users from overwriting the package set for a Reporter (i.e., if private$private_pkg_name exists, throw a fatal error if they try to use set_package).

pkg_path in CreatePackageReport needs to be an absolute path

The documentation in CreatePackageReport does not specify whether the argument pkg_path needs to be an absolute path. However, when I used a relative path, I got the following error Error:path is invalid: <my_relative_path>.

Is this the intended behavior? And, if it is, would you be open to a PR where I clarified the documentation?

Thanks for the package!

add percent coverage as tooltip for Function Network Visualization

In the Function Network visualization, it would be nice to see the percent coverage for a function in the hover over.

It would most likely be an edit/addition to plot_network function.

~~See~~ ~~this vizNetwork example~~ See this page instead

Reinstate tests

After i commented them all out 😬

network plot node coloring is inflexible to discrete factor vs. continuous variable

We need to allow the user the ability to color their network nodes based on some discrete factor of there choosing or a continuous variable such as function coverage proportion. Currently, nodes are colored blue no mater what.

Group orphan nodes together above a threshold value

Create a private variable orphanNodeClusteringThreshold and leverage visNetwork::visClusteringByConnection (or similar visNetwork function) to group orphan nodes together as one cluster node when there are more orphan nodes than the parameter, orphanNodeClusteringThreshold.

Reason being that the graphs get super busy when there are a lot of orphan nodes. See lubridate's graph as an example.

Missing edges in function graphs

I am really excited for this package! It is a promising improvement relative to the static visual displays of call graphs from CodeDepends (ref: duncantl/CodeDepends#18).

For CreatePackageReport(pkg_name = "liteq"), here are some screenshots of the function graph. ack() calls db_ack() and consume() calls db_consume(), but the graph does not show this information.

The R CMD check unit test is irrelevant now

We can remove this test since this is what Travis CI will do anyway:

https://github.com/UptakeOpenSource/pkgnet/blob/master/tests/testthat/test-repo_characteristics.R#L13

output nodes and edges of package dependency and function network

Create a function to output the nodes & edge list to csv. This will allow easier integration with other packages.

appveyor testing is not set up

Looking for someone to figure out how to configure appveyor testing for this repo. I've added the project to the UptakeOpenSource account with appveyor, just need someone to figure out how to create the .appveyor.yml that will run our tests

verify and remove input checks functions in `.UpdateNodes`

Within R/PackageFunctionReporter.R, the function .UpdateNodes has a handful of input check functions. Confirm that the new, object oriented structure makes these checks unnecessary and then remove them.

expand coverage metrics to include number of lines per function

This should be a byproduct of the covr::tally_coverage function in PR #24 . However, I believe these fields are being dropped before populating the nodes table. We should attach them as well

node clustering feature has been dropped accidentally

It seems as though node clustering has been removed as a feature in one of our recent PRs. Whoops.

We need to:

Reenable it
write a unit test to check it moving forward.

It was a minor feature, so not an urgent issue. However, it should be PR'd back in before v0.3

CRAN compliance: Test Packages "baseballstats" and "sartre" should not write to \Library folder

pkgnet has a minor CRAN compliance issue. I believe this is due to both issue #78 and our custom subpages for testing and vignettes being built in the \Library folder. The log files on CRAN seem to support this theory.

Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory (or during installation in the location
pointed to by TMPDIR: and such usage should be cleaned up).

Modifications or alternates to the devtools::install_local functions should be explored in the unit tests and the vignette.

Package documentation should use inheritParams

Some arguments like pkg_name and pkg_path are repeated several times in the documentation. We should centralize them using #' @inheritParams from roxygen2

Package Score

As I understand the original vision according @bburns632 , every package should have a "Score" that's a weighted metric of all the factors we consider for a package. It should have an indicator like green/yellow/red associated with each dimension and an overall health. The purpose of this in the future is not only to list your dependencies, but also your dependencies health as a function of its structure and its dependencies' health too.

I want to open up this issue for discussion to see if this is still on the table, and if so, what are some rough sketches about what it should look like?

Network Plots Not Rendering in HTML Reports

Sometimes the dependency or functional networks do not show in the created HTML report.

Check and fail fast for bad inputs to CreatePackageReport

If a user puts something bad for report_path, like a directory instead of a filename, CreatePackageReport runs all the way through but then fails when it tries to knit the rmd.

Input:

t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~")

Result:

INFO [2018-04-18 18:40:20] Creating package report for package lubridate with reporters:

SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Rendering package report...
[WARNING] This document format requires a nonempty <title> element.
  Please specify either 'title' or 'pagetitle' in the metadata.
  Falling back to 'package_report.utf8'
pandoc: /Users/jqi: openFile: inappropriate type (Is a directory)
Error: pandoc document conversion failed with error 1

Function Network Plots Not 100% Repeatable

If you render the visualization for the function network multiple times in a row, you will see a few different versions.

To observe this, run:

library(pkgnet)
t <- CreatePackageReport("lubridate")
t[["PackageFunctionReporter"]]$plot_network()

... and repeat...

uptake / pkgnet Goto Github PK

pkgnet's People

Contributors

Stargazers

Watchers

Forkers

pkgnet's Issues

Recommend Projects

Recommend Topics

Recommend Org