uptake / pkgnet Goto Github PK
View Code? Open in Web Editor NEWR package for analyzing other R packages via graph representations of their dependencies
Home Page: https://uptake.github.io/pkgnet/
License: Other
R package for analyzing other R packages via graph representations of their dependencies
Home Page: https://uptake.github.io/pkgnet/
License: Other
Oddly, with PR #42, devtools::test(".")
returns a success, no issues found with the unit tests.
> devtools::test(".")
Loading pkgnet
Testing pkgnet
Abstract Graph Reporter Tests: ....
Abstract Package Reporter Tests: ....
CreatePackageReport: ..
Creation of graph of package functions: ...........
Package Dependency Reporter Tests: ........................
Package Function Reporter Tests: ...................NULL
.......
DONE ==================================================================================================
>
However, covr::package_coverage(".")
returns the following issue:
> testthat::test_check('pkgnet')
NULL
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127)
testObj$calculate_metrics() produced warnings.
testthat results ================================================================
OK: 78 SKIPPED: 0 FAILED: 1
1. Failure: PackageFunctionReporter Methods Work (@test-PackageFunctionReporter.R#127)
Error: testthat unit tests failed
Execution halted
At the moment, this just means we cannot evaluate the coverage of package net. We can still instate the unit tests and associated code.
It has proven a difficult bug to remedy. Help is appreciated.
The file _pkgdown.yml
is way out of date with the current state of the package. Fixing this issue involves:
_pkgdown.yml
to reflect the actual functions in the packagepkgdown::build_sit()
to generate the docsA graph is plotted twice from the same package reporter object, and the second plot is flipped compared to the first plot. The color field is changed between plots, one discrete and the other continuous, but I'm not sure if that's the cause. The edges are not reversed.
Both graphs should have the same node arrangement, just different colors.
Example:
b <- PackageFunctionReporter$new()
b$set_package('baseballstats'
, packagePath = system.file('baseballstats',package="pkgnet")
)
b$calculate_metrics()
b$get_raw_data()
b$set_plot_node_color_scheme(field = "coverageRatio"
, pallete = c("red", "green")
)
# First Plot
expect_output(object = plotObj <- b$plot_network()
, regexp = "Done creating plot"
, info = "Plot with continuous coloring has issues")
b$set_plot_node_color_scheme(field = "filename"
, pallete = RColorBrewer::brewer.pal(9,"Set1")
)
# Second Plot
expect_output(object = plotObj <- b$plot_network()
, regexp = "Done creating plot"
, info = "Plot with continuous coloring has issues")
I think the original intent of each PackageReporter was that it was able to knit a child markdown page, and give a high level summary stats, that could then be bundled into a single report for your package. Each page would have some intro about it, and a rank ordered list of "what do i do next if i want to fix this", as well as maybe some cool plots. I think this ordering should be pluggable but we can have sensible defaults.
For our current reporters this is what i envision:
Is this inline with everyone's thinking?
Our philosophy for this package is that people using pkgnet
should be using it to make thoughtful choices about how they manage their code. We need some docs with one or two examples where we say "here's how you find this thing with pkgnet, here's what to do if it says this"
I think it would be appropriate to put in the project README (which will also get hosted on our pkgdown
site)
This should be a byproduct of the covr::tally_coverage
function in PR #24 . However, I believe these fields are being dropped before populating the nodes table. We should attach them as well
Plots of the functional network for large packages are currently very hard to read. #45 might help a bit but this will probably still need some lovin'.
This is a high level reporter that implements AbstractPackageReporter. It will provide high level information about the package in question re: authors, number of lines, license etc.
I propose we adopt camelCase for R6 class names and snake_case for everything else.
I definitely feel strongly that method names in the R6 classes should be snake_case because you need to use the $
accessor
object$doStuff()
is grosser, IMHO, than:
object$do_stuff()
Does it make sense to have a style guide? Otherwise it's confusing what we should be doing in terms of:
Maybe merge with the CONDUCT.md into a CONTRIBUTING.md?
C is dependent upon both A and C.
whoops.
needs to be:
C is dependent upon both A and B.
Need to fix a bunch of miscellaneous documentation errors and other small things.
To see what needs to be fixed, you can run the following (after cloning this repo):
R CMD BUILD pkgnet
R CMD check pkgnet_0.1.0.tar.gz --no-tests
Basically work through all the errors in the below command to get ready for cran release
R CMD CHECK --as-cran .
Add a codecov
token ticker to the readme to display package coverage.
Here's an example: rendered , raw code - line 2
Issue: Initializing new PackageDependencyReporter
and PackageFunctionReporter
and using set_package
will result in an unintuitive "Resetting cached network information..." logger statement to be printed.
To reproduce:
> myDepReporter <- PackageDependencyReporter$new()
> myDepReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:31] Resetting cached network information...
and
> myFuncReporter <- PackageFunctionReporter$new()
> myFuncReporter$set_package('baseballstats')
INFO [2018-03-25 20:22:48] Resetting cached network information...
The reason this happens is because the reset_cache
method of AbstractPackageReporter
is checking whether the cache
is null. Both PackageFunctionReporter
and PackageDependencyReporter
start with cache
as a list of several variables with null values.
The more complicated and less flexible way to fix this is to have reset_cache
check if cache
is a list, and then check whether each value in the list is null. This will not work well if we ever end up having cached fields that initialize as a non-null value.
Another implementation that addresses this is to have cache
initialized as null with a new reporter instance, and a template default_cache
that the cache
gets set/reset to. See implementation in older version here and here:
We have two packages we created for testing: baseballstats
and sartre
. They are saved in the inst
folder which seemingly are not saved in the temp folder created for unit testing and cannot be referenced otherwise.
Any help for a work around to reference these packages for testing would be appreciated. It is currently holding up PR #27 .
In the spirit of integration with other packages, allow a network to be given to pkgnet
in the form of an igraph
object.
igraph is a popular network analysis tool & format written in R, python, and c++.
@bburns632 at your earliest convenience, can you update cran-comments.md
with details of why we had to resubmit?
When I evaluate CreatePackageReport
with the default argument for report_path
, it saves the output within the /Library/Frameworks/R.framework/Versions/3.4/Resources/library/pkgnet/package_report/
directory on my machine.
I think the issue is that the getwd()
call that is supplied to report_path
is not being evaluated within the calling environment. This suspicion was confirmed when I passed a string literal to report_path
, as the html report was generated in the correct location.
When running CreatePackageReport
with a package that isn't available:
t <- CreatePackageReport(pkg_name = "notarealpackage")
It runs through a fair amount before failing on rendering the summary report.
INFO [2018-04-18 18:47:53] Creating package report for package notarealpackage with reporters:
SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Resetting cached network information...
INFO [2018-04-18 18:47:53] Rendering package report...
Quitting from lines NA-10 (/usr/local/lib/R/3.4/site-library/pkgnet/package_report/package_summary_reporter.Rmd)
Error in data.table::data.table(Field = names(desc), Values = unlist(desc)) :
column or argument 1 is NULL
Ideally, CreatePackageReport
should generate plots AND return the network to you in a usable form.
Right now it returns a bit gross list of plotting params.
Try it:
thing <- CreatePackageReport('ggplot2')
str(thing, max.level = 1)
I know this may be a lot of effort, but it would be great to be able to build dependency graphs for specific functions within a package. Even better, count the number of function calls for functions from other packages. This would be very useful as a package ages and dependencies change. You could look at each function within your package to see if there are any functions which have only a few calls to very "heavy" packages. This way you could look at alternatives to reduce the dependencies of a package.
pkgnet
is only intended for the analysis of packages. It's unnecessary to have every class's name start with Package
. I think this would be cleaner:
PackageSummaryReporter
--> SummaryReporter
PackageDependencyReporter
--> DependencyReporter
PackageFunctionReporter
--> FunctionReporter
@bburns632 @jayqi thoughts? I can do this tomorrow if you agree
Similar to functional based packages: inheritance as edges between objects.
In the spirit of integration with other packages, allow a network to be given to pkgnet
in the form of a csv of nodes and a csv of edges.
Apparently, this code does not actually update network_measures.
> t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~/lubridateReport.html")
> str(t$FunctionReporter$network_measures)
List of 3
$ centralization.OutDegree : num 0.105
$ centralization.betweenness: num 0.000187
$ centralization.closeness : num 0.000507
Currently, a PackageFunctionReporter
returns the dependency network and PackageDependencyReporter
returns the functional network. That should be the other way around.
Example:
t <- PackageFunctionReporter$new()
t$set_package('uptasticsearch', packagePath = "~/repos/uptasticsearch")
t$calculate_metrics()
t$get_raw_data()
returns
$cache
$cache$networkMeasures
NULL
$pkgGraph
IGRAPH 31df00b DN-- 5 2 --
+ attr: name (v/c)
+ edges from 31df00b (vertex names):
[1] chomp_aggs ->es_search unpack_nested_data->chomp_aggs
$nodes
node coverage
1: 1 NA
2: 6 NA
3: chomp_aggs 1
4: es_search 0
$edges
TARGET SOURCE value
1: es_search chomp_aggs 1
2: chomp_aggs unpack_nested_data 1
<...>
Those nodes and edges reflect the package dependencies, not the functions.
mvbutils::foodweb
is a key function used by the current version of pkgnet
. That makes me nervous...mvbutils
most recent release was in 2015, it has no tests AFAICT, and there is not clear package philosophy. Most of what's in that package is irrelevant to pkgnet
and (in the spirit of what we're doing here) carrying it around as a shadow dependency seems like a risk we shouldn't tolerate.
Here's the core code for foodweb
: https://github.com/markbravington/mvbutils/blob/97c07457e3af05c52ebc5dd241e7b048190dc7df/R/mvbutils.R#L3895
And where we use it: https://github.com/UptakeOpenSource/pkgnet/blob/8d7781dd3774b4abad4abccb11f2cd59a38e93d0/R/PackageFunctionReporter.R#L83
I propose that we contact the package maintainer for mvbutils
and talk about the possibility of ripping foodweb
out of his package, putting it into ours as a non-exported function, changing it to match our style and other standards, and adding him as an author on pkgnet
. Thoughts?
#16 Referring on the package score issue -- @bburns632 ,Some of the quick deployment options,We could use in the next release can be-
This R-bloggers article notes an issue with pkgnet throwing an (very unhelpful) error: https://www.r-bloggers.com/comparing-dependencies-of-popular-machine-learning-packages-with-pkgnet/
Some examples of packages throwing the error:
tensorflow, randomFores, gbm
Error text:
Error in data.table::data.table(node = names(igraph::V(self$pkg_graph)), : column or argument 1 is NULL
Right now, we have a lot of calculated stuff cached in the Reporters, e.g., nodes, edges, metrics. If a user overwrites the package by calling set_package
again, then it invalidates those cached objects and so we have functionality to reset_cache
and wipe them out.
This has a downside in that having to worry about cache resetting adds complexity. As an example, it has led to bug #70.
I propose that instead, we prevent users from overwriting the package set for a Reporter (i.e., if private$private_pkg_name
exists, throw a fatal error if they try to use set_package
).
The documentation in CreatePackageReport
does not specify whether the argument pkg_path
needs to be an absolute path. However, when I used a relative path, I got the following error Error:
path is invalid: <my_relative_path>
.
Is this the intended behavior? And, if it is, would you be open to a PR where I clarified the documentation?
Thanks for the package!
In the Function Network visualization, it would be nice to see the percent coverage for a function in the hover over.
It would most likely be an edit/addition to plot_network function.
After i commented them all out 😬
We need to allow the user the ability to color their network nodes based on some discrete factor of there choosing or a continuous variable such as function coverage proportion. Currently, nodes are colored blue no mater what.
Create a private variable orphanNodeClusteringThreshold
and leverage visNetwork::visClusteringByConnection
(or similar visNetwork function) to group orphan nodes together as one cluster node when there are more orphan nodes than the parameter, orphanNodeClusteringThreshold
.
Reason being that the graphs get super busy when there are a lot of orphan nodes. See lubridate
's graph as an example.
I am really excited for this package! It is a promising improvement relative to the static visual displays of call graphs from CodeDepends
(ref: duncantl/CodeDepends#18).
For CreatePackageReport(pkg_name = "liteq")
, here are some screenshots of the function graph. ack()
calls db_ack()
and consume()
calls db_consume()
, but the graph does not show this information.
We can remove this test since this is what Travis CI will do anyway:
Create a function to output the nodes & edge list to csv. This will allow easier integration with other packages.
Looking for someone to figure out how to configure appveyor testing for this repo. I've added the project to the UptakeOpenSource
account with appveyor, just need someone to figure out how to create the .appveyor.yml
that will run our tests
Within R/PackageFunctionReporter.R
, the function .UpdateNodes
has a handful of input check functions. Confirm that the new, object oriented structure makes these checks unnecessary and then remove them.
This should be a byproduct of the covr::tally_coverage
function in PR #24 . However, I believe these fields are being dropped before populating the nodes table. We should attach them as well
It seems as though node clustering has been removed as a feature in one of our recent PRs. Whoops.
We need to:
It was a minor feature, so not an urgent issue. However, it should be PR'd back in before v0.3
pkgnet
has a minor CRAN compliance issue. I believe this is due to both issue #78 and our custom subpages for testing and vignettes being built in the \Library folder. The log files on CRAN seem to support this theory.
Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory (or during installation in the location
pointed to by TMPDIR: and such usage should be cleaned up).
Modifications or alternates to the devtools::install_local
functions should be explored in the unit tests and the vignette.
Some arguments like pkg_name
and pkg_path
are repeated several times in the documentation. We should centralize them using #' @inheritParams
from roxygen2
As I understand the original vision according @bburns632 , every package should have a "Score" that's a weighted metric of all the factors we consider for a package. It should have an indicator like green/yellow/red associated with each dimension and an overall health. The purpose of this in the future is not only to list your dependencies, but also your dependencies health as a function of its structure and its dependencies' health too.
I want to open up this issue for discussion to see if this is still on the table, and if so, what are some rough sketches about what it should look like?
Sometimes the dependency or functional networks do not show in the created HTML report.
If a user puts something bad for report_path
, like a directory instead of a filename, CreatePackageReport
runs all the way through but then fails when it tries to knit the rmd.
Input:
t <- CreatePackageReport(pkg_name = "lubridate", pkg_path = "~/repos/lubridate", report_path = "~")
Result:
INFO [2018-04-18 18:40:20] Creating package report for package lubridate with reporters:
SummaryReporter
DependencyReporter
FunctionReporter
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Resetting cached network information...
INFO [2018-04-18 18:40:20] Rendering package report...
[WARNING] This document format requires a nonempty <title> element.
Please specify either 'title' or 'pagetitle' in the metadata.
Falling back to 'package_report.utf8'
pandoc: /Users/jqi: openFile: inappropriate type (Is a directory)
Error: pandoc document conversion failed with error 1
If you render the visualization for the function network multiple times in a row, you will see a few different versions.
To observe this, run:
library(pkgnet)
t <- CreatePackageReport("lubridate")
t[["PackageFunctionReporter"]]$plot_network()
... and repeat...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.