Giter VIP home page Giter VIP logo

datacomparer's Introduction

dataCompareR

CRAN downloads dev build master build
Build Status Build Status

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

dataCompareR aims to make it easy to compare two tabular data objects in R. It’s specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. In this regard, it aims to offer a more useful output than all.equal when your two datasets do not match, but isn’t intended to replace all.equal if you just want a binary test for equality.

  • rCompare() does the comparison and creates a dataCompareR object containing all the differences between the two inputted datasets. The object can be used with print and summary.
  • generateMismatchData() generates a list of two data frames, each having the missing rows from the comparison.
  • saveReport() creates a summary of the comparison that is saved into a file.

It’s expected that dataCompareR will be used to compare data frames, but it can be used to compare any objects that can be coerced to data frames, such as data tables, tibbles or matrices. dataCompareR cannot compare data that is not tabular in format (nested JSON, irregular lists etc) but does handle tabular data that needs to be matched (or joined) on one or more keys (or ID columns).

Getting started

Requirements

Confirmed as working on R v3.6.3 and v4.0.0 for Windows, as well as v3.6.2, v4.0.0 and the devel release for Linux. Package was built with the following dependencies, but we anticipate it will work with later versions of these packages.

Package Version Source code URL
dplyr 0.5.0 https://github.com/hadley/dplyr
knitr 1.12.3 https://github.com/yihui/knitr
stringi 1.0-1 https://github.com/gagolews/stringi
markdown 0.7.7 https://github.com/rstudio/markdown

Installing the package

You can install from the CRAN via:

install.packages("dataCompareR")

You can also install the latest version directly from GitHub via

library(devtools)
install_git('https://github.com/capitalone/dataCompareR.git', branch = 'master',
            subdir = 'dataCompareR', type = 'source', repos = NULL,
            build_vignettes = TRUE)

Using dataCompareR

Please run vignette('dataCompareR') after installation to see an example of the dataCompareR workflow.

Repo Contents

The code is arranged as an R package, with the following contents:

  • dataCompareR/R
  • dataCompareR/man
  • dataCompareR/tests/testthat
  • dataCompareR/tests/performancetesting
  • dataCompareR/inst/css
  • dataCompareR/vignette

The contents will be covered below.

dataCompareR/R

The main body of R code that provide the dataCompareR functionality.

The R package format mandates that this is a flat folder structure. Initial development had a nested structure, so to try to maintain this as far as possible, the naming convention for files is to preface them with 2 or 3 letter code that identifies the part of the code that file belongs to. The codes and hierarchy is as follows

  • rc - rCompare - the entry point of the function
    • pf - processFlow - handles the flow of an rCompare run
      • vd - validateData - checks the data is suitable before starting an rCompare run
      • pd - prepareData - prepares the input data for comparison
      • cd - compareData - does the comparison
    • rco - rCompare object - routines to handle the rCompare object that is generated by an rCompare run
    • out - output - code to provide various views of the output

The filenames follow the format of the prefix, followed by underscore, followed by a camelcase description of what the code does. The .R files tend to have either 1 function inside them, or a small number of related functions.

dataCompareR/man

Code is commented using ROxygen2 headers, which is used to automatically create the required R man pages by running

devtools::document()

dataCompareR/tests/testthat

Automated tests that are run via

devtools::test()

This consists of both unit tests and some end-to-end tests that MUST pass before any code is merged to dev or main. We've added Travis integration, so this is now mandated. If your development code change breaks an existing test, then it is your responsibility to fix it!

The current unit test coverage can be found in testing.md - please feel free to add more tests, and regenerate this file using covR.

dataCompareR/tests/performancetesting

This folder contains useful repeatable performance tests, but there are not run automatically, and the results they produce can only be interpreted manually.

CRAN Release Version History

https://cran.r-project.org/package=dataCompareR

  • Version 0.1.0 released on 2017-07-17
  • Version 0.1.1 released on 2017-11-14
  • Version 0.1.2 released on 2019-09-07
  • Version 0.1.3 released on 2020-05-01
  • Version 0.1.4 released on 2021-11-23

External Contributors

We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to sign the Contributor License Agreement (CLA).

This project adheres to the Open Source Code of Conduct. By participating, you are expected to honor this code.

Project Roadmap

The project roadmap can be found in ROADMAP.md.

datacomparer's People

Contributors

krishanbhasin avatar mend-bolt-for-github[bot] avatar rjli13 avatar robne1982 avatar sajohnston avatar sclewis23 avatar tmbjmu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datacomparer's Issues

CVE-2018-14040 (Medium) detected in bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

CVE-2018-14040 - Medium Severity Vulnerability

Vulnerable Libraries - bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

bootstrap-3.3.5.min.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.min.js

Dependency Hierarchy:

  • bootstrap-3.3.5.min.js (Vulnerable Library)
bootstrap-3.3.5.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.js

Dependency Hierarchy:

  • bootstrap-3.3.5.js (Vulnerable Library)

Found in HEAD commit: 567a64e178266fdcb9b927190a300696c2430033

Vulnerability Details

In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent attribute.

Publish Date: 2018-07-13

URL: CVE-2018-14040

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: twbs/bootstrap#26630

Release Date: 2018-07-13

Fix Resolution: org.webjars.npm:bootstrap:4.1.2,org.webjars:bootstrap:3.4.0


Step up your Open Source Security Game with WhiteSource here

(Technical Debt) Dependencies on references to internal column/list item names

(Ported from another server)

This is mostly just a small technical debt issue.

To give an example of the dependencies I mean, the createColMatching function has hardcoded references to the names of the columns in the matchColumns output that have the name of the columns and the flag for whether it's in A or B. This means we'd have to change it in multiple places if we changed it in the actual matchColumns location, and this is not ideal, nor obvious. It would be better if we could find a way to reduce those kinds of dependencies somehow (passing references or keeping that info in some type of central location, etc.)

rd warnings on install on some platforms

They're back!

createCompareObject                     html  
    createMeta                              html  
Rd warning: /tmp/RtmpyWaVpH/R.INSTALL63e85f08fea5/dataCompareR/man/createMeta.Rd:22: missing file link ‘round’
    createMismatchObject                    html  
    createMismatches                        html  
    createReportText                        html  
    createRowMatching                       html  
    createTextSummary                       html  
    currentObjVersion                       html  
    executeCoercions                        html  
    generateMismatchData                    html  
    getCoercions                            html  
    getMismatchColNames                     html  
    is.dataCompareRobject                   html  
    isNotNull                               html  
    isSingleNA                              html  
    listObsNotVerbose                       html  
    listObsVerbose                          html  
    locateMismatches                        html  
    makeValidKeys                           html  
    makeValidNames                          html  
    matchColumns                            html  
    matchMultiIndex                         html  
    matchNoIndex                            html  
    matchRows                               html  
    matchSingleIndex                        html  
    metaDataInfo                            html  
    mismatchHighStop                        html  
    orderColumns                            html  
    outputSectionHeader                     html  
    prepareData                             html  
    print.dataCompareRobject                html  
    print.summary.dataCompareRobject        html  
    processFlow                             html  
Rd warning: /tmp/RtmpyWaVpH/R.INSTALL63e85f08fea5/dataCompareR/man/processFlow.Rd:15: missing file link ‘round’
    rCompare                                html  
Rd warning: /tmp/RtmpyWaVpH/R.INSTALL63e85f08fea5/dataCompareR/man/rCompare.Rd:19: missing file link ‘round’
    rcompObjItemLength                      html  
    rounddf                                 html  
    saveReport                              html  
    subsetDataColumns                       html  
    summary.dataCompareRobject              html  
    trimCharVars                            html  
    updateCompareObject                     html  
    updateCompareObject.cleaninginfo        html  
    updateCompareObject.colmatching         html  
    updateCompareObject.matches             html  
    updateCompareObject.meta                html  
    updateCompareObject.mismatches          html  
    updateCompareObject.rowmatching         html  
    validateArguments                       html  
Rd warning: /tmp/RtmpyWaVpH/R.INSTALL63e85f08fea5/dataCompareR/man/validateArguments.Rd:14: missing file link ‘round’
    validateData                            html  
    variableDetails                         html  
    variableMismatches                      html  
    warnLargeData                           html  

print() produces errors with one specific data set

Which I've attached for ease of recreation

ID,Col1,Col2,Col3
1,A,apple,0.8414710
2,B,orange,0.9092974
3,C,apple,0.1411200
4,D,pineapple,-0.7568025
5,E,apple,-0.9589243
6,F,orange,-0.2794155
ID,Col1,Col2,Col4,Col5
1,A,Apple,0.6666666,1
2,b,orange,0.9092974,2
3.0,D,apple,0.14,3
4,D,     pineapple,-0.7568025,4
5,E,apple,0.9589243,5
7,A,pink,4.1213000,6
> rCompare(a, b)
Running rCompare...
3 column(s) were dropped, all rows were compared 
There are  3 mismatched variables:
First and last 5 observations for the  3 mismatched variables
   rowNo    valueA         valueB variable   typeA  typeB diffAB
1      2         B              b     COL1  factor factor       
2      3         C              D     COL1  factor factor       
3      6         F              A     COL1  factor factor       
4      1     apple          Apple     COL2  factor factor       
5      4 pineapple      pineapple     COL2  factor factor       
6      6    orange           pink     COL2  factor factor       
7      1      <NA>           <NA>       ID integer double       
8      2      <NA>           <NA>       ID integer double       
9      3      <NA>           <NA>       ID integer double       
10     4      <NA>           <NA>       ID integer double       
11     5      <NA>           <NA>       ID integer double       
12     6      <NA>           <NA>       ID integer double       
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = 1:6) :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = c(1, 2, 3, 4, 5, 7)) :
  invalid factor level, NA generated

Some missing links in help files

Rd warning: /tmp/Rtmpdyoe8U/file782c5f5cf1b1/dataCompareR/man/validateArguments.Rd:14: missing file link ‘round’

Rd warning: /tmp/Rtmpdyoe8U/file782c5f5cf1b1/dataCompareR/man/processFlow.Rd:15: missing file link ‘round’
    rCompare                                html  
Rd warning: /tmp/Rtmpdyoe8U/file782c5f5cf1b1/dataCompareR/man/rCompare.Rd:19: missing file link ‘round’

Should be fixed before CRAN submission.

No longer on CRAN

The package has been removed from CRAN:

Archived on 2020-04-09 as check problems were not corrected despite reminders.

In addition to putting this back onto CRAN, I will also look into why I received no reminders about the check problems or any communication about this happening.

v0.1.2 CRAN release

CRAN is on summer holiday until next week!

Therefore, to avoid repeating steps again, I will document the pre-upload checks here, and attach the source so we can easily upload once the submission page is back.

I have

  • cloned from master
  • Got the latest version of R and packages (on windows)
  • Ran
    • devtools::test()
    • devtools::document()
    • devtools::check()
    • devtools::build()
  • submitted to https://win-builder.r-project.org/
  • uploaded source here
    dataCompareR_0.1.2.tar.gz
  • installed the package from source, with no warnings
  • checked and we have no reverse depends

Evidence below.

To do list:

  • Get winbuilder results from the OSO mailbox
  • Submit to CRAN

Majority of functions missing while installing package : dataCompareR

I am using R version 3.3.2 .
when i install dataCompareR from CRAN, i get only 3 functions namely :

  1. rCompare
  2. generateMismatchData
  3. saveReport
    I need other functions from the package
    I tried installing using
    library(devtools)
    install_git('https://github.com/capitalone/dataCompareR.git', branch = 'master',
    subdir = 'dataCompareR', type = 'source', repos = NULL)

the package gets installed but when i load it i get this error :
library(dataCompareR)
Error in fetch(key) :
lazy-load database 'C:/.../Documents/R/win-library/3.3/dataCompareR/help/dataCompareR.rdb' is corrupt

Error on empty data.frames

Hey,

thnaks for the package this is very use ful and very handy - we love the summary and the reporting!

What irritates me is the following:

I have two data.frames, e.g.:

library(dataCompareR)

df_1 <- data.frame(a = character(0), b = integer(0))
df_2 <- data.frame(a = character(0), b = integer(0))

rCompare(df_1, df_2)
## Running rCompare...
## Error in checkEmpty(df1)  : ERROR : One or more dataframes are empty

Obviously this is not a bug but intended behaviour (right?) BUT I would argue that

  1. both data.frames are valid
  2. they are equal (same columns, same data). Why impose on the user that data is only valid if its filled?

I would suggest to either redesign the function to make it handle 0 row data.frames just like any other data.frame or allow the user to prevent this error by setting a parameter (e.g.: rCompare(df_1, df_2, do_not_error_on_emty_df = TRUE)).

What do you think?

CVE-2020-11023 (Medium) detected in jquery-3.4.1.min.js

CVE-2020-11023 - Medium Severity Vulnerability

Vulnerable Library - jquery-3.4.1.min.js

JavaScript library for DOM operations

Library home page: https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js

Path to dependency file: dataCompareR/reference/createCleaningInfo.html

Path to vulnerable library: dataCompareR/reference/createCleaningInfo.html

Dependency Hierarchy:

  • jquery-3.4.1.min.js (Vulnerable Library)

Found in HEAD commit: 567a64e178266fdcb9b927190a300696c2430033

Vulnerability Details

In jQuery versions greater than or equal to 1.0.3 and before 3.5.0, passing HTML containing elements from untrusted sources - even after sanitizing it - to one of jQuery's DOM manipulation methods (i.e. .html(), .append(), and others) may execute untrusted code. This problem is patched in jQuery 3.5.0.

Publish Date: 2020-04-29

URL: CVE-2020-11023

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11023

Release Date: 2020-04-29

Fix Resolution: jquery - 3.5.0


Step up your Open Source Security Game with WhiteSource here

Summary fails if tables are pulled out of named lists.

I found a bit of an edge case. Summary fails if the original data frames are passed from a named list. Maybe the table names need to be sanitized before returning the comp object? BTW, I have removed the dplyr deprecated function notices from the reprex output for clarity.

library(tibble)
library(dataCompareR)

table1 <- tribble(~A, ~B, ~C,
                   1,  2,  3,
                   2,  6,  7)

table2 <- tribble(~A, ~D, ~C,
                   1,  2, 19,
                   2,  6,  7)

lis <- list(table1 = table1, table2 = table2)

comp1 <- rCompare(table1, table2, keys = "A")
#> Running rCompare...
#> Coercing input data to data.frame

summary(comp1)
#> dataCompareR is generating the summary...
#> 
#> Data Comparison
#> ===============
#> 
#> Date comparison run: 2020-11-13 13:00:13  
#> Comparison run on R version 4.0.3 (2020-10-10)  
#> With dataCompareR version 0.1.3  
#> 
#> 
#> Meta Summary
#> ============
#> 
#> 
#> |Dataset Name |Number of Rows |Number of Columns |
#> |:------------|:--------------|:-----------------|
#> |table1       |2              |3                 |
#> |table2       |2              |3                 |
#> 
#> 
#> Variable Summary
#> ================
#> 
#> Number of columns in common: 2  
#> Number of columns only in table1: 1  
#> Number of columns only in table2: 1  
#> Number of columns with a type mismatch: 0  
#> Match keys : 1   - A
#> 
#> 
#> Columns only in table1: B  
#> Columns only in table2: D  
#> Columns in both : A, C  
#> 
#> Row Summary
#> ===========
#> 
#> Total number of rows read from table1: 2  
#> Total number of rows read from table2: 2    
#> Number of rows in common: 2  
#> Number of rows dropped from table1: 0  
#> Number of rows dropped from  table2: 0  
#> 
#> 
#> Data Values Comparison Summary
#> ==============================
#> 
#> Number of columns compared with ALL rows equal: 0  
#> Number of columns compared with SOME rows unequal: 1  
#> Number of columns with missing value differences: 0  
#> 
#> 
#> 
#> Summary of columns with some rows unequal: 
#> 
#> 
#> 
#> |Column |Type (in table1) |Type (in table2) | # differences|Max difference | # NAs|
#> |:------|:----------------|:----------------|-------------:|:--------------|-----:|
#> |C      |double           |double           |             1|16             |     0|
#> 
#> 
#> 
#> Unequal column details
#> ======================
#> 
#> 
#> 
#> #### Column -  C
#> 
#> 
#> 
#> |  A| C (table1)| C (table2)|Type (table1) |Type (table2) | Difference|
#> |--:|----------:|----------:|:-------------|:-------------|----------:|
#> |  1|          3|         19|double        |double        |        -16|

comp2 <- rCompare(lis$table1, lis$table2, keys = "A")
#> Running rCompare...
#> Coercing input data to data.frame

summary(comp2)
#> dataCompareR is generating the summary...
#> Warning in matrix(c(object$meta$A$name, object$meta$A$rows,
#> object$meta$A$cols, : data length [10] is not a sub-multiple or multiple of the
#> number of columns [3]
#> Error in names(x) <- value: 'names' attribute [7] must be the same length as the vector [3]

Created on 2020-11-13 by the reprex package (v0.3.0)

Update test coverage check in next release

This step was overlooked for the current release, but isn't essential in terms of checking that things work. It'll be good to have an up-to-date one for the next release and to do it as part of the release process.

Improve unit test hygene

When running the unit tests:

  • the console is spammed with dataframes and other output
  • the environment gains numerous datasets that weren't there before
  • there's a warning encoding is deprecated; all files now assumed to be UTF-8

All of these can be remedied!

Error when unequal number of rows

I'm looking for a way to find mismatches between two data frames that may have an unequal number of rows. Something along the lines you might get from running anti_join(df1, df2) followed by anti_join(df2, df1). I hoped that dataCompareR would do this, but apparently it's not possible.

df2 <- tibble(col1 = c("cat", "dog", "mouse", "fly"))
df1 <- tibble(col1 = c("cat", "dog", "rat"))
dataCompareR::rCompare(df1, df2)
Running rCompare...
Coercing input data to data.frame
Error in (nrow(df_a_subset) + 1):nrow(df_a) : argument of length 0

What do you think about adding this functionality to dataCompareR?
Or maybe I'm missing some other obvious way to do this kind of comparison?

Error when Invalid column name included in the keys

If an invalid column name is included in the keys, the invalid column name fixes in the data frames creates an issue when the checks are done to see if the keys exist.

A user might not be able to tell what the issue is, as in their version of the data frames, the keys are present, and it's not obvious what the changed column names are after a fix.

saveReport shows only 5 sample row

Is it possible to increase the actual different row that the saveReport comes up with? instead of current 5-row preview for each variable? Can a user expand the list when knit the report?

ex:

test <- rCompare(df1, df2 ,keys = 'id' )
saveReport(test, reportName = 'test' , n = 20)

*n – The first n different rows

CVE-2019-8331 (Medium) detected in bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

CVE-2019-8331 - Medium Severity Vulnerability

Vulnerable Libraries - bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

bootstrap-3.3.5.min.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.min.js

Dependency Hierarchy:

  • bootstrap-3.3.5.min.js (Vulnerable Library)
bootstrap-3.3.5.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.js

Dependency Hierarchy:

  • bootstrap-3.3.5.js (Vulnerable Library)

Found in HEAD commit: 567a64e178266fdcb9b927190a300696c2430033

Vulnerability Details

In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the tooltip or popover data-template attribute.

Publish Date: 2019-02-20

URL: CVE-2019-8331

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: twbs/bootstrap#28236

Release Date: 2019-02-20

Fix Resolution: bootstrap - 3.4.1,4.3.1;bootstrap-sass - 3.4.1,4.3.1


Step up your Open Source Security Game with WhiteSource here

Codeowners file

Please create a codeowners file and add trusted reviewers to it and to the project write team if necessary. Thanks!

Help file build generates warnings

Think we still have some misformed tags on help files. Not sure why this is appears to only happen sometimes, but have seen it on two systems. Should be a simple fix

CVE-2020-11022 (Medium) detected in jquery-3.4.1.min.js

CVE-2020-11022 - Medium Severity Vulnerability

Vulnerable Library - jquery-3.4.1.min.js

JavaScript library for DOM operations

Library home page: https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js

Path to dependency file: dataCompareR/reference/createCleaningInfo.html

Path to vulnerable library: dataCompareR/reference/createCleaningInfo.html

Dependency Hierarchy:

  • jquery-3.4.1.min.js (Vulnerable Library)

Found in HEAD commit: 567a64e178266fdcb9b927190a300696c2430033

Vulnerability Details

In jQuery versions greater than or equal to 1.2 and before 3.5.0, passing HTML from untrusted sources - even after sanitizing it - to one of jQuery's DOM manipulation methods (i.e. .html(), .append(), and others) may execute untrusted code. This problem is patched in jQuery 3.5.0.

Publish Date: 2020-04-29

URL: CVE-2020-11022

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/

Release Date: 2020-04-29

Fix Resolution: jQuery - 3.5.0


Step up your Open Source Security Game with WhiteSource here

Unhelpful error message if there are no columns to compare

Not particularly important, but when I was working on #71 I tried to run the following

df2 <- tibble(col1 = c("cat", "dog", "mouse", "fly"))
df1 <- tibble(col1 = c("cat", "dog", "rat"))

rCompare(df1, df2, keys = "col1")

Clearly this isn't smart, as without col1, there's nothing left to compare.

However, the output isn't clear

Running rCompare...
Coercing input data to data.frame
 Error in if (nrow(DFA) == 0) { : argument is of length zero 

It would be better to catch this and output a friendly error message.

Error with large datasets

The line

totalSize <- nrow(coercedData[[1]])*ncol(coercedData[[1]]) + nrow(coercedData[[2]])*ncol(coercedData[[2]])

aims to calculate the total number of elements for comparison. However, if this results in a value outside the range of a 32-bit integer, the code errors with

Running rCompare...
Coercing input data to data.frame
Error in if (totalSize > 2e+07) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In nrow(coercedData[[1]]) * ncol(coercedData[[1]]) + nrow(coercedData[[2]]) *  :
  NAs produced by integer overflow

This code is just there to warn the user about a possible long run time, so it'd be preferable to remove this rather than cause this error, although I imagine there are probably numerous ways to fix it.

Test failures on R 4.0.0 pre-release (win-builder check)

The results of using the win-builder for the upcoming R release (4.0.0) had some tests now failing, though they passed on the current version and on 3.5.3.

It sounds like 4.0.0 will be coming out tomorrow, so these should be addressed before releasing/submitting to the CRAN.

 -- 1. Failure: Coercion wrapper function (@testCoercion.R#92)  -----------------
  executeCoercions(Fac, WSF, T) not equal to `Ret3`.
  Component "DataTypes": Component "numeric": 1 string mismatch
  Component "DataTypes": Component "character": 2 string mismatches
  
  -- 2. Failure: Coercion wrapper function (@testCoercion.R#93)  -----------------
  executeCoercions(Fac, WSF, F) not equal to `Ret4`.
  Component "DataTypes": Component "numeric": 1 string mismatch
  Component "DataTypes": Component "character": 2 string mismatches
  
  -- 3. Failure: Coercion wrapper function (@testCoercion.R#94)  -----------------
  executeCoercions(WSF, Fac, F) not equal to `Ret5`.
  Component "DataTypes": Component "numeric": 1 string mismatch
  Component "DataTypes": Component "character": 2 string mismatches
  
  -- 4. Failure: Coercion wrapper function (@testCoercion.R#95)  -----------------
  executeCoercions(WS, WSF, T) not equal to `Ret6`.
  Component "DataTypes": Component "character": 1 string mismatch
  
  -- 5. Failure: ComparisonOfEquals (@testEndToEndFourKeys.R#63)  ----------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 6. Failure: ComparisonOfUnEquals (@testEndToEndFourKeys.R#107)  -------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 7. Failure: ComparisonOfMissRows (@testEndToEndFourKeys.R#147)  -------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 8. Failure: ComparisonOfMissCols (@testEndToEndFourKeys.R#188)  -------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 9. Failure: ComparisonOfEquals (@testEndToEndTwoKeys.R#59)  -----------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 10. Failure: ComparisonOfUnEquals (@testEndToEndTwoKeys.R#99)  --------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 11. Failure: ComparisonOfMissRows (@testEndToEndTwoKeys.R#135)  -------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  -- 12. Failure: ComparisonOfMissCols (@testEndToEndTwoKeys.R#172)  -------------
  length(ABcomparison$cleaninginfo$COLOR) not equal to 4.
  1/1 mismatches
  [1] 0 - 4 == -4
  
  == testthat results  ===========================================================
  [ OK: 999 | SKIPPED: 3 | WARNINGS: 0 | FAILED: 12 ]
  1. Failure: Coercion wrapper function (@testCoercion.R#92) 
  2. Failure: Coercion wrapper function (@testCoercion.R#93) 
  3. Failure: Coercion wrapper function (@testCoercion.R#94) 
  4. Failure: Coercion wrapper function (@testCoercion.R#95) 
  5. Failure: ComparisonOfEquals (@testEndToEndFourKeys.R#63) 
  6. Failure: ComparisonOfUnEquals (@testEndToEndFourKeys.R#107) 
  7. Failure: ComparisonOfMissRows (@testEndToEndFourKeys.R#147) 
  8. Failure: ComparisonOfMissCols (@testEndToEndFourKeys.R#188) 
  9. Failure: ComparisonOfEquals (@testEndToEndTwoKeys.R#59) 
  1. ...

Installation Instructions

Tested the Installation Instructions on Linux. Package successfully installed in Ubuntu 16.04.2 with R 3.4.0 and the latest CRAN versions of dependent packages.

One minor issue are the installation instructions. Currently they are

library(devtools)
install_git('https://github.com/capitalone/dataCompareR.git', branch = 'master',
            subdir = 'dataCompareR', type = 'source', repos = NULL)

Unfortunately because of the defaults of install {devtools} are build_vignettes = False the option to build a vignette should be passed during installation. Hence they should be

library(devtools)
install_git('https://github.com/capitalone/dataCompareR.git', branch = 'master',
            subdir = 'dataCompareR', type = 'source', repos = NULL,
            build_vignettes = TRUE)

Otherwise vignette('dataCompareR') fails after installation.

dplyr warnings about deprecated code

These warnings don't show up every time ('once per session'). Running the tests with R 4.0.0 and dplyr 0.8.5 produced the following below, but not all the tests were run yet, so I will add more as they come up.

testCheckPrintObject.R:38: warning: print only generates message when data sets match
select_() is deprecated. 
Please use select() instead

The 'programming' vignette or the tidyeval book can help you
to program with select() : https://tidyeval.tidyverse.org
This warning is displayed once per session.

testCheckPrintObject.R:62: warning: print returns message and data when mismatches occur
arrange_() is deprecated. 
Please use arrange() instead

The 'programming' vignette or the tidyeval book can help you
to program with arrange() : https://tidyeval.tidyverse.org
This warning is displayed once per session.

testCheckPrintObject.R:62: warning: print returns message and data when mismatches occur
funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once per session.

Make non-unique keys message more informative

If I run

rCompare(ddf, ddf2, keys = c("a", "b", "c", "d","e"))

but there are duplicates in c("a", "b", "c", "d","e") I get an error message like:

Running rCompare...
Error in matchSingleIndex(df_a, df_b, "dataCompareR_merged_indices", indices) : 
  The indices are not unique in the submitted dataframes. Please resubmit with unique indices.

I expect my indices are unique, so now I have to write some code to figure out why they're not!

I'm guessing the code knows more about what element is not unique, so it'd be nice if it told me!

Round has unexpected behaviour

A user is reporting odd behaviour when using the roundDigits functionality. Specifically, the # mismatches is increasing as roundDigits increases. See below for an example.

  8 digits 7 digits difference  
Column_1 2855164 286209 2568955 Expected
Column_2 1338229 1541336 -203107 Not Expected
Column_3 1716294 1302222 414072 Expected
Column_4 1127592 1730836 -603244 Not Expected

Use identical() for any equality checks

(Issue ported from another server - other people were involved in the initial conversation!)

In hindsight perhaps it would have been a good idea to read through the R documentation first...

The way object comparison is implemented in R is a bit odd, leading to "special" moments such as this:

> "1" == 1
[1] TRUE

This behaviour makes a bit more sense having looked through the relevant documentation:

In particular the following quote from the comparison page:

Do not use == and != for tests, such as in if expressions, where you must get a single TRUE or FALSE. Unless you are absolutely sure that nothing unusual can happen, you should use the identical function instead.

Consider also the following description of the identical() function:

The safe and reliable way to test two objects for being exactly equal. It returns TRUE in this case, FALSE in every other case.

... then again, the project is called rcompare, not ridentical ;)

NULL at end of print sometimes

All columns were compared, all rows were compared 
All compared variables match 
 Number of rows compared: 243 
 Number of columns compared: 102NULL

Not sure how widespread this is. No key used in the example. Need to create a reproducible version of this error.

Inconsistent capitalization

Running

rm(list=ls())

library(dataCompareR)


#dataCompare will match data frames (or any objects that can be coerced to data frames) - this is part of the package (uses as.data.frame)

#Lets use iris in the first example
head(iris)

#Create a new data frame to use in dataCompareR

#Make a copy of iris
iris2 <- iris

#Change it by first subsetting just to the first 140 rows:
iris2 <- iris[1:140,]

#then remove Petal.Width column
iris2$Petal.Width <- NULL

#and then change some values
iris2[1:10,1] <- iris2[1:10,1] + 1

#Comparison without a key:
#Rows are matched based on order: if the dataframes have different number of rows then rows will be dropped from the larger data frame
#This will be recorded in the output

#Run the comparison
compIris <- rCompare(iris,iris2)


summary(compIris)

Results in

Columns only in iris: Petal.Width  
Columns in both : PETAL.LENGTH, SEPAL.LENGTH, SEPAL.WIDTH, SPECIES 

We're kinda stuck with capitals as we have no easy way of recovering the original case, but the first row should be capitals too!

CI failing

The CI is failing to start with:

sh -e /etc/init.d/xvfb start

sh: 0: Can't open /etc/init.d/xvfb

The command "sh -e /etc/init.d/xvfb start" failed and exited with 127 during .

I set up xvfb start when we first set the CI up years ago to get the tests running in "headless" mode based on how Travis was configured to run R back then.

I'm guessing things have moved on since then, I'll take a look at how to get them running properly again later.

CVE-2018-20677 (Medium) detected in bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

CVE-2018-20677 - Medium Severity Vulnerability

Vulnerable Libraries - bootstrap-3.3.5.min.js, bootstrap-3.3.5.js

bootstrap-3.3.5.min.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.min.js

Dependency Hierarchy:

  • bootstrap-3.3.5.min.js (Vulnerable Library)
bootstrap-3.3.5.js

The most popular front-end framework for developing responsive, mobile first projects on the web.

Library home page: https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.js

Path to vulnerable library: /packrat/lib/x86_64-pc-linux-gnu/3.4.4/rmarkdown/rmd/h/bootstrap/js/bootstrap.js

Dependency Hierarchy:

  • bootstrap-3.3.5.js (Vulnerable Library)

Found in HEAD commit: 567a64e178266fdcb9b927190a300696c2430033

Vulnerability Details

In Bootstrap before 3.4.0, XSS is possible in the affix configuration target property.

Publish Date: 2019-01-09

URL: CVE-2018-20677

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-20677

Release Date: 2019-01-09

Fix Resolution: Bootstrap - v3.4.0;NorDroN.AngularTemplate - 0.1.6;Dynamic.NET.Express.ProjectTemplates - 0.8.0;dotnetng.template - 1.0.0.4;ZNxtApp.Core.Module.Theme - 1.0.9-Beta;JMeter - 5.0.0


Step up your Open Source Security Game with WhiteSource here

Package generates warnings due to new version of dplyr

## `mutate_each()` is deprecated.
## Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
## To map `funs` over all variables, use `mutate_all()`
## `mutate_each()` is deprecated.
## Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
## To map `funs` over all variables, use `mutate_all()`
## `mutate_each()` is deprecated.
## Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
## To map `funs` over all variables, use `mutate_all()`
## `mutate_each()` is deprecated.
## Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
## To map `funs` over all variables, use `mutate_all()`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.