Giter VIP home page Giter VIP logo

eeptools's People

Contributors

jknowles avatar jsonbecker avatar sgibb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eeptools's Issues

Small bug with age_calc.R

Dear jknowles,

I guess there is a very small bug with age_calc.R on line 24: https://github.com/jknowles/eeptools/blob/master/R/age_calc.R#L24

This condition can never been true as you test before if the month is Feb. Shouldn't you test first if it's Feb and leap year before to test if it's Feb?

age_calc(as.Date('2004-01-15'),as.Date('2004-02-16')) is producing 1.035714. .035714 is equal to 1/28, not to 1/29.

Best regards

Error in latest unstable

See here:
https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/eeptools-00check.html

 Running ‘test-all.R’ [13s/12s]
Running the tests in ‘tests/test-all.R’ failed.
Last 13 lines of output:
  Loading required package: ggplot2
  > 
  > test_check("eeptools")
  Loading required package: MASS
  1. Error: Function works for multiple types of inputs (@test-utils.R#140) ------
  invalid format '%04d'; use format %f, %e, %g or %a for numeric objects
  1: leading_zero(a, digits = 3) at testthat/test-utils.R:140
  2: sprintf(formatter, x)
  
  testthat results ================================================================
  OK: 526 SKIPPED: 0 FAILED: 1
  1. Error: Function works for multiple types of inputs (@test-utils.R#140) 
  
  Error: testthat unit tests failed
  Execution halted

Problem with the `mapmerge` function

mapmerge returns an error because of how it renames rownames depending on how rownames are listed in the shapefile or in the dataset. This is due to the way spCbind relies on rownames.

Here is a proposed alternative for when rownames are not sequential in the source data:


mapmerge2 <- function (mapobj, data, xid, yid) 
{
  x <- match(xid, colnames(mapobj@data))
  y <- match(yid, colnames(data))
  o <- match(mapobj@data[, x], data[, y])
  d <- data[o, ]
  row.names(d) <- row.names(mapobj@data)
  d <- maptools::spCbind(mapobj, d)
  return(d)
}

Default method in statamode is wrong

Submitted via e-mail from Dennis Gromer:

Bug #1: Statamode does not default to method=Stata like the documentation claims it should

This is easily replicable:

> x <- c(1,1,3,3)
> statamode(x)
[1] "3"
> statamode(x, method='stata')
[1] "."

By looking at the source code, I can see that you have the line method <- match.arg(method) before you enter your if loop to check if the method is missing.
This seems to default to setting a missing method to the first value in the argument list (in this case, "last") before checking to see if it is missing at all.
By removing the first method <- match.arg(method), the code seems to gain its intended functionality.

Speed in some functions

max_mis is super slow when operating within data.table or dplyr functions. Need to investigate this, and possibly link up to anyNA function R 3.1.

Error checking within functions

Add error checking to all functions so they give useful warning and error messages instead of the current state where they rely on errors / warnings from underlying functions.

Documentation

Fix text wrapping in all documentation examples
Add plot images to document datasets

Update to R RNG causes errors

See the errors here:

https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/eeptools-00check.html

From Kurt @ CRAN:

The previous method can be requested using RNGkind() or
RNGversion() if necessary for reproduction of old results.

To make your package successfully pass the checks for current R-devel
and R-release you may find it most convenient to insert

suppressWarnings(RNGversion("3.5.0"))

before calling set.seed() in your example, vignette and test code (where
the difference in RNG sample kinds matters, of course).

age_calc now fails with vector inputs.

The new age_calc function is dependent on seq for some if its work which does not handle vector inputs.

The current function works fine for a single start or end date but cannot use vectors as inputs.

If age_calc is to be used with vectors, you can get the same result as in the past utilizing:

sapply(dob, age_calc, enddate)

statamode method "last" does not function correctly

The desired behavior for statamode(x, method = 'last') is to return the last value in order in the vector.

a <- c(7, 7, 3, 4, 1, 3, 4) 
statamode(a, method="last")

Should results in 4 but instead results in 7 because of the sort within the table function. Need to organize and match against the original vector in order to ensure the "last" object is truly the one returned.

Also need a test checking this works for character vectors as well.

Data documentation

Implement graphics in the documentation of the datasets.

Also include datasets with missing data to better demonstrate R's missing data handling to others and to test functions on.

Top and Bottom N functions

Report the largest and smallest N values for a vector and also their frequency.

Make a method for factors that reports the most common and least common values.

Apply method to character vectors as well.

Create a proper uniqueness function

Take arbitrary number of identifiers (1 - X). Concatenate them, and identify their uniqueness. Provide methods to report uniqueness without selecting distinct and provide a method to drop non-distinct values.

Implement and test for efficiency/speed.

`statamode` is insanely slow

When doing statamode within a data.table over many groups, the function is pretty slow. This needs to be improved dramatically. Consider this example:

library(data.table)
mdf <- data.frame(id=rep(sample(1:100000,500000, replace=TRUE),10), 
                  group = sample(letters, 5000000, replace=TRUE))

length(unique(mdf$id))

modes <- as.data.table(mdf)[, list(mode=statamode(group, method="stata")), 
                            by="id"]

This takes way too long and I think it is because of inefficiency in the statamode code.

Failing to install eeptools

Hi, I am new to GitHub and excited to learn more about cleaning and analyzing data in R. I am enjoying the tutorials you posted, but am unable to navigate further because I cannot download eeptools.

Here is the syntax I used: devtools::install_github("jknowles/eeptools")

I would love to use this package and continue learning from you! I appreciate your support here and am pasting the error message below. Have a great day. -Preston

"xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
Error: Failed to install 'eeptools' from GitHub:
Command failed (1)"

Develop tests for age_calc

I need to develop tests for age_calc to ensure that future changes do not result in regressions and the algorithm is accurate.

This will need to test all options

  • precise=TRUE, units='days'
  • precise=FALSE, units='days'
  • precise=TRUE, units='months'
  • precise=FALSE, units='months'
  • precise=TRUE, units='years'
  • precise=FALSE, units='years'

There will need to be test data for:

  • an atomic dob and vector enddate
  • an atomic enddate and vector dob
  • vector dob and vector enddate
  • both dob and/or enddate in leap years
  • dob$yday > enddate$yday
  • dob$yday < enddate$yday

dependency on retiring spatial infrastructure packages

You will be aware, for example from:
https://r-spatial.org/r/2022/04/12/evolution.html,
https://r-spatial.org/r/2022/12/14/evolution2.html,
https://r-spatial.org/r/2023/04/10/evolution3.html and
https://rsbivand.github.io/csds_jan23/bivand_csds_ssg_230117.pdf and
perhaps view https://www.youtube.com/watch?v=TlpjIqTPMCA&list=PLzREt6r1NenmWEidssmLm-VO_YmAh4pq9&index=1
that rgdal, rgeos and maptools will be retired this
year, in October 2023.

eeptools uses maptools::spCbind, which could be replaced by sp::cbind. The use is however noted as deprecated in your code, so simply removing mapmerge etc. might be simplest.

Try running CMD check under a scenario using sp evolution status 2 (substitute use of rgdal with sf for projection/transformation/CRS) and absence of retiring packages from the library path.

Unit tests

Make unit tests for each and every function using the testthat package. This will make collaboration easier and enhancements easier to implement before the codebase gets too big.

Months calculation

Hello
first thanks for the code,
but as I was running my calculations, I runned into a problem with the month units

**
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
'to' must be finite
**

and this one,

**
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
wrong sign in 'by' argument
**

for the year calculation everything goes smoothly, i checked the format of my data and I couldn't spot on the problem.

Many thanks

Error in latest ggplot2

  • checking whether package ‘eeptools’ can be installed ... WARNING
    Found the following significant warnings:
      Warning: replacing previous import ‘ggplot2::syms’ by
    

‘memisc::syms’ when loading ‘eeptools’
See ‘/Users/max/github/ggplot2/revdep/checks.noindex/eeptools/new/eeptools.Rcheck/00install.out’
for details.

Issues with using statamode within data.table

Using statamode within data.table will sometimes throw an error because data.table requires NAs to be correctly typed, which the plyr functions are not concerned with.

This is a suggested fix:

statamode <- function(x, method = c("last", "stata", "sample")){
  x <- as.character(x)
  if (method == 'stata'){
    z <- table(as.vector(x))
    m <- names(z)[z == max(z)]

  if (length(m) == 1){
    return(m)
    }

    return(".")
  }
  else if (method == 'sample'){
    z <- table(as.vector(x))
    m<-names(z)[z == max(z)]
  if (length(m)==1){
      return(m)
    }
  else if (length(m)>1){
      return(sample(m,1))
    }
  else if (length(m)<1){
      return(NA_character_)
    }
  }
  else if (method=='last'){
    z <- table(as.vector(x))
    m <- names(z)[z == max(z)]
    if (length(m) == 1){
      return(m)
    }
    else if (length(m) > 1){
      return(tail(m,1))
    }
    else if (length(m) < 1){
      return(NA_character_)
    }
  }
}

R 3.1

Test installation binaries on R 3.1.

age_calc produces the wrong years

From Joel Schwartz via e-mail:

age_calc sometimes adds an extra year to the calculated age if you feed it dates with the same day of the year in two different years. This only happens with units="years". When units="months" the age is correct. For example:

age_calc(as.Date("2012-10-23"), as.Date("2014-10-23"), units="years")
[1] 2.999476

(Should be 2)

age_calc(as.Date("2012-10-23"), as.Date("2014-10-23"), units="months")
[1] 24
age_calc(as.Date("2000-10-23"), as.Date("2014-10-23"), units="years")
[1] 14.99726

(Should be 14)

age_calc(as.Date("2000-10-23"), as.Date("2014-10-23"), units="months")
[1] 168

On the other hand, sometimes it gives the correct answer:

age_calc(as.Date("2011-10-23"), as.Date("2014-10-23"), units="years")
[1] 3

Based on checking a few examples, I at first thought that age_calc adds an extra year when the age in years doesn't come out to an integer. But that turns out not to be the case, as this example shows. The age should be 14, but now I've used different years than in the previous example. The answer is not an integer, but it's correct.

age_calc(as.Date("1994-10-23"), as.Date("2008-10-23"), units="years")
[1] 14.00273

This might be one to look at soon while writing up tests for age_calc.

Autoplot for shapefiles

Having a quick autoplot method for shapefiles as a sanity check would save time when loading up a shapefile, merging data, and plotting it.

Workflow would be:

myShape <- readShapePoly("mymap")
autoplot(myShape)

This will make a nice ggplot2 representation of the shapefile.

Update for ggplot2

From Hadley:

checking whether package ‘eeptools’ can be installed ... WARNING
Found the following significant warnings:
  Warning: replacing previous import by ‘grid::arrow’ when loading ‘eeptools’
  Warning: replacing previous import by ‘grid::unit’ when loading ‘eeptools’
See ‘/private/tmp/Rtmpri6X2x/check_cran33a12e539bd0/eeptools.Rcheck/00install.out’
for details.
checking Rd cross-references ... WARNING
Missing link or links in documentation object 'ggmapmerge.Rd':
  ‘ggplot2’

See section 'Cross-references' in the 'Writing R Extensions' manual.

checking examples ... ERROR
Running examples in ‘eeptools-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: autoplot.lm
> ### Title: A function to replicate the basic plot function for linear
> ###   models in ggplot2
> ### Aliases: autoplot.lm
>
> ### ** Examples
>
> # Univariate
> a <- runif(1000)
> b <- 7*a+rnorm(1)
> mymod <- lm(b~a)
> autoplot(mymod)
Error: geom_hline requires the following missing aesthetics: yintercept
Execution halted
checking tests ... ERROR
Running the tests in ‘tests/test-all.R’ failed.
Last 13 lines of output:
  In addition: Warning messages:
  1: In max(z) : no non-missing arguments to max; returning -Inf
  2: In makenum(c) : NAs introduced by coercion
  3: In max(c(NA, NA, NA, NA), na.rm = TRUE) :
    no non-missing arguments to max; returning -Inf
  4: In if (varclass %in% c("ordered", "factor", "character")) { :
    the condition has length > 1 and only the first element will be used
  5: In decomma(a) : NAs introduced by coercion
  6: In decomma(b) : NAs introduced by coercion
  7: In decomma(n) : NAs introduced by coercion
  8: In decomma(n) : NAs introduced by coercion
  9: In decomma(n) : NAs introduced by coercion
  Execution halted

Bug in age_calc (if -> vector)

The current implementation (0.9.1) of age_calc treats enddate and dob as scalars when testing for validity, while documentation suggests they can be vectors.

Current implementation:

if (enddate < dob) {
  stop("End date must be a date after date of birth")
}

Should be fixed to:

if (any(enddate < dob)) {
  stop("End date must be a date after date of birth")
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.