Giter VIP home page Giter VIP logo

dlookr's People

Contributors

choonghyunryu avatar jcochanc avatar mgacc0 avatar mgirlich avatar ymarcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dlookr's Issues

Failed to install "dlookr" from Github

i get this error trying to download the package

Error: Failed to install 'dlookr' from GitHub:
(converted from warning) cannot remove prior installation of package ‘rlang’

Misspelled Word: Gruoped

In a few locations in the EDA Report, the headings use "Gruoped" instead of Grouped. These headings are:

4.1 Gruoped Descriptive Statistics
4.1.1 Gruoped Numerical Variables
There is no target variable.

4.1.2 Gruoped Categorical Variables
There is no target variable.

4.2 Gruoped Relationship Between Variables

diagnose_report() produces output with too much rows in section 1.2.1 Diagnosis of categorical variables

The following code produces a html output where the section 1.2.1 Diagnosis of categorical variables contains too many rows.
It appears when there are more than 10 variable values with a frequency from top 10 frequency.
Maybe an additional level of rank() or rownum() should be used.
R version: 4.0.2
dlookr version: 0.4.2
Could you please check/fix?

library(dlookr)
#test for diagnose_report() categorical variable part error
    datafrm <- data.frame("field1" = c(paste("f1val", "v01")), "field2"= c(paste("f2val","v02")))
for (row in 1:100) {
    datafrm <- rbind(datafrm, data.frame("field1" = c(paste("f1val", row)), "field2"= c(paste("f2val",row))))
    
}
    
for (row in 1:100) {
    datafrm <- rbind(datafrm, data.frame("field1" = c(paste("f1val", row)), "field2"= c(paste("f2val",row))))
    
}
# this output does not look good:
  diagnose_report(datafrm, output_format = "html", output_file = "testdiagnose_bad.html", output_dir = "c:/temp")
    
for (row in 1:5) {
    datafrm <- rbind(datafrm, data.frame("field1" = c(paste("f1val", row)), "field2"= c(paste("f2val",row))))
    
}
  # this output looks good 
diagnose_report(datafrm, output_format = "html", output_file = "testdiagnose_good.html", output_dir = "c:/temp")

Fix page view overhead of pdf file among automatic reports

In the case of a scatter plot or Q-Q plot among the plots included in the report,
When the number of data is large, it may take a lot of time to switch pages.

This is because the plot attached to the document is a pdf file, not a bitmap file. When reading the plot included in the pdf file, the information corresponding to the number of observations is read and rendered on the screen.

So, plots of data with many observations should be attached as bitmap-based image files.

[Feature request] Cramer's V, Theil's U, Point-Biseral and PPS in correlate() and plot_correlate()

This is a "generalization" of my requests made here and here.

It would be great to be able to use the correlate() and plot_correlate() functions to calculate/plot the correlation between categorical variables using Cramer's V or Theil's U factors.

Both the upon mentioned functions could be also used to calculate/plot the correlation between categorical and numerical variables using the Point-Biseral correlation.

In order to get the order of association between any variable of any kind, it'd be great also to introduce the Predictive Power Score (PPS) in both the upon mentioned functions. Pros and cons of PPS could be find here.

package vignette failed

when installing latest github version with vignette generation, the process fails. Seems a typo in vignette "diagonosis.Rmd"

Maybe it should be "diagnosis.Rmd"


install.packages(c("nycflights13", "ISLR", "DBI", "RSQLite"))
devtools::install_github("choonghyunryu/dlookr", build_vignettes = TRUE)

-  installing the package to build vignettes
E  creating vignettes (1m 6.1s)
   --- re-building 'EDA.Rmd' using rmarkdown
   
   Attaching package: 'dlookr'
   
   The following object is masked from 'package:base':
   
       transform
   
   
   Attaching package: 'dplyr'
   
   The following objects are masked from 'package:stats':
   
       filter, lag
   
   The following objects are masked from 'package:base':
   
       intersect, setdiff, setequal, union
   
   Warning: Passed a group with no more than five observations.
   (Urban == NA)
   Loading required package: DBI
   Loading required package: RSQLite
   Loading required package: dbplyr
   
   Attaching package: 'dbplyr'
   
   The following objects are masked from 'package:dplyr':
   
       ident, sql
   
   --- finished re-building 'EDA.Rmd'
   
   --- re-building 'Introduce.Rmd' using rmarkdown
   --- finished re-building 'Introduce.Rmd'
   
   --- re-building 'diagonosis.Rmd' using rmarkdown
   Quitting from lines 463-472 (diagonosis.Rmd) 
   Error: processing vignette 'diagonosis.Rmd' failed with diagnostics:
   el argumento "add_date" está ausente, sin valor por omisión
   --- failed re-building 'diagonosis.Rmd'
   
   --- re-building 'transformation.Rmd' using rmarkdown
   --- finished re-building 'transformation.Rmd'
   
   SUMMARY: processing the following file failed:
     'diagonosis.Rmd'
   
   Error: Vignette re-building failed.
   Ejecución interrumpida
Error: Failed to install 'dlookr' from GitHub:
  System command 'Rcmd.exe' failed, exit status: 1, stdout + stderr (last 10 lines):
E> --- failed re-building 'diagonosis.Rmd'
E> 
E> --- re-building 'transformation.Rmd' using rmarkdown
E> --- finished re-building 'transformation.Rmd'
E> 
E> SUMMARY: processing the following file failed:
E>   'diagonosis.Rmd'
E> 
E> Error: Vignette re-building failed.
E> Ejecución interrumpida

reportData error in eda_paged_report

Hi
I am trying to use eda_paged_report with a dataset but I get the error:

Quitting from lines 170-173 (eda_paged_temp.Rmd) 
Error in is.data.frame(df) : object 'reportData' not found

I am using the last version from CRAN.
Thank you!

Change extrafont to showtext.

Change extrafont to showtext. This is because stable maintenance of the Rttf2pt1 package used by extrafont seems difficult.

The message below is from the author of the Rttf2pt1 package.

"I have a workaround, but please note that this package relies on an old, unmaintained program written in C called ttf2pt1. The most recent commit in that project was made about 18 years ago, and I have basically been patching it along the way, so satisfy compiler warnings which have been getting more stringent over time. It is possible that I eventually will not be able to keep this package on CRAN."

Error: Can't recycle `as_tibble(result$table)[, 7:21]` (size 15) to size 17.

Hi. I trust the all is well in your world.
Just started to experiment with [dlookr] and experiencing the above error in title.
The backtrace is:
<error/vctrs_error_incompatible_size>
Can't recycle as_tibble(result$table)[, 7:21] (size 15) to size 17.
Backtrace:

  1. ├─base::source("~/Documents/Documents/SOBA/ten80_testing/ten80_db_access_test_multi_client/ten80_BidPricing.R")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │ └─base::eval(ei, envir)
  5. ├─test_Bids %>% describe() ~/Documents/Documents/SOBA/ten80_testing/ten80_db_access_test_multi_client/ten80_BidPricing.R:468:10
  6. ├─dlookr::describe(.)
  7. ├─dlookr:::describe.data.frame(.)
  8. │ └─dlookr:::describe_impl(.data, vars)
  9. │ └─base::lapply(...)
  10. │ └─dlookr:::FUN(X[[i]], ...)
  11. │ └─dlookr:::num_summary(pull(df, x))
  12. │ ├─base::[<-(...)
  13. │ └─tibble:::[<-.tbl_df(...)
  14. │ └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
  15. │ └─tibble:::vectbl_as_new_col_index(j, x, value, j_arg, value_arg)
  16. │ └─tibble:::vectbl_recycle_rhs_names(names2(value), length(j), value_arg)
  17. │ ├─base::unname(vec_recycle(set_names(names), n, x_arg = as_label(value_arg)))
  18. │ └─vctrs::vec_recycle(set_names(names), n, x_arg = as_label(value_arg))
  19. └─vctrs:::stop_recycle_incompatible_size(...)
  20. └─vctrs:::stop_vctrs(...)

What is the cause of this? Cn it be column data with NA values?
Thank you

Replace from latex to pagedown

The report based on latex is difficult to implematation. so will change the architect in report.
The pagedown report is latex independent. So, it is free from the constraints of the language used by the user.

Cannot install

When installing dlookr it seems to download but does not appear in the "packages" session of Rstudio. When I try to run diagnose() I get an error message that it can't find the "function"

ead_report / pdf output issue

안녕하세요. dlookr 패키지 정말 감사히 사용하고 있습니다.
혼자서 분석 할 때는 html로 출력하여 홀로 보면서 사용하였는데,
팀프로젝트를 하다보니, pdf로 출력하는 것이 꼭 필요해졌습니다ㅠ

거의 2주동안 문제를 해결하기 위해 이곳저곳을 돌아다녔지만,
해결책을 찾을 수 없어 문의 드립니다.

초반에는 LaTex 문제인 것 같아 MikTex도 설치해보고, Texlive도 설치해보았습니다.

현재는 Texlive 2019가 설치되어 있고,
아래와 같이 Console 창에 메시지가 출력되었습니다.

eda_report 함수로 pdf 출력을 꼭 해보고 싶은데, 바쁘시겠지만 도움 주신다면
정말 정말 감사하겠습니다.

감사합니다 ㅠ

processing file: ./EDA_Report.Rnw
|..................................... | 33%
ordinary text without R code

|........................................................................... | 67%
label: child-section-application (with options)
List of 1
$ child: chr "02_RunEDA.Rnw"

processing file: ./02_RunEDA.Rnw
|.... | 4%
ordinary text without R code

|......... | 8%
label: enrironment (with options)
List of 3
$ echo : logi FALSE
$ warning: logi FALSE
$ message: logi FALSE

|............. | 12%
ordinary text without R code

|.................. | 16%
label: check_variables (with options)
List of 1
$ echo: logi FALSE

|...................... | 20%
inline R code fragments

|........................... | 24%
label: info_variables (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|............................... | 28%
inline R code fragments

|.................................... | 32%
label: describe_univariate (with options)
List of 3
$ echo : logi FALSE
$ comment: chr ""
$ results: chr "asis"

|........................................ | 36%
ordinary text without R code

|............................................. | 40%
label: normality (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|................................................. | 44%
ordinary text without R code

|...................................................... | 48%
label: correlations (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|.......................................................... | 52%
ordinary text without R code

|............................................................... | 56%
label: plot_correlations (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|................................................................... | 60%
ordinary text without R code

|........................................................................ | 64%
label: numeric_variables (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|............................................................................ | 68%
ordinary text without R code

|................................................................................. | 72%
label: category_variables (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|..................................................................................... | 76%
ordinary text without R code

|.......................................................................................... | 80%
label: group_correlations (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|.............................................................................................. | 84%
ordinary text without R code

|................................................................................................... | 88%
label: plot_group_correlations (with options)
List of 2
$ echo : logi FALSE
$ results: chr "asis"

|....................................................................................................... | 92%
ordinary text without R code

|............................................................................................................ | 96%
label: option_undo (with options)
List of 1
$ echo: logi FALSE

|................................................................................................................| 100%
ordinary text without R code

|................................................................................................................| 100%
ordinary text without R code

output file: ./EDA.tex

This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019/W32TeX) (preloaded format=pdflatex)
restricted \write18 enabled.
Use of uninitialized value $ver in scalar chomp at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 204.
Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
fmtutil: fmtutil is using the following fmtutil.cnf files (in precedence order):
fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
fmtutil [INFO]: --- remaking pdflatex with pdftex
Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
fmtutil [INFO]: Disabled formats: 6
fmtutil [INFO]: Not selected formats: 44
fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
fmtutil [INFO]: Total formats: 51
fmtutil [INFO]: exiting with status 1
C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe
I can't find the format file `pdflatex.fmt'!

kpathsea: Running mktexfmt pdflatex.fmt

The command name is C:\texlive\2019\bin\win32\mktexfmt
I was unable to find any missing LaTeX packages from the error log EDA.log.
! Use of uninitialized value $ver in scalar chomp at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 204.
! Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
! Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
! fmtutil: fmtutil is using the following fmtutil.cnf files (in precedence order):
! fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
! Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
! fmtutil: fmtutil is using the following fmtutil.cnf files (in precedence order):
! fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! Use of uninitialized value $ver in substitution (s///) at C:/texlive/2019/tlpkg/TeXLive/TLWinGoo.pm line 205.
! fmtutil: fmtutil is using the following fmtutil.cnf files (in precedence order):
! fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! fmtutil: fmtutil is using the following fmtutil.cnf files (in precedence order):
! fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! fmtutil: c:/texlive/2019/texmf-dist/web2c/fmtutil.cnf
! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! fmtutil: fmtutil is using the following fmtutil.cnf file for writing changes:
! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! fmtutil: c:/users/6300263/.texlive2019/texmf-config/web2c/fmtutil.cnf
! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! fmtutil [INFO]: Total formats: 51
! fmtutil [INFO]: exiting with status 1
! C:\texlive\2019\bin\win32\runscript.tlu:902: command failed with exit code 1:
! perl.exe c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl --user --byfmt pdflatex
! Running the command C:\texlive\2019\bin\win32\fmtutil-user.exe

! fmtutil [INFO]: writing formats under c:/users/6300263/.texlive2019/texmf-var/web2c
! fmtutil [INFO]: --- remaking pdflatex with pdftex
! Can't spawn "cmd.exe": No such file or directory at c:\texlive\2019\texmf-dist\scripts\texlive\fmtutil.pl line 598.
! fmtutil [WARNING]: inifile pdflatex.ini for pdflatex/pdftex not found.
! fmtutil [INFO]: Disabled formats: 6
! fmtutil [INFO]: Not selected formats: 44
! fmtutil [INFO]: Failed to build: 1 (pdftex/pdflatex)
! 에러: LaTeX failed to compile EDA.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See EDA.log for more info.
추가정보: 경고메시지(들):
In dir.create(paste(path, "figure", sep = "/")) :
'.\figure'은 이미 존재합니다

The title of a graph may be wrong.

I made a log transform of a feature called 'totalVenda'.

totalVenda_log = transform(df$totalVenda, method = "log")
summary(totalVenda_log)

After doing that, I made a plot.
plot(totalVenda_log)

The above command produces two graphs.
The left one is called 'Oraginal Data'.

I think it is supposed to be called 'Original Data' instead.

My best regards, Mario (Brazil)

diagnose_paged_report issue

Just one numeric variable case

Quitting from lines 453-465 (diagnosis_paged_temp.Rmd)
Error in plot_na_intersect(reportData) :
Supported only when the number of variables including missing values is 2 or more.

Error in eda_report

Hello,

When eda_report is run on a data set that has two numerical variables, both of which have some null values but not all entries are null and one of which is set to that target variable, the resulting error occurs: "Error in apply(edaData[, idx.numeric], 2, function(x) !all(is.na(x))) : dim(X) must have a positive length". The code supplied in the error traces back to lines 87-95 in EDA_Report.Rmd. The solution was to wrap edaData[, idx.numeric] with as.data.frame(). Please see below.

idx.numeric <- idx.numeric[apply(as.data.frame(edaData[, idx.numeric]), 2,
function(x) !all(is.na(x)))]

idx.numeric <- idx.numeric[apply(as.data.frame(edaData[, idx.numeric]), 2,
function(x) diff(range(x, na.rm = TRUE)) > 0)]

Thank you for your work, I find dlookr to be very useful.

Best,

Evan Luff

Issue of plot.optimal_bins()

NA information is not displayed in the visualization result of plot.optimal_bins().

Generate data for the example

heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

optimal binning using binning_by()

bin <- binning_by(heartfailure2, "death_event", "creatinine")

visualize all information for optimal_bins class

plot(bin)

some improvements

from @roberto Passera ,

1- descriptive univariate analysis for continuous variables, it might be useful to add standard deviation too (a classical evergreen measure…)
2- descriptive univariate analysis for categorical variables, I do believe that a classical vertical bar plot could be more immediate, letting it to be copied and pasted as the other graphics do
3- bivariate categorical variables (and everywhere), what about an option to exclude missing from the general count?
4- Target based Analysis - Grouped Categorical Variables, the percentages for relative contingency tables do not go, at least with my datasets

유니코드 문자 에러

안녕하세요? dlookr 는 너무나 유용하게 잘 사용하고 있습니다.

변수에 한글로 되어있으면 diagnose_report() 함수가 작동하지 않는데...
혹시 해결방법이 있을까요?

! Package inputenc Error: Unicode character ??(U+C18C)
(inputenc) not set up for use with LaTeX.

! pdflatex: warning: running with administrator privileges

Try other LaTeX engines instead (e.g., xelatex) if you are using pdflatex. For R Markdown users, see https://bookdown.org/yihui/rmarkdown/pdf-document.html
Error: Failed to compile DataDiagnosis_Report.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See DataDiagnosis_Report.log for more info.
In addition: Warning message:
In grepl("==> Fatal error occurred", x[i], fixed = TRUE) :
input string 1 is invalid in this locale

Reduce size of vignettes

Reduce size of vignettes because when building on Solaris, a warning is issued that the package size exceeds 5MB.

Feature Request: Pass Color to "Plot" functions

For the Plot functions: I would like to see added an additional parameter, colorm that would default pass to col = "lightblue", but could take another color instead for example "mediumblue".

Example:

   plot_outliers <- function(df, var, colorselect = "lightblue") {
    x <- pull(df, var)
    op <- par(no.readonly = TRUE)
    par(mfrow = c(2, 2), oma = c(0, 0, 3, 0), mar = c(2, 4, 2, 2))
    on.exit(par(op))
    boxplot(x, main = "With outliers", col = colorselect)...

Problem of producing pdf report

With the below code:
removed_chronic_patients %>%
eda_report(target=spont_clearance,output_format="pdf",
output_file="EDA_report.pdf",
output_dir="Output/")
I am getting the following error:
output file: Output//EDA_report.tex

! LaTeX Error: Something's wrong--perhaps a missing \item.

Error: LaTeX failed to compile EDA_report.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See EDA_report.log for more info.

I am aware this is probably an LaTex issue. I have checked if any LaTex packages are missing with no avail. Was wondering if you have any experience with error above? I have also attached the error log.
EDA_report.log

Error in eda_report() function

Hi Choonghyunryu,

I face the below error while executing the following R code
library(dlookr)
library(magrittr)
iris%>%eda_report(output_format="html")

PS: I am a biggest fan of your dlookr package. It is awsome

Output from Console:
output file: EDA_Report.knit.md

Error in extract(input_str) : Invalid nesting of html_preserve directives
In addition: Warning messages:
1: Unquoting language objects with !!! is deprecated as of rlang 0.4.0.
Please use !! instead.

Bad:

dplyr::select(data, !!!enquo(x))

Good:

dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures

This warning is displayed once per session.
2: In png(file, width = 1 + k * w, height = h) :
'width=10, height=13' are unlikely values in pixels
3: In readLines(con, warn = FALSE) :
invalid input found on input connection 'EDA_Report.knit.md'
4: In readLines(con, warn = FALSE) :
invalid input found on input connection 'EDA_Report.utf8.md'

My Session info()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] kableExtra_1.1.0 knitr_1.22 moments_0.14 gridExtra_2.3
[5] ggplot2_3.1.1 dplyr_0.8.5 xtable_1.8-4 magrittr_1.5
[9] dlookr_0.3.13 mice_3.8.0

loaded via a namespace (and not attached):
[1] colorspace_1.4-1 class_7.3-15 rio_0.5.16
[4] htmlTable_1.13.1 base64enc_0.1-3 rstudioapi_0.10
[7] bit64_0.9-7 mvtnorm_1.1-0 sqldf_0.4-11
[10] xml2_1.2.0 splines_3.6.3 libcoin_1.0-5
[13] DMwR_0.4.1 Formula_1.2-3 broom_0.5.2
[16] cluster_2.1.0 png_0.1-7 readr_1.3.1
[19] compiler_3.6.3 httr_1.4.0 backports_1.1.4
[22] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
[25] acepack_1.4.1 htmltools_0.3.6 tools_3.6.3
[28] partykit_1.2-7 gtable_0.3.0 glue_1.3.1
[31] tinytex_0.22 Rcpp_1.0.4.6 carData_3.0-2
[34] cellranger_1.1.0 vctrs_0.2.4 gdata_2.18.0
[37] nlme_3.1-144 inum_1.0-1 xfun_0.6
[40] stringr_1.4.0 proto_1.0.0 openxlsx_4.1.0.1
[43] rvest_0.3.4 lifecycle_0.2.0 gtools_3.8.1
[46] RcmdrMisc_2.7-0 MASS_7.3-51.5 zoo_1.8-6
[49] scales_1.0.0 hms_0.4.2 sandwich_2.5-1
[52] RColorBrewer_1.1-2 smbinning_0.9 yaml_2.2.0
[55] quantmod_0.4-15 curl_3.3 memoise_1.1.0
[58] rpart_4.1-15 latticeExtra_0.6-29 stringi_1.4.3
[61] RSQLite_2.2.0 highr_0.8 corrplot_0.84
[64] nortest_1.0-4 e1071_1.7-3 checkmate_1.9.4
[67] TTR_0.23-4 caTools_1.17.1.2 zip_2.0.1
[70] chron_2.3-55 rlang_0.4.5 pkgconfig_2.0.2
[73] bitops_1.0-6 evaluate_0.13 lattice_0.20-38
[76] ROCR_1.0-7 purrr_0.3.2 htmlwidgets_1.3
[79] bit_1.1-15.2 tidyselect_0.2.5 plyr_1.8.4
[82] R6_2.4.0 gplots_3.0.1.1 generics_0.0.2
[85] Hmisc_4.4-0 DBI_1.1.0 withr_2.1.2
[88] gsubfn_0.7 pillar_1.4.3 haven_2.1.0
[91] foreign_0.8-75 prettydoc_0.3.1 xts_0.12-0
[94] survival_3.1-11 abind_1.4-5 nnet_7.3-12
[97] tibble_2.1.1 crayon_1.3.4 car_3.0-3
[100] KernSmooth_2.23-16 rmarkdown_2.1 jpeg_0.1-8.1
[103] grid_3.6.3 readxl_1.3.1 data.table_1.12.8
[106] blob_1.2.1 forcats_0.4.0 digest_0.6.18
[109] classInt_0.4-3 webshot_0.5.2 tidyr_1.0.2
[112] munsell_0.5.0 viridisLite_0.3.0 tcltk_3.6.3

diagnose_report() is not working

diagnose_report() is not working. The reason is that some of the values in Arabic and I do think the issue when the code try to compile latex file with Arabic

Missing dependency

Hi,

In version lookr_0.4.2 there is a missing dependency on systemfonts

Cheers

ttf2pt1.exe has stopped working

When trying to load the dlookr package on a new machine, I am running into an error. Windows throws up a window that says "ttf2pt1.exe has stopped working". This pop-up can be closed with "Close the program", but then re-opens 3 more times before finishing loading the dlookr package.

See the attached screenshot:

image

Here is the complete log, note that the problem occurs after the Registering PDF & PostScript fonts with R for Viz line before completing after I close the pop-ups:

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library('dlookr')
Registering Windows fonts with R for Viz
Registering PDF & PostScript fonts with R for Viz
Imported Arial Narrow fonts.
Registering Windows fonts with R for Viz
Registering PDF & PostScript fonts with R for Viz

Attaching package: ‘dlookr’

The following object is masked from ‘package:base’:

    transform

Warning message:
package ‘dlookr’ was built under R version 4.0.5

Two questions:

  1. What do you suggest to resolve this problem?
  2. What downstream consequences might I face if I don't solve this problem?

Implementing the new ppscore

Recently a great addition in Data Exploration tools has been done: the predictive power score.
There is currently only the original implementation in Python here. dlookr could be the first R library to implement this innovating tool in a tidy way! I would not miss this opportunity!

plot_outlier errors out when diagnose_report is run against a dataset with a numeric column that is null

dlookr (v.0.3.9)
plot_outlier errors out when diagnose_report is run against a dataset with a numeric column where all values are N/A

Here's the traceback:

Quitting from lines 216-252 (Diagnosis_Report.Rmd)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) :
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
need finite 'ylim' values
31.
plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs)
30.
bxp(list(stats = structure(c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), .Dim = c(5L, 1L)), n = 0, conf = structure(c(NA_real_,
NA_real_), .Dim = 2:1), out = numeric(0), group = numeric(0),
names = ""), notch = FALSE, width = NULL, varwidth = FALSE, ...
29.
do.call("bxp", c(list(z, notch = notch, width = width, varwidth = varwidth,
log = log, border = border, pars = pars, outline = outline,
horizontal = horizontal, add = add, at = at), args[namedargs]))
28.
boxplot.default(x, main = "With outliers", col = col)
27.
boxplot(x, main = "With outliers", col = col)
26.
plot_outliers(df, x, col)
25.
FUN(X[[i]], ...)
24.
lapply(vars[idx_numeric], function(x) plot_outliers(df, x, col))
23.
plot_outlier_impl(.data, vars, col)
22.
plot_outlier.data.frame(edaData, variables[i])
21.
plot_outlier(edaData, variables[i]) at #26
20.
eval(expr, envir, enclos)
19.
eval(expr, envir, enclos)
18.
withVisible(eval(expr, envir, enclos))
17.
withCallingHandlers(withVisible(eval(expr, envir, enclos)), warning = wHandler,
error = eHandler, message = mHandler)
16.
handle(ev <- withCallingHandlers(withVisible(eval(expr, envir,
enclos)), warning = wHandler, error = eHandler, message = mHandler))
15.
timing_fn(handle(ev <- withCallingHandlers(withVisible(eval(expr,
envir, enclos)), warning = wHandler, error = eHandler, message = mHandler)))
14.
evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos,
debug = debug, last = i == length(out), use_try = stop_on_error !=
2L, keep_warning = keep_warning, keep_message = keep_message,
output_handler = output_handler, include_timing = include_timing)
13.
evaluate::evaluate(...)
12.
evaluate(code, envir = env, new_device = FALSE, keep_warning = !isFALSE(options$warning),
keep_message = !isFALSE(options$message), stop_on_error = if (options$error &&
options$include) 0L else 2L, output_handler = knit_handlers(options$render,
options))
11.
in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE,
keep_warning = !isFALSE(options$warning), keep_message = !isFALSE(options$message),
stop_on_error = if (options$error && options$include) 0L else 2L,
output_handler = knit_handlers(options$render, options)))
10.
block_exec(params)
9.
call_block(x)
8.
process_group.block(group)
7.
process_group(group)
6.
withCallingHandlers(if (tangle) process_tangle(group) else process_group(group),
error = function(e) {
setwd(wd)
cat(res, sep = "\n", file = output %n% "") ...
5.
process_file(text, output)
4.
knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet,
encoding = encoding)
3.
rmarkdown::render(paste(path, rmd, sep = "/"), output_format = prettydoc::html_pretty(toc = TRUE,
number_sections = TRUE), output_file = paste(path, output_file,
sep = "/"))
2.
diagnose_report.data.frame(sample, output_format = "html")
1.
diagnose_report(sample, output_format = "html")

diagnose_report() function errors

Hi @choonghyunryu,

First of, many thanks for this package, really useful for EDA!
I am using the function diagnose_report and it gives me errors when trying to generate either html or pdf. For instance:

df %>% diagnose_report(output_format = "html")

gives:

Quitting from lines 109-122 (Diagnosis_Report.Rmd) 
 Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html,  : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xA0 0x4D 0x20 0x3C [9] 

and

df %>% diagnose_report(output_file = "Diagn.pdf")

gives:

output file: /var/folders/54/f4d8z7ps1lx_w4xcj_8p16y0mjrjn3/T//RtmpFCdNMH/Diagn.tex

tlmgr search --file --global '/setspace.sty'
Proxy must be specified as absolute URI; '194.34.82.250:10263' is not at /Users/kkrg658/Library/TinyTeX/tlpkg/TeXLive/TLDownload.pm line 44.
! LaTeX Error: File `setspace.sty' not found.

! Emergency stop.
<read *> 

Error: Failed to compile Diagn.tex. See https://yihui.name/tinytex/r/#debugging for debugging tips. See Diagn.log for more info.
In addition: Warning messages:
1: In dir.create(paste(path, "figure", sep = "/")) :
  '/var/folders/54/f4d8z7ps1lx_w4xcj_8p16y0mjrjn3/T//RtmpFCdNMH/figure' already exists
2: In system2("tlmgr", args, ...) :
  running command ''tlmgr' search --file --global '/setspace.sty'' had status 255

Any idea why this might be happening?

My session info:

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.1 (2019-07-05)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_GB.UTF-8                 
 ctype    en_GB.UTF-8                 
 tz       Europe/London               
 date     2020-02-14                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version  date       lib source        
 abind          1.4-5    2016-07-21 [1] CRAN (R 3.6.0)
 acepack        1.4.1    2016-10-29 [1] CRAN (R 3.6.0)
 assertthat     0.2.1    2019-03-21 [1] CRAN (R 3.6.0)
 backports      1.1.5    2019-10-02 [1] CRAN (R 3.6.0)
 base64enc      0.1-3    2015-07-28 [1] CRAN (R 3.6.0)
 bit            1.1-14   2018-05-29 [1] CRAN (R 3.6.0)
 bit64          0.9-7    2017-05-08 [1] CRAN (R 3.6.0)
 bitops         1.0-6    2013-08-17 [1] CRAN (R 3.6.0)
 blob           1.2.0    2019-07-09 [1] CRAN (R 3.6.0)
 boot           1.3-22   2019-04-02 [1] CRAN (R 3.6.1)
 broom          0.5.2    2019-04-07 [1] CRAN (R 3.6.0)
 car            3.0-6    2019-12-23 [1] CRAN (R 3.6.0)
 carData        3.0-3    2019-11-16 [1] CRAN (R 3.6.0)
 caTools        1.17.1.3 2019-11-30 [1] CRAN (R 3.6.0)
 cellranger     1.1.0    2016-07-27 [1] CRAN (R 3.6.0)
 checkmate      1.9.4    2019-07-04 [1] CRAN (R 3.6.0)
 chron          2.3-55   2020-02-02 [1] CRAN (R 3.6.0)
 class          7.3-15   2019-01-01 [1] CRAN (R 3.6.1)
 classInt       0.4-2    2019-10-17 [1] CRAN (R 3.6.0)
 cli            1.1.0    2019-03-19 [1] CRAN (R 3.6.0)
 cluster        2.1.0    2019-06-19 [1] CRAN (R 3.6.1)
 colorspace     1.4-1    2019-03-18 [1] CRAN (R 3.6.0)
 corrplot       0.84     2017-10-16 [1] CRAN (R 3.6.0)
 crayon         1.3.4    2017-09-16 [1] CRAN (R 3.6.0)
 curl           4.2      2019-09-24 [1] CRAN (R 3.6.0)
 data.table     1.12.2   2019-04-07 [1] CRAN (R 3.6.0)
 DBI            1.0.0    2018-05-02 [1] CRAN (R 3.6.0)
 digest         0.6.21   2019-09-20 [1] CRAN (R 3.6.0)
 dlookr       * 0.3.13   2020-01-09 [1] CRAN (R 3.6.0)
 DMwR           0.4.1    2013-08-08 [1] CRAN (R 3.6.0)
 dplyr        * 0.8.3    2019-07-04 [1] CRAN (R 3.6.0)
 e1071          1.7-2    2019-06-05 [1] CRAN (R 3.6.0)
 evaluate       0.14     2019-05-28 [1] CRAN (R 3.6.0)
 fansi          0.4.0    2018-10-05 [1] CRAN (R 3.6.0)
 forcats      * 0.4.0    2019-02-17 [1] CRAN (R 3.6.0)
 foreign        0.8-71   2018-07-20 [1] CRAN (R 3.6.1)
 Formula        1.2-3    2018-05-03 [1] CRAN (R 3.6.0)
 gdata          2.18.0   2017-06-06 [1] CRAN (R 3.6.0)
 generics       0.0.2    2018-11-29 [1] CRAN (R 3.6.0)
 ggplot2      * 3.2.1    2019-08-10 [1] CRAN (R 3.6.0)
 glue           1.3.1    2019-03-12 [1] CRAN (R 3.6.0)
 gplots         3.0.1.2  2020-01-11 [1] CRAN (R 3.6.0)
 gridExtra      2.3      2017-09-09 [1] CRAN (R 3.6.0)
 gsubfn         0.7      2018-03-16 [1] CRAN (R 3.6.0)
 gtable         0.3.0    2019-03-25 [1] CRAN (R 3.6.0)
 gtools         3.8.1    2018-06-26 [1] CRAN (R 3.6.0)
 haven          2.1.1    2019-07-04 [1] CRAN (R 3.6.0)
 highr          0.8      2019-03-20 [1] CRAN (R 3.6.0)
 Hmisc          4.3-1    2020-02-07 [1] CRAN (R 3.6.0)
 hms            0.5.1    2019-08-23 [1] CRAN (R 3.6.0)
 htmlTable      1.13.3   2019-12-04 [1] CRAN (R 3.6.0)
 htmltools      0.4.0    2019-10-04 [1] CRAN (R 3.6.0)
 htmlwidgets    1.5.1    2019-10-08 [1] CRAN (R 3.6.0)
 httr           1.4.1    2019-08-05 [1] CRAN (R 3.6.0)
 inum           1.0-1    2019-04-25 [1] CRAN (R 3.6.0)
 janitor      * 1.2.0    2019-04-21 [1] CRAN (R 3.6.0)
 jomo           2.6-10   2019-10-22 [1] CRAN (R 3.6.0)
 jpeg           0.1-8.1  2019-10-24 [1] CRAN (R 3.6.0)
 jsonlite       1.6      2018-12-07 [1] CRAN (R 3.6.0)
 kableExtra   * 1.1.0    2019-03-16 [1] CRAN (R 3.6.0)
 KernSmooth     2.23-15  2015-06-29 [1] CRAN (R 3.6.1)
 knitr        * 1.25     2019-09-18 [1] CRAN (R 3.6.0)
 lattice      * 0.20-38  2018-11-04 [1] CRAN (R 3.6.1)
 latticeExtra   0.6-29   2019-12-19 [1] CRAN (R 3.6.0)
 lazyeval       0.2.2    2019-03-15 [1] CRAN (R 3.6.0)
 libcoin        1.0-5    2019-08-27 [1] CRAN (R 3.6.0)
 lifecycle      0.1.0    2019-08-01 [1] CRAN (R 3.6.0)
 lme4           1.1-21   2019-03-05 [1] CRAN (R 3.6.0)
 lubridate      1.7.4    2018-04-11 [1] CRAN (R 3.6.0)
 magrittr     * 1.5      2014-11-22 [1] CRAN (R 3.6.0)
 MASS           7.3-51.4 2019-03-31 [1] CRAN (R 3.6.1)
 Matrix         1.2-17   2019-03-22 [1] CRAN (R 3.6.1)
 memoise        1.1.0    2017-04-21 [1] CRAN (R 3.6.0)
 mice         * 3.7.0    2019-12-13 [1] CRAN (R 3.6.0)
 minqa          1.2.4    2014-10-09 [1] CRAN (R 3.6.0)
 mitml          0.3-7    2019-01-07 [1] CRAN (R 3.6.0)
 modelr         0.1.5    2019-08-08 [1] CRAN (R 3.6.0)
 moments        0.14     2015-01-05 [1] CRAN (R 3.6.0)
 munsell        0.5.0    2018-06-12 [1] CRAN (R 3.6.0)
 mvtnorm        1.0-11   2019-06-19 [1] CRAN (R 3.6.0)
 nlme           3.1-140  2019-05-12 [1] CRAN (R 3.6.1)
 nloptr         1.2.1    2018-10-03 [1] CRAN (R 3.6.0)
 nnet           7.3-12   2016-02-02 [1] CRAN (R 3.6.1)
 nortest        1.0-4    2015-07-30 [1] CRAN (R 3.6.0)
 openxlsx       4.1.0.1  2019-05-28 [1] CRAN (R 3.6.0)
 pan            1.6      2018-06-29 [1] CRAN (R 3.6.0)
 partykit       1.2-6    2020-01-30 [1] CRAN (R 3.6.0)
 pillar         1.4.2    2019-06-29 [1] CRAN (R 3.6.0)
 pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 3.6.0)
 png            0.1-7    2013-12-03 [1] CRAN (R 3.6.0)
 prettydoc      0.3.1    2019-11-23 [1] CRAN (R 3.6.0)
 proto          1.0.0    2016-10-29 [1] CRAN (R 3.6.0)
 purrr        * 0.3.2    2019-03-15 [1] CRAN (R 3.6.0)
 quantmod       0.4-15   2019-06-17 [1] CRAN (R 3.6.0)
 R6             2.4.0    2019-02-14 [1] CRAN (R 3.6.0)
 RcmdrMisc      2.7-0    2020-01-14 [1] CRAN (R 3.6.0)
 RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 3.6.0)
 Rcpp           1.0.2    2019-07-25 [1] CRAN (R 3.6.0)
 readr        * 1.3.1    2018-12-21 [1] CRAN (R 3.6.0)
 readxl       * 1.3.1    2019-03-13 [1] CRAN (R 3.6.0)
 rio            0.5.16   2018-11-26 [1] CRAN (R 3.6.0)
 rlang          0.4.2    2019-11-23 [1] CRAN (R 3.6.0)
 rmarkdown      2.1      2020-01-20 [1] CRAN (R 3.6.0)
 ROCR           1.0-7    2015-03-26 [1] CRAN (R 3.6.0)
 rpart          4.1-15   2019-04-12 [1] CRAN (R 3.6.1)
 RSQLite        2.1.2    2019-07-24 [1] CRAN (R 3.6.0)
 rstudioapi     0.10     2019-03-19 [1] CRAN (R 3.6.0)
 rvest          0.3.4    2019-05-15 [1] CRAN (R 3.6.0)
 sandwich       2.5-1    2019-04-06 [1] CRAN (R 3.6.0)
 scales         1.0.0    2018-08-09 [1] CRAN (R 3.6.0)
 sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 3.6.0)
 smbinning      0.9      2019-04-01 [1] CRAN (R 3.6.0)
 sqldf          0.4-11   2017-06-28 [1] CRAN (R 3.6.0)
 stringi        1.4.3    2019-03-12 [1] CRAN (R 3.6.0)
 stringr      * 1.4.0    2019-02-10 [1] CRAN (R 3.6.0)
 survival       3.1-8    2019-12-03 [1] CRAN (R 3.6.0)
 tibble       * 2.1.3    2019-06-06 [1] CRAN (R 3.6.0)
 tidyr        * 1.0.0    2019-09-11 [1] CRAN (R 3.6.0)
 tidyselect     0.2.5    2018-10-11 [1] CRAN (R 3.6.0)
 tidyverse    * 1.2.1    2017-11-14 [1] CRAN (R 3.6.0)
 tinytex        0.16     2019-09-17 [1] CRAN (R 3.6.0)
 TTR            0.23-6   2019-12-15 [1] CRAN (R 3.6.0)
 utf8           1.1.4    2018-05-24 [1] CRAN (R 3.6.0)
 vctrs          0.2.0    2019-07-05 [1] CRAN (R 3.6.0)
 viridisLite    0.3.0    2018-02-01 [1] CRAN (R 3.6.0)
 webshot        0.5.2    2019-11-22 [1] CRAN (R 3.6.0)
 withr          2.1.2    2018-03-15 [1] CRAN (R 3.6.0)
 xfun           0.10     2019-10-01 [1] CRAN (R 3.6.0)
 xml2           1.2.2    2019-08-09 [1] CRAN (R 3.6.0)
 xtable       * 1.8-4    2019-04-21 [1] CRAN (R 3.6.0)
 xts            0.12-0   2020-01-19 [1] CRAN (R 3.6.0)
 yaml           2.2.0    2018-07-25 [1] CRAN (R 3.6.0)
 zeallot        0.1.0    2018-01-28 [1] CRAN (R 3.6.0)
 zip            2.0.4    2019-09-01 [1] CRAN (R 3.6.0)
 zoo            1.8-7    2020-01-10 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Many thanks,
Dimitris

Improvement of automated reports

In an automated report, users need the ability to organize content and change the cover page.
In some cases, the ability to add content should also be considered.

Launches XQuartz

Hi,
Is there a specific reason why dlookr launches XQuartz?
I'm on OS X 10.14, the icon appears on the dock, but beyond that I don't understand why it needs to call XQuartz.

I really like the package, thanks!

saving the result of plot_outlier()

Based on the example for plotting the effect of the outliers of numerical variables, I have defined the function plot_num_outliers() to apply the example to a list of variables named num_vbles of a data frame rnr_df and defined as:

num_vbles <- rnr_df %>% select_at(vars(contains("how_many"), contains("number_of")) %>% names(.)
#plot function for outliers visualization of numerical variables
plot_num_outliers <- function(df, ls_vbles){
df %>%
select(!!ls_vbles) %>%
plot_outlier(diagnose_outlier(.) %>%
filter(outliers_ratio >= 0.5) %>%
select(variables) %>%
unlist())
}
When I try to save the results of a object called plot_outliers running
plot_outliers <- plot_num_outliers(rnr_df, quo(num_vbles))

seeing the result stored in plot_outliers, it gives me

plot_outliers
$variables1
NULL
$variables2
NULL
$variables3
NULL

What is the type of object backed by plot_outlier(), could I modify the aesthetics of the plots and how could I store the plots into a variable?

Troubles with relate() function

I have a data frame df with two categorical variables: var1 and var2. I am following the tutorial to get the contingency table taking var1 as target and var2 as predictor. When I run

target <- target_by(df, var1)
tc <- relate(target, var2)

tc
the latter gives me:
function (.data, predictor)
{
UseMethod("relate", .data)
}
<bytecode: 0x7fa4ad372e40>
<environment: namespace:dlookr>

and running summary(etc) gives me the error:
Error in object[[i]] : object of type 'closure' is not subsettable

I am not understanding what is happening.
I appreciate your help in advance

Relationship between categorical variables and categorical vs numeric ones

It seems there is only the correlation coefficient to understand relationships between variables.
Other than the PPScore I mentioned here, metrics like eta squared for finding associations between a categorical variable vs a numeric one, or Cramér's V to find associations between categorical variables could be introduced.
Then a simple bar plot could visualize the associations between a chosen variable as reference and the other ones, like the following one:

image

Adding Box-Cox plots to plot_normality() and Box-Cox transformation to transform()

Providing the transform() function with the box-cox transformation, finding the best lambda automatically, could be useful too. Also storing the found values in attributes for the dataframe (referring the transformed variable) could be useful to do a reverse transformation later on.
It'd be useful to show also box-cox transformed plots in plot_normality() output.

x axis overlaps labels

When plotting bins, x-axis labels tend to overlap when figures are long.

A good fix would be to allow rotating labels to some degree or disabling scientific notation on plots.

bin <- binning_by(bindf, "target", "predicted")

plot(bin)

plot

plot_outlier() not outputting correct label name

Good day,

Using the guide on both on the readme of the github and the data quality diagnosis vignette, under visualization of outliers using plot_outlier(), the label name does not correctly match with the specified numerical diagnosis when seeing the plot. You'll see that the Outlier Diagnosis Plot has 2 rows and 2 columns. The first column, 2nd row plot should be labeled "Without outliers", but is labeled "with outliers" since we are following,

  • With outliers box plot
  • Without outliers box plot
  • With outliers histogram
  • Without outliers histogram

pic

transformation_report pdf output text overflows page on the right side

when calling transformation_report() if the amount of variables with NAs is sufficient, the text overflows the pdf page.
just added a buch of NAs and a couple of variables to show this:

overflow

maybe the best way is to show a list of variables with a carriage return per variable so every line has a variable and the page limit is not bothering anymore.

Automated Report Error in Korean Language Operating System

When creating an automation report in the Korean language operating system, an error may occur because Korean language cannot be recognized.

The logic for checking Korean is to check that LANG is ko_KO.UTF-8 in the environment variable, but this logic does not work perfectly in some operating systems.

Multicollinearity check

A multicollinearity check using VIF analysis would be really useful too. Once calculated, if you'd return also a vector of feature to remove from the dataset in order to not have multicollinearity would be great.

the result do not contain the minimum value of the sequence.

hello , the below code contain a bug .

library(dlookr)
a <- c(1,1,2, 5,5,5, 7,8,100)
t=binning(a,nbins=2,type="quantile")
print(t)

it will not contain the value "1" ,
In your code, fct <- cut(a, breaks = breaks)
I think we should write fct <- cut(a, breaks = breaks,include.lowest = TRUE)

best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.