Comments (2)
Unfortunately, the last fix does not seem to be enough:
> dat3 = readARFF("datafile1.arff")
Parse with reader=readr : datafile1.arff
Loading required package: readr
Warnung: 1 parsing failure.
row # A tibble: 1 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1 NA 1 columns 759 columns '/var/folders/t5/8s0vv3w545v7x5j0_pqtc8wr0000gp/T//Rtmpcmkgef/file475535af75d5' file # A tibble: 1 x 5
header: 114.905000; preproc: 0.504000; data: 0.845000; postproc: 0.096000; total: 116.350000
Warnmeldungen:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
number of columns of result is not a multiple of vector length (arg 2)
> all.equal(dat1, dat3)
[1] "Attributes: < Names: 1 string mismatch >"
[2] "Attributes: < Length mismatch: comparison on first 2 components >"
[3] "Attributes: < Component 2: Modes: numeric, list >"
[4] "Attributes: < Component 2: Lengths: 2000000, 5 >"
[5] "Attributes: < Component 2: names for current but not for target >"
[6] "Attributes: < Component 2: Attributes: < Ziel ist NULL, aktuell ist list > >"
[7] "Attributes: < Component 2: target is numeric, current is tbl_df >"
[8] "Component “huge_factor”: Lengths: 2000000, 2000002"
[9] "Component “huge_factor”: Lengths (2000000, 2000002) differ (string compare on first 2000000)"
[10] "Component “huge_factor”: 'is.NA' value mismatch: 2 in current 0 in target"
Now, a dataframe is returned but the number of rows do not match.
It seems like two empty rows are added at the beginning of the dataframe:
> dim(dat1)
[1] 2000000 1
> dim(dat3)
[1] 2000002 1
>
> head(dat1)
huge_factor
1 6GwiqtKZwCEVtO4wpTeqK58HKKsgMc
2 9jc6lV3by0tkHv8UUBtv1p30baKu6z
3 rpF65yg5DY3sHk5mnRbWKVHR03lA3S
4 8uZpJsDm7WI13zFYoUD6obcLeG0I1Z
5 KZti0i9paE3iB0umaC46x1pN3GPzQ7
6 7xfDZa1ug3we4cKNmE5p6JwUZwdmSg
>
> head(dat3)
huge_factor
1 <NA>
2 <NA>
3 6GwiqtKZwCEVtO4wpTeqK58HKKsgMc
4 9jc6lV3by0tkHv8UUBtv1p30baKu6z
5 rpF65yg5DY3sHk5mnRbWKVHR03lA3S
6 8uZpJsDm7WI13zFYoUD6obcLeG0I1Z
from farff.
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
system x86_64, darwin15.6.0
ui RStudio (1.1.456)
language (EN)
collate de_DE.UTF-8
tz Europe/Berlin
date 2018-11-20
Packages --------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
base * 3.5.1 2018-07-05 local
BBmisc 1.11 2018-11-07 Github (berndbischl/BBmisc@a5a4e45)
checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
cli 1.0.1 2018-09-25 CRAN (R 3.5.0)
compiler 3.5.1 2018-07-05 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
data.table 1.11.8 2018-09-30 CRAN (R 3.5.0)
datasets * 3.5.1 2018-07-05 local
devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
digest 0.6.18 2018-10-10 CRAN (R 3.5.0)
fansi 0.4.0 2018-10-05 CRAN (R 3.5.0)
farff * 1.0 2018-11-20 Github (mlr-org/farff@8221efb)
graphics * 3.5.1 2018-07-05 local
grDevices * 3.5.1 2018-07-05 local
hms 0.4.2 2018-03-10 CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
methods * 3.5.1 2018-07-05 local
pillar 1.3.0 2018-07-14 CRAN (R 3.5.0)
pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.0)
R6 2.3.0 2018-10-04 CRAN (R 3.5.0)
Rcpp 0.12.19 2018-10-01 CRAN (R 3.5.0)
readr * 1.1.1 2017-05-16 CRAN (R 3.5.0)
rlang 0.3.0.1 2018-10-25 cran (@0.3.0.1)
rstudioapi 0.8 2018-10-02 CRAN (R 3.5.0)
stats * 3.5.1 2018-07-05 local
stringi * 1.2.4 2018-07-20 CRAN (R 3.5.0)
tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
tools 3.5.1 2018-07-05 local
utf8 1.1.4 2018-05-24 CRAN (R 3.5.0)
utils * 3.5.1 2018-07-05 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
yaml 2.2.0 2018-07-25 CRAN (R 3.5.0)
from farff.
Related Issues (20)
- ISO_8601_to_POSIX_datetime_format in parseHeader HOT 1
- List of OML files that dont work HOT 3
- columns containing question marks HOT 4
- currently we cannot parse multi-instance data HOT 2
- factors with levels T,F are converted to logical (like RWeka) HOT 6
- Some OpenML datasets can't be parsed HOT 2
- Error parsing file HOT 6
- add jenkins test to download runs HOT 1
- writeARFF should have overwrite = TRUE HOT 3
- Encoding bug? HOT 3
- too long lines in preproc_readr c code
- write little blog post on mlr blog on farrf with mini speed test
- RWeka != farff when a feature contains only FALSE HOT 1
- leading tilde in path -> segfault
- Instance weighting not supported HOT 1
- Path expansion with ~ generates segfault HOT 1
- parseHeader is very slow and should be rewritten in C++
- the preprocessing buffer code is VERY bad
- Support parsing strings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from farff.