Comments (19)
Hi, I think this issue may be closed after #47
I parsed all your example files with the upcoming version of bibtex
, where the C code is replaced by R code and the described issue is not observed anymore. The files are read accodingly:
# PR 47 https://github.com/ropensci/bibtex/pull/47
library(bibtex)
# File 1 ----
f1 <- tempfile("file1", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120203/long_field.txt",
f1
)
ex1 <- read.bib(f1)
ex1
#> Batzill M (2012). "The Surface Science of Graphene: Metal Interfaces,
#> CVD Synthesis, Nanoribbons, Chemical Modifications, and Defects."
#> _SURFACE SCIENCE REPORTS_, *67*(3-4), 83-115. ISSN 0167-5729, doi:
#> 10.1016/j.surfrep.2011.12.001 (URL:
#> https://doi.org/10.1016/j.surfrep.2011.12.001).
# File 2 ----
f2 <- tempfile("file2", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120229/jabref_comment.txt",
f2
)
ex2 <- read.bib(f2)
ex2
#> Gómez RL (2002). "Variability and Detection of Invariant Structure."
#> _Psychological Science_, *13*(5), 431-436. ISSN 0956-7976, 1467-9280,
#> doi: 10.1111/1467-9280.00476 (URL:
#> https://doi.org/10.1111/1467-9280.00476), <URL: 2015-01-20>.
# File 3 -----
f3 <- tempfile("file3", fileext = ".zip")
download.file(
"https://github.com/romainfrancois/bibtex/files/1229495/soil.health_healthy.soil_1to500.bib.zip",
f3
)
unzip(f3, junkpaths = TRUE, exdir = tempdir())
ex3 <- read.bib(
file.path(
tempdir(),
"soil.health_healthy.soil_1to500.bib"
)
)
#> ignoring entry 'ISI:000268383100002' (line 34779) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100003' (line 34853) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100004' (line 34928) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100005' (line 34999) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100006' (line 35080) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100008' (line 35134) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100010' (line 35192) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
length(ex3)
#> [1] 493
# Small sample of entries, since the file has 500 (493 read)
ex3[1:5]
#> FORMAN J (1951). "SOIL, HEALTH, AND THE DENTAL PROFESSION." _JOURNAL OF
#> PROSTHETIC DENTISTRY_, *1*(5), 508-522. ISSN 0022-3913, doi:
#> 10.1016/0022-3913(51)90037-6 (URL:
#> https://doi.org/10.1016/0022-3913(51)90037-6).
#>
#> SHARMA N, MADAN M (1983). "EARTHWORMS FOR SOIL HEALTH AND
#> POLLUTION-CONTROL." _JOURNAL OF SCIENTIFIC \& INDUSTRIAL RESEARCH_,
#> *42*(10), 575-583. ISSN 0022-4456.
#>
#> HABERERN J (1992). "A SOIL HEALTH INDEX." _JOURNAL OF SOIL AND WATER
#> CONSERVATION_, *47*(1), 6. ISSN 0022-4561.
#>
#> [Anonymous] (1993). "THE BREAD CORNER - NO BREAD WITHOUT HEALTHY SOIL."
#> _ALIMENTA_, *32*(3), 45. ISSN 0002-5402.
#>
#> Watts M (1994). "Pesticide residues in food: The views of the Soil \&
#> Health Association of New Zealand." In Savage, GP (ed.), _PROCEEDINGS
#> OF THE NUTRITION SOCIETY OF NEW ZEALAND, VOL 19_, volume 19 number 0
#> series PROCEEDINGS OF THE NUTRITION SOCIETY OF NEW ZEALAND, 58-63. Nutr
#> Soc New Zealand, ANIMAL \& VETERINARY SCI GROUP, LINCOLN UNIVERSITY, PO
#> BOX 84, CANTERBURY, NEW ZEALAND. 29th Annual Conference of the
#> Nutrition-Society-of-New-Zealand, CHRISTCHURCH, NEW ZEALAND, AUG, 1994.
# From gist ----
gist <- tempfile(fileext = ".bib")
download.file(
url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = gist
)
bibtex::read.bib(file = gist)
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA, Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a tropical
#> dry forest in Yucatan." _PloS one_, *8*(9), e73660. ISSN 1932-6203,
#> doi: 10.1371/journal.pone.0073660 (URL:
#> https://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ, Zimmerman JK
#> (2012). "Temporal turnover in the composition of tropical tree
#> communities: functional determinism and phylogenetic stochasticity."
#> _Ecology_, *93*(3), 490-499. ISSN 0012-9658, doi: 10.1890/11-1180.1
#> (URL: https://doi.org/10.1890/11-1180.1), <URL:
#> http://doi.wiley.com/10.1890/11-1180.1>.
Created on 2022-01-17 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 22000)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Spain.1252
#> ctype Spanish_Spain.1252
#> tz Europe/Paris
#> date 2022-01-17
#> pandoc 2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> bibtex * 0.5.0 2022-01-17 [1] local
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1)
#> fansi 1.0.0 2022-01-10 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.1)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.1)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.1)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.1)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.1)
#>
#> [1] C:/Users/diego/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------
from bibtex.
In some of the .bib-files I have encountered the error was caused by a single long field containing > 10000 characters. Also see #14.
from bibtex.
Anything happening here? I have the error as well and would really like to read the references into R.
Or are there any alternatives? I can use scan
to read the file in, x <- scan(file=bibfile, multi.line = TRUE, sep = "\n", what = "character")
followed by a x <- trimws(x)
, but what than?
How could I parse this object?
from bibtex.
Can you prepare a reprex ?
from bibtex.
I am using Python for the task now. I had to adapt the workflow a bit, but now it works; and I am learning some python in parallel.
from bibtex.
@narayanibarve do you still have this problem ? If so can you prepare a reproducible example using the reprex
package.
from bibtex.
Here's a reprex for a case of a long field causing flex
to break:
bibtex::read.bib("long_field.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT
I used the current development version of bibtex
from this repository.
from bibtex.
Similarly, some reference managers (in this case Zotero) add a jabref comment to the bottom of the file, which causes the same error.
bibtex::read.bib("jabref_comment.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT
from bibtex.
Thanks. I'll have a look for the next version
from bibtex.
Just wanted to add to this that I'm having a similar problem reading in the attached .bib file from WoS.
soil.health_healthy.soil_1to500.bib.zip
from bibtex.
This cleans the BibTex comments, for anybody else dealing with this:
### First read file to remove the JabRef comment
cleanFile <- readLines(file.path(queryHitsPath, queryHitsFiles));
### Paste all strings together
cleanFile <- paste(cleanFile, collapse="\n");
### Remove jabref comments
cleanFile <- gsub("(?s)@[Cc]omment\\{jabref-meta:[^\\}]*\\}", "", cleanFile, perl=TRUE);
### Write clean file to disk
writeLines(cleanFile, con=file.path(queryHitsPath, "tmp-clean-file.bib"));
### Import references
queryHits[['1and2']] <- ReadBib(file.path(queryHitsPath, "tmp-clean-file.bib"));
However, for some reason it still fails to import, despite no field having even close to 10K characters in it. So there seem to be other errors, as well. Perhaps simply allowing one to specify a string to parse, and thereby letting people import the files on their own, can be a simple, relatively quick fix? Plus, would add functionality that can more generically be useful, so it wouldn't even be lost functionality once this bug (if it is once :-)) has been resolved :-)
from bibtex.
I'm no closer to solving this, but I remembered I'd actually written 'my own' function to import BibTex files, for a package I'm working on ('metabefor'). It's at https://github.com/Matherion/metabefor/blob/master/R/importBibtex.r, in case anybody's struggling with the same.
from bibtex.
Any news on this?
from bibtex.
Something new on this? I had the same error using both bitex
and RefManageR
packages, and using citr
addin.
My try:
download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
RefManageR::ReadBib(file = "library.bib")
My session info:
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.0.5.9000 Cite_0.1.0 rcrossref_0.8.1.9429
[4] wordcountaddin_0.2.0 citr_0.2.0.9055 pacman_0.4.6
[7] knitr_1.20 picante_1.6-2 nlme_3.1-131
[10] brranching_0.2.0 phytools_0.6-44 maps_3.2.0
[13] data.table_1.10.4-3 flora_0.3.0 readxl_1.0.0
[16] ape_5.0 betapart_1.5.0 forcats_0.3.0
[19] stringr_1.3.0 dplyr_0.7.4 purrr_0.2.4
[22] readr_1.1.1 tidyr_0.8.0 tibble_1.4.2
[25] ggplot2_2.2.1 tidyverse_1.2.1 vegan_2.4-6
[28] lattice_0.20-35 permute_0.9-4 bibtex_0.4.2
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 rprojroot_1.3-2 rstudioapi_0.7
[4] urltools_1.7.0 DT_0.4 mvtnorm_1.0-7
[7] lubridate_1.7.3 RefManageR_0.14.20 xml2_1.2.0
[10] codetools_0.2-15 splines_3.4.3 mnormt_1.5-5
[13] bold_0.5.0 jsonlite_1.5 broom_0.4.3
[16] cluster_2.0.6 compiler_3.4.3 httr_1.3.1
[19] backports_1.1.2 assertthat_0.2.0 Matrix_1.2-12
[22] lazyeval_0.2.1 cli_1.0.0 later_0.7.1
[25] htmltools_0.3.6 tools_3.4.3 bindrcpp_0.2
[28] igraph_1.1.2 coda_0.19-1 gtable_0.2.0
[31] glue_1.2.0 taxize_0.9.3 reshape2_1.4.3
[34] clusterGeneration_1.3.4 fastmatch_1.1-0 Rcpp_0.12.16
[37] msm_1.6.6 cellranger_1.1.0 crul_0.5.2
[40] debugme_1.1.0 iterators_1.0.9 psych_1.7.8
[43] rvest_0.3.2 mime_0.5 miniUI_0.1.1
[46] phangorn_2.4.0 devtools_1.13.5 stringdist_0.9.4.7
[49] MASS_7.3-49 zoo_1.8-1 scales_0.5.0
[52] rcdd_1.2 hms_0.4.2 promises_1.0
[55] parallel_3.4.3 expm_0.999-2 animation_2.5
[58] yaml_2.1.18 curl_3.2 memoise_1.1.0
[61] triebeard_0.3.0 reshape_0.8.7 stringi_1.1.7
[64] foreach_1.4.4 plotrix_3.7 geometry_0.3-6
[67] rlang_0.2.0 pkgconfig_2.0.1 evaluate_0.10.1
[70] bindr_0.1.1 htmlwidgets_1.0 plyr_1.8.4
[73] magrittr_1.5 R6_2.2.2 combinat_0.0-8
[76] whisker_0.3-2 pillar_1.2.1 haven_1.1.1
[79] foreign_0.8-69 withr_2.1.2 mgcv_1.8-23
[82] survival_2.41-3 scatterplot3d_0.3-41 abind_1.4-5
[85] modelr_0.1.1 crayon_1.3.4 rmarkdown_1.9
[88] koRpus_0.10-2 grid_3.4.3 callr_2.0.2
[91] reprex_0.1.2 digest_0.6.15 xtable_1.8-2
[94] httpuv_1.3.6.9007 numDeriv_2016.8-1 munsell_0.4.3
[97] shinyjs_1.0 magic_1.5-8 quadprog_1.5-5
from bibtex.
The funny thing is that the code works using the reprex
addin.
download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> pp. 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA and Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a
#> tropical dry forest in Yucatan." _PloS one_, *8*(9), pp. e73660.
#> ISSN 1932-6203, doi: 10.1371/journal.pone.0073660 (URL:
#> http://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ and
#> Zimmerman JK (2012). "Temporal turnover in the composition of
#> tropical tree communities: functional determinism and phylogenetic
#> stochasticity." _Ecology_, *93*(3), pp. 490-499. ISSN 0012-9658,
#> doi: 10.1890/11-1180.1 (URL: http://doi.org/10.1890/11-1180.1),
#> <URL: http://doi.wiley.com/10.1890/11-1180.1>.
RefManageR::ReadBib(file = "library.bib")
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> [1] J. O. López-Mart\'inez, L. Sanaphre-Villanueva, J. M. Dupuy,
#> et al. "$\beta$-Diversity of functional groups of woody plants in
#> a tropical dry forest in Yucatan.". In: _PloS one_ 8.9 (Jan.
#> 2013), p. e73660. ISSN: 1932-6203. DOI:
#> 10.1371/journal.pone.0073660. <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> [2] N. G. Swenson, J. C. Stegen, S. J. Davies, et al. "Temporal
#> turnover in the composition of tropical tree communities:
#> functional determinism and phylogenetic stochasticity". In:
#> _Ecology_ 93.3 (Mar. 2012), pp. 490-499. ISSN: 0012-9658. DOI:
#> 10.1890/11-1180.1. <URL: http://doi.wiley.com/10.1890/11-1180.1>.
#>
#> [3] M. Vellend. "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?". In: _Journal of Vegetation Science_
#> 12 (2001), pp. 545-552.
from bibtex.
I've been reading bib files with readFiles
in the bibliometrix package.
from bibtex.
Hi,
I am using citr and Rmarkdown with Zotero. I partially got around this problem with crsh's suggestion of omitting abstract, but some bibtex entries have 500/1000+ author names, that reproduces the problem.
Any suggestions, has anyone come around with a solution to this?
from bibtex.
I have the same problem with Rmarkdown and citr. Any suggested solution for this please ?
from bibtex.
I am having this issue for parsing a long list of authors too. Any progress?
from bibtex.
Related Issues (20)
- caught segfault read.bib() - macOS 10.14.6 HOT 6
- merge changes from Brian Ripley HOT 1
- ASCII turned into non-ASCII HOT 3
- Orphaned on CRAN HOT 16
- rchk issues HOT 2
- Difficulty loading bibtex in R Studio
- Parse single entry from string HOT 4
- GSOC 2021 R project HOT 2
- DONT WRITE BACK TO YOUR BIBTEX-FILE: custom fields are imported with column-names that includes the values in the fields...!! HOT 5
- write.bib chooses the wrong citation, and doesn't warn that there was an option HOT 3
- Development environment of contributors? HOT 1
- Unable to recover after encountering two consecutive TOKEN_LBRACE "{"
- `write.bib` does not write UTF-8 characters properly HOT 1
- Proposal: Improving the package HOT 6
- oldrel testthat snapshot differences
- Issue with "\\}$" HOT 2
- Commas added to references when using bibtex in rmarkdown HOT 2
- Replace as.personList(authors) with do.call(c, authors)
- Importing bibtex to Zotero classifies citation as "Book" HOT 2
- Direct import into EndNote is not possible HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bibtex.