This repository has been archived. The former README is now in README-NOT.md.
ropensci / bib2df Goto Github PK
View Code? Open in Web Editor NEWParse a BibTeX file to a tibble
Home Page: https://docs.ropensci.org/bib2df
Parse a BibTeX file to a tibble
Home Page: https://docs.ropensci.org/bib2df
This repository has been archived. The former README is now in README-NOT.md.
I reinstalled R recently and was reinstalling my packages but get this when I attempted to install bib2df
.
Warning in install.packages :
package ‘bib2df’ is not available (for R version 3.5.0)
The README indicates that v1.0.0 is on CRAN?
when reading bib as a data.frame something like this {\\'{e}}
can be seen as é? etc.
Related post: https://superuser.com/questions/1560971/reading-latex-accents-of-bib-in-r
There has been quite some work done since the last release which also dates back over two years.
Is there anything particular holding back a new version?
Using bib2df without specifying separate_names
results in a warning message. Set default to FALSE with the option to set to TRUE.
When using bib2df (version 1.1.1) bib2df:::bib2df_gather(bib = myBib)
I get the following warning:
Warning (test-occCitePrint.R:19:3): regular print
as_data_frame()
was deprecated in tibble 2.0.0.
Please use as_tibble()
instead.
The signature and semantics have changed, see ?as_tibble
.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings()
to see where this warning was generated.
It looks like this could be fixed by replacing dat <- as_data_frame(dat)
with dat <- as_tibble(dat)
.
Hi, I've run into a problem reading in references where abstract field contains an equals symbol = the preceeding abstract text is read in as a column header.
e.g.
"high genetic differentiation (F st = 0.043"
ends up as a new column header "HIGH.GENETIC.DIFFERENTIATION..F.ST"
My example bibtex file has an entry of an edited volume, for which there is an editor field but not an author field.
@book{DeBruijn2011,
address = {Hoboken, NJ, USA},
booktitle = {Handb. Mol. Microb. Ecol. II Metagenomics Differ. Habitats},
doi = {10.1002/9781118010549},
editor = {de Bruijn, Frans J.},
file = {:home/michael/articles/Unknown - 2011 - Handbook of Molecular Microbial Ecology II.pdf:pdf},
isbn = {9781118010549},
month = {sep},
publisher = {John Wiley {\&} Sons, Inc.},
title = {{Handbook of Molecular Microbial Ecology II}},
url = {http://doi.wiley.com/10.1002/9781118010549},
year = {2011}
}
When I read and write the bibtex file with separate_names
, the editor field is dropped, and an author field with "," is added. But it works ok without separate_names. If I run
tb <- bib2df::bib2df("/tmp/library.bib", separate_names = TRUE)
bib2df::df2bib(tb, "/tmp/library-1.bib")
tb <- bib2df::bib2df("/tmp/library.bib", separate_names = FALSE)
bib2df::df2bib(tb, "/tmp/library-2.bib")
Then in library-1.bib
I get
@Book{DeBruijn2011,
Address = {Hoboken, NJ, USA},
Author = {,},
Booktitle = {Handb. Mol. Microb. Ecol. II Metagenomics Differ. Habitats},
Month = {sep},
Publisher = {John Wiley {\&} Sons, Inc.},
Title = {{Handbook of Molecular Microbial Ecology II}},
Year = {2011},
Doi = {10.1002/9781118010549},
File = {:home/michael/articles/Unknown - 2011 - Handbook of Molecular Microbial Ecology II.pdf:pdf},
Url = {http://doi.wiley.com/10.1002/9781118010549},
Isbn = {9781118010549}
}
and in library-2.bib,
@Book{DeBruijn2011,
Address = {Hoboken, NJ, USA},
Booktitle = {Handb. Mol. Microb. Ecol. II Metagenomics Differ. Habitats},
Editor = {de Bruijn, Frans J.},
Month = {sep},
Publisher = {John Wiley {\&} Sons, Inc.},
Title = {{Handbook of Molecular Microbial Ecology II}},
Year = {2011},
Doi = {10.1002/9781118010549},
File = {:home/michael/articles/Unknown - 2011 - Handbook of Molecular Microbial Ecology II.pdf:pdf},
Url = {http://doi.wiley.com/10.1002/9781118010549},
Isbn = {9781118010549}
}
I tried to parse .bib files exported from scopus today but ended up with a total mess of column names (see below).
bib_string <- "@ARTICLE{Brulc20091948,
author={Brulc, J.M. and Antonopoulos, D.A. and Berg Miller, M.E. and Wilson, M.K. and Yannarell, A.C. and Dinsdale, E.A. and Edwards, R.E. and Frank, E.D. and Emerson, J.B. and Wacklin, P. and Coutinho, P.M. and Henrissat, B. and Nelson, K.E. and White, B.A.},
title={Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases},
journal={Proceedings of the National Academy of Sciences of the United States of America},
year={2009},
doi={10.1073/pnas.0806191105},
url={https://www.scopus.com/inward/record.uri?eid=2-s2.0-60549114321&doi=10.1073%2fpnas.0806191105&partnerID=40&md5=8d70a27545328d4cbb538bdb4757335b},
affiliation={Department of Animal Sciences, University of Illinois, Urbana, IL 61801, United States; Institute for Genomics and Systems Biology, Argonne National Laboratory, Argonne, IL 60439, United States; Department of Biology, San Diego State University, San Diego, CA 92813, United States; School of Biological Sciences, Flinders University, Adelaide, SA 5001, Australia; Center for Microbial Sciences, San Diego State University, San Diego, CA 92813, United States; Department of Computer Sciences, San Diego State University, San Diego, CA 92813, United States; Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, United States; J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD 20850, United States; Architecture et Fonction des Macromolecules Biologiques, Unité Mixte de Recherche 6098, Universites Aix-Marseille I and II, Case 932, 163 Avenue de Luminy, 13288 Marseille, France; Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, United States},
abstract={The complex microbiome of the rumen functions as an effective system for the conversion of plant cell wall biomass to microbial protein, short chain fatty acids, and gases. As such, it provides a unique genetic resource for plant cell wall degrading microbial enzymes that could be used in the production of biofuels. The rumen and gastrointestinal tract harbor a dense and complex microbiome. To gain a greater understanding of the ecology and metabolic potential of this microbiome, we used comparative metagenomics (phylotype analysis and SEED subsystems-based annotations) to examine randomly sampled pyrosequence data from 3 fiber-adherent microbiomes and 1 pooled liquid sample (a mixture of the liquid microbiome fractions from the same bovine rumens). Even though the 3 animals were fed the same diet, the community structure, predicted phylotype, and metabolic potentials in the rumen were markedly different with respect to nutrient utilization. A comparison of the glycoside hydrolase and cellulosome functional genes revealed that in the rumen microbiome, initial colonization of fiber appears to be by organisms possessing enzymes that attack the easily available side chains of complex plant polysaccharides and not the more recalcitrant main chains, especially cellulose. Furthermore, when compared with the termite hindgut microbiome, there are fundamental differences in the glycoside hydrolase content that appear to be diet driven for either the bovine rumen (forages and legumes) or the termite hindgut (wood). © 2009 by The National Academy of Sciences of the USA.},
author_keywords={CAZymes; Cellulases; Plant cell wall; Pyrosequencing},
Isoptera},
document_type={Article},
source={Scopus},
}"
fil <- tempfile("data")
write(bib_string, fil)
bib2df::bib2df(fil)
#> Column `YEAR` contains character strings.
#> No coercion to numeric applied.
#> # A tibble: 1 x 37
#> CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF
#> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr>
#> 1 ARTICLE Brulc200~ <NA> <NA> <chr ~ <NA> <NA> <NA>
#> # ... with 29 more variables: EDITION <chr>, EDITOR <list>,
#> # HOWPUBLISHED <chr>, INSTITUTION <chr>, JOURNAL <chr>, KEY <chr>,
#> # MONTH <chr>, NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>,
#> # PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>,
#> # TYPE <chr>, VOLUME <chr>, YEAR <chr>, AUTHOR..BRULC. <chr>,
#> # TITLE..GENE.CENTRIC <chr>, JOURNAL..PROCEEDINGS <chr>,
#> # YEAR..2009.. <chr>, DOI..10.1073.PNAS.0806191105.. <chr>,
#> # URL..HTTPS...WWW.SCOPUS.COM.INWARD.RECORD.URI.EID.2.S2.0.60549114321.DOI.10.1073.2FPNAS.0806191105.PARTNERID.40.MD5.8D70A27545328D4CBB538BDB4757335B.. <chr>,
#> # AFFILIATION..DEPARTMENT <chr>, ABSTRACT..THE <chr>,
#> # AUTHOR_KEYWORDS..CAZYMES. <chr>, DOCUMENT_TYPE..ARTICLE.. <chr>,
#> # SOURCE..SCOPUS.. <chr>
Created on 2019-08-06 by the reprex package (v0.3.0)
This might help: https://cran.r-project.org/web/packages/humaniformat/index.html
Hi,
I found a problem parsing .bib
file with quotes instead of curly brackets:
Here is a file
@BOOK{hoehlig97,
author = "Monika Hoehlig",
title = "Kontaktbedingter {S}prachwandel in der adygeischen {U}mgangssprache im {K}aukasus und in der {T}uerkei",
series = "LINCOM Studies in Caucasian Linguistics 03",
year = "1997",
publisher = "Lincom GmbH",
address = "Muenchen",
}
Here is the code:
bib2df("test.bib") %>%
unlist() %>%
na.omit() %>%
View()
Here is the result:
CATEGORY BOOK
BIBTEXKEY hoehlig97
ADDRESS Muenchen",
AUTHOR Monika Hoehlig",
PUBLISHER Lincom GmbH",
SERIES LINCOM Studies in Caucasian Linguistics 03",
TITLE Kontaktbedingter {S}prachwandel in der adygeischen {U}mgangssprache im {K}aukasus und in der {T}uerkei",
YEAR 1997",
As you see the problem is in the final ",
.
I'm using bib2df
v. 1.1.1
Whenever I want to import an scopus export, the resulting dataframe is completely messed up and has thousands of columns. Apparently, this should be fixed after #33 or #34 , but I'm afraid it is not.
Steps:
devtools::install_github("ropensci/bib2df")
- 28th November 2021)testbib <- bib2df::bib2df("<attached file>")
Result:
testbib
# A tibble: 307 × 55
CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL KEY MONTH
<chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <chr>
1 ARTICLE Köpper20… NA NA <chr … NA NA NA NA <chr … NA NA Resear… NA NA
2 CONFERENCE Manfredi… NA NA <chr … NA NA NA NA <chr … NA NA IOP Co… NA NA
3 ARTICLE Avdikos2… NA NA <chr … NA NA NA NA <chr … NA NA Geofor… NA NA
4 ARTICLE Petrescu… NA NA <chr … NA NA NA NA <chr … NA NA Enviro… NA NA
5 ARTICLE Parikh20… NA NA <chr … NA NA NA NA <chr … NA NA Enviro… NA NA
6 ARTICLE Dekeyser… NA NA <chr … NA NA NA NA <chr … NA NA Enviro… NA NA
7 BOOK Stuber20… NA NA <chr … NA NA NA NA <chr … NA NA Balanc… NA NA
8 ARTICLE Wang2021… NA NA <chr … NA NA NA NA <chr … NA NA Americ… NA NA
9 ARTICLE Marino20… NA NA <chr … NA NA NA NA <chr … NA NA Territ… NA NA
10 ARTICLE Sardeshp… NA NA <chr … NA NA NA NA <chr … NA NA Cities NA NA
# … with 297 more rows, and 40 more variables: NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>, PAGES <chr>, PUBLISHER <chr>,
# SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <dbl>, DOI <chr>, URL <chr>, AFFILIATION <chr>,
# ABSTRACT <chr>, AUTHOR_KEYWORDS <chr>, REFERENCES <chr>, ISSN <chr>, LANGUAGE <chr>, ABBREV_SOURCE_TITLE <chr>,
# DOCUMENT_TYPE <chr>, SOURCE <chr>, ART_NUMBER <chr>, KEYWORDS <chr>, FUNDING_DETAILS <chr>, FUNDING_TEXT <chr>,
# CORRESPONDENCE_ADDRESS1 <chr>, SPONSORS <chr>, FUNDING_TEXT.1 <chr>, FUNDING_DETAILS.1 <chr>, FUNDING_DETAILS.2 <chr>,
# ISBN <chr>, FUNDING_DETAILS.3 <chr>, FUNDING_DETAILS.4 <chr>, FUNDING_TEXT.2 <chr>, CODEN <chr>, FUNDING_DETAILS.5 <chr>,
# PUBMED_ID <chr>, PAGE_COUNT <chr>, CHEMICALS_CAS <chr>
sessioninfo:
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User - Plasma 25th Anniversary Edition
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 rstudioapi_0.13 magrittr_2.0.1 tidyselect_1.1.1 R6_2.5.1 rlang_0.4.12
[7] fansi_0.5.0 stringr_1.4.0 httr_1.4.2 dplyr_1.0.7 tools_4.1.2 humaniformat_0.6.0
[13] utf8_1.2.2 cli_3.1.0 DBI_1.1.1 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.5
[19] lifecycle_1.0.1 crayon_1.4.2 purrr_0.3.4 vctrs_0.3.8 glue_1.4.2 stringi_1.7.5
[25] compiler_4.1.2 pillar_1.6.4 generics_0.1.1 renv_0.13.2 bib2df_1.1.2 pkgconfig_2.0.3
These are:
Also, remove documentation for these functions.
It might be useful to be able to write the imported data frame back to a data frame. This would enable users to do processing in R to programmatically modify a bibtex database.
%\VignetteIndexEntry{}
humaniformat
packageI use UTF-8 characters in a bibliography, which bib2df
doesn't seem to import properly. A way to fix this issue is to add an additional argument to the bib2df
function that allows users to specify the encoding of the .bib file (default: encoding = "unknown"
), and to feed that argument to the readLines
call inside the bib2df_read
function. This way I could import the bibliography with bib2df("references.bib", encoding = "UTF-8")
.
I'd be happy to make a pull request. Thanks for this useful package!
I have difficulties reading some of my reference files. R markdown shows there is an error in the UTF-8 encoding of my BibTex files. Please add UTF-8 encoding. This would be great!
Hi, thanks for the great package.
I have found an edge-case that causes an issue with writing valid .bib files with df2bib. Specifically, I had a .bib entry that had 2 DOI values, like the following:
@Article{test2022,
doi = {DOI_PLACEHOLDER},
doi = {DOI_PLACEHOLDER2}
}
parsing this with bib2df, and then rewriting it with df2bib, then results in something like:
@Article{test2022,
doi = {DOI_PLACEHOLDER},
doi.1 = {DOI_PLACEHOLDER2}
}
However, '.' is not a valid character for a variable name for .bib files (I think; at least had issues knitting a .rmd).
I suspect this would be an easy fix, where you could substitute '.' in variable names with '_' or something?
Cheers
Some regex bugs exist in bib2df_gather
, e.g.:
cat('@Article{mykey,
Author = {me},
Title = {{FOO} bar {bAZ}},
Year = {2011}
}
', file=f <- tempfile())
bib <- bib2df::bib2df(f)
bib$TITLE
#> [1] "FOO} bar {bAZ"
Created on 2019-11-13 by the reprex package (v0.3.0.9000)
sessioninfo::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os macOS Mojave 10.14.3
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2019-11-13
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0)
#> bib2df 1.1.1 2019-11-13 [1] Github (ROpenSci/bib2df@e151772)
#> cli 1.1.0 2019-03-19 [2] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.0)
#> digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.0)
#> dplyr 0.8.3 2019-07-04 [2] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [2] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [2] CRAN (R 3.6.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> httr 1.4.1 2019-08-05 [2] CRAN (R 3.6.0)
#> humaniformat 0.6.0 2016-04-24 [1] CRAN (R 3.6.0)
#> knitr 1.25 2019-09-18 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [2] CRAN (R 3.6.0)
#> pillar 1.4.2 2019-06-29 [2] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.0 2019-02-14 [2] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
#> rlang 0.4.1 2019-10-24 [1] CRAN (R 3.6.0)
#> rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [2] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 3.6.0)
#> tibble 2.1.3 2019-06-06 [2] CRAN (R 3.6.0)
#> tidyselect 0.2.5 2018-10-11 [2] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [2] CRAN (R 3.6.0)
#> xfun 0.10 2019-10-01 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [2] CRAN (R 3.6.0)
#>
#> [1] /Users/jbau/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
I've run into a few problems parsing my .bib files with this. First, parsing fails for any field that has an @
anywhere in it (e.g., as part of an email address). Second, it fails for multi-line fields (like the annote
fields that are auto-exported by Mendeley). Neither of these causes bibtex itself to complain so they are at least de facto supported. Fixing the first probably isn't so difficult, but fixing the second might be more challenging given how the reading function works (need a pass to combine lines with un-terminated strings/fields).
You probably have already seen this, but if not: I'm getting:
Warning messages:
1: as_data_frame()
was deprecated in tibble 2.0.0.
ℹ Please use as_tibble()
(with slightly different semantics) to convert to a tibble, or
as.data.frame()
to convert to a data frame.
ℹ The deprecated feature was likely used in the bib2df package.
Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings()
to see where this warning was generated.
2: In bib2df_tidy(bib, separate_names) : NAs introduced by coercion
bib2df::bib2df() fails to load fields when the field separator (",") is preceded by a newline, as in the following example:
@article{SHBP
,title = "Efficient DC Analysis of RVJ Circuits for Moment and Derivative Commutations of Interconnect Networks"
,author = " S. H. Batterywala and H. Narayanan "
,journal = "12th International Conference on VLSI Design"
,pages = "169-174"
,year = 1999
}
reprex:
f <- tempfile()
download.file('https://www.ee.iitb.ac.in/~trivedi/LatexHelp/Docs/ref.bib', f)
bib2df::bib2df(f)
With version 1.1.1 it loads in new columns "X.≪fieldname≫":
# A tibble: 9 × 41
CATEGORY BIBTE…¹ ADDRESS ANNOTE AUTHOR BOOKT…² CHAPTER CROSS…³ EDITION EDITOR HOWPU…⁴ INSTI…⁵ JOURNAL KEY MONTH NOTE NUMBER ORGAN…⁶
<chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 ARTICLE SHBP NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
2 ARTICLE SIE NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
3 BOOK HN NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
4 BOOK DON NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
5 MASTERSTHE… GAK NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
6 MASTERSTHE… GT NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
7 MASTERSTHE… NJB NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
8 MANUAL PVM NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
9 MISC PVMS NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
# … with 23 more variables: PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <chr>,
# X.TITLE <chr>, X.AUTHOR <chr>, X.JOURNAL <chr>, X.PAGES <chr>, X.YEAR <chr>, X.VOLUME <chr>, X.NUMBER <chr>, X.PUBLISHER <chr>,
# X.MONTH <chr>, X.SCHOOL <chr>, X.ORGANIZATION <chr>, X.ADDRESS <chr>, X.NOTE <chr>, X.KEY <chr>, X.HOWPUBLISHED <chr>, and abbreviated
# variable names ¹BIBTEXKEY, ²BOOKTITLE, ³CROSSREF, ⁴HOWPUBLISHED, ⁵INSTITUTION, ⁶ORGANIZATION
With version 1.1.2 it doesn't load at all (all values are either NA, character(0) or an empty string):
# A tibble: 9 × 26
CATEGORY BIBTE…¹ ADDRESS ANNOTE AUTHOR BOOKT…² CHAPTER CROSS…³ EDITION EDITOR HOWPU…⁴ INSTI…⁵ JOURNAL KEY MONTH NOTE NUMBER ORGAN…⁶
<chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 ARTICLE SHBP NA NA <chr> NA NA NA NA <chr> NA NA "" NA NA NA NA NA
2 ARTICLE SIE NA NA <chr> NA NA NA NA <chr> NA NA "" NA NA NA "" NA
3 BOOK HN NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
4 BOOK DON NA NA <chr> NA NA NA NA <chr> NA NA NA NA NA NA NA NA
5 MASTERSTHE… GAK NA NA <chr> NA NA NA NA <chr> NA NA NA NA "" NA NA NA
6 MASTERSTHE… GT NA NA <chr> NA NA NA NA <chr> NA NA NA NA "" NA NA NA
7 MASTERSTHE… NJB NA NA <chr> NA NA NA NA <chr> NA NA NA NA "" NA NA NA
8 MANUAL PVM "" NA <chr> NA NA NA NA <chr> NA NA NA NA "" "" NA ""
9 MISC PVMS NA NA <chr> NA NA NA NA <chr> "" NA NA "" NA NA NA NA
# … with 8 more variables: PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <chr>,
# and abbreviated variable names ¹BIBTEXKEY, ²BOOKTITLE, ³CROSSREF, ⁴HOWPUBLISHED, ⁵INSTITUTION, ⁶ORGANIZATION
I am not sure how common this is (probably not at all), but this did happen on the first example .bib I found online and it seems like a basic parsing error.
plyr
is no longer actively developed, and since dplyr
is already imported and has bind_rows
.
I encountered issues when parsing a file with entries like e.g.
@Article{RJournal:2011-1:Cook,
author = {Dianne Cook},
title = {Tips for Presenting Your Work},
journal = {The R Journal},
year = 2011,
volume = 3,
number = 1,
pages = {72--74},
month = jun,
url = {http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Cook.pdf}
}
Fields like year
, volume
were NA in the final table.
I solved this rather not elegantly in https://github.com/masalmon/bib2df/commit/5bbf89d4c168eaddcc7c43ad7f3e300f9101400e (I wasn't able to find something with str_extract
).
I guess my new code is not usable because I don't use str_extract
(if I had I would have done a PR), do you have an idea how to solve this issue for all users?
When reading a .bib file exported from an ORCID profile (Export works), bib2df() have some problems parsing it.
The same file can be imported in zotero without problems.
See attached bib file: works_G.zip
Hi,
I create my bibtex files from Zotero.
For some reason (that is unclear to me) Zotero puts all capitalized words in brackets in the title object.
When I use bib2df, if the last word in the title has brackets then the final brackets are ignored, which causes parsing problems later when I want to use df2bib (for example).
e.g.
bibtex:
@book{patz_grammar_2002,
address = {Canberra},
series = {Pacific linguistics},
title = {A grammar of the {Kuku} {Yalanji} language of north {Queensland}},
isbn = {978-0-85883-534-4},
number = {527},
publisher = {Research School of Pacific and Asian Studies, Australian National University},
author = {Patz, Elisabeth},
collaborator = {{Australian National University}},
year = {2002},
note = {OCLC: ocm51721900},
keywords = {CS, PN5, kinbank, kuku1273},
file = {Patz_2002_A grammar of the Kuku Yalanji language of north Queensland.pdf:files/2186/Patz_2002_A grammar of the Kuku Yalanji language of north Queensland.pdf:application/pdf}
}
and the title object after reading in with bib2df:
zotero_subset[1,]$TITLE
"A grammar of the {Kuku} {Yalanji} language of north {Queensland"
Here we see two end brackets have been removed from the title (rather than one).
And the bibtex object after using df2bib (with emphasis on the problem)
@Book{patz_grammar_2002,
Address = {Canberra},
Author = {Patz, Elisabeth},
Note = {OCLC: ocm51721900},
Number = {527},
Publisher = {Research School of Pacific and Asian Studies, Australian National University},
Series = {Pacific linguistics},
Title = **{A grammar of the {Kuku} {Yalanji} language of north {Queensland},**
Year = {2002},
File = {Patz_2002_A grammar of the Kuku Yalanji language of north Queensland.pdf:files/2186/Patz_2002_A grammar of the Kuku Yalanji language of north Queensland.pdf:application/pdf},
Isbn = {978-0-85883-534-4},
Collaborator = {Australian National University},
Keywords = {CS, PN5, kinbank, kuku1273},
sourceid = {PN5}
}
From this, it looks like the problem lies in bib2df (rather than df2bib), in that it finds the end of the line as any number of end brackets (and removes them) rather than a single bracket.
I find that bib2df does not correctly parse when field names are "title", "author" etc. Has anyone else faced this problem?
I've discovered that when I have an entry like this:
@book{fassberg2019modern,
title = {Languages of the Eastern Section: Great Lakes to Indian Ocean},
author={Fassberg, Steven E},
lgcode={west2763},
hhtype={overview},
pages={632652},
year={2019},
publisher={Routledge}
}
I get a table that looks like this from bib2df::bib2df()
CATEGORY | BIBTEXKEY | ADDRESS | ANNOTE | AUTHOR | BOOKTITLE | CHAPTER | CROSSREF | EDITION | EDITOR | HOWPUBLISHED | INSTITUTION | JOURNAL | KEY | MONTH | NOTE | NUMBER | ORGANIZATION | PAGES | PUBLISHER | SCHOOL | SERIES | TITLE | TYPE | VOLUME | YEAR | AUTHOR..FASSBERG. | LGCODE..WEST2763.. | HHTYPE..OVERVIEW.. | PAGES..632652.. | YEAR..2019.. | PUBLISHER..ROUTLEDGE. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOK | fassberg2019modern | Languages of the Eastern Section: Great Lakes to Indian Ocean | Fassberg, Steven E | west2763 | overview | 632652 | 2019 | Routledge |
I've isolated the problem down to the lack of whitespaces before and after the equal sign at the field assignment. It's an easy fix, I basically just inserted whitespaces before and after every equal sign before a curly bracket, but it was a bit frustrating to debug. Can this be included in the documentation, or fixed?
lintr::lint_package()
bib2df-package
help fileNEWS.md
It looks like the package was removed from CRAN for some reason. Is there a plan to resubmit the package in the near future?
I receive the following warning indicating that the package is out of date:
Warning message:
`as_data_frame()` was deprecated in tibble 2.0.0.
Please use `as_tibble()` instead.
The signature and semantics have changed, see `?as_tibble`.
Everything appears to work, but a simple update should resolve the warning?
In the following BibTeX file:
@phdthesis{Yang2011Lalo,
author = {Yang, Cathryn},
address = {Bundoora},
language = {English},
school = {La Trobe University},
shorttitle = {Lalo regional varieties},
title = {Lalo regional varieties: {Phylogeny}, dialectometry and sociolinguistics},
type = {{PhD} dissertation},
year = {2011}
}
... the field type
gets parsed to PhD} dissertation
(i.e. the first curly brace protecting the casing in 'PhD' gets eaten.
The culprit is this statement in bib_gather
. I'm not quite sure what this regex is doing, so I don't want to fiddle with it to fix it.
This is not an academic spam I promise. 😄
Have you considered submitting this package to rOpenSci onboarding process? More info here + I can answer any question.
In brief, onboarding is an open review process, with often two reviewers having a look at the package according to the guidelines. I'm a co-editor now so I might sound biased but I've submitted a few packages before that and I've really learned a ton & improved the packages. Onboarded packages then live in the ropensci organization on Github but you're still the maintainer and keep admin rights to the repository. Your package seems to fit in the data extraction category.
And obviously no problem if you prefer not to submit bib2df
!
Hello -
When I parse the attached .bib file, the article TITLE is truncated. How can I import the entire TITLE?
Thanks!
bib <- bib2df("Desktop/monitoring_library.txt") %>%
select(TITLE)
bib <- structure(list(TITLE = c("Exploring perspectives, preferences and needs of a",
"Short-Term} Postpartum Blood Pressure {Self-Management} and",
"Blood pressure after {PREeclampsia/HELLP} by {SELF} monitoring",
"Pregnancy outcomes following home blood pressure monitoring in",
"A randomised controlled trial of blood pressure self-monitoring"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
Spell check the whole package + vignette
Hi Philipp,
I was reading the documentation for df2bib
this evening and think perhaps you have a mistake?
Line 3 of df2bib.R has.
#' @param x \code{tibble}, returned by \code{\link{df2bib}}.
Should it not be?
#' @param x \code{tibble}, returned by \code{\link{bib2df}}.
i'm seeing the bib2df() function lose or skip the first bibtex entry in the files I'm using v1.1.1
example file attached, rename to .bib, has 100 records
17 water and demand.txt
library(bib2df)
path2file <- "17 water and demand.bib"
bib <- bib2df(file=path2file, separate_names = FALSE)
nrow(bib)
[1] 99
Increase code coverage, especially for df2bib.R
and bib2df_tidy.R
.
Avoid line widths > 80 characters.
When reading a bib file with a single reference, bib2df gives the error:
Error in x[, 1] : incorrect number of dimensions
I downloaded the file you share in the vignette: LiteratureOnCommonKnowledgeInGameTheory and erased all but the first reference.
As you will see, bib2df works fine with the complete file but fails with the single-reference file. I had the same problem with other .bib files created using rcrossref::cr_cn().
bib2df::bib2df("bib2df.bib") # Works
bib2df::bib2df("bib2df_single.bib") # Error in x[, 1] : incorrect number of dimensions
I attach both files: bib2df.zip
Thanks!
I happened to save a .bib
file in a subfolder of my project called www
. When trying to read it, pattern www.
is matched by www/
in line 6 of bib2pdf()
which makes it to try to "GET" a remote file from a URL, e.g.:
bib2df::bib2df("www/a_bibliography.bib")
#> Error: Invalid URL: File is not readable.
Created on 2021-07-15 by the reprex package (v2.0.0)
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.0 (2021-05-18)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Spain.1252
#> ctype Spanish_Spain.1252
#> tz Europe/Paris
#> date 2021-07-15
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
#> bib2df 1.1.1 2019-05-22 [1] CRAN (R 4.1.0)
#> cli 3.0.0 2021-06-30 [1] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
#> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
#> humaniformat 0.6.0 2016-04-24 [1] CRAN (R 4.1.0)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
#> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
#> rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
#> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.1.0)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] C:/Users/Mori.P16/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.0/library
I think this could be simply solved by changing the pattern in line 6 to: "http://|https://|www\." so it only matches "www" followed by a dot instead of "www" followed by "any character".
I downloaded a bib file from Web of Science savedrecs.zip and there are multiple issues when reading it. The solution shown in #21 doesn't work here :(
Most of them seen to be related with what you @ottlngr mentioned in in #21 (key-value pairs not separated by linebreaks):
But other issues seem to arise from a different thing:
[A] single_reference.zip
When reading this bib reference, the following lines of the abstract are creating new columns (the first-word of the line is the column title, and the text in the cell is whatever comes after the "="):
So, the first of those creates a BENEFITS column with a text "451) or non-evidence-based (e.g., relative risks"
Please, let me know if I can be of any help testing/debugging this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.