Giter VIP home page Giter VIP logo

geobosh / rbibutils Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 1.0 2.43 MB

Convert bibliography files between various formats, including BibTeX, BibLaTeX, PubMed, RIS, and Bibentry. This is an R port of the bibutils utilities plus R manipulation of bibiliography objects.

Home Page: https://geobosh.github.io/rbibutils/

R 2.85% TeX 3.49% C 93.66%
r bibtex biblatex bibliography bibutils r-package conversion pubmed endnote ris-format

rbibutils's Introduction

CRANStatusBadge CRAN RStudio mirror downloads CRAN RStudio mirror downloads R build status codecov

Read and write 'BibTeX' files. Convert bibliography files between various formats, including BibTeX, BibLaTeX, PubMed, EndNote and Bibentry. Includes an R port of the bibutils utilities.

Installing rbibutils

Install the latest stable version from CRAN:

install.packages("rbibutils")

You can also install the development version of rbibutils from Github:

library(devtools)
install_github("GeoBosh/rbibutils")

Overview

Import and export 'BibTeX' files. Convert bibliography files between various formats. All formats supported by the bibutils utilities are available, see bibConvert() for a complete list. In addition, conversion from and to bibentry, the R native representation based on Bibtex, is supported.

readBib() and writeBib() import/export BiBTeX files. readBibentry() and writeBibentry() import/export R source files in which the references are represented by bibentry() calls.

The convenience function charToBib() takes input from a character vector, rather than a file. It calls readBib() or bibConvert().

bibConvert() takes an input bibliography file in one of the supported formats, converts its contents to another format, and writes the result to a file. All formats, except for rds (see below) are plain text files. bibConvert() tries to infer the input/output formats from the file extentions. There is ambiguity however about bib files, which can be either Bibtex or Biblatex. Bibtex is assumed if the format is not specified. Also, the xml extension is shared by XML-based formats. Its default is 'XML MODS intermediate' format.

The default encoding is UTF-8 for both, input and output. All encodings handled by bibutils are supported. Besides UTF-8, these include gb18030 (Chinese), ISO encodings such as iso8859_1, Windows code pages (e.g. cp1251 for Windows Cyrillic) and many others. Common alternative names are also accepted (e.g. latin1).

Bibentry objects can be input from an R source file or from an rds file. The rds file should contain a bibentry R object, saved from R with saveRDS(). The rds format is a compressed binary format`. Alternatively, an R source file containing one or more bibentry instructions and maybe other commands can be used. The R file is sourced and all bibentry objects created by it are collected.

Examples:

readBib

The examples in this section import the following file:

bibacc <- system.file("bib/latin1accents_utf8.bib", package = "rbibutils")

Note that some characters may not be displayed on some locales. Also, on Windows some characters may be "approximated" by other characters.

Import the above bibtex file into a bibentry object. By default TeX escape sequences representing characters are kept as is:

be0 <- readBib(bibacc)
be0
print(be0, style = "bibtex")

As above, using the direct option:

be1 <- readBib(bibacc, direct = TRUE)
## readBib(bibacc, direct = TRUE, texChars = "keep") # same
be1
print(be1, style = "bibtex")

Use the "convert" option to convert TeX sequences to true characters:

be2 <- readBib(bibacc, direct = TRUE, texChars = "convert")
be2
print(be2, style = "R")

(On Windows the Greek characters alpha and delta may be printed as 'a' and 'd' but internally they are alpha and delta.)

Use the "export" option to convert other characters to ASCII TeX sequences, when possible (currently this option doesn't handle well mathematical expressions):

be3 <- readBib(bibacc, direct = TRUE, texChars = "export")
print(be3, style = "bibtex")

bibConvert

Convert Bibtex file myfile.bib to a bibentry object and save the latter to `"myfile.rds":

bibConvert("myfile.bib", "myfile.rds", informat = "bibtex", outformat = "bibentry")
bibConvert("myfile.bib", "myfile.rds")

Convert Bibtex file myfile.bib to a Biblatex save to `"biblatex.bib":

bibConvert("myfile.bib", "biblatex.bib", "bibtex", "biblatex")
bibConvert("myfile.bib", "biblatex.bib", outfile = "biblatex")

Convert Bibtex file myfile.bib to Bibentry and save as rds or R:

bibConvert("myfile.bib", "myfile.rds")
bibConvert("myfile.bib", "myfile.R")

Read back the above files and/or convert them to other formats:

readLines("myfile.R")
file.show("myfile.R")
readRDS("myfile.rds")
bibConvert("myfile.rds", "myfile.bib")
bibConvert("myfile.R", "myfile.bib")

Assuming myfile.bib is a Biblatex file, convert it to Bibtex and save to bibtex.bib:

bibConvert("myfile.bib", "bibtex.bib", "biblatex", "bibtex")
bibConvert("myfile.bib", "bibtex.bib", "biblatex")

Assuming "myfile.med" is a PubMed file, convert it to Bibtex:

bibConvert(infile = "myfile.med", outfile = "bibtex.bib", informat = "med", outformat = "bib")
bibConvert(infile = "myfile.med", outfile = "bibtex.bib", informat = "med") # same

See bibConvert() for further examples and their results.

rbibutils's People

Contributors

geobosh avatar hsloot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zagrosman

rbibutils's Issues

Version 1.4 implicitly depends on R>= 3.4

Your call to R_unif_index in

return floor + ( (int) R_unif_index((double) RAND_MAX) ) % len;

leads to undefined symbol: R_unif_index error since this function is only defined in R 3.4 and publicly introduced as API in R 3.5. In my case, this causes the CI of my package (which depend on Rdpack for creating the documentation) on R 3.3 to fail. This will also cause problems for all package developers that use the Github action workflow in https://github.com/r-lib/actions/blob/master/examples/check-full.yaml for their CI which tests on R 3.3.

You could use the following workaround:

#include <Rversion.h> 
// ...
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 4, 0)
  return floor + ( (int) R_unif_index((double) RAND_MAX)) ) % len;
#else
  return  floor + ( (int) floor(RAND_MAX * unif_rand()) ) % len;
#endif

Note that your solution and my workaround will probably be biased (see section 3 in https://isocpp.org/files/papers/n3551.pdf). A better solution might be the following which should produce random integers in [floor, floor+len) and is consistent with R's way to sample a random integer on R< 3.4 and R >= 3.4:

#include <Rversion.h> 
// ...
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 4, 0)
  return floor + ( (int) R_unif_index((double) len)) );
#else
  return  floor + ( (int) floor(len * unif_rand()) );
#endif

Different conversion when first name is given

test.bib:

@misc{x,
author = "P{\'\i}{\'i}"
}
@misc{y,
  author = "L. P{\'\i}{\'i}"
}
@misc{z,
  author = "P{\'\i}{\'i}, L."
}
rbibutils::readBib("test.bib", direct=TRUE, encoding = "UTF-8")

gives:

Pí\'i (????).

P\'i\'i L (????).

P\'i\'i L (????).

that is, having firstname lastname fails to process "'\i" into "í".

Inaccurate conversion of `\emph` in bibtex entries

In case of bibtex entries with italicized codes specified using \emph{}, rbibuitils is converting backslash in \emph to \backslash, resulting in \backslashemph leading to a unknown macro '\backslashemph' error.

bib1 <- "@article{burton_quantitative_1951,
	title = {Quantitative inheritance in pearl millet (\\emph{{Pennisetum} glaucum})},
	volume = {43},
	number = {9},
	journal = {Agronomy Journal},
	author = {Burton, Glenn W.},
	year = {1951},
	pages = {409--417}
}"


bib2 <- "@article{diwan_methods_1995,
  title = {Methods of developing a core collection of annual \\emph{{Medicago}} species},
  volume = {90},
  language = {en},
  number = {6},
  journal = {Theoretical and Applied Genetics},
  author = {Diwan, N. and McIntosh, M. S. and Bauchan, G. R.},
  month = may,
  year = {1995},
  keywords = {Annual Medicago species, Core collection, Germplasm collection, Relative Diversity method},
  pages = {755--761}
}"

rbibutils::charToBib(bib1)
#> Burton GW (1951). "Quantitative inheritance in pearl millet
#> (\backslashemph{{Pennisetum} glaucum})." _Agronomy Journal_, *43*(9),
#> 409-417.
rbibutils::charToBib(bib2)
#> Diwan N, McIntosh MS, Bauchan GR (1995). "Methods of developing a core
#> collection of annual \backslashemph{{Medicago}} species." _Theoretical
#> and Applied Genetics_, *90*(6), 755-761.

How to set nref_in and nref_out parameters?

I have an pubmed xml file with more than 500 references.
I tried to convert to bib file using this code:

bibConvert(infile = fn_med, #fn_med: route to my xml file
outfile = "export.bib",
informat = "med",
outformat = "bib")

But I got only 3 references:
$nref_in
[1] 3

$nref_out
[1] 3

How to set the number of references to read from the input file (nref_in) and number of references to write to the output file (nref_out)?

How to read and write all the references from the source file?

Thanks

Incorrect parsing of braces

@misc{x,
author = "P{\i}{\'\i}{\'i}",
}
@misc{y,
  author = "L P{\i}{\'\i}{\'i}",
}

Save into bug.bib

rbibutils::readBib("test.bib", direct=TRUE, encoding = "UTF-8")

gives:

Pıí\'i (????).

P\i}'i\'i L (????).
Warning messages:
1: In parseLatex(x) : x:1: unexpected END_OF_INPUT 'i'
2: In parseLatex(x) : x:1: unexpected END_OF_INPUT 'i'
3: In withCallingHandlers(.External2(C_parseRd, tcon, srcfile, "UTF-8",  :
  <connection>:2: unexpected END_OF_INPUT ' L (????).
'

Something is removing one opening brace when the author has "firstname lastname".

Failing tests on non-x86 archs

Hi,

In the latest version 2.2.7, one test is failing on archs other than amd64 and i386 in debian, as can be seen here

It chokes on

expect_known_value(readLines(tmp_ads), "xampl_bib2ads.rds", update = FALSE)

more specifically, in the resultant bib file, the last U is missing on two lines:

readLines(tmp_ads) has changed from known value recorded in 'xampl_bib2ads.rds'.
2/313 mismatches
x[293]: "%R ..................."
y[293]: "%R ..................U"

x[300]: "%R 1988..............."
y[300]: "%R 1988..............U"

Could you please fix the problem?

====================================================================================================

CC: @GeoBosh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.