Giter VIP home page Giter VIP logo

reffix's Introduction

reffix: Fixing BibTeX reference list with DBLP API ๐Ÿ”ง

reffix GitHub GitHub issues PyPI PyPI downloads Github stars

โžก๏ธ Reffix is a simple tool for improving the BibTeX list of references in your paper. It can fix several common errors such as incorrect capitalization, missing URLs, or using arXiv pre-prints instead of published version.

โžก๏ธ Reffix queries the DBLP API, so it does not require any local database of papers.

โžก๏ธ Reffix uses a conservative approach to keep your bibliography valid.

โžก๏ธ The tool is developed with NLP papers in mind, but it can be used on any BibTeX list of references containing computer science papers present on DBLP.

Quickstart

๐Ÿ‘‰๏ธ You can now install reffix from PyPI:

pip install -U reffix
reffix [BIB_FILE]

See the Installation and Usage section below for more details.

Example

Before the update (Google Scholar):

  • โŽ arXiv version
  • โŽ no URL
  • โŽ capitalization lost
 {  
    'ENTRYTYPE': 'article',
    'ID': 'duvsek2020evaluating',
    'author': 'Du{\\v{s}}ek, Ond{\\v{r}}ej and Kasner, Zden{\\v{e}}k',
    'journal': 'arXiv preprint arXiv:2011.10819',
    'title': 'Evaluating semantic accuracy of data-to-text generation with '
             'natural language inference',
    'year': '2020'
}

After the update (DBLP + preserving capitalization):

  • โœ”๏ธ ACL version
  • โœ”๏ธ URL included
  • โœ”๏ธ capitalization preserved
 {   
    'ENTRYTYPE': 'inproceedings',
    'ID': 'duvsek2020evaluating',
    'author': 'Ondrej Dusek and\nZdenek Kasner',
    'bibsource': 'dblp computer science bibliography, https://dblp.org',
    'biburl': 'https://dblp.org/rec/conf/inlg/DusekK20.bib',
    'booktitle': 'Proceedings of the 13th International Conference on Natural '
                 'Language\n'
                 'Generation, {INLG} 2020, Dublin, Ireland, December 15-18, '
                 '2020',
    'editor': 'Brian Davis and\n'
              'Yvette Graham and\n'
              'John D. Kelleher and\n'
              'Yaji Sripada',
    'pages': '131--137',
    'publisher': 'Association for Computational Linguistics',
    'timestamp': 'Mon, 03 Jan 2022 00:00:00 +0100',
    'title': '{Evaluating} {Semantic} {Accuracy} of {Data-to-Text} '
             '{Generation} with {Natural} {Language} {Inference}',
    'url': 'https://aclanthology.org/2020.inlg-1.19/',
    'year': '2020'
}

Main features

  • Completing references โ€“ reffix queries the DBLP API with the paper title and the first author's name to find a complete reference for each entry in the BibTeX file.
  • Replacing arXiv preprints โ€“ reffix can try to replace arXiv pre-prints with the version published at a conference or in a journal whenever possible.
  • Preserving titlecase โ€“ in order to preserve correct casing, reffix wraps individual uppercased words in the paper title in curly brackets.
  • Conservative approach:
    • the original .bib file is preserved
    • no references are deleted
    • papers are updated only if the title and at least one of the authors match
    • the version of the paper corresponding to the original entry should be selected first
  • Interactive mode โ€“ you can confirm every change manually.

The package uses bibtexparser for parsing the BibTex files, DBLP API for updating the references, and the titlecase package for optional extra titlecasing.

Installation

You can install reffix from PyPI:

pip install reffix

For development, you can install the package in the editable mode:

pip install -e .[dev]

Usage

Run the script with the .bib file as the first argument:

reffix [IN_BIB_FILE]

By default, the program will run in batch mode, save the outputs in the file with an extra ".fixed" suffix, and keep the arXiv versions.

The following command will run reffix in interactive mode, save the outputs to a custom file, and replace arXiv versions:

reffix [IN_BIB_FILE] -o [OUT_BIB_FILE] -i -a

Flags

short long description
-o --out Output filename. If not specified, the default filename <original_name>.fixed.bib is used.
-i --interact Interactive mode. Every replacement of an entry with DBLP result has to be confirmed manually.
-a --replace-arxiv Replace arXiv versions. If a non-arXiv version (e.g. published at a conference or in a journal) is found at DBLP, it is preferred to the arXiv version.
-t --force-titlecase Force titlecase for all entries. The titlecase package is used to fix casing of titles which are not titlecased. (Note that the capitalizaton rules used by the package may be a bit different.)
-s --sort-by Multiple sort conditions compatible with bibtexparser.BibTexWriter applied in the provided order. Example: -s ENTRYTYPE year sorts the list by the entry type as its primary key and year as its secondary key. ID can be used to refer to the Bibtex key. The default None value keeps the original order of Bib entries.
--no-publisher Suppress publishers in conference papers and journals (still kept for books).
--process-conf-loc Parse conference dates and locations, remove from proceedings names, store locations under address.
--no-formatting Disable automatic BibTeX formatting.

Notes

For lowering the amount of requests to the DBLP API, you can use the bibexport tool for generating a file compact.bib containing only the references used in the paper. As an input, use the file <myarticle>.aux created during compilation.

bibexport -o compact.bib <myarticle>.aux

Although reffix uses a conservative approach, it provides no guarantees that the output references are actually correct.

If you want to make sure that reffix does not introduce any unwanted changes, please use the interactive mode (flag -i).

The tool depends on DBLP API which may change any time in the future. I will try to update the script if necessary, but it may still occasionally break. I welcome any pull requests with improvements.

Please be considerate regarding the DBLP API and do not generate high traffic for their servers :-)

Contact

For any questions or suggestions, send an e-mail to [email protected].

reffix's People

Contributors

devernay avatar kasnerz avatar oplatek avatar tuetschek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

reffix's Issues

Add tests

Add a test bibtex file and a simple pytest script.

argparse long options should use '-' instead of '_'

To follow usual POSIX conventions.

For example, here it should be --replace-arxiv instead of --replace_arxiv.

Argparse will replace -with _ for you, so you dont need to change anything else (source): "Any internal - characters will be converted to _ characters to make sure the string is a valid attribute name."

โค๏ธ the project. One of the most useful bib tools around.

PyPI package

Modify the structure to make the package installable and upload it on PyPI.

Too many capitals get protected by default

Commit fdea5b7 from #15 breaks #9

It protects too many words in the title, only "PartNet", "A", and "3D" should be protected

    'title': '{P}art{N}et: {A} {L}arge-Scale {B}enchmark for {F}ine-Grained '
             'and {H}ierarchical {P}art-Level 3{D} {O}bject {U}nderstanding',

from https://dblp.org/rec/conf/cvpr/MoZCYTGS19.bib:

@inproceedings{DBLP:conf/cvpr/MoZCYTGS19,
  author       = {Kaichun Mo and
                  Shilin Zhu and
                  Angel X. Chang and
                  Li Yi and
                  Subarna Tripathi and
                  Leonidas J. Guibas and
                  Hao Su},
  title        = {PartNet: {A} Large-Scale Benchmark for Fine-Grained and Hierarchical
                  Part-Level 3D Object Understanding},
  booktitle    = {{IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}
                  2019, Long Beach, CA, USA, June 16-20, 2019},
  pages        = {909--918},
  publisher    = {Computer Vision Foundation / {IEEE}},
  year         = {2019},
  url          = {http://openaccess.thecvf.com/content\_CVPR\_2019/html/Mo\_PartNet\_A\_Large-Scale\_Benchmark\_for\_Fine-Grained\_and\_Hierarchical\_Part-Level\_3D\_CVPR\_2019\_paper.html},
  doi          = {10.1109/CVPR.2019.00100},
  timestamp    = {Mon, 30 Aug 2021 17:01:14 +0200},
  biburl       = {https://dblp.org/rec/conf/cvpr/MoZCYTGS19.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

It should be (breaking words is bad, it breaks search etc)

    'title': '{PartNet:} {A} Large-Scale Benchmark for Fine-Grained '
             'and Hierarchical Part-Level {3D} Object Understanding',

Relative import problem

The current Git main version triggers the following error in my setup:

Traceback (most recent call last):
File "/mnt/c/Users/Ondra/Werke/reffix/reffix/./reffix.py", line 28, in <module>
    from . import utils as ut
ImportError: attempted relative import with no known parent package

This is what fixes it for me:

--- a/reffix/reffix.py
+++ b/reffix/reffix.py
@@ -25,7 +25,7 @@ import bibtexparser
 import re
 import pprint

-from . import utils as ut
+import utils as ut

 from bibtexparser.bparser import BibTexParser
 import bibtexparser.customization as bc

Not sure if I'm doing something wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.