pezmc / biblatex-check Goto Github PK

View Code? Open in Web Editor NEW

167.0 5.0 33.0 456 KB

A python script for checking BibLatex .bib files for common referencing mistakes!

Home Page: https://github.com/Pezmc/BibLatex-Linter

License: MIT License

Python 84.31% TeX 15.69%

biblatex bib-files linting validating python-script

biblatex-check's Issues

No errors

I just used this to validate the following

@techreport{StrumShafferEbersoleVitale2005,
	author = "Strum, Lindsey Marie and Shaffer, Jeanne Angela and Ebersole, Garrett P. and Vitale, Daniel F."
	title = "3rd grade engineering and technology curriculum"
	institution = "Worcester Polytechnic Institute"
	%address = "100 Institute Road, Worcester MA 01609-2280 USA"
	year = "2005"
	%month = "January"
}

and got

Info
# entries: 0
# problems: 0
# missing fields: 0
# flawed names: 0
# wrong types: 0
# non-unique id: 0
# wrong field: 0

Add support for optional fields

List optional fields somewhere in the UI and check for fields that are not allowed in a particular entry

Field 'school' required instead of 'institution' for 'masterthesis'

See subject, page 13 at http://www.lsv.ens-cachan.fr/%7Emarkey/BibTeX/doc/ttb_en.pdf, and line 60 in biblatex_check.py

detect missing field terminator

Field values should always end with a comma. A missing comma is not currently detected.

Add support for "paired" fields

E.g. online requires editor or author as per the biblatex spec

false positive on abbreviated journal title check

PROBLEM: quade2016 - flawed name: abbreviated journal title 'Phys. Rev. E'

However, that is the correct ISO 4 abbreviation.

List index out of range

I'm getting this with Python3, latest version from master

Traceback (most recent call last):
  File "biblatex_check.py", line 232, in <module>
    currentId = line.split("{")[1].rstrip(",\n")
IndexError: list index out of range

No check for last entry

The check of the current entry is performed on finding the new '@'. Thus the last entry is NOT CHECKED.

Error

Describe the bug
Just tried the script with a small bibtex file and it gives the error message

INFO: Reading references from 'test.bib'
INFO: Filtering by references found in 'references.aux'
WARNING: Aux file 'references.aux' doesn't exist -> not restricting entries
Traceback (most recent call last):
  File "/home/jonas/shared/bin/biblatex_check.py", line 473, in <module>
    handleEntryEnding(bibLineNumber, bibLine)
  File "/home/jonas/shared/bin/biblatex_check.py", line 355, in handleEntryEnding
    entryProblemsHTML = generateEntryProblemsHTML(
  File "/home/jonas/shared/bin/biblatex_check.py", line 243, in generateEntryProblemsHTML
    html += "<div class='reference'>" + title + " (" + author + ")"
TypeError: can only concatenate str (not "filter") to str

I'm using ubuntu 21.04 and replaced the python in the first line of the script by a python3.

To Reproduce
Execute the script with the command line

biblatex_check.py -b test.bib

where the file test.bib contains the following:

@book {hartshorne1977algebraic,
    AUTHOR = {Hartshorne, R.},
     TITLE = {Algebraic geometry},
      NOTE = {Graduate Texts in Mathematics, No. 52},
 PUBLISHER = {Springer-Verlag, New York-Heidelberg},
      YEAR = {1977},
     PAGES = {xvi+496},
      ISBN = {0-387-90244-9},
   MRCLASS = {14-01},
  MRNUMBER = {0463157 (57 \#3116)},
MRREVIEWER = {Robert Speiser},
}

Expected behavior

Should tell me that the bibtex file is correct.

Add support for xref

In my .bib file, I've tried to extract journals to @xref entries to save typing and reduce errors. An example of this:

@xref{computer,
  journaltitle = {Computer},
  publisher = {{IEEE}},
  issn = {0018-9162},
}

@article{Bowman2007a,
  title = {Virtual Reality},
  subtitle = {How Much Immersion Is Enough?},
  author = {Doug A. Bowman and Ryan P. McMahan},
  crossref = {computer},
  volume = 40,
  number = 7,
  date = {2007-07},
  doi = {10.1109/MC.2007.257},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4287241&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A4287226%29}
}

It would be nice if the checker would merge these (as per the BibLaTeX documentation) before complaining about missing fields.

Run tests in CI

Is your feature request related to a problem? Please describe.
There is a currently a 'test' that could be validated for all MRs in CI/CD

Describe the solution you'd like
GitHub Actions can be freely and easily implemented to install necessary stuff and run the shell command for running the tests

Describe alternatives you've considered
Writing the test in terms of a real pytest setup is also possible.

Problem opening .bib file with custom name

Contrary to what is currently written in the Readme, calling ./biblatex_check.py customname.bib did not work. I had to specify the bib argument explicitly with ./biblatex_check.py -b customname.bib.

Great work though!

Linting in CI

Is your feature request related to a problem? Please describe.
Similiarly to #59, code linting could be implemented in CI/CD

Describe the solution you'd like
pylint is easy to implement with GitHub Actions

Reference fields should support aliases to remove duplication

e.g. online == electronic

Support for alias fields

The linter flags an issue if a @phdthesis doesn't use institution, but everything I can find says school is in fact the right entry.

Is the License MIT or AGPL?

The README.md and biblatex_check.py files both state the license is the MIT license.
Yet the LICENSE.txt document contains the license text for AGPL.

This has me very confused. What is this project intended to be licensed under, MIT or AGPL?

Book entry with editor incorrectly flagged as warning

BibTeX requires a book entry to have an author or editor entry. However, BibLatex-Check warns about correct entries that have an editor instead of an author, such as

@Book{TestBook,
    editor = "John Doe",
    title = "An Correct Entry Creating a Warning",
    publisher = "The Publisher",
    year = "2020",
}

BibLatex-Check as a linter: missing line numbers in messages

To use biblatex-check as a linter in text editors, the messages should contain line numbers of the offending line.

For example, $ bibclean my.bib > /dev/null follows the style filename:line number: message (https://ctan.org/pkg/bibclean):

❯ bibclean  my.bib > /dev/null
%% my.bib:6:Expected http://dx.doi.org/ prefix in DOI value ``"10.1098/rstl.1856.0022"''.
%% my.bib:10:Unexpected value in ``month = "1"''.

However, biblatex-check returns

❯ biblatex-check -b my.bib -a my.aux
INFO: Reading references from 'my.bib'
INFO: Filtering by references found in 'my.aux'
PROBLEM: Blatov2010 - non-unique id: 'Blatov2010'

Could biblatex-check be extended to provide line numbers as well?

Add command line output

Add command line output to view the issues and/or automatically open the .html file

Online service for Bibtex linting

Hello,

it will be great if an online Bibtex checker/linter could be available.
A service comparable to https://jsonlint.com/ or http://www.yamllint.com/

Kind regards

Separate HTML from python code (read in a template)

Separate the HTML from the python, either by reading in a template or building the html with code rather than strings

Skipping Last Citation, Comput J invalid, school= considered invalid

The last citation in a file isn't processed at all (last one should otherwise get a complaint of missing author) (#38)
Comput. J. isn't recognized as journal,
school=... in phdthesis gives an error if citation is NOT standalone.

@article{Bra78b,
   title={Pattern-based representation of chess end-game knowledge},
   author={Bratko, Ivan and Kopec, Danny and Michie, Donald},
   journal={Comput. J.},
   volume={21},
   number={2},
   pages={149--153},
   year={1978}
}

@phdthesis{Str70,
   title={Untersuchungen \"{u}ber kombinatorische Spiele},
   author={Str\"{o}hlein, Thomas},
   year={1970},
   school={TU M\"{u}nchen}
}

@misc{LOM18,
title={Lomonosov tablebases},
year={2018},
url={http://tb7.chessok.com/},
howpublished={ChessOK}
}

Command line mode

Could we just have the warnings on the command line, instead of the HTML?

Of course, keep the HTML rendering optionally :)

Field editor required for incollection and inproceedings?

Based on the BibLaTeX documentation the field editor is not required for incollectionand inproceedings. However, in lines 41 and 51 they are.

Python error when trying to run the script

I'm trying to use your script with on my thesis and bibliography and I get this error:
Traceback (most recent call last): File "./biblatex_check.py", line 201, in <module> cleanedTitle = currentTitle.translate(removePunctuationMap) TypeError: expected a string or other character buffer object

I had to change the line endings as I am running Linux (see this issue on stackoverflow), perhaps is it related?

book entry should accept author or editor

For book entries, a missing author is flagged even if an editor is present.

Autofix

Some issues, such as journal instead of journaltitle are autofixable. An autofix option could replace those with the correct version.

Add sample files

You should add sample files for the input.bib and input.aux

Modify the .bib file inline

Instead of HTML output, modify the .bib file, adding comments where there are issues

Search entry ID omits first letter of the citation handle

Hi, thanks for developing this great tool! I just cloned the repo this morning and all seemed to work fine. However, when I wanted to search for a specific entry using the search field in the generated html file, I noticed that the search seems to omit the first letter of the search term. For example, one of my .bib-entries is this entry here:

@article{Abraham2014FN,
Abstract = {Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.},
Author = {Abraham, Alexandre and Pedregosa, Fabian and Eickenberg, Michael and Gervais, Philippe and Mueller, Andreas and Kossaifi, Jean and Gramfort, Alexandre and Thirion, Bertrand and Varoquaux, Ga{\"e}l},
Date-Added = {2019-11-06 17:20:24 +0100},
Date-Modified = {2019-11-06 17:21:27 +0100},
Doi = {10.3389/fninf.2014.00014},
Issn = {1662-5196},
Journal = {Frontiers in Neuroinformatics},
Month = {Feb},
Publisher = {Frontiers Media SA},
Title = {Machine learning for neuroimaging with scikit-learn},
Url = {http://dx.doi.org/10.3389/fninf.2014.00014},
Volume = {8},
Year = {2014}}

Now, if I start typing Abraham the entry is not displayed. However, if I type braham (omitting the first letter) it can find the corresponding entry. Therefore, I assume that the ID search somehow omits the first letter. I can try to fix it myself but wanted to point to this issues here. Thanks!

accept journal instead of journaltitle

It would be great, if I could have the script either accept journal as a substitute for journaltitle or like pep8 have the script ignore certain types of errors. But the script was useful for bringing my bib file into shape. Thanks.

Run script as CLI command anywhere

Is your feature request related to a problem? Please describe.
It would be more accessible to be able to just run something like

biblatex_check -b input.bib

from anywhere.

Describe the solution you'd like
This should be possible by setting up a setup.py accordingly, see https://stackoverflow.com/questions/56534678/how-to-create-a-cli-in-python-that-can-be-installed-with-pip

Describe alternatives you've considered
Google has a library https://github.com/google/python-fire for generating CLIs from Python but this is a bit overkill for a single script.

pezmc / biblatex-check Goto Github PK

biblatex-check's Issues

Recommend Projects

Recommend Topics

Recommend Org