pezmc / biblatex-check Goto Github PK
View Code? Open in Web Editor NEWA python script for checking BibLatex .bib files for common referencing mistakes!
Home Page: https://github.com/Pezmc/BibLatex-Linter
License: MIT License
A python script for checking BibLatex .bib files for common referencing mistakes!
Home Page: https://github.com/Pezmc/BibLatex-Linter
License: MIT License
I just used this to validate the following
@techreport{StrumShafferEbersoleVitale2005,
author = "Strum, Lindsey Marie and Shaffer, Jeanne Angela and Ebersole, Garrett P. and Vitale, Daniel F."
title = "3rd grade engineering and technology curriculum"
institution = "Worcester Polytechnic Institute"
%address = "100 Institute Road, Worcester MA 01609-2280 USA"
year = "2005"
%month = "January"
}
and got
Info
# entries: 0
# problems: 0
# missing fields: 0
# flawed names: 0
# wrong types: 0
# non-unique id: 0
# wrong field: 0
List optional fields somewhere in the UI and check for fields that are not allowed in a particular entry
See subject
, page 13
at http://www.lsv.ens-cachan.fr/%7Emarkey/BibTeX/doc/ttb_en.pdf, and line 60
in biblatex_check.py
Field values should always end with a comma. A missing comma is not currently detected.
E.g. online requires editor or author as per the biblatex spec
PROBLEM: quade2016 - flawed name: abbreviated journal title 'Phys. Rev. E'
However, that is the correct ISO 4 abbreviation.
I'm getting this with Python3, latest version from master
Traceback (most recent call last):
File "biblatex_check.py", line 232, in <module>
currentId = line.split("{")[1].rstrip(",\n")
IndexError: list index out of range
The check of the current entry is performed on finding the new '@'. Thus the last entry is NOT CHECKED.
Describe the bug
Just tried the script with a small bibtex file and it gives the error message
INFO: Reading references from 'test.bib'
INFO: Filtering by references found in 'references.aux'
WARNING: Aux file 'references.aux' doesn't exist -> not restricting entries
Traceback (most recent call last):
File "/home/jonas/shared/bin/biblatex_check.py", line 473, in <module>
handleEntryEnding(bibLineNumber, bibLine)
File "/home/jonas/shared/bin/biblatex_check.py", line 355, in handleEntryEnding
entryProblemsHTML = generateEntryProblemsHTML(
File "/home/jonas/shared/bin/biblatex_check.py", line 243, in generateEntryProblemsHTML
html += "<div class='reference'>" + title + " (" + author + ")"
TypeError: can only concatenate str (not "filter") to str
I'm using ubuntu 21.04 and replaced the python
in the first line of the script by a python3
.
To Reproduce
Execute the script with the command line
biblatex_check.py -b test.bib
where the file test.bib contains the following:
@book {hartshorne1977algebraic,
AUTHOR = {Hartshorne, R.},
TITLE = {Algebraic geometry},
NOTE = {Graduate Texts in Mathematics, No. 52},
PUBLISHER = {Springer-Verlag, New York-Heidelberg},
YEAR = {1977},
PAGES = {xvi+496},
ISBN = {0-387-90244-9},
MRCLASS = {14-01},
MRNUMBER = {0463157 (57 \#3116)},
MRREVIEWER = {Robert Speiser},
}
Expected behavior
Should tell me that the bibtex file is correct.
In my .bib
file, I've tried to extract journals to @xref
entries to save typing and reduce errors. An example of this:
@xref{computer,
journaltitle = {Computer},
publisher = {{IEEE}},
issn = {0018-9162},
}
@article{Bowman2007a,
title = {Virtual Reality},
subtitle = {How Much Immersion Is Enough?},
author = {Doug A. Bowman and Ryan P. McMahan},
crossref = {computer},
volume = 40,
number = 7,
date = {2007-07},
doi = {10.1109/MC.2007.257},
url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4287241&sortType%3Dasc_p_Sequence%26filter%3DAND%28p_IS_Number%3A4287226%29}
}
It would be nice if the checker would merge these (as per the BibLaTeX documentation) before complaining about missing fields.
Is your feature request related to a problem? Please describe.
There is a currently a 'test' that could be validated for all MRs in CI/CD
Describe the solution you'd like
GitHub Actions can be freely and easily implemented to install necessary stuff and run the shell command for running the tests
Describe alternatives you've considered
Writing the test in terms of a real pytest
setup is also possible.
Contrary to what is currently written in the Readme, calling ./biblatex_check.py customname.bib
did not work. I had to specify the bib argument explicitly with ./biblatex_check.py -b customname.bib
.
Great work though!
Is your feature request related to a problem? Please describe.
Similiarly to #59, code linting could be implemented in CI/CD
Describe the solution you'd like
pylint
is easy to implement with GitHub Actions
e.g. online == electronic
The linter flags an issue if a @phdthesis
doesn't use institution
, but everything I can find says school
is in fact the right entry.
The README.md and biblatex_check.py files both state the license is the MIT license.
Yet the LICENSE.txt document contains the license text for AGPL.
This has me very confused. What is this project intended to be licensed under, MIT or AGPL?
BibTeX requires a book entry to have an author or editor entry. However, BibLatex-Check warns about correct entries that have an editor instead of an author, such as
@Book{TestBook,
editor = "John Doe",
title = "An Correct Entry Creating a Warning",
publisher = "The Publisher",
year = "2020",
}
To use biblatex-check as a linter in text editors, the messages should contain line numbers of the offending line.
For example, $ bibclean my.bib > /dev/null
follows the style filename:line number: message
(https://ctan.org/pkg/bibclean):
❯ bibclean my.bib > /dev/null
%% my.bib:6:Expected http://dx.doi.org/ prefix in DOI value ``"10.1098/rstl.1856.0022"''.
%% my.bib:10:Unexpected value in ``month = "1"''.
However, biblatex-check returns
❯ biblatex-check -b my.bib -a my.aux
INFO: Reading references from 'my.bib'
INFO: Filtering by references found in 'my.aux'
PROBLEM: Blatov2010 - non-unique id: 'Blatov2010'
Could biblatex-check be extended to provide line numbers as well?
Add command line output to view the issues and/or automatically open the .html file
Hello,
it will be great if an online Bibtex checker/linter could be available.
A service comparable to https://jsonlint.com/ or http://www.yamllint.com/
Kind regards
Separate the HTML from the python, either by reading in a template or building the html with code rather than strings
school=
... in phdthesis gives an error if citation is NOT standalone.@article{Bra78b,
title={Pattern-based representation of chess end-game knowledge},
author={Bratko, Ivan and Kopec, Danny and Michie, Donald},
journal={Comput. J.},
volume={21},
number={2},
pages={149--153},
year={1978}
}
@phdthesis{Str70,
title={Untersuchungen \"{u}ber kombinatorische Spiele},
author={Str\"{o}hlein, Thomas},
year={1970},
school={TU M\"{u}nchen}
}
@misc{LOM18,
title={Lomonosov tablebases},
year={2018},
url={http://tb7.chessok.com/},
howpublished={ChessOK}
}
Could we just have the warnings on the command line, instead of the HTML?
Of course, keep the HTML rendering optionally :)
Based on the BibLaTeX documentation the field editor
is not required for incollection
and inproceedings
. However, in lines 41 and 51 they are.
I'm trying to use your script with on my thesis and bibliography and I get this error:
Traceback (most recent call last): File "./biblatex_check.py", line 201, in <module> cleanedTitle = currentTitle.translate(removePunctuationMap) TypeError: expected a string or other character buffer object
I had to change the line endings as I am running Linux (see this issue on stackoverflow), perhaps is it related?
For book entries, a missing author is flagged even if an editor is present.
Some issues, such as journal
instead of journaltitle
are autofixable. An autofix option could replace those with the correct version.
You should add sample files for the input.bib and input.aux
Instead of HTML output, modify the .bib file, adding comments where there are issues
Hi, thanks for developing this great tool! I just cloned the repo this morning and all seemed to work fine. However, when I wanted to search for a specific entry using the search field in the generated html file, I noticed that the search seems to omit the first letter of the search term. For example, one of my .bib
-entries is this entry here:
@article{Abraham2014FN,
Abstract = {Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.},
Author = {Abraham, Alexandre and Pedregosa, Fabian and Eickenberg, Michael and Gervais, Philippe and Mueller, Andreas and Kossaifi, Jean and Gramfort, Alexandre and Thirion, Bertrand and Varoquaux, Ga{\"e}l},
Date-Added = {2019-11-06 17:20:24 +0100},
Date-Modified = {2019-11-06 17:21:27 +0100},
Doi = {10.3389/fninf.2014.00014},
Issn = {1662-5196},
Journal = {Frontiers in Neuroinformatics},
Month = {Feb},
Publisher = {Frontiers Media SA},
Title = {Machine learning for neuroimaging with scikit-learn},
Url = {http://dx.doi.org/10.3389/fninf.2014.00014},
Volume = {8},
Year = {2014}}
Now, if I start typing Abraham
the entry is not displayed. However, if I type braham
(omitting the first letter) it can find the corresponding entry. Therefore, I assume that the ID search somehow omits the first letter. I can try to fix it myself but wanted to point to this issues here. Thanks!
It would be great, if I could have the script either accept journal
as a substitute for journaltitle
or like pep8
have the script ignore certain types of errors. But the script was useful for bringing my bib file into shape. Thanks.
Is your feature request related to a problem? Please describe.
It would be more accessible to be able to just run something like
biblatex_check -b input.bib
from anywhere.
Describe the solution you'd like
This should be possible by setting up a setup.py
accordingly, see https://stackoverflow.com/questions/56534678/how-to-create-a-cli-in-python-that-can-be-installed-with-pip
Describe alternatives you've considered
Google has a library https://github.com/google/python-fire for generating CLIs from Python but this is a bit overkill for a single script.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.