contextlab / cdl-bibliography Goto Github PK
View Code? Open in Web Editor NEWBibtex file shared by the Contextual Dynamics Lab at Dartmouth College
Home Page: http://www.context-lab.com
License: MIT License
Bibtex file shared by the Contextual Dynamics Lab at Dartmouth College
Home Page: http://www.context-lab.com
License: MIT License
We should, at some far future time when it rises to the top of the priority queue, change the name from memlab.bib to cdl.bib (and also update the documentation accordingly).
The common "MacTeX" TeX distribution for MacOS (which ContextLab/lab-manual links to and recommends) bundles TeX Live and some other tools, one of which is the TeXShop editor. TeXShop is generally great -- it has lots of useful features and makes working with LaTeX/BibTeX a lot easier, especially for beginners.
However, because Mac apps don't run internal commands in the user's shell environment, adding the CDL-bibliography repo to $BIBINPUTS
doesn't make cdl.bib
available to all projects when using TeXShop like it does if you compile from the command line.
TeX Live expects personal files to be placed in ~/Library/texmf/
and will prioritize any files there over the main texmf tree. So to make cdl.bib
available to all projects:
mkdir -p ~/Library/texmf/bibtex/bib
cdl.bib
from your local CDL-bibliography repo into this folder
ln -s /Users/<username>/path/to/CDL-bibliography/cdl.bib /Users/<username>/Library/texmf/bibtex/bib/cdl.bib
~
won't workThis method works for compiling from the command line as well, so setting $BIBINPUTS
is actually not necessary. You can also place/link personal files in ~/Library/texmf/tex/latex
or ~/Library/texmf/bibtex/bst/
rather than setting $BSTINPUTS
or $TEXINPUTS
, and this is actually the preferred/recommended method.
During verification, there seems no error message printed out for invalid 'pages' value.
I encountered this problem when verifying this BibTeX entry:
@article{Dess07,
title = {Storing events to retell them},
volume = {30},
number = {3},
journal = {Behavioral and Brain Sciences},
author = {Dessalles, Jean-Louis},
year = {2007},
pages = {321–322}
}
btw, the invalid pages value is caused by the dash '–' between page numbers, instead of the common hyphen '-'. Could also add support for the dash family.
add checks for titles:
can probably do this using an adapted version of format_journal_name
Once the bibtex checker is in place, I think the approach for changing the bibtex file should go something like this:
Importantly:
Another note: I probably also need to improve the instructions for adding bibtex entries...that might help avoid some errors too
It'd be nice to have a TravisCI or other similar CI check to verify:
I'd like to set up CI to verify the integrity of cdl.bib each time a pull request is submitted:
from helpers import check_bib
errors, corrected = check_bib('cdl.bib')
assert len(errors) == 0, 'check failed!'
seems straightforward, and I've even created a Docker file that can run the relevant checks (although it could probably be modified to use a different base package now that I'm done developing it...)
anyone interested in helping to set this up?
I've tried using the following packages to interface with Google Scholar:
I can get each of these to return valid information for a small number of queries. However, when I submit many queries (I'm not sure of the precise number-- 20? 50? 100?) I start seeing 429 HTTP errors ("too many requests"). It seems that the Google Scholar backend limits the number of queries per day (or possibly the rate?) that can come from a single browser/ip address/user (I'm not sure how it's parameterized).
This seems to make it impossible (or at least "non-trivial") to use Google Scholar to verify and/or look up bibliographic information.
I've also tried using the semanticscholar package to interface with Semantic Scholar. Unfortunately, the semantic scholar API requires knowing the DOI, author ID, or semantic scholar code-- which I don't have for most papers. The Google Scholar API does support DOI lookups, but it's not useful (if I could reliably access Google Scholar we wouldn't need Semantic Scholar!). I also tried submitting requests to crossref (using the mechanize package to simulate browser requests, and then regular expressions to parse out DOIs), but the results were highly unreliable (only a very small proportion of queries seemed to return useful information).
So: I'm stumped. Until I can figure out a way forward (e.g. a way around Google Scholar's limits, a way to look up information via Semantic Scholar, and/or another reliable source for bibliographic information) I'm going to remove bibliographic lookups from the bibtex checker code. My (broken) attempts can be found in the notebook (dev folder) of this commit.
The main issue I was trying to solve was that some of the page numbers are either self-inconsistent or invalid (e.g. the given page range doesn't make sense, like starting from a high number and going to a low number, or containing mixes of alpha and numeric characters that seem suspect). I'm going to implement some heuristics for cleaning up those sorts of issues (to the extent that I can reliably detect them), and I'll ignore for now the likelihood that some bibliographic information may be entered incorrectly.
When checking entries for words that should be manually capitalized using curly braces (entries in caps.txt
), the auto-checker should not expect leading and trailing symbols to be inside the braces. For example, CoppEtal17 was added as:
@inproceedings{CoppEtal17,
author = {G Coppersmith and C Hilland and O Frieder and R Leary},
booktitle = {2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics ({BHI})},
organization = {IEEE},
pages = {393--396},
title = {Scalable mental health analysis in the clinical whitespace via natural language processing},
year = {2017}}
The checker throws the following error:
CoppEtal17: booktitle "2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics ({BHI})" should be "2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics {(BHI})"
In the booktitle field, "BHI" in parentheses is capitalized as ({BHI})
. The checker sees this as an error and wants {(BHI})
instead.
I'm not familiar with the checker code and don't want to introduce unexpected bugs by trying to change this myself, so for now, I've updated CoppEtal17 to match what the checker prefers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.