Giter VIP home page Giter VIP logo

cdl-bibliography's People

Contributors

andrewheusser avatar dependabot[bot] avatar ethanadner552 avatar f1lm1 avatar jeremymanning avatar kirstensgithub avatar lucywowen avatar maddyrlee avatar paxtonfitzpatrick avatar tmuntianu avatar xiazhu avatar xinmingxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdl-bibliography's Issues

change name

We should, at some far future time when it rises to the top of the priority queue, change the name from memlab.bib to cdl.bib (and also update the documentation accordingly).

add instructions to README for using cdl.bib as a common bibliography with TeXShop/TeX Live (MacOS)

The common "MacTeX" TeX distribution for MacOS (which ContextLab/lab-manual links to and recommends) bundles TeX Live and some other tools, one of which is the TeXShop editor. TeXShop is generally great -- it has lots of useful features and makes working with LaTeX/BibTeX a lot easier, especially for beginners.

However, because Mac apps don't run internal commands in the user's shell environment, adding the CDL-bibliography repo to $BIBINPUTS doesn't make cdl.bib available to all projects when using TeXShop like it does if you compile from the command line.

TeX Live expects personal files to be placed in ~/Library/texmf/ and will prioritize any files there over the main texmf tree. So to make cdl.bib available to all projects:

  1. create the directory BibTeX checks for personal files (it doesn't exist unless manually created)
    • mkdir -p ~/Library/texmf/bibtex/bib
  2. symlink cdl.bib from your local CDL-bibliography repo into this folder
    • ln -s /Users/<username>/path/to/CDL-bibliography/cdl.bib /Users/<username>/Library/texmf/bibtex/bib/cdl.bib
    • Note: you must use absolute paths; relative paths or shortenings like ~ won't work

This method works for compiling from the command line as well, so setting $BIBINPUTS is actually not necessary. You can also place/link personal files in ~/Library/texmf/tex/latex or ~/Library/texmf/bibtex/bst/ rather than setting $BSTINPUTS or $TEXINPUTS, and this is actually the preferred/recommended method.

no error message for invalid 'pages' value

During verification, there seems no error message printed out for invalid 'pages' value.

I encountered this problem when verifying this BibTeX entry:

@article{Dess07, 
	title = {Storing events to retell them}, 
	volume = {30}, 
	number = {3}, 
	journal = {Behavioral and Brain Sciences}, 
	author = {Dessalles, Jean-Louis}, 
	year = {2007}, 
	pages = {321–322}
	}

btw, the invalid pages value is caused by the dash '–' between page numbers, instead of the common hyphen '-'. Could also add support for the dash family.

check titles

add checks for titles:

  • sentence case
  • should not end in "."
  • use caps.txt to define non-standard caps

can probably do this using an adapted version of format_journal_name

proposed revision to modification instructions

Once the bibtex checker is in place, I think the approach for changing the bibtex file should go something like this:

  1. Make the changes to the .bib file using whatever editor is convenient
  2. Run the bibtex checker locally. This should (a) automatically clean up recoverable errors (and print a warning message if anything is changed), (b) print out an error message if there are unrecoverable errors (e.g. bibtex file can't be parsed, bad citation keys, missing info, etc.), and (c) if the corrections and checks worked, create a commit whose message lists any changed or added bibtex entries.
  3. Push the changes to the user's fork.
  4. Open a pull request to merge the changes into the ContextLab fork. This should trigger an additional run of the bibtex checker (e.g., using TravisCI, GitHub actions, or another CI service). If the revised bibtex file passes the checks, append a "passed" message to the pull request. Otherwise print an informative message to the pull request saying which keys failed the checks.

Importantly:

  • The revised pipeline should be (at worst) minimally more inconvenient than the current pipeline. It's important that the bibtex checker script (which could be annoying to run) also does something useful (generates the commit message with the list of changed keys) so that it saves user effort on balance.
  • If a future user does not follow the recommended pipeline (e.g. they don't run the bibtex checker, and they commit "bad" changes), we should get a warning and/or those errors should be corrected prior to merging the changes into the ContextLab fork

Another note: I probably also need to improve the instructions for adding bibtex entries...that might help avoid some errors too

set up CI using github actions, travis, or similar

I'd like to set up CI to verify the integrity of cdl.bib each time a pull request is submitted:

from helpers import check_bib

errors, corrected = check_bib('cdl.bib')
assert len(errors) == 0, 'check failed!'

seems straightforward, and I've even created a Docker file that can run the relevant checks (although it could probably be modified to use a different base package now that I'm done developing it...)

anyone interested in helping to set this up?

lookups using google scholar, semantic scholar, etc.

I've tried using the following packages to interface with Google Scholar:

  • scholarly
  • gscholar
  • mechanize (via a simulated browser)

I can get each of these to return valid information for a small number of queries. However, when I submit many queries (I'm not sure of the precise number-- 20? 50? 100?) I start seeing 429 HTTP errors ("too many requests"). It seems that the Google Scholar backend limits the number of queries per day (or possibly the rate?) that can come from a single browser/ip address/user (I'm not sure how it's parameterized).

This seems to make it impossible (or at least "non-trivial") to use Google Scholar to verify and/or look up bibliographic information.

I've also tried using the semanticscholar package to interface with Semantic Scholar. Unfortunately, the semantic scholar API requires knowing the DOI, author ID, or semantic scholar code-- which I don't have for most papers. The Google Scholar API does support DOI lookups, but it's not useful (if I could reliably access Google Scholar we wouldn't need Semantic Scholar!). I also tried submitting requests to crossref (using the mechanize package to simulate browser requests, and then regular expressions to parse out DOIs), but the results were highly unreliable (only a very small proportion of queries seemed to return useful information).

So: I'm stumped. Until I can figure out a way forward (e.g. a way around Google Scholar's limits, a way to look up information via Semantic Scholar, and/or another reliable source for bibliographic information) I'm going to remove bibliographic lookups from the bibtex checker code. My (broken) attempts can be found in the notebook (dev folder) of this commit.

The main issue I was trying to solve was that some of the page numbers are either self-inconsistent or invalid (e.g. the given page range doesn't make sense, like starting from a high number and going to a low number, or containing mixes of alpha and numeric characters that seem suspect). I'm going to implement some heuristics for cleaning up those sorts of issues (to the extent that I can reliably detect them), and I'll ignore for now the likelihood that some bibliographic information may be entered incorrectly.

bibchecker: exclude leading & trailing non-alphabetic characters from expected contents of curly braces

When checking entries for words that should be manually capitalized using curly braces (entries in caps.txt), the auto-checker should not expect leading and trailing symbols to be inside the braces. For example, CoppEtal17 was added as:

@inproceedings{CoppEtal17,
	author = {G Coppersmith and C Hilland and O Frieder and R Leary},
	booktitle = {2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics ({BHI})},
	organization = {IEEE},
	pages = {393--396},
	title = {Scalable mental health analysis in the clinical whitespace via natural language processing},
	year = {2017}}

The checker throws the following error:

CoppEtal17: 	booktitle "2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics ({BHI})" should be "2017 {IEEE} {EMBS} International Conference on Biomedical \& Health Informatics {(BHI})"

In the booktitle field, "BHI" in parentheses is capitalized as ({BHI}). The checker sees this as an error and wants {(BHI}) instead.

I'm not familiar with the checker code and don't want to introduce unexpected bugs by trying to change this myself, so for now, I've updated CoppEtal17 to match what the checker prefers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.