odkr / pandoc-zotxt.lua Goto Github PK
View Code? Open in Web Editor NEWPandoc filter that looks up bibliographic data for citations in Zotero.
License: MIT License
Pandoc filter that looks up bibliographic data for citations in Zotero.
License: MIT License
Hello. I am quite new to zotero and pandoc.
When I am using zotero with emacs, zotero links are inserted like (zotero://select/items/1_LFZY74D5)
I was hoping with this filter, pandoc exports would replace such zotero links with a URL in case the referenced item in Zotero is a web page. However, it is not the case.
Am I interpreting this project wrong or do I lack some configuration?
Citekeys placed in a nocite
pandoc metadata field are not being rendered if using pandoc-zotxt.lua.
Example, using cite
(with Zotero running, and containing a @brush:ornithology
item):
pandoc --lua-filter pandoc-zotxt.lua -C -t plain << EOT
@brush:ornithology
EOT
actual = expected output:
Brush and Clark (1983)
Brush, A. H., and G. A. Clark Jr., eds. 1983. Perspectives in
Ornithology. Cambridge: Cambridge Univ. Press.
using nocite
:
pandoc --lua-filter pandoc-zotxt.lua -C -t plain << EOT
---
nocite: '@brush:ornithology'
...
EOT
Actual output: nothing
Expected:
Brush, A. H., and G. A. Clark Jr., eds. 1983. Perspectives in
Ornithology. Cambridge: Cambridge Univ. Press.
For comparison, biblio data in pandoc metadata:
pandoc -C -t plain << EOT
---
nocite: '@brush:ornithology'
references:
- id: brush:ornithology
editor:
- family: Brush
given: A. H.
- family: Clark
given: G. A.
suffix: Jr.
issued: 1983
language: en-GB
publisher: Cambridge Univ. Press
publisher-place: Cambridge
title: Perspectives in ornithology
type: book
...
EOT
actual = expected output:
Brush, A. H., and G. A. Clark Jr., eds. 1983. Perspectives in
Ornithology. Cambridge: Cambridge Univ. Press.
I'd be happy to investigate further, e.g., how exactly nocites are represented in the pandoc AST, but maybe that isn't necessary, and there's an immediately obvious solution. Just let me know.
This has been fixed in v1.2.0b. Please give that version a try. This issue exists so that you know that I am aware of it and will be closed once v1.2 is released.
With version 1.0.0 of the filter, if a key is cited that is not in Zotero, but is defined in a bibliography file or in references
metadata, the filter prints an error and a stack trace from Zotero. I think it should print nothing in this case. A warning that the reference is not in Zotero would be OK, but I think it is unnecessary, because if the key is not defined elsewhere citeproc will already warn about this. The error that is currently printed does not make it clear what is wrong, and the stack trace is unnecessary.
Sample to reproduce the problem:
Cite something not in Zotero[@no_such_entry 123].
---
references:
- id: no_such_entry
author:
- literal: Somebody
issued:
- year: 2021
publisher: Some Press
publisher-place: Somewhere
title: Such Entry
type: book
...
Place the above in foo.md
and run pandoc foo.md -L pandoc-zotxt.lua --citeproc -o foo.pdf
:
pandoc-zotxt.lua: Library ID not providedError: Library ID not provided
Zotero.DataObjects.prototype.getIDFromLibraryAndKey@chrome://zotero/content/xpcom/data/dataObjects.js:377:25
Zotero.DataObjects.prototype.getByLibraryAndKeyAsync<@chrome://zotero/content/xpcom/data/dataObjects.js:349:12
tryCatcher@resource://zotero/loader.jsm -> resource://zotero/bluebird/util.js:16:16
module.exports/Promise.method/<@resource://zotero/loader.jsm -> resource://zotero/bluebird/method.js:15:21
From previous event:
captureStackTrace@resource://zotero/loader.jsm -> resource://zotero/bluebird/debuggability.js:915:23
CapturedTrace@resource://zotero/loader.jsm -> resource://zotero/bluebird/debuggability.js:807:5
longStackTracesCaptureStackTrace@resource://zotero/loader.jsm -> resource://zotero/bluebird/debuggability.js:482:19
module.exports/Promise.prototype._then@resource://zotero/loader.jsm -> resource://zotero/bluebird/promise.js:261:9
module.exports/Promise.prototype._passThrough@resource://zotero/loader.jsm -> resource://zotero/bluebird/finally.js:94:12
module.exports/Promise.prototype.finally@resource://zotero/loader.jsm -> resource://zotero/bluebird/finally.js:103:12
PromiseSpawn@resource://zotero/loader.jsm -> resource://zotero/bluebird/generators.js:36:25
module.exports/Promise.coroutine/<@resource://zotero/loader.jsm -> resource://zotero/bluebird/generators.js:197:21
Zotero.Server.DataListener.prototype._headerFinished@chrome://zotero/content/xpcom/server.js:302:5
Zotero.Server.DataListener.prototype.onDataAvailable@chrome://zotero/content/xpcom/server.js:208:7
hi everyone,
when I use Zotxt to covert latex file (.tex) to word (.docx), the warning information shows:
pandoc-zotxt.lua: acemogluE2022: zotxt response not encoded in UTF-8.
pandoc-zotxt.lua: schumpeter1942: zotxt response not encoded in UTF-8.
pandoc-zotxt.lua: graetzRES2018: zotxt response not encoded in UTF-8.
I try to find a solution. However, I lost here. Did anyone meet the same problem?
Thanks!
Zotero v5.0.71 introduces a security mechanim that aims to block 'browsers' from accessing its API. It appears to define 'browser' as any HTTP user agent that identifes as "Mozilla"; more precisely, the identification string of which starts with "Mozilla/" (lines 439–453 in server_connector.js
). So, HTTP requests to Zotero's API have to be made either via an HTTP user agent that doesn't identify as "Mozilla" or, alternatively, with the HTTP header Zotero-Allowed-Request
set. (judging from server_connector.js
).
By default, Pandoc doesn't set any request headers. Still, it appears Zotero treats it as 'browser'. This isn’t, however, because Pandoc identifies as Mozilla, but because Pandoc doesn't identify as any user agent. This triggers:
Error: this.headers['user-agent'] is undefined
Source File: chrome://zotero/content/xpcom/server.js
Line: 434
However, pandoc-zotxt.lua
uses Pandoc's pandoc.mediabag.fetch
function to retrieve data via HTTP (in MediaBag.hs, which ultimately calls openURL
in Class.hs). And pandoc.mediabag.fetch
(and openURL
), do not allow to set HTTP headers. Apparently, Pandoc sets headers for HTTP requests globally (see CommonState
and stRequestHeaders
in Class.hs). PANDOC_STATE
is read-only. And PANDOC_STATE.request_headers['Zotero-Allowed-Request'] = 1
has no effect (PANDOC_STATE.request_headers
is still empty afterwards).
(And pandoc.mediabag.fetch
returns ("", "") (i.e., the empty string two times), rather than either an error message from Zotero or (nil
, nil
), as the documentation would suggest. This makes debugging harder.)
Lua is just a thin layer over ANSI C. There is no other library or function to connect to a socket.
This explains egh/zotxt#11.
Fixing this will require a change in Zotero or Pandoc.
zotxt’s “Easy Citekeys” are of the form DoeTitle2000
or doe:2000title
, where title
(IIRC) can be any word from the title.
If either of these two basic formats is used for BBT citekeys, too, pandoc-zotxt.lua will sometimes return unexpected items.
For example, if there are two items in Zotero, author and date of both “Doe, Jane” and ”2020”, one title “Foo bar baz”, the other “Baz”, I’ve had echo "@doe:2020baz" | pandoc -L pandoc-zotxt.lua -C -t plain
sometimes return Doe, Jane. 2020. “Foo Bar Baz.”
(Expected, of course: Doe, Jane. 2020. “Baz.”
)
My temporary solution has been to modify the pandoc-zotxt.lua
script, either disabling easykey
, or moving it to the end of the relevant list, which seems to have the effect of checking the specific and (IIRC) guaranteed to be unique keys before the somewhat fuzzy easykeys:
ZOTXT_KEYTYPES = {
'betterbibtexkey', -- Better BibTeX citation key
'key', -- Zotero item ID
'easykey' -- zotxt easy citekey}
I wonder whether it might be a good idea for the official pandoc-zotxt.lua to adopt this.
What’s more, my impression is that https://github.com/egh/zotxt has, sort of, retired easykeys in favour of betterbibtexkeys (at least easykeys are not being mentioned on the project’s main page any longer at all).
It might be worth checking with @egh, and if easykeys have indeed become obsolete, remove them from pandoc-zotxt.lua as well.
Zotero beta has updated to v6.0, and this filter may be invalid for it.
I have a markdown file test.md
like this:
---
title: zotxt test
zotero-bibliography: bib.json
---
[@soper_LegalTheoryObligation_1977]
If I run pandoc like this:
pandoc test.md -o test.docx --standalone --lua-filter=pandoc-zotxt.lua --citeproc
then everything works fine. pandoc-zotxt creates a references file called bib.json
and adds passes it as a bibliography to citeproc. The outcome is a properly formatted document with all citations in place as expected.
However, if I add another citation to test.md
, things go wrong. bib.json
is not updated with the new reference.
Additionally, pandoc-zotxt no longer passes the reference file to citeproc, because even the references that were properly processed before are now missing, and replaced with the citation keys appended by question marks, implying that citeproc cannot find the references. If I add bibliography=bib.json
to the metadata, then citeproc can find the file again, but still doesn't process the citation(s) added after the original creation of the reference file, which is never updated.
I have tested this with docx
and html
output with the same result. I have also tested passing citeproc
as a lua filter rather than a regular switch. That doesn't make a difference either.
Thanks for a wonderful project. I'd be grateful for your thoughts.
Pandoc has introduced the argument --resource-path
to define a search path for files that are referenced in a document. pandoc-zotxt.lua ignores that argument.
Thank you for all your work on this project.
I would appreciate your help using pandoc-zotx.lua to generate output with correct citations.
Here is a basic test case:
---
zotero-bibliography: bib.json
---
Test citation
[@salzman_StrugglesPromisedLand_1997]
I am expecting that if I use pandoc-zotxt.lua then the bibtext key will be looked up in my desktop Zotero instance and added to bib.json. I further expect the filter to add that bibliography to the metadata (based on the manual: "The bibliography file is added to the "bibliography" metadata field automatically") so that the output is a formatted citation.
In fact, what happens is that the lua filter runs and creates bib.json correctly, but does not add it to the metadata as a bibliography file. The result is html output like this:
Test citation
salzman_StrugglesPromisedLand_1997?
If I add bib.json as "bibliography" to the yaml metadata myself, pandoc fails to run because it can't find the bibliography. Pandoc outputs:
File bib.json not found in resource path
Perhaps this is because pandoc tries to run citeproc before the lua filter, so the bib.json file is not yet available.
If I run the command first with the filter but without the bibliography in the metadata, and then add bib.json as bibliography and run it a second time, pandoc then finds bib.json and correctly formats the citation, resulting in the desired output. But running the script twice, and having to add the bibliography before the second time, is not expected behavior, and is quite tedious.
Am I doing something wrong?
Many thanks for your advice.
A change to the latest dev version of pandoc (possibly connected with this commit) seems to break the pandoc-zotxt.lua filter.
MWE (prerequisites: Zotero is running, and @author:2020title
is a valid citekey):
$ echo @author:2020title | /usr/local/bin/pandoc -L pandoc-zotxt.lua -F pandoc-citeproc
using a pandoc binary installed via brew into /usr/local/bin/ generates the expected (HTML-)formatted in-text citation plus bibliography entry.
The latest dev version of pandoc, however, invoked with
$ echo @author:2020title | pandoc -L pandoc-zotxt.lua -F pandoc-citeproc
results in
Error running filter /Users/nick/.local/share/pandoc/filters/pandoc-zotxt.lua:
PandocCouldNotFindDataFileError "lunajson.lua"
stack traceback:
[C]: in ?
[C]: in function 'require'
/Users/nick/.local/share/pandoc/filters/pandoc-zotxt.lua:179: in main chunk
(pandoc-citeproc is the latest dev version in both cases.)
Any ideas?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.