uwdata / living-papers Goto Github PK

Authoring tools for scholarly communication. Create interactive web pages or formal research papers from markdown source.

License: BSD 3-Clause "New" or "Revised" License

JavaScript 27.05% CSS 1.33% TeX 69.68% Makefile 1.64% PostScript 0.19% Handlebars 0.12%

living-papers's People

Contributors

Stargazers

Watchers

Forkers

twotau andrewhead sciencelabshs declann andrewzhang126 apollohuang1 cloudspeech ruofeidu

living-papers's Issues

Assets are only copied before running `watch`, not during ongoing use of `watch`

Because copying assets to build/assets is done with a prewatch script, it only takes effect right when npm watch is run. If a user adds more assets, they have to restart the npm watch process. (Ideally, a user could run npm watch and then work on their document without having to mess around on the command line anymore.)

feature request: compile only some output formats

LaTeX output is slow, so I'd like a way to only compile HTML output as I'm writing. I can comment out the "latex" section in the metadata, but this clobbers my collaborators' environments. It would be nicer if there were a command-line argument to specify (as an override) which output formats I want – or something like that.

LaTeX doesn't like image URLs with spaces

on account of the percent symbols, I reckon!

Array parameters to custom components

I'm trying to write a component that turns a list of strings into a bulleted list. Something like:

~~~ js { hide=true }
items = ["Milk", "Eggs", "Cheese"]
~~~

Today I am going shopping for:

::: bullet-list { items=`items` }
:::

I have implemented a custom component like this:

import { DependentElement } from "@living-papers/components";

export default class BulletList extends DependentElement {
  static get properties() {
    return {
      items: { type: Array },
    };
  }

  constructor() {
    super();
    this.items = [];
  }

  render() {    
    var ul = document.createElement("ul");
    this.items.forEach((item) => {
      var li = document.createElement("li");
      var p = document.createElement("p");
      p.textContent = item;
      li.appendChild(p);
      ul.appendChild(li);
    });
    return ul;
  }
}

However, the syntax and semantics of array parameters seems very flaky? I see that the option-text component uses a numeric array input that seems to work. But all of the following seem to bring up issues:

Using a block element instead of an inline element
Using an array of strings instead of an array of numbers
Using a variable to hold the array instead of passing a constant expression
Putting spaces between the elements of the array (!)

Where "issues" means either the program fails to parse, or the array gets stringified weirdly and the component has a null value for this.items.

Is there a better approach to what I'm trying to do?

`build` directory is not cleaned before build

Because the build directory is not cleared out before each build, old files can pile up there and build output is not as predictable / deterministic as it could be.

Bibliography is dumped when there are no citations (HTML output)

I copied a big bibliography in before adding any citations. Output looks like:

Once I add a single citation, this dump goes away:

(Weird edge case for sure.)

Support (Author, date) citation formats

Dear team,

This is a feature suggestion.
It would be great if in-text citation styles with author names and dates were supported out of the box. The styles are very popular in economics and business journals, among others.

Popular styles include

APA
Chicago
Harvard Style

An alternative would be to provide a short example how to define and select styles for LaTeX and HTML output formats, such that we can roll our own implementations.

Keep up the great work!
Best,
Christian

feature request: seeing intermediate translations

There's an option right now to output the final AST to a file. That's great! But there are a bunch of other intermediate forms in the translation process without similar options:

preprocessed Markdown (output of preprocess)
Pandoc's AST format (output of pandoc)
internal AST format (output of parsePandocAST, before transformAST has been called)

Making it easy to see these would help with understanding the living-papers internals, debugging new features, etc.

document workflow tips

I was happy to discover that Skim supports automatically reloading PDF files when they change on disk. A configuration change is required: see https://skim-app.sourceforge.io/manual/SkimHelp_38.html, but also select "Reload automatically". (Preview will sometimes do this, but it is unreliable, and you will often lose your place in the PDF.)

I think it would be helpful for living-papers to document some workflow niceties like this -- whatever is required to approximate the user experience of Overleaf, anyway.

feat idea: arXiv submission helper

Looks like it's a bit tricky to submit a paper to arXiv from Living Papers.

You need to extract source files from .temp/latex, preferably without build output files. (Although apparently some build output, like .bib, is desired...)
You need to do a bit of directory structure rearrangement, if .temp/latex/index.tex accesses figures from assets. Hidden directories are deleted!

Seems like it wouldn't be too hard to have an arXiv-submission output format, or helper script, or something.

Other fun things this format/helper could do:

Strip out comments from the tex file, since it gets archived.
Remove unused figures, since they get archived too.

Some info on what arXiv is looking for in a TeX submission.

reference numbers do not match between HTML and LaTeX output

I just noticed that reference numbers ([22] and such) do not match between HTML and LaTeX output. This isn't a huge surprise, since IIUC there are two separate pipelines for bibliography generation. I'm not sure it's a big deal, but it feels a bit troubling -- makes the HTML version not feel as much like a perfect substitute for the archival PDF version. Might be nice to see if syncing is possible, or to just put this on a list of "FYI" non-goals somewhere.

citations from Semantic Scholar without DOIs?

I've run into some papers I want to cite that don't seem to have DOIs (ex: https://www.semanticscholar.org/paper/The-Grail-System-Implementation-Ellis-Heafner/6c1779a0c6791609987b6481428022a39a9baadd). It looks like Semantic Scholar still has data on these papers accessible through its API. But I think Living Papers ultimately always looks up data via DOI.

Is it practical to use Semantic Scholar when DOIs aren't available? Or are we relying on a lot of DOI-specific machinery?

(Naturally, I could always just get a BibTeX citation and put it into my document by hand. Just curious about automation!)

feature request: live reloading of HTML output

I like having my paper open in a browser window while I write, and occasionally skimming the freshly-rendered HTML output. This requires some kind of live-reloading setup.

For the moment, I've hacked in https://livejs.com/, which I like just fine. I haven't made this production-ready yet. One big question would be how to make this a development-time-only option. See also: #25.

Open to any thoughts & suggestions!

JS formatting is too strict

I thought I'd try out JS formatting. However, with the code:

import extractSynonymsProgram from "./extractSynonymsProgram.json";

as a ```js block, I got an error:

file:///Users/joshuah/Documents/coding/living-papers/node_modules/acorn/dist/acorn.mjs:3454
  var err = new SyntaxError(message);
            ^

SyntaxError: Unexpected token (1:7)
    at pp$4.raise (file:///Users/joshuah/Documents/coding/living-papers/node_modules/acorn/dist/acorn.mjs:3454:13)
    at CellParser.unexpected (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:207:10)
    at pp$9.expect (file:///Users/joshuah/Documents/coding/living-papers/node_modules/acorn/dist/acorn.mjs:749:26)
    at CellParser.parseImportSpecifiers (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:68:10)
    at CellParser.parseImport (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:55:28)
    at CellParser.parseCell (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:131:19)
    at CellParser.parseTopLevel (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:180:17)
    at CellParser.parse (file:///Users/joshuah/Documents/coding/living-papers/node_modules/acorn/dist/acorn.mjs:584:15)
    at CellParser.parse (file:///Users/joshuah/Documents/coding/living-papers/node_modules/acorn/dist/acorn.mjs:634:35)
    at parseCell (file:///Users/joshuah/Documents/coding/living-papers/node_modules/@observablehq/parser/src/parse.js:22:23) {
  pos: 7,
  loc: Position { line: 1, column: 7 },
  raisedAt: 29
}

I suppose living papers is trying to parse my code with Observable's parser? Doesn't surprise me that my code isn't runnable. But I just want to format it.

(Maybe there's an option for this already?)

P.S. no worries with this issue; I don't really need colored code for my paper.

feature request: output-specific blocks

I'd like to be able to define blocks whose contents render only for a specific output format. Note that this is different from the notation described at https://pandoc.org/MANUAL.html#extension-raw_attribute, because the contents of this block would continue to be Markdown syntax in need of parsing, rather than raw output-specific syntax.

I've partially hacked this in. I use blocks that look like ::: {.html-only}. Then I add special case checks at the top of various output methods. For instance, I put:

if (hasClass(ast, 'html-only')) {
  return undefined;
}

towards the top of tex in tex-format.js.

But this isn't the right place for this check. Really, the first thing the tex-output-specific code should do is go through the AST and prune out these nodes. Problems are caused by delaying this. For instance, the LaTeX image converter goes through the AST to find images to convert. Images inside of ::: {.html-only} blocks should be ignored, but they're not.

Use 'latex' as raw format rather than 'tex'

Currently, Living Papers assumes raw tex code will be labeled with format 'tex' rather than 'latex'. This deviates from Pandoc's built-in / documented practice. Living Papers needs to explicitly rewrite the format:

living-papers/packages/compiler/src/plugins/include/index.js

Line 56 in 804b722

const format = raw === 'latex' ? 'tex' : raw;

Also, the Pandoc documents refer to using {=latex} for a raw latex block.

Not sure why we should use 'tex' rather than 'latex', since 'latex' is the name we use for the output format elsewhere (like in latex:only, etc.). At the very least, I'd like this condition to include 'latex':

living-papers/packages/compiler/src/output/latex/tex-format.js

Lines 385 to 388 in 804b722

 raw(ast) { 

 const format = getPropertyValue(ast, 'format'); 

 if (format !== 'tex') { 

 return '';

Happy to make the change if desired.

Feature request: Configurable typesetting engine (like xelatex or lualatex)

When I try to use Living Papers with the PLATEAU LaTeX class, LaTeX tells me:

! Fatal Package fontspec Error: The fontspec package requires either XeTeX or
(fontspec)                      LuaTeX.
(fontspec)
(fontspec)                      You must change your typesetting engine to,
(fontspec)                      e.g., "xelatex" or "lualatex" instead of
(fontspec)                      "latex" or "pdflatex".

Switching from pdflatex to xelatex in the LP code seems to fix things. Would be great to have this as an option, if that doesn't open too many cans containing too many worms.

Support clickable URLs in references

References can have links to DOIs or the original source. Right now, these are not clickable in the generated HTML.

footnotes disappear when browser window is too narrow

Footnotes currently appear in the margins, which is great. But if the browser window is too narrow (half screen width, mobile, etc) they disappear completely. It would be nicer if they appeared below the paragraph, or in an on-click popup, or something like that.

feature request: appendices

Full support would be nice. If not, it would be nice to control the location of the bibliography, so that you could write

[[[syntax that says bibliography goes here]]]

\appendix
# My Appendix

and get the right LaTeX output.

preprocess fails when attributes are used in nested fenced divs

::: div1 {a=b}
::: div2
:::
:::

preprocesses to

::: {.div1 a=b}
::: div2
:::
:::

which is expected. But

::: div1
::: div2 {a=b}
:::
:::

preprocesses to

::: div1
::: div2 {a=b}
:::
:::

which pandoc can't parse.

I think I can route around this for now by only using the ::: {.div2 a=b} form in nested divs.

Feature request: Set figure width (in LaTeX, at least)

As far as I can tell, it's hard-coded to \linewidth right now.

feature request: generate LaTeX source without running pdflatex

Use cases:

debugging LaTeX generation
I have a script that does word-count from LaTeX source; I'd like to run it as I write without spending 10 seconds generating a PDF.

Not super-common use-cases, but it would be nice. More generally, it's interesting to ask whether a living-papers executable should be a one-stop shop (a self-contained build system), or whether it could be more versatile as a UNIX-philosophy "do one thing well" process, to be joined with others in an external build system.

[feature request] "anchors of Headers" support for markdown

Dear iDL,

I wanted to take a moment to express my appreciation for the work you have done. I'm big fan of your vega language. I have been using it for several years and I am consistently impressed with the quality and functionality of it.

Several days ago, I want to find something to build an interactive page. After a lucky search, I find this. It is what I want. It makes to build interactive web pages much easier.

I was wondering if it would be possible to add a feature to support "anchors of Headers" feature. This feature is not quite common in paper, but it is very common in web. If the anchor exist, I can add [create an anchor](#anchors-in-markdown) to denote the specific section. User can create ToC(table of content) by hand, or create a link some where to point that section.

This would make it much easier for users to quickly browse and navigate through the content. I know this would take some extra effort, but I believe it would be a valuable addition to the project.

Thank you again for all of your hard work and dedication to this project.

Failing to build using template with "style is not iterable" error

Hello, I was just trying to use the template for the first time but running into a build error.

OS and Environment:
Ubuntu 22.04.1
Node 16.4.0
npm 8.19.2
pandoc 2.9.2.1
R 4.2.1
Python 3.10.6

Steps to reproduce:

Go to https://github.com/uwdata/living-papers-template/
Download zipfile
Extract content
Move to extracted content in terminal
Run npm i
Run npm run build

Error:

~/active/personal/living-papers-template-main
❯ npm run build

> @living-papers/[email protected] prebuild
> npm run assets


> @living-papers/[email protected] assets
> mkdirp build && cp -r assets build


> @living-papers/[email protected] build
> lpub -o build --tempDir='.temp' index.md

file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:452
      const [ {t: align} /*, {t: width} */ ] = style;
                                               ^

TypeError: style is not iterable
    at file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:452:48
    at Array.map (<anonymous>)
    at PandocASTParser.tableColumnStyles (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:451:19)
    at PandocASTParser.parseTable (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:398:25)
    at PandocASTParser.parseBlocks (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:154:24)
    at file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:376:29
    at Array.map (<anonymous>)
    at PandocASTParser.parseEnv (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:365:30)
    at PandocASTParser.parseDiv (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:353:14)
    at PandocASTParser.parseBlocks (file:///home/eva/active/personal/living-papers-template-main/node_modules/@living-papers/compiler/src/parse/parse-pandoc-ast.js:166:24)

feature request: viewing LaTeX logs

Would be nice to pipe the pdflatex & bibtex output into a log file you could look at.

Commenting out lines breaks paragraphs

(Pretty sure this is a Pandoc-Markdown issue, not LP, but it's annoying enough to me that I wouldn't mind fixing it in LP.)

I split my paragraphs into one-sentence-per-line:

Sentence 1.
Sentence 2.
Sentence 3.

I comment out the second sentence:

Sentence 1.
<!-- Sentence 2. -->
Sentence 3.

Now it's formatted as two paragraphs! No good.

LaTeX can't access images with relative paths

If you include a (non-svg) image with a relative path, generated LaTeX won't be able to find it because the LaTeX lives in .temp/latex and the paths are relative to the root.

"false positive" DOI not found warnings?

I've been getting some warnings in terminal output, like:

Semantic Scholar: Paper with id DOI:10.1145/2499370.2462170 not found
Semantic Scholar: Paper with id DOI:10.1109/icse.2012.6227133 not found

Weird thing is, the PDF is rendering fine, with the aforementioned articles in the bibliography and everything. So I'm not sure what's happening here.

(Haven't tried testing the problem in isolation yet.)

TeX code in bibliography gets mangled

If I put {$C^{\infty}$} in a title in a bibliography (as described here), the .bib file output has {C}\textsuperscript{\textbackslash{}infty} in its place.

citation parser isn't general enough (e.g. `@doi:10.1016/S1045-926X(05)80012-6`)

"10.1016/S1045-926X(05)80012-6" is a valid DOI (see https://doi.org/10.1016/S1045-926X(05)80012-6), but when I put @doi:10.1016/S1045-926X(05)80012-6 or [@doi:10.1016/S1045-926X(05)80012-6] in my document, the system says

Citation doi lookup failed: 10.1016/S1045-926X

I suppose the parser doesn't support a broad enough range of characters, so it's stopping at the (?

bug: fenced divs with non-existent components /sometimes/ work

It looks like a fenced div with a made-up component name creates a div with that name as a class. Ex:

::: my-component
Text
:::

compiles to

<div class="my-component">
Text
</div>

That's nice. However, adding a class, like

::: my-component {.my-class}
Text
:::

breaks this. (It just outputs some <p>s and w/e.)

This is despite the fact that

::: {.my-component .my-class}
Text
:::

compiles to

<div class="my-component my-class">
Text
</div>

just fine.

(This behavior is confusing. I stumbled onto the ::: my-component notation early on, liked it, and was quite confused when it failed to accept classes.)

Improve convert image snapshots

There are some lingering issues with snapshots of page elements. The pdf output is inconsistent with png/jpg output. Each has strengths and weaknesses. Ideally we would get consistent output with all the strengths and none of the weaknesses...

The bitmap snapshots do not perform resizing so can capture unnecessary white space of parent container elements. The PDF output, in contrast, includes transformation of element sizes by re-styling block elements to use display: inline-block to ensure the element sizing is driven by child content. It also re-styles margins to avoid undesirable clipping.
The PDF output uses extracted HTML and styles; however, this is not sufficient to capture the page content, as the extracted HTML may not re-generate the correct page state. For example, an extracted HTML canvas will not include the canvas pixel content or rendering code, resulting in a blank canvas. Similarly, extracted form elements may not preserve the current values of the form input elements.

We could consider an alternative approach that uses the same preparations for both vector and bitmap outputs. We would want to load the page and capture the "live" page state. One idea is to inject JS code into the loaded page to change styles, hide non-snapshot content (e.g., display: none), and perform sizing / margin adjustments. We could then take a (bounding box cropped) PDF or bitmap screenshot. It would be ideal to avoid re-loading the page for each snapshot, so we could look at ways to apply and then undo such styling transformations. Either way, as a subsequent optimization a conversion plan might also generate a filtered AST with only the elements we want to snapshot, thereby avoiding processing and rendering all the other page contents.

@mathisonian Any reactions or other ideas?

feature request: subfigures

As described at https://www.overleaf.com/learn/latex/How_to_Write_a_Thesis_in_LaTeX_(Part_3)%3A_Figures%2C_Subfigures_and_Tables. Pretty sure I'll want this for my paper. Might hack something in.

cite-ref fails when given name is missing

This line fails when given is missing (say, the author is a company):

living-papers/packages/components/src/cite-ref.js

Line 119 in 1b3f0bb

 const aMap = author.map(({ given, family }) => `${given.includes('.') ? given:given[0] + '.'} ${family}`); 

Trouble with ampersands in bibliography URLs

I think you're supposed to escape ampersands in bibliography URLs. (Seems like LaTeX yells at me when I don't.)

But something in LP's bibliography pipeline messes them up. In my .md:

@misc{wikipedia-grafting,
  author = "{Wikipedia contributors}",
  title = "Grafting --- {W}ikipedia{,} The Free Encyclopedia",
  year = "2022",
  url = "https://en.wikipedia.org/w/index.php?title=Grafting\&oldid=1095365064",
  note = "[Online; accessed 01-September-2022]"
}

In the output .bib:

@misc{wikipedia-grafting,
	author = {{Wikipedia contributors}},
	year = {2022},
	note = {[Online; accessed 01-September-2022]},
	title = {Grafting --- {Wikipedia}, {The} {Free} {Encyclopedia}},
	howpublished = {https://en.wikipedia.org/w/index.php?title=Grafting%5C&oldid=1095365064},
}

There's actually a bunch of things getting twiddled with here, which is a bit surprising. But the important bit is that \& becomes %5C&, which doesn't work.

ellipsis renders weirdly in LaTeX

... in md (three characters) turns into … in LaTeX source (single character) turns into in LaTeX output.

controlling whether new paragraph starts after list

When writing in LaTeX, I'm accustomed to paying attention to whether or not a list is "inside" a paragraph. This is controlled with newlines after the list.

For example:

Here's a paragraph
\begin{enumerate}
\item with
\item a
\item list inside it.
\end{enumerate}
And the paragraph continues!

Here's another paragraph
\begin{enumerate}
\item with
\item a
\item list inside it.
\end{enumerate}

But this paragraph is new!

renders to

This doesn't seem possible to control with Markdown. It looks like the generated LaTeX always starts a new paragraph after the list.

Maybe I'm the only person in the world who has ever paid attention to this? But I definitely notice it. It feels weird and choppy to always start a new paragraph after a list, even if the list is part of the flow of the paragraph. E.g., imagine you want a single final sentence of conclusion in the paragraph, after the list.

(Another twist: Ideally I'd want control over this for both LaTeX and HTML output, but it is apparently illegal to put ol/ul inside p tags. So I have no clue what HTML representation is best to use here.)

	raw(ast) {
	const format = getPropertyValue(ast, 'format');
	if (format !== 'tex') {
	return '';