Giter VIP home page Giter VIP logo

paperwhy's Introduction

Paperwhy

This is PaperWhy. Our sisyphean endeavour not to drown in the immense Machine Learning literature.

With thousands of papers every month, keeping up with and making sense of recent research in machine learning has become almost impossible. By routinely reviewing and reporting papers we help ourselves and hopefully someone else.

Requirements and setup

  • pandoc with pandoc-citeproc to generate the bibliography.
  • Hugo version 0.41 to locally generate the site. Breaking changes after this version make upgrading too much of a hassle.
  • Optional: TeXmacs with the markdown converter to write the posts in a sensible editor with typesetting.

In order to preview the site, from the source of the repo type

hugo server

and open a browser to localhost:1313.

All original content is (optionally) written as TeXmacs files inside tmdocs/, then exported to markdown using our TeXmacs2Markdown converter. If this hasn't been pushed to TeXmacs' trunk it can be easily installed following the instructions at the projects page. Directly writing markdown is of course also possible, see below.

Using docker

Alternatively to installing hugo, you can use the provided Dockerfile to first build an image:

docker build -f docker/Dockerfile -t paperwhy .

and then run hugo from a container with:

docker run --rm -it -v $(pwd):/site -p 1313:1313 paperwhy \
       serve --bind 0.0.0.0

Adding a new post using TeXmacs

Paths are relative to the root of the code.

  1. Write the original document in tmdocs/. The naming convention is to use the bibtex citekey for the paper as basename, e.g. turing_chemical_1952.tm. There is a TeXmacs style file with some default settings. See also below for how to set things up to get metadata automatically exported to markdown.
  2. Save any images inside static/img/.
  3. Add an entry to the Zotero database for the paper and any others that you are going to cite in the post. Export the database to bibtex as tmdocs/paperwhy.bib and cite as usual inside the TeXmacs document.
  4. Convert the bibtex to yaml with pandoc-citeproc -y tmdocs/paperwhy.bib > data/bibliography.yml.
  5. Export your posts as markdown into content/post/.
  6. Two things need manual fixing in the markdown for now: the paper_key field and paths to images, which need to be corrected to /img/whatever.

The extra field in Zotero (notes in the bibtex) can hold further fields. For now only code: http://urltowhatever (for any source code published with the paper) is used in the templates to automatically add a link to it next to the one to the original paper, authors, etc.

Supported Hugo features for TeXmacs documents

  • Paper authors: use the doc-data author tags as if adding a regular author to the document.
  • Post author: use the running-author macro.
  • Tags: use the \tags macro.
  • Shortcodes: use the \hugo-shortcode macro. To do: use xargs to accept a variable number of arguments.
  • Bibliography: Add the bibliography file and cite as usual.
  • Almost everything that Hugo supports can be converted from TeXmacs, including all kinds of text, lists, images, links and blackfriday extensions like footnotes and striked through text.
  • Furthermore, much of TeXmacs' non-dynamic markup is recognized and exported. In particular, labels and references, numbered environments, and figures should work out of the box.

Structure of a markdown post

Each markdown file has a header, the frontmatter, with:

author: Your full name here
authors: ["Repeat your full name here, I have to fix this"]
date: YYYY-mm-dd
tags: ["tags", "should-have", "no-whitespace", "use-dashes" ]
paper_authors: ["surname, name", "surname, name"]
paper_key: "bibtex_citekey"

Even though they can be extracted from the bibliography file, it is necessary to copy the authors to paper_authors. This is required for the "Papers by..." pages.

The name of the file should coincide with the bibtex citekey. Eventually we will drop the additional variable in the frontmatter.

Editing the markdown directly

Simple but tedious: use markdown directly with, e.g. easy-hugo for emacs. This makes browsing the posts easy and copies a frontmatter template from archetypes/default.md.

Citations in the markdown are done with a Hugo shortcode as follows:

This architecture was already tested in {{< cite bibtex_citekey_here>}}

This is replaced either by a standard "Paper title, Authors, Year", or possibly by a number and footnote, or in case the paper was the subject of a previous post by a link to it.

Videos can be embedded with another shortcode:

{{< youtube videoid >}}

Figures can be added with

{{< figure src="/img/citekey-fig1.svg" title="some caption" >}}

There's also a caption field, but looks worse in the current theme.

To do

  • Fix the generation of the post author's pages. {{ . Content }} is being ignored in the template. Should I place the _index.md elsewhere?

  • Generate lists of authors (and citekeys if I use them) upon each build. Use them when writing new posts, maybe autocomplete in easy-hugo?

  • Consolidate multiple assets in single files.

  • Ensure that all content is minified by aerobatic before being served and in case it doesn't write a Makefile.

  • Use local, stripped version of MathJax?

Meta

This site is built with HUGO based on the hikari theme. Branch master is automatically deployed onto netlify.

Credits

  • The many authors of the papers
  • HUGO
  • hikari theme (port and original author)
  • MathJax
  • jQuery
  • icomoon
  • And of course TeXmacs :)

paperwhy's People

Contributors

mdbenito avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.