Giter VIP home page Giter VIP logo

arxiv-collector's Introduction

A small script to collect your LaTeX files for submission to the arXiv. Particularly useful if you use biblatex, and you can use it directly on Overleaf.

Usage

Install with pip install arxiv-collector or conda install -c conda-forge arxiv-collector – or just download arxiv_collector.py, it's a stand-alone script with no dependencies. Works with any reasonable version of Python 3, or 2.7 if you really must.

Use with arxiv-collector from your project's main directory, or arxiv-collector file.tex if you have more than one .tex file and it can't guess correctly which one to use; arxiv-collector --help for more.

Main features:

  • By default, strips potentially-embarrassing comments from your uploaded .tex files. (Use --no-strip-comments to turn this off; it's based on a regular expression, and it's definitely possible for it to screw up, especially if you use % in a verbatim block or something.)

  • Includes the necessary parts of any system package you tell it to upload. By default, this includes biblatex (if you use it) to avoid errors like

Package biblatex Warning: File '.bbl' is wrong format version

  • Only uploads things you actually use: if you have an image you're not including anymore or whatever, doesn't upload it.

Requirements:

  • A working installation of latexmk, on your PATH. (This is used to make the .bbl file and to track which files are used.)
    • If you have working TeX and Perl installations, you likely already have latexmk even if you don't use it. If you don't, you can either install it the "normal" way (tmlgr install latexmk, apt-get install latexmk, ...), or just grab the script with arxiv-collector --get-latexmk path/to/output/latexmk.
    • If latexmk isn't on your PATH for whatever reason, add --latexmk ./path/to/latexmk to your arxiv-collector call.
    • NOTE: latexmk version 4.63b has broken dependency tracking, which means arxiv-collector won't work with it. You can either update it with your package manager, or you can get a working version, e.g. 4.64a, with arxiv-collector --get-latexmk path/to/output/latexmk, and either put it in e.g. ~/bin or pass --latexmk to your arxiv-collector invocations.

Caveats

The script may or may not work if you do something weird with your project layout / etc; always check that the arXiv output pdf looks right. Let me know if you run into any problems, including a copy of the not-working project if possible.

In particular, if you include figures or other files with absolute paths (\includegraphics{/home/me/wow.png} instead of \includegraphics{../wow.png}), the script will think it's a system file and not include it by default. You can hack it with --include-packages to include any directory name in the path.

Using directly on Overleaf

It's easy to set up Overleaf to run the script on each compilation, so that you're always ready to upload to arXiv at a moment's notice! (You can of course comment out or remove the lines below after running it once, but it shouldn't add much overhead to just do it every time.)

First, add arxiv_collector.py to your project. You can do "New file", "From external url", then put in https://raw.githubusercontent.com/djsutherland/arxiv-collector/master/arxiv_collector.py.

Now, add a file called .latexmkrc if you don't have one already. This is a control file that tells latexmk how to compile your project (which is what Overleaf uses behind the scenes). If you use something slightly complicated like an index or a glossary, you might need to add in Overleaf's default settings file, which this will override, but for 95% of projects you don't need to worry about this.

Add to the .latexmkrc file (whether you're starting from blank or from Overleaf's default, doesn't matter) the following contents:

$dependents_list = 1;
$deps_file = ".deps";

END {
  system("python arxiv_collector.py --latexmk-deps $deps_file");
}

Now, after you compile, you can download arxiv.tar.gz by clicking on the blue page icon to the right of the big green Recompile button ("Logs and output files"), clicking on "Other logs & files", then choosing arxiv.tar.gz. Upload that file to the arXiv, and you should be good!

arxiv-collector's People

Contributors

ast0815 avatar djsutherland avatar ryanakca avatar scopatz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

arxiv-collector's Issues

arXiv still loads own biblatex version

Hello,

I am trying to use your script to get around the dreaded "Package biblatex Warning: File 'article.bbl' is wrong format version - expected 2.8." error on the arXiv. I am using MikTeX on Windows to write my paper.

Using your script works fine and I managed to create an arxiv.tar.gz file by calling arxiv-collector article. Unfortunately, when I upload the resulting file to the arXiv, the build still fails. It seems like the builder on arXiv is still using its own biblatex version:

 (/texlive/2016/texmf-dist/tex/latex/biblatex/biblatex.sty

Could this be related to my Windows setup? The arxiv.tar.gz contains a somewhat strange folder structure, including paths that start with C:/.

Bad output of latexmk --version

I swear this worked a month or two ago (running on MacOS Catalina) but now arxiv-collector main.tex produces the following error:

Traceback (most recent call last):
  File "/Users/nrbeaton/anaconda3/bin/arxiv-collector", line 10, in <module>
    sys.exit(main())
  File "/Users/nrbeaton/anaconda3/lib/python3.6/site-packages/arxiv_collector.py", line 353, in main
    version = get_latexmk_version(args.latexmk)
  File "/Users/nrbeaton/anaconda3/lib/python3.6/site-packages/arxiv_collector.py", line 77, in get_latexmk_version
    raise ValueError("Bad output of {} --version:\n{}".format(latexmk, out))
ValueError: Bad output of latexmk --version:
b'\nLatexmk, John Collins, 26 Dec. 2019. Version 4.67\n'

UnicodeDecodeError on Windows when stripping comments

My arxiv-collector version is:

0.4.1

Debugging output:

arxiv-collector --debug main.tex Building main... .deps already exists... Running ['latexmk', '-silent', '-pdf', '-deps', '-deps-out=.deps-d', 'main'] External Perl missing or outdated. Please install a recent Perl, or configure TeX Live to always use the builtin Perl: tlmgr conf texmf TEXLIVE_WINDOWS_TRY_EXTERNAL_PERL 0 Meanwhile, continuing with built-in Perl...

Latexmk: Run number 1 of rule 'pdflatex'
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/W32TeX) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
Latexmk: Run number 2 of rule 'pdflatex'
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/W32TeX) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode

Dependencies in .deps-d
Gathering outputs...
Deps file .deps-d: source main, base name main, output main.pdf, jobname main
Processing c:/texlive/2020/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb ...
Processing c:/texlive/2020/texmf-dist/tex/latex/base/article.cls ...
Processing c:/texlive/2020/texmf-dist/tex/latex/base/inputenc.sty ...
Processing c:/texlive/2020/texmf-dist/tex/latex/base/size10.clo ...
Processing c:/texlive/2020/texmf-dist/tex/latex/l3backend/l3backend-pdfmode.def ...
Processing c:/texlive/2020/texmf-dist/web2c/texmf.cnf ...
Processing c:/texlive/2020/texmf-var/fonts/map/pdftex/updmap/pdftex.map ...
Processing c:/texlive/2020/texmf-var/web2c/pdftex/pdflatex.fmt ...
Processing c:/texlive/2020/texmf.cnf ...
Processing main.tex ...
Traceback (most recent call last):
File "c:\users\karlson\anaconda3\envs\arxiv\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\karlson\anaconda3\envs\arxiv\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Karlson\Anaconda3\envs\arxiv\Scripts\arxiv-collector.exe_main
.py", line 7, in
File "c:\users\karlson\anaconda3\envs\arxiv\lib\site-packages\arxiv_collector.py", line 491, in main
collect(
File "c:\users\karlson\anaconda3\envs\arxiv\lib\site-packages\arxiv_collector.py", line 261, in collect
for line in f:
File "c:\users\karlson\anaconda3\envs\arxiv\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 93: character maps to

This is a minimum failing example (file is utf8 encoded without BOM):

\documentclass{article}
\usepackage[utf8]{inputenc}

\begin{document}
“This is a test”
\end{document}

main.zip
I am compiling this on a Windows machine and it fails here:

with io.open(dep) as f, io.BytesIO() as g:
tarinfo = tarfile.TarInfo(name=dep)
for line in f:
g.write(strip_comment(line).encode("utf-8"))

Judging by the error message, I assume that python wants to open the file as Windows-1252 file, since no encoding is provided.

Error after adding the overleaf hook to latexmk

I am running python ./arxiv_collector.py from master which works well.
My latexmk version is 4.69a and works also well on its own.
When I add the lines

$dependents_list = 1;
$deps_file = ".deps";

END {
  system("python arxiv_collector.py --latexmk-deps $deps_file");
}

to my empty .latexmkrc I get the following error message.

Latexmk: All targets (draft.pdf) are up-to-date
Gathering outputs...
Traceback (most recent call last):
  File "arxiv_collector.py", line 500, in <module>
    main()
  File "arxiv_collector.py", line 482, in main
    collect(
  File "arxiv_collector.py", line 223, in collect
    expect(
  File "arxiv_collector.py", line 54, in expect
    raise ValueError(msg)
ValueError: deps file .deps seems broken: expected the line
draft.pdf :\
  to be one of:
draft.tex.pdf :\
draft.tex.pdf .deps :\

Any idea what goes wrong here?
I started to investigate this problem after noticing that overleaf set up according to the README generates only an empty zip file.

Used a .bib file, but didn't find 'text.pdf .bbl'; this likely won't work.

Description

The project file is text.tex. Its content is

  \documentclass{article}
  \usepackage{biblatex}
  \addbibresource{bibliography.bib}
  \begin{document}
  \cite{A}
  \printbibliography
  \end{document}

The bibliography file is bibliography.bib. Its content is

 @book{A,
  author = {Me},
  title = {Kamasutra}}

What's wrong? Thanks.

My arxiv-collector version is: 0.4.1

Debugging output:

Building text... Running ['latexmk', '-silent', '-pdf', '-deps', '-deps-out=.deps', 'text']

Dependencies in .deps
Gathering outputs...
Deps file .deps: source text, base name text, output text.pdf .deps, jobname text.pdf
Processing /etc/texmf/web2c/texmf.cnf ...
Processing /usr/share/texmf/fonts/map/fontname/texfonts.map ...
Processing /usr/share/texmf/fonts/tfm/public/cm/cmbx12.tfm ...
Processing /usr/share/texmf/fonts/tfm/public/cm/cmr12.tfm ...
Processing /usr/share/texmf/fonts/tfm/public/cm/cmti10.tfm ...
Processing /usr/share/texmf/fonts/type1/public/amsfonts/cm/cmbx12.pfb ...
Processing /usr/share/texmf/fonts/type1/public/amsfonts/cm/cmr10.pfb ...
Processing /usr/share/texmf/fonts/type1/public/amsfonts/cm/cmti10.pfb ...
Processing /usr/share/texmf/tex/generic/oberdiek/etexcmds.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/ifluatex.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/ifpdf.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/infwarerr.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/kvsetkeys.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/ltxcmds.sty ...
Processing /usr/share/texmf/tex/generic/oberdiek/pdftexcmds.sty ...
Processing /usr/share/texmf/tex/generic/xstring/xstring.sty ...
Processing /usr/share/texmf/tex/generic/xstring/xstring.tex ...
Processing /usr/share/texmf/tex/latex/base/article.cls ...
Processing /usr/share/texmf/tex/latex/base/ifthen.sty ...
Processing /usr/share/texmf/tex/latex/base/size10.clo ...
Processing /usr/share/texmf/tex/latex/biblatex/bbx/numeric.bbx ...
Adding /usr/share/texmf/tex/latex/biblatex/bbx/numeric.bbx
as numeric.bbx
Processing /usr/share/texmf/tex/latex/biblatex/bbx/standard.bbx ...
Adding /usr/share/texmf/tex/latex/biblatex/bbx/standard.bbx
as standard.bbx
Processing /usr/share/texmf/tex/latex/biblatex/biblatex.cfg ...
Adding /usr/share/texmf/tex/latex/biblatex/biblatex.cfg
as biblatex.cfg
Processing /usr/share/texmf/tex/latex/biblatex/biblatex.def ...
Adding /usr/share/texmf/tex/latex/biblatex/biblatex.def
as biblatex.def
Processing /usr/share/texmf/tex/latex/biblatex/biblatex.sty ...
Adding /usr/share/texmf/tex/latex/biblatex/biblatex.sty
as biblatex.sty
Processing /usr/share/texmf/tex/latex/biblatex/blx-compat.def ...
Adding /usr/share/texmf/tex/latex/biblatex/blx-compat.def
as blx-compat.def
Processing /usr/share/texmf/tex/latex/biblatex/blx-dm.def ...
Adding /usr/share/texmf/tex/latex/biblatex/blx-dm.def
as blx-dm.def
Processing /usr/share/texmf/tex/latex/biblatex/cbx/numeric.cbx ...
Adding /usr/share/texmf/tex/latex/biblatex/cbx/numeric.cbx
as numeric.cbx
Processing /usr/share/texmf/tex/latex/biblatex/lbx/english.lbx ...
Adding /usr/share/texmf/tex/latex/biblatex/lbx/english.lbx
as english.lbx
Processing /usr/share/texmf/tex/latex/etoolbox/etoolbox.sty ...
Processing /usr/share/texmf/tex/latex/graphics/keyval.sty ...
Processing /usr/share/texmf/tex/latex/logreq/logreq.def ...
Processing /usr/share/texmf/tex/latex/logreq/logreq.sty ...
Processing /usr/share/texmf/tex/latex/oberdiek/kvoptions.sty ...
Processing /usr/share/texmf/tex/latex/url/url.sty ...
Processing /usr/share/texmf/web2c/texmf.cnf ...
Processing /var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map ...
Processing /var/lib/texmf/web2c/pdftex/pdflatex.fmt ...
Processing bibliography.bib ...
Processing text.tex ...
Adding text.tex with comments stripped
Used a .bib file, but didn't find 'text.pdf .bbl'; this likely won't work.
Output in arxiv.tar.gz: 10 files, 102KiB compressed

running on travis

The program runs great on my local computer. However, when I tried to run it on the CI service Travis, only the main tex file and the bbl file were included. The image file included files and others included tex files were not included. Do you by chance have any experience to run the arxiv-collector from Travis?

Thank you for this!

Was wrangling with compilation issues and .bbl version mismatches till I found this. Got to say, this software neatly does exactly what it promises.

Excellent work! Thanks again.
PS: Feel free to close this issue.

Removing {microtype}?

arXiv has a long-standing history of not allowing the package {microtype} (possibly because of some interplay with {hyperref}?) and failing to compile with a very obscure message if the package is included. Would it be possible to either remove the package if it's included, or generate a warning to the user?

ascii input as UTF-8 doesn't exist

Trying to use the script on a document in ubuntu. Compiling using latexmk -pdf [main.tex] works. When I run arxiv-collector main.tex I get the following error:

(xenial)brett@localhost:~/Downloads/SC_Conference$ arxiv-collector main.tex
Building main...
Traceback (most recent call last):
  File "/home/brett/.local/bin/arxiv-collector", line 11, in <module>
    sys.exit(main())
  File "/home/brett/.local/lib/python3.5/site-packages/arxiv_collector.py", line 174, in main
    strip_comments=args.strip_comments, verbosity=args.verbosity)
  File "/home/brett/.local/lib/python3.5/site-packages/arxiv_collector.py", line 109, in collect
    add(dep)
  File "/home/brett/.local/lib/python3.5/site-packages/arxiv_collector.py", line 68, in add
    raise OSError("{} doesn't exist!".format(path))
OSError: ascii input as UTF-8 doesn't exist!

I'm not exactly sure what the error means. Are you expecting all files to be encoded as UTF-8?

Xelatex support

is there a possibility to make this package support other compilers such as xelatex or luatex?

Bad output of latexmk --version i.e. v4.69a

Thanks for all the good work (it is really useful!) but unfortunately your script is broken again (most likely latexmk faults)

Traceback (most recent call last):
  File "./arxiv_collector.py", line 378, in <module>
    main()
  File "./arxiv_collector.py", line 353, in main
    version = get_latexmk_version(args.latexmk)
  File "./arxiv_collector.py", line 77, in get_latexmk_version
    raise ValueError("Bad output of {} --version:\n{}".format(latexmk, out))
ValueError: Bad output of latexmk --version:

Latexmk, John Collins, 17 Apr. 2020. Version 4.69a

the problem is the "." after "Apr."

I quick-fixed it by changing
version_re = re.compile(r"Latexmk, John Collins, \d+ \w+\. \d+\. Version (.*)$")

but maybe one should take a deeper look into latexmk to see what they do there (which I didn't manage in the hurry just now)

Should replace ".eps" file extension in tex files

My arxiv-collector version is:

0.3.5 (installed via conda-forge)

Description:

arxiv-collector places pdf files instead of eps files in the .tar.gz (and does rightly so). However, the includegraphics command does not get updated - it still refers to the eps file. A simple regex / replace should fix the issue.

arxiv.tar.gz

Problem with Python script

Dear Dougal,

I get the following ValueError: Unexpected EOF. I'm using Windows 10 and I have

file.tex
mystylefile.sty
file.bbl

as files. Or does this script not work with Windows?

Best,

Jan

use a real tex parser to strip comments + optionally certain commands

For example, if you have \includepackage[disable]{todonotes}, if you're stripping comments you probably also want to strip the contents of \todo{}.

Similarly, probably want to remove the contents of any \iffalse blocks.

Of course this might require a proper tex parser to do correctly....

Dependencies are determined using pdflatex, disregarding what is specified in .latexmkrc

My arxiv-collector version is:

arxiv_collector.py 0.4.1

Rough situation: I have a paper with many plots generated with R, each containing quite a few datapoints.
This doesn't compile with pdflatex due to the limited... dunno, stack? memory? Something like that.
It does compile with lualatex, which I've specified in my .latexmkrc:

# LuaLatex
$pdf_mode = 4;

Now, from my understanding, arxiv-collector by default tries to create the dependencies file through latexmk -pdf ..., which apparently overrides my settings and calls pdflatex, which fails.

This can be circumvented like so:

# Save dependencies for arxiv_collector.
$dependents_list = 1;
$deps_file = ".deps";

and

python3 arxiv_collector.py --latexmk-deps .deps ...

But it was a bit surprising.

test suite + CI

Would be nice to have some simple tests that things work correctly, and run them on travis / circle / azure....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.