laurentrdc / pandoc-pyplot Goto Github PK
View Code? Open in Web Editor NEWA Pandoc filter to generate Matplotlib/Plotly figures directly in documents
License: GNU General Public License v2.0
A Pandoc filter to generate Matplotlib/Plotly figures directly in documents
License: GNU General Public License v2.0
This filter has a nice option to include a file before the code:
# file.py
x = list(range(5))
# Include in markdown
```{.pyplot include=file.py}
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, list(map(lambda x: x**2, x)))
plt.title('This is an example figure')
```
But often it may be useful to include more than one script:
# file1.py
x = list(range(5))
# file2.py
y = list(map(lambda a: a**2, x))
# Include multiple files in markdown
```{.pyplot include=file1.py include=file2.py}
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, y)
plt.title('This is an example figure')
```
If pandoc does not allow repeating attributes (include=file1.py include=file2.py
) then it can be achieved with include="file1.py,file2.py"
Sometimes you'd like to be able to pass in additional parameters to saveFig
. The usual one here, for me at least, is bbox_inches='tight'
to get around issues with plots being cut off. Here's one example on SO to that effect.
It would be nice if, like dpi
, there were configuration options for most (or all?) of the parameters that a user might want to pass to saveFig
.
I don't think there's currently a way to do this, judging by how the saveFig
call is constructed.
Unrelated, thanks for this library. ๐ Making the transition from "LaTeX + Makefiles running miscellaneous figure scripts" to Pandoc a lot easier.
Hey there,
I've been trying to run the example from usage, but it's throwing an error:
$ pandoc --filter pandoc-pyplot input.md --output output.pdf
Error running filter pandoc-pyplot:
Error in $: Failed reading: not a valid json value
Any idea what might cause this?
My system is x86_64 Arch Linux with pandoc version 2.7.2. I've installed pandoc-pyplot via stack.
Thanks in advance.
How about storing state of python objects between executions of different blocks?
The idea is following: imagine we have two code blocks and in the second one we want to have variable available, which was defined in previous block. At the moment each code block executes in isolated (in terms of global objects available) python enviroment. So, if you are unable to do the following:
# First
```{.pyplot capture="a,b"}
import matplotlib.pyplot as plt
a = list(range(5))
b = list(range(10))
plt.figure()
plt.plot(a, list(map(lambda x: x**2, a)))
plt.title('This is an example figure')
```
# Second
```{.pyplot needs="a,b"}
import matplotlib.pyplot as plt
plt.figure()
plt.plot(a, list(map(lambda x: x**3, a)))
plt.plot(a, list(map(lambda x: x**2, b)))
plt.title('This is an example figure')
```
Sure, we can simply copy-paste a = list(range(5))
and b = list(range(10))
or put it in separate file and include each time we want to have it available in our code. But this could result in overhead in performance if it something harder than list(range(10))
. For example, we might need to preprocess out data, show the density histogram, then normalize the preprocessed data and show new density plot.
I can think of following approach:
Firstly, we need to launch only one instance of python process while filter is working. And feed it with our scripts. It should store our variables and function definitions and they will persist in memory during work of filter. Look how it could be achieved with bash:
# generate script files
for ((i = 0 ; i < 10 ; i++)); do echo "print($i)" > "file${i}.py"; done
# feed them to python
for ((i = 0 ; i < 10 ; i++)); do cat "file${i}.py"; done | python3
Secondly, I don't know yet how, but it would be cool if we can externally provide python process to this filter to maintain state between running of pandoc with this filter. Consider this example:
# Example
Here is only one plot
```{.pyplot capture="a"}
import matplotlib.pyplot as plt
a = list(range(5)) # there will be much harder task in real-life example
plt.figure()
plt.plot(a, list(map(lambda x: x**3, a)))
plt.title('This is an example figure')
```
Here a two plots:
```{.pyplot needs="a"}
import matplotlib.pyplot as plt
b = list(range(10))
plt.figure()
plt.plot(a, list(map(lambda x: x**3, a)))
plt.plot(a, list(map(lambda x: x**2, b)))
plt.title('This is an example figure')
```
Then I convert it with pandoc:
$ pandoc --filter pandoc-pyplot -t html5 example.md
And imagine if I slightly change the code:
Here a two plots:
```{.pyplot needs="a"}
import matplotlib.pyplot as plt
b = list(range(10))
plt.figure()
- plt.plot(a, list(map(lambda x: x**3, a)))
+ plt.plot(a, list(map(lambda x: x**4, a)))
plt.plot(a, list(map(lambda x: x**2, b)))
plt.title('This is an example figure')
```
If I try to convert it again, first code block will have to be executed as well:
$ pandoc --filter pandoc-pyplot -t html5 example.md
Maybe we can borrow some ideas from here:
dir=`mktemp -d /tmp/temp.XXX`
keep_pipe_open=$dir/keep_pipe_open
pipe=$dir/pipe
mkfifo $pipe
touch $keep_pipe_open
# Read from pipe:
python3 < $pipe &
# Keep the pipe open:
while [ -f $keep_pipe_open ]; do sleep 1; done > $pipe &
# Write to pipe:
for ((i = 0 ; i < 10 ; i++)); do cat "file${i}.py"; done > $pipe
# close the pipe:
rm $keep_pipe_open
wait
rm -rf $dir
Hello again :)
It is possible to read metadata of document within filter. It is much more convenient to define configurations for this filter in YAML metadata block within markdown document that is going to be processed. Moreover, pandoc has support for providing this information in separate file. This is inconvenient to have another file especially for this filter.
It will be convenient to add include scripts to this YAML metadata block as well.
So, the final example will look as following (with #6 implemented):
---
title: How to model Hypergeometric Distribution?
pyplot_interpreter: python3
pyplot_format: svg
pyplot_links: false
pyplot_includes:
plotly: |
```
import plotly as py
import plotly.graph_objs as go
```
hg: |
```
import scipy as sp
# HG implements hypergeometric distribution
class HG(object):
def __init__(self, N: int, m: int, n: int):
self.N = N
self.m = m
self.n = n
def p(self, k: int) -> float:
return sp.special.comb(self.m, k) \
* sp.special.comb(self.N - self.m, self.n - k) \
/ sp.special.comb(self.N, self.n)
def __str__(self) -> str:
return f'HG({self.N}, {self.m}, {self.n})'
xi = HG(30, 15, 20)
hist_data_x = np.arange(xi.n+1)
```
---
# Try plotly
<!-- In this code block I need only to include `mpl`
as I do not use any things from `data`:-->
```{.pyplot includes="plotly"}
fig = go.Figure(
data=(go.Scatter(
x=list(range(10)),
y=list(map(lambda x: x**3, range(10))),
mode='markers',
),)
)
```
# Histogram
Here is a histogram of $HG(30,15,20)$:
```{.pyplot includes="plotly,hg"}
hg_hist_fig = go.Figure(
data=(go.Scatter(
x=list(hist_data_x),
y=list(map(xi.p, hist_data_x)),
mode='markers',
),),
layout=go.Layout(
title=go.layout.Title(
text=r'$\xi \sim ' + str(xi) + '$',
x=.5,
),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(
text=r'$\mathbb{P}(\xi=k)$',
)),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(
text=r'$k$',
)),
),
)
```
Also, it will be very nice if I can include scripts that were already in document and reference them by their name without executing them separately. Look at this example:
# Modelling
Let's firstly define class `HG` which would contain information about
parameters $N$, $m$ and $n$ and method `p(k)` for calculating
the probability of event when $\xi = k$:
```{.python .pyplot .noExec name="class"}
import scipy as sp
class HG(object):
def __init__(self, N: int, m: int, n: int):
self.N = N
self.m = m
self.n = n
def p(self, k: int) -> float:
return sp.special.comb(self.m, k) \
* sp.special.comb(self.N - self.m, self.n - k) \
/ sp.special.comb(self.N, self.n)
def __str__(self) -> str:
return f'HG({self.N}, {self.m}, {self.n})'
```
Then create object of random variable $\xi \sim HG(30, 15, 20)$:
```{.python .pyplot .noExec name="xi"}
xi = HG(30, 15, 20)
```
Next step is to define interval $\overline{0, n}$ for $k$ where we will draw our histogram:
```{.python .pyplot .noExec name="data"}
import numpy as np
hist_data_x = np.arange(xi.n+1)
```
Finally, draw the plot:
```{.python .pyplot .noExec name="plot"}
import plotly
import plotly.graph_objs as go
hg_hist_fig = go.Figure(
data=(go.Scatter(
x=list(hist_data_x),
y=list(map(xi.p, hist_data_x)),
mode='markers',
),),
layout=go.Layout(
title=go.layout.Title(
text=r'$\xi \sim ' + str(xi) + '$',
x=.5,
),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(
text=r'$\mathbb{P}(\xi=k)$',
)),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(
text=r'$k$',
)),
),
)
```
The result would be following:
```{.pyplot include="class,xi,data,plot"}
# I do not have to copy-paste anything here.
```
It would be awesome if you can implement this!
when i install pandoc-pyplot with cabal install
, i catch an error:
`
[1 of 3] Compiling Text.Pandoc.Filter.Scripting ( src/Text/Pandoc/Filter/Scripting.hs, dist/build/Text/Pandoc/Filter/Scripting.o )
src/Text/Pandoc/Filter/Scripting.hs:45:50: error:
* Variable not in scope:
(<>) :: m0 ExitCode -> String -> IO ExitCode
* Perhaps you meant one of these:
</>' (imported from System.FilePath),
<$>' (imported from Prelude), *>' (imported from Prelude) Perhaps you want to add
<>' to the import list in the import of
`Data.Monoid' (src/Text/Pandoc/Filter/Scripting.hs:27:1-37).
src/Text/Pandoc/Filter/Scripting.hs:57:32: error:
* Variable not in scope: (<>) :: [Char] -> String -> t0
* Perhaps you meant one of these:
</>' (imported from System.FilePath),
<$>' (imported from Prelude), *>' (imported from Prelude) Perhaps you want to add
<>' to the import list in the import of
`Data.Monoid' (src/Text/Pandoc/Filter/Scripting.hs:27:1-37).
src/Text/Pandoc/Filter/Scripting.hs:57:46: error:
* Variable not in scope: (<>) :: t0 -> [Char] -> PythonScript
* Perhaps you meant one of these:
</>' (imported from System.FilePath),
<$>' (imported from Prelude), *>' (imported from Prelude) Perhaps you want to add
<>' to the import list in the import of
Data.Monoid' (src/Text/Pandoc/Filter/Scripting.hs:27:1-37). cabal: Leaving directory '.' cabal: Error: some packages failed to install: pandoc-pyplot-1.0.2.0 failed during the building phase. The exception was: ExitFailure 1
on Ubuntu 18.04 gcc-7 ghc-8.0.2 cabal-install version 1.24.0.2
I'm getting the following error, when using pandoc-pyplot as a filter:
pandoc-pyplot: Error in $: Incompatible API versions: encoded with [1,20] but attempted to decode with [1,17,5,4].
CallStack (from HasCallStack):
error, called at .\Text\Pandoc\JSON.hs:111:48 in pandoc-types-1.17.5.4-1YIMZftZkkF55zFQfWBdlu:Text.Pandoc.JSON
Error running filter pandoc-pyplot:
Filter returned error status 1
make: *** [Makefile:89: result.pdf] Error 83
pandoc-pyplot: 2.2.0.0
Pandoc version: 2.8
System: Windows 10 Pro x64, Polish language
Hello!
Let me first thank you for such an amaizing filter!!! I was looking especially for this kind of functionality and run into this repo.
But the only disadvantage of the approach you are using is that with this filter I am limited to matplotlib and can't use any other, like Plotly. Matplotlib is great, but it lacks some features and it would be awesome to add support for other libraries, too.
Instead of adding support for each library individually, I suggest to implement more general functionality (I can't make a PR as I do not know Haskell yet) by providing exporter
argument that should be a function that takes filename and does whatever stuff that should be done to save figure. For matplotlib it will look something like following:
```{.pyplot type="svg" exporter="lambda filename: plt.savefig(filename)')"}
plt.figure()
plt.plot([0,1,2,3,4], [1,2,3,4,5])
plt.title('This is an example figure')
```
This approach allows you to use any library and any format that you want:
```{.pyplot type="html" exporter="lambda filename: py.offline.plot(hg_hist_fig, filename=filename"}
import plotly as py
import scipy as sp
hg = sp.stats.hypergeom(xi.N,xi.m,xi.n)
hg_hist_x = range(*map(int, sp.stats.hypergeom.interval(1, 80, 15, 20)))
hg_hist_fig = go.Figure(,
data=(go.Scatter(,
name='histogram',
x=list(hg_hist_x),
y=list(map(hg.pmf, hg_hist_x)),
mode='markers',
),),
layout=go.Layout(,
title=go.layout.Title(,
text=r'$$\\xi \\sim HG(80, 15, 20)$$',
x=.5,
),
yaxis=go.layout.YAxis(,
title=go.layout.yaxis.Title(,
text=r'$$\\mathbb{P}(\\xi=k)$$',
),
),
xaxis=go.layout.XAxis(,
title=go.layout.xaxis.Title(,
text=r'$$k$$',
),
),
),
)
```
In this example we embed html in final document (it may me useful for converting your markdown to html and then to pdf).
P.S.: Take a look at this repo it may contain some great ideas as well.
Based on a comment in #4 :
Pandoc filters only work on the intermediate document representation.
As I see here, getting the output format it possible (I am not sure, because I don't know Haskell)
The idea is to support interactive Plotly plots (via HTML), but fall back to static images when the output format is not HTML, (e.g. PDF)
From Pandoc toJSONFilter
documentation:
An alternative is to use the type Maybe Format -> a -> a. This is appropriate when the first argument of the script (if present) will be the target format, and allows scripts to behave differently depending on the target format. The pandoc executable automatically provides the target format as argument when scripts are called using the
--filter
option.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.