Comments (14)
Oops sorry about that. There are some changes re: renderer's handling of footnotes
that haven't been updated to pypi yet (see commit 95af86e). I'll prepare a release soon.
If you clone the repo directly (mistletoe v0.4) the above code should work. If you installed by pypi (mistletoe v0.3.1) you need to define render_raw_text
with an additional footnotes
argument:
def render_raw_text(self, token, footnotes):
...
All other code stays the same.
Sorry about that, and watch for the recent v0.4 release to get rid of the footnotes
argument!
from mistletoe.
Hey, sorry – just making sure I understand what you're trying to do, say the returned data is <p>sample paragraph</p>
, what you want is sample paragraph
, is that correct? And do you need all the text without the enclosing HTML tags, or do you want to isolate each token and do something with them individually?
In either case, it seems like you would probably want to write a custom renderer. Could you explain more what your project is, so that I might give appropriate suggestions?
from mistletoe.
Hi thanks and sorry for not making it clear.
Basically I'm learning python by creating a simple GitHub bot. This bot extracts the contents of the a README.md file and checks if all spellings are correct. So, right now with your library I can get all the contents of the readme but I want to limit it to only get contents from the p, h1,h2... tags. In other words, I don't need the code part.
So, I used your library as:
readme = os.getcwd() + "/clones/samayo/Autolog/readme.md"
with open(readme, 'r') as fin:
rendered = mistletoe.markdown(fin)
print("rendered", rendered)
I get this:
<h2>Autolog</h2>
<p>APP class that to errors or notifications from your app or from <code>/var/log/</code> or as they appear</p>
<h2>Install</h2>
<p>Using git</p>
<pre>
<code class="lang-bash">
$ git clone https://github.com/samayo/autolog.git
</code>
</pre>
<p>Using composer</p>
<pre>
<code class="lang-`bash">
$ composer require samayo/autolog
</code>
But I'm only interested in getting "Autolog APP class that to errors or notifications from your app or from Install ..."
I'm not sure if you have this feature, it would be nice to have a method that fetches by tags like a header ex:
with open(readme, 'r') as fin:
rendered = mistletoe.markdown(fin)
print("rendered", rendered.getHeader())
If there is no way to achive this, It's okay, I can use regex for it
from mistletoe.
Ah, I see what you mean. No need to write a entirely new renderer then. In your case it would be easier to exploit the design of mistletoe, namely, that all render functions eventually need to call render_raw_text
, so that the recursions may bottom out. Here's a hacky way to do it:
import mistletoe
readme = ['## hey\n', '\n', 'APP class that ...\n', '\n', 'using git\n']
raw_text = []
def render_raw_text(self, token):
raw_text.append(token.content)
return token.content
mistletoe.HTMLRenderer.render_raw_text = render_raw_text # hacky!
rendered = mistletoe.markdown(readme)
print(raw_text)
So in this case I am creating a (global) variable called raw_text
, which is a list to put all the contents into. I defined a new version of render_raw_text
, essentially telling the renderer to first put whatever is in token.content
into my raw_text
list, and then do whatever rendering as needed. I then hacked this function onto my HTMLRenderer
class (if you want a neater solution you may want to subclass it). Finally, I let the renderer run through all the tokens, rendering each one. In doing so my custom version of render_raw_text
will get hit every time.
Here's what raw_text
looks like, after running the script:
['hey', 'APP class that ...', 'using git']
I assume this is what you want? But great question, I might document this in an example in the future.
Let me know if something's confusing!
from mistletoe.
Ah, thanks for the response and detailed explanation. I will check this during the weekend and see how it goes. I will close this for now, I will re-open it if something comes up.
thanks again
from mistletoe.
hmm so I finally got some time just now and when I tried your code I get this error:
TypeError: render_raw_text() takes 2 positional arguments but 3 were given
I'll try to parse it in regex until you can find a solution for this
from mistletoe.
Version 0.4 released! Run pip3 install --upgrade mistletoe
to upgrade.
from mistletoe.
Thanks. I'll do this now 👍
from mistletoe.
Nope. Sadly this doesn't work. I updated to 0.4 and I don't have the error but I get all contents of the readme, ... maybe I didn't make my question clear enough because I get the codes, and markdown tags.
Can you check for the last time if this code is correct?
import mistletoe
raw_text = []
def render_raw_text(self, token):
raw_text.append(token.content)
return token.content
path = os.getcwd() + "/clones/samayo/Autolog/readme.md"
readme = open(path, "r")
mistletoe.HTMLRenderer.render_raw_text = render_raw_text
rendered = mistletoe.markdown(readme)
print(raw_text)
from mistletoe.
Could you paste in the result of your print(raw_text)
, and what you want it to look like? Maybe also run it with the README of this repo, so that we have something to compare notes with. On my machine, even with a fresh install from pip, I seem to be seeing the correct outputs, so perhaps there's still miscommunication going on!
It's totally fine to drag on this issue! I want to help you resolve this, because extracting stuff from HTML by regex is not a good idea..
from mistletoe.
Sorry for the delay. I hardly get any time off, so hopefully I make it clear this time.
In short. I only want the headers, paragraph (actual English words/text) to get from a readme. As in, I don't want the codes (anything inside ``` because you can not spell check sys
for example. also, I don't want to get the ### or --- symbols) I only want to get the real understandable words, like:
Autolog
A PHP class that to save/log/mail errors or notifications from your app or from/var/log/
or as they appear
So, let's assume you have a README.md file that contains these lines:
## Autolog
A PHP class that to save/log/mail errors or notifications from your app or from `/var/log/` or as they appear
Install
-----
Using git
```bash
$ git clone https://github.com/samayo/autolog.git
```bash
Using composer
```bash
$ composer require samayo/autolog
```bash
Usage
-----
#### Short Example.
A simple snippet to send a message to your inbox.
```php
require __DIR__ . "/src/Logger.php";
So, since I only want to get the actual headers, sentences I expect to get from mistletoe Not the code part, but all others (with the exception of the markdown tags (###, ---)
So, if I run this code on python3.x
raw_text = []
def render_raw_text(self, token):
raw_text.append(token.content)
return token.content
path = os.getcwd() + "/clones/samayo/Autolog/readme.md"
readme = open(path, "r")
mistletoe.HTMLRenderer.render_raw_text = render_raw_text
rendered = mistletoe.markdown(readme)
print(raw_text)
what I get in return is this.
[root@samayo typot]# python3 typot.py
['Autolog', 'A PP class that to errors or notifications from your app or from ', '/var/log/', ' or as they appear', 'Install', 'Using git', '$ git clone https://github.com/samayo/autolog.git\n', 'Using composer', '$ composer require samayo/autolog\n````\n\nUsage\n-----\n#### Short Example. \nA simple snippet to send a message to your inbox. \n```php\nrequire __DIR__ . "/src/Logger.php"; \n\n$log = new => "[email protected]"]);\n\nif($userCommented){\n $log->log("Someone just commented!", $log::INFO, $log::EMAIL); \n}\n', 'The ', '$log->log()', ' method accepts 4 arguments, only the first ', '$msg',
' is required, others are optional.', 'log($msg, $type, $handler, $verbosity);\n', 'You can use different logtype, handler and verbosity', '// options for $type - to describe the log type\n$type::INFO; // info for simple tasks\n$type::ERROR; // simple error like 404 .. \n$type::ALERT; // fatal error or suspicious activity\n\n// Options for $handler - on how to registered the log message\n$handler::EMAIL; // send to email\n$handler::FILE; // write to file\n$handler::DATABASE; // insert to database \n$handler::SMS; // send to sms\n\n// do you need the all the info, or relevant (simple)\n$verbosity::SIMPLE; // send simplified log\n$verbosity::VERBOSE; // send every log information\n\n// to get log of all error in verbose format\n$log = new => "[email protected]"]);\n$log->log($msg, $log::ERROR, $log::EMAIL,
$log::VERBOSE);\n', 'Passing only the message as ', '$log->log($msg)', " is possible, and it'll be handled type INFO, and sent by email", 'Examples', '##### Sending logs to your email', '// First you need to setup your email as\n$log = new Taut...
Which is clearly unintended because I get the all even the code parts. I get what's inside my code blocks and I even get the the markdown tags '##### Sending logs to your email',
when it should have been something like ''Sending logs to your email','
Anyway, it is still not clear, don't bother with it, it was a silly project and not worth to waste any of our time. thanks
from mistletoe.
oh, and yeah .. I knew about the regex issue with parsing HTML. I was just out of options :)
from mistletoe.
I see what the problem is. So you want to suppress BlockCode
and InlineCode
outputs. The cleanest way would be to write your own renderer. Here's the quick and dirty way, though:
import mistletoe
raw_text = []
class CustomRenderer(mistletoe.HTMLRenderer):
def render_inline_code(self, token):
return ''
def render_block_code(self, token):
return ''
def render_raw_text(self, token):
raw_text.append(token.content)
return token.content
with open('foo.md', 'r') as f:
with CustomRenderer() as renderer:
rendered = renderer.render(mistletoe.Document(f))
print(raw_text)
The other problem is that your markdown format is slightly messy. For code blocks, the format should be:
```python
# some code
```
... instead of:
```python
# some code
```python
I don't plan to support the second format (commonmark also doesn't support it), so maybe clean it up before you feed it into mistletoe?
from mistletoe.
Ah, ok thanks. I will check this out, closing for now. Also, I added the second python
in the code block so I can render the example, otherwise I am only using the first example as others do :)
cheers
from mistletoe.
Related Issues (20)
- Enable tables which interrupt a paragraph (like GFM does) HOT 5
- IndexError on malformed markdown tables HOT 1
- Emphasis around link ignored when followed by a special punctuation character HOT 1
- Don't convert the quotes HOT 3
- Contrib sub-package is missing in the mistletoe 1.0.0 PyPI version HOT 3
- Question how to modify ast and write back out as markdown HOT 2
- Do not escape double or single quotes by default HOT 1
- Use strict PascalCase for class names HOT 3
- Renderer for Natural Docs HOT 6
- How to add a custom token inside a block token? HOT 1
- How to add a custom token inside a block token? HOT 3
- make output markdown keep original indentation of the 1st line in a list item
- parse-render loop creates newline HOT 3
- How to use extension tokens for parsing? HOT 4
- Render Heading to markdown: ValueError: list.remove(x): x not in list HOT 2
- `block_token.List.start` is both a classmethod and attribute HOT 1
- BaseRenderer.render_line_break does not work without override HOT 1
- Delete a Document node. Recommended way? HOT 4
- `MarkdownRenderer` is not concurrent safe HOT 3
- MarkdownRenderer should emit extra newline after list HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mistletoe.