Giter VIP home page Giter VIP logo

Comments (14)

miyuchina avatar miyuchina commented on May 27, 2024 1

Oops sorry about that. There are some changes re: renderer's handling of footnotes that haven't been updated to pypi yet (see commit 95af86e). I'll prepare a release soon.

If you clone the repo directly (mistletoe v0.4) the above code should work. If you installed by pypi (mistletoe v0.3.1) you need to define render_raw_text with an additional footnotes argument:

def render_raw_text(self, token, footnotes):
    ...

All other code stays the same.

Sorry about that, and watch for the recent v0.4 release to get rid of the footnotes argument!

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Hey, sorry – just making sure I understand what you're trying to do, say the returned data is <p>sample paragraph</p>, what you want is sample paragraph, is that correct? And do you need all the text without the enclosing HTML tags, or do you want to isolate each token and do something with them individually?

In either case, it seems like you would probably want to write a custom renderer. Could you explain more what your project is, so that I might give appropriate suggestions?

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Hi thanks and sorry for not making it clear.

Basically I'm learning python by creating a simple GitHub bot. This bot extracts the contents of the a README.md file and checks if all spellings are correct. So, right now with your library I can get all the contents of the readme but I want to limit it to only get contents from the p, h1,h2... tags. In other words, I don't need the code part.

So, I used your library as:

readme = os.getcwd() + "/clones/samayo/Autolog/readme.md"
with open(readme, 'r') as fin:
  rendered = mistletoe.markdown(fin)
  print("rendered", rendered)

I get this:

<h2>Autolog</h2>
<p>APP class that to  errors or notifications from your app or from <code>/var/log/</code> or as they appear</p>
<h2>Install</h2>
<p>Using git</p>
<pre>
<code class="lang-bash">
$ git clone https://github.com/samayo/autolog.git
</code>
</pre>
<p>Using composer</p>
<pre>
<code class="lang-`bash">
$ composer require samayo/autolog
</code>

But I'm only interested in getting "Autolog APP class that to errors or notifications from your app or from Install ..."

I'm not sure if you have this feature, it would be nice to have a method that fetches by tags like a header ex:

with open(readme, 'r') as fin:
  rendered = mistletoe.markdown(fin)
  print("rendered", rendered.getHeader())

If there is no way to achive this, It's okay, I can use regex for it

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Ah, I see what you mean. No need to write a entirely new renderer then. In your case it would be easier to exploit the design of mistletoe, namely, that all render functions eventually need to call render_raw_text, so that the recursions may bottom out. Here's a hacky way to do it:

import mistletoe

readme = ['## hey\n', '\n', 'APP class that ...\n', '\n', 'using git\n']
raw_text = []

def render_raw_text(self, token):
    raw_text.append(token.content)
    return token.content

mistletoe.HTMLRenderer.render_raw_text = render_raw_text  # hacky!

rendered = mistletoe.markdown(readme)
print(raw_text)

So in this case I am creating a (global) variable called raw_text, which is a list to put all the contents into. I defined a new version of render_raw_text, essentially telling the renderer to first put whatever is in token.content into my raw_text list, and then do whatever rendering as needed. I then hacked this function onto my HTMLRenderer class (if you want a neater solution you may want to subclass it). Finally, I let the renderer run through all the tokens, rendering each one. In doing so my custom version of render_raw_text will get hit every time.

Here's what raw_text looks like, after running the script:

['hey', 'APP class that ...', 'using git']

I assume this is what you want? But great question, I might document this in an example in the future.

Let me know if something's confusing!

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Ah, thanks for the response and detailed explanation. I will check this during the weekend and see how it goes. I will close this for now, I will re-open it if something comes up.

thanks again

from mistletoe.

samayo avatar samayo commented on May 27, 2024

hmm so I finally got some time just now and when I tried your code I get this error:

TypeError: render_raw_text() takes 2 positional arguments but 3 were given

I'll try to parse it in regex until you can find a solution for this

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Version 0.4 released! Run pip3 install --upgrade mistletoe to upgrade.

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Thanks. I'll do this now 👍

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Nope. Sadly this doesn't work. I updated to 0.4 and I don't have the error but I get all contents of the readme, ... maybe I didn't make my question clear enough because I get the codes, and markdown tags.

Can you check for the last time if this code is correct?

import mistletoe

raw_text = []
def render_raw_text(self, token):
   raw_text.append(token.content)
   return token.content

path = os.getcwd() + "/clones/samayo/Autolog/readme.md"
readme = open(path, "r")
mistletoe.HTMLRenderer.render_raw_text = render_raw_text
rendered = mistletoe.markdown(readme)
print(raw_text)

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Could you paste in the result of your print(raw_text), and what you want it to look like? Maybe also run it with the README of this repo, so that we have something to compare notes with. On my machine, even with a fresh install from pip, I seem to be seeing the correct outputs, so perhaps there's still miscommunication going on!

It's totally fine to drag on this issue! I want to help you resolve this, because extracting stuff from HTML by regex is not a good idea..

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Sorry for the delay. I hardly get any time off, so hopefully I make it clear this time.

In short. I only want the headers, paragraph (actual English words/text) to get from a readme. As in, I don't want the codes (anything inside ``` because you can not spell check sys for example. also, I don't want to get the ### or --- symbols) I only want to get the real understandable words, like:

Autolog
A PHP class that to save/log/mail errors or notifications from your app or from /var/log/ or as they appear

So, let's assume you have a README.md file that contains these lines:

## Autolog

A PHP class that to save/log/mail errors or notifications from your app or from `/var/log/` or as they appear 

Install
-----

Using git
```bash
$ git clone https://github.com/samayo/autolog.git
```bash
Using composer
```bash
$ composer require samayo/autolog
```bash

Usage
-----
#### Short Example. 
A simple snippet to send a message to your inbox. 
```php
require __DIR__ . "/src/Logger.php"; 

So, since I only want to get the actual headers, sentences I expect to get from mistletoe Not the code part, but all others (with the exception of the markdown tags (###, ---)

So, if I run this code on python3.x

raw_text = []
def render_raw_text(self, token):
	raw_text.append(token.content)
	return token.content

path = os.getcwd() + "/clones/samayo/Autolog/readme.md"
readme = open(path, "r")
mistletoe.HTMLRenderer.render_raw_text = render_raw_text
rendered = mistletoe.markdown(readme)
print(raw_text)

what I get in return is this.

[root@samayo typot]# python3 typot.py
['Autolog', 'A PP class that to  errors or notifications from your app or from ', '/var/log/', ' or as they appear', 'Install', 'Using git', '$ git clone https://github.com/samayo/autolog.git\n', 'Using composer', '$ composer require samayo/autolog\n````\n\nUsage\n-----\n#### Short Example. \nA simple snippet to send a message to your inbox. \n```php\nrequire __DIR__ . "/src/Logger.php"; \n\n$log = new  => "[email protected]"]);\n\nif($userCommented){\n   $log->log("Someone just commented!", $log::INFO, $log::EMAIL);     \n}\n', 'The ', '$log->log()', ' method accepts 4 arguments, only the first ', '$msg',
' is required, others are optional.', 'log($msg, $type, $handler, $verbosity);\n', 'You can use different logtype, handler and verbosity', '// options for $type  - to describe the log type\n$type::INFO; // info for simple tasks\n$type::ERROR; // simple error like 404 .. \n$type::ALERT; // fatal error or suspicious activity\n\n// Options for $handler - on how to registered the log message\n$handler::EMAIL; // send to email\n$handler::FILE; // write to file\n$handler::DATABASE; // insert to database \n$handler::SMS; // send to sms\n\n// do you need the all the info, or relevant (simple)\n$verbosity::SIMPLE; // send simplified log\n$verbosity::VERBOSE; // send every log information\n\n// to get log of all error in verbose format\n$log = new  => "[email protected]"]);\n$log->log($msg, $log::ERROR, $log::EMAIL,
$log::VERBOSE);\n', 'Passing only the message as ', '$log->log($msg)', " is possible, and it'll be handled type INFO, and sent by email", 'Examples', '##### Sending logs to your email', '// First you need to setup your email as\n$log = new Taut...

Which is clearly unintended because I get the all even the code parts. I get what's inside my code blocks and I even get the the markdown tags '##### Sending logs to your email', when it should have been something like ''Sending logs to your email','

Anyway, it is still not clear, don't bother with it, it was a silly project and not worth to waste any of our time. thanks

from mistletoe.

samayo avatar samayo commented on May 27, 2024

oh, and yeah .. I knew about the regex issue with parsing HTML. I was just out of options :)

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

I see what the problem is. So you want to suppress BlockCode and InlineCode outputs. The cleanest way would be to write your own renderer. Here's the quick and dirty way, though:

import mistletoe

raw_text = []

class CustomRenderer(mistletoe.HTMLRenderer):
    def render_inline_code(self, token):
        return ''

    def render_block_code(self, token):
        return ''

    def render_raw_text(self, token):
        raw_text.append(token.content)
        return token.content

with open('foo.md', 'r') as f:
    with CustomRenderer() as renderer:
        rendered = renderer.render(mistletoe.Document(f))

print(raw_text)

The other problem is that your markdown format is slightly messy. For code blocks, the format should be:

```python
# some code
```

... instead of:

```python
# some code
```python

I don't plan to support the second format (commonmark also doesn't support it), so maybe clean it up before you feed it into mistletoe?

from mistletoe.

samayo avatar samayo commented on May 27, 2024

Ah, ok thanks. I will check this out, closing for now. Also, I added the second python in the code block so I can render the example, otherwise I am only using the first example as others do :)

cheers

from mistletoe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.