elliotgao2 / tomd Goto Github PK
View Code? Open in Web Editor NEWConvert HTML to Markdown.
License: GNU General Public License v3.0
Convert HTML to Markdown.
License: GNU General Public License v3.0
so, guess we have a html like this:
<p>this was jhon's car finally arrived at jane's palce</p>
and we get:
this was jhon
'
s car finally arrived at jane
'
s place
im currently busy with something else so no time to toy around with this, but the bug is present i guess.
网页里面的图片无法解析成markdown
Hello, I found it can't convert self-closing tag like <img src="https://github.com" class="dsad"/>
.
But it work fine with <img src="https://github.com" class="dsad"></img>
当我在爬取CSDN文章时,下面标签转化过程中出现了问题。
原文链接为:https://blog.csdn.net/weixin_38405253/article/details/100151657
<li>
RetentionPolicy.SOURCE: 注解只保留在源文件中
</li>
<li>
RetentionPolicy.CLASS : 注解保留在class文件中,在加载到JVM虚拟机时丢弃
</li>
<li>
RetentionPolicy.RUNTIME: 注解保留在程序运行期间,此时可以通过反射获得定义在某个类上的所有注解。
</li>
看了一下tomd的源码,有点看不懂,所以不清楚怎么改,所以自行打了一个补丁,代码如下
import re
str_ = '''<li>
RetentionPolicy.SOURCE: 注解只保留在源文件中
</li>
<li>
RetentionPolicy.CLASS : 注解保留在class文件中,在加载到JVM虚拟机时丢弃
</li>
<li>
RetentionPolicy.RUNTIME: 注解保留在程序运行期间,此时可以通过反射获得定义在某个类上的所有注解。
</li>'''
pattem = re.compile(' *<li.*?>(.*?)</li>', re.S)
s = re.sub(pattem, lambda temp: "+ " + temp.group(1).strip(), str_)
print(s)
<em>
-tags convert to example instead example
It is better to add header space.
very strict with the html format ,not working at the situation
<ul>
<li>123123131</li></ul><ul><li>1
</li>
<li>2</li>
<li>3</li>
</ul>
麻烦看下
The result of processed data can't build correct table in markdown.
It seems that \n\t have to be deleted before the html data process
`
table='''
head1 | head2 | head3 |
---|---|---|
content1 | content2 | content3 |
md = Tomd(table).markdown
`
|head1 |head2 |head3
|------
|content1| |content2| |content3|
`
中文用这个就是乱码了,有什么办法吗?
html = """
<p>paragraph
<img src="https://github.com"></img>
</p>
<img src="https://github.com"></img>
"""
print tomd.convert(html)
Hi, I find the case that
When the tag is <img></img>
:
tomd.convert('''<p><img src="https://github.com" class="dsad"></img></p>''')
the parsed result is \n![](https://github.com)\n
, which is what I expect,
But, when the tag is <img />
:
tomd.convert('''<p><img src="https://github.com" class="dsad"/></p>''')
the result is: \n<img src=\"https://github.com\" class=\"dsad\"/>\n
, so it seems that the self contained tag cannot be parsed.
Can we repair it?
tomd.convert('<p><b> bold </b></p>') # '\n** bold **\n', works
tomd.convert('<b> bold </b>') # "", does not work
maybe pyquery can be useful, something like this:
from pyquery import Pyquery as pq
from tomd import MARKDOWN
html = "<b> bold </b>"
doc = pq(html)
for elm, val in MARKDOWN.items():
# for item in doc(elm): replace item.html() with val[0] + pq(item).text() + val[1]
I am trying to convert my html codes.
But specially <br/>
tag is not replaced markdown syntax
Hi,
can we have a new release of this, even with the code as is? I'm a big fan and user of this lib and I'd like to have a new relase instead of hand patching on every computer I use.
Would be greatly appreciated.
Example:
<tr height="19">
<td style="border-bottom:#000000 1px solid;text-align:center;border-left:#000000 1px solid;font-style:normal;width:72px;height:19px;color:#000000;font-size:12px;vertical-align:middle;border-top:#000000 1px solid;font-weight:700;border-right:#000000 1px solid;text-decoration:none;mso-text-control:shrinktofit;mso-protection:locked visible" class="et2" height="19" width="72">
The above tags won't be parsed and turn into empty string.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.