evbacher / gd2md-html Goto Github PK

View Code? Open in Web Editor NEW

649.0 12.0 57.0 199 KB

Convert a Google Doc to Markdown or HTML. This Docs add-on converts a Google Doc to simple Markdown and/or HTML.

License: Apache License 2.0

JavaScript 91.87% HTML 8.13%

markdown html google-docs google-docs-addon

gd2md-html's Introduction

Docs to Markdown (gd2md-html)

Docs to Markdown is a free Google Docs add-on that converts a Google Doc to simple, readable Markdown or HTML text.

Overview

Docs to Markdown lets you use Google Docs' editing, formatting, and collaboration tools before you publish to a Markdown or HTML platform. Docs to Markdown also lets you select and convert part of a Google Doc.

Docs to Markdown requires minimal permissions: it will ask for permission to access the current Doc (to convert it) and permission to create a sidebar (the user interface). It requires no other permissions.

Use Google Docs styles for headings (Heading 1, Heading 2, etc.) so that Docs to Markdown can convert them properly to Markdown or HTML heading styles. If you just make them bold and large, they will convert as normal paragraphs.

Installation and other details

Installation: Install Docs to Markdown from the Google Workspace Marketplace.
License: Docs to Markdown uses the Apache 2.0 license.
Documentation: See gd2md-html docs for more information about features and usage.
Privacy policy.

Contributing

If you want to contribute docs or code to this project, please read CONTRIBUTING.md and dev.md.

Contributors

Thanks to all!

gd2md-html's People

Contributors

Stargazers

Watchers

gd2md-html's Issues

Suppress error messages in document?

Hey, thanks for making this—I'm glad to see a tool that converts Markdown without many errors around bold and italic characters, which I know was a sticking point of the old gdocs2md plugin. I just wanted to offer a couple of thoughts/ideas on this:

I think there should be a way to suppress error messages, especially as they show up inline in some cases. (Some of the things that create error messages, specifically the message around there being multiple H1s in a document, aren't useful for me, as I tend to write multiple headlines when I'm editing.) My suggestion would be to either put them all at the top of the document so they don't disrupt the flow of the doc, or to allow users to turn them off entirely. For my purposes (mostly just standard core features of the MD spec like links, headers, bold, ital, blockquote, and the occasional image), the error messages would probably just slow things down.
Since Google Docs doesn't have a natural equivalent to the blockquote, I wonder if you'd be open to making it so that when a section is indented, it treats the indent the same as a blockquote. Could be left as an option for users.

Overall though, thanks for creating this. I write everything in Markdown, but all my stuff inevitably gets edited in Gdocs, meaning that it creates a lot of headaches around trying to convert things back. This helps ease some of the frustrations, I think. :)

Convert strikethrough text

Convert strikethrough text in a Doc to ~~strikethrough~~ in Markdown and <del>strikethrough</del> in HTML.

Adds unwanted line breaks to html output

HTML output seems to add line breaks to the output code. Not
tags, but actual line endings. While whitespace is ignored by html, when pasting into wordpress, the new lines are turned into line breaks which affects the flow of text.

Option to use Github markdown style

This chrome extension is really handy. Can you support a [ ] GitHub Markdown so that Github friendly markdown is generated?

Smart quotes get turned into straight quotes

As editors of an arts publication, we often use “smart” quotes in our articles.
However, when we use gd2md-html, the smart quotes all get turned into "straight quotes".

would be great if they were also replaced with “ and ”

Is there anyway this could be added in as a setting or option? Or are there ways in which we can extend the functionality of this add-on?

Nested tables in lists get mangled

Hello again!

While I continue busily pushing the envelope of your excellent script, I've discovered a consistent issue with tables nested in (ordered) lists. The script creates the following, comments mine:

<li> Enter the following:
<table>
                    <tbody>
                        <tr>
                            <td>
                                <strong>Heading Cell 1</strong>
                            </td>
                        </tr>
                    </tbody>
                </table>
    </li>
</ol> <-- creates a closing tag
<p>
    <strong>Heading Cell 2</strong> <-- heading gets its own paragraph
</p>
<p> <-- contents get dumped here
    Row 1 Cell 1Row 1 Cell 2
    Row 2 Cell 1Row 2 Cell 2
    Row 3 Cell 1Row 3 Cell 2</p>
<ol> <-- opens a new list for the following step

Won't convent just a selection

When I select part of my doc to convert to HTML, the tool doesn't put anything into the output section. It processes for a second and seems like it's done some "thinking" but then there's nothing there. Screenshot attached for reference.

Blank headings should be ignored

A blank heading in a Google Doc (whitespace characters only) may cause problems with some HTML or Markdown publishing systems. gd2md-html should ignore such headings. Blank plain paragraphs are already ignored (except when they appear in code blocks).

Names of images in markdown

Hi. thanks for working on this :)

When you export to HTML from google docs, they name the files image1.jpg, image2.jpg etc.
We get a nice folder of images.

If you output ![images/image1.jpg] - etc instead of title + index + jpg, it would make life a lot easier when mapping image files.

we could simply copy the folder into the relative path and it should work.

if you then suppress all the error warning, it would help us make this process a bit cleaner..
what do you think?

HTML conversion clears image alt text and original image filename

Converting to HTML clears any indication of the original image filename. Renamed images also don't correspond to the "image0, image1, etc" convention used when exporting to HTML from Google Docs, so there isn't a clean way to figure out which image file goes with which broken element.

ex, with a Google Docs filename starting with "Deploying":

<img src="images/Deploying4.png" width="" alt="alt_text" title="image_tooltip">

The HTML export directly from Google Docs contains the alt text:

<img alt="object_id.png" src="images/image3.png" ...

Unless I'm overlooking something, preserving the alt attribute seems necessary for fixing an exported file.

Handle embedded paragraphs in lists and nested lists (Markdown).

Not handling embedded paragraphs in lists properly for either Markdown or HTML.

Terminations on newlines

If I have some content in gDocs which is marked up like this (in HTML just to illustrate)

<strong>My bolded text<br /></strong>My regular text

This addon will generate this:

**My bolded text
**My regular text

Which makes sense, but it breaks the markdown interpretation based on what is really intended. It would be better if it figured that out and gave this:

**My bolded text**
My regular text

Unfortunately there is no way in gDocs for you to realize that you are committing that formatting mistake, so we can't really lump it on the gDocs editor who wrote it.

Great addon by the way, thanks for sharing!

Support: Where is the source code?

Hi! Thanks so much for this tool! I'm hoping to script similar functionality, and was curious whether the source code existed anywhere, or if it was private :)

HTML conversion adds space into hyperlink incorrectly

To recreate the issue:

Create a link within a sentence
Convert to HTML (leaving all options off)
See the output has added the preceding space into the hyperlink as per the following screenshot:

Add option to configure bullet conversion

This is the best Docs-to-Markdown conversion I've found yet, but there's still one document feature that I find I have to do significant post-processing on to get to the way I want it: bullets. I prefer to use hyphens (-), not asterisks (*), to indicate bullets in Markdown, and I want each level of indentation beyond the leftmost to be indented by two, rather than apparently four, spaces. It would be really nice if the converter either met these conventions by default or allowed the user to configure which character to use for bullets and how much to indent each subsequent level of nested bullets.

Fringe cases with lists and html

I love this add-on! I'm finding a few inconsistencies in the list handing still, though.
A few edge-cases I've found with lists still not nesting/behaving quite right in html. All three cases here were found when converting real docs.

First case (two lists next to each other, eg list one is numbered, list 2 is bullet points))
1. List 1 item 1
* list 2 item 1
Generates to the below (misses the fact it's a second list)
<ol>
<li>List 1 item 1
<li>List 2 item 1

Second case (a blank line between two lists):
1. List 1 item 1

* list 2 item 1
Generates to the below (indents it up another level; doesn't ever close the </ol>)
<ol>
<li>List 1 item 1
<ul>
<li>List 2 item 1

Third case (indented trailing text after the list; </li> should be before the text, not after it)
1. List 1 item 1
Indented text
Generates to the below. The example isn't clear in the markup, but 'indented text' is indented by whitespace, or the margin is different - anything other than being up against the left hand margin.
<ol>
<li>List 1 item 1
Indented text
</li>
</ol>

Thanks for this awesome add-on!

Export feature and ability to suppress errors.

Hello,

I love you add-on and I know this is a big ask! But I‘d love the following features:

The ability to suppress errors. All the errors I get are relating to the src of links. I’d like to suppress them so I can get a clean HTML export.
Ability to download the file once exported, along with the images inside the document.

Thanks,

Dave.

HTML tags are not escaped in HTML code blocks

When HTML tags appear in code blocks, the intention is for them to be displayed, not interpreted. We need to escape the opening < to prevent the browser from rendering.

Handle embedded paragraphs in lists and nested lists (HTML)

We are still having a problem with nested list output in HTML. Markdown output is working now.

In Markdown, underlined text becomes HTML-tagged, instead of wrapped in emphasis characters

I suggest/request that when converting to Markdown, underlined text should be wrapped in *single asterisks*.

I want to suggest single underscores, because they visually resemble underlining so strongly, but I'm instead suggesting single asterisks, because:

It (unscientifically, admittedly) appears so very common to me that underscores are used to indicate italics--including right here in GitHub posts.
Making it single asterisks instead of single underscores provides a distinct mapping that, though it will likely be rendered the same as if it were underscores, can survive a round-trip conversion.

For those who aren't familiar, the Markdown syntax document simply asserts that wrapping in a single emphasis character (whether *asterisks* or _underscores_) should result in tags when generating HTML, while wrapping in doubled-up emphasis characters (again, **double-asterisks** or __double-underscores__) should result in HTML tags.

Only the Last List-Item Closes

Thing 1

thing 2

Thingy 3

Happened on both lists.

Explore converting simple tables to Markdown

We could potentially have an option to convert tables to Markdown. If there are complex tables, I suspect they will get mangled. If that's the case, you can just use HTML tables. I don't really want to add logic to determine if a table is simple enough to convert to Markdown, because that will add a lot of overhead (time and code).

Name and discoverability

This is a good add-on, but I the name "gd2md" makes is hard to find it.

I suggest you rename the addon something like "Docs Markdown Export" or similar within the domain of human languages :)

Is it possible to assign a shortcut?

The add-on is fantastic, i'm wondering if it's possible to assign a custom shortcut to activate it.

Remove requirement to have a TOC to have links to headers work

One of the points of HTML is to be able to reference things via hyperlink from within the document. Currently, if I have text that is linked to a Header (which is very common), it spits out the

ERROR: undefined internal link to this URL: "#heading=h.ksq44f7qkbfg".link text: Header 2"

error unless I have a blue-link style TOC in the document previously (page ref TOC, ebven though it has embedded links too, still causes error). I know this is a known issue, but it is a big limitation in converting cross-referenced documents.

It would be really great to not have to insert a TOC at all; most of the docs I convert that have internal cross-references don't have one and never will. I imagine you could just use the embedded gdoc #id when you encounter a link, and hope it turns up later, and if it doesn't - then put out the error?

In the meantime, could it suppress this error inline the HTML if the links are stale, because with large, cross-referenced documents it is a huge amount of manual work to remove these error messages from the html every time it gets converted. This is an addition to #5 requesting something similar.

It is impossible to remove this app

Superscript is being converted to html when you select markdown.

Currently, superscripted text is being wrapped in  blocks when markdown conversion is selected.

Superscript is not part of the canonical markdown specifications, so it should be ignored, or converted to ^. It should not be converted to html.

Markdown for newlines

Related but different to the other issue I raised.

When the gDoc contains a newline (via SHIFT+ENTER, as opposed to a paragraph break), markdown can mark that up using double-space at the end of the line. However that functionality appears to be absent from this addon.

TypeError: Cannot find function getSelection in object ERROR opening active document: Exception: Action not allowed.

When I first tried to use the add-on, I went to it and hit convert and the sidebar would open properly, but then when I tried to click either markdown or HTML, I got a 'ScriptError: Authorization is required to perform that action.' error. As far as I can tell, it does already have every permission I can give it, and I do not get any sort of authorization dialogue when I try it. I have already tried removing and re-adding it, as well.

^That's the error it gave

After that, I tried adding the add-on to my main/ default google account, the one my chrome is synced to. When on google docs on that account, I can use the add-on without any issues; I also tried using it on my other account in a document that both accounts had access to and it worked fine. However, if I try to use it in a google doc that my main account doesn't have access to, I instead get a different error: 'TypeError: Cannot find function getSelection in object ERROR opening active document: Exception: Action not allowed.'

It works fine if I add my main google account to the doc, but I prefer to keep my accounts' documents separate, so it's a little bit annoying. (Otherwise it's great, though!)

Thanks.

Plaintext < and > are not exchanged for < and >

I write with the angle brackets a lot, and if these character are not escaped, this addon is nearly unusable, as any text between angled brackets are counted as code and not displayed.

How to specify target in generated href

Is it possible to specify target="_blank" in generated href ?

Export comments

It Would Be Nice if an exported Markdown (or HTML, I suppose) version of a document included the comments on the document. This feature is the one reason I keep using Draft to convert my Google Docs to Markdown).

How Draft represents Docs comments (which I'm not sure is the best way, but is one way) is easier to show by way of example than by describing. Suppose I have a document containing the text "The quick brown fox jumped over the lazy dogs", and there was a comment on "lazy", a comment on "quick brown", and a thread of three comments on "quick". This would be represented as the following Markdown:

The _**quick**[a][b][c] brown_[d] fox jumped over the _lazy_[e] dogs"

[a] First comment in the thread

[b] Second comment in the thread

[c] Third comment in the thread

[d] Text of the second comment

[e] Text of the first comment

Draft doesn't include any indication of comment authorship or date in the Markdown it creates, nor does it import marked-resolved comments or comments that are not (or are no longer) attached to specific text locations in the current state of the document, both of which are features I've wished for.

Where is the original google doc

Hello,

I really love working with your Add-on but I don't where the original google doc is so that I can use that as a reference. when i click on the link https://github.com/evbacher/gd2md-html/wiki/Demo-and-sample-doc it just redirects back to the wiki.

Problem converting <code> within

There seems to be a problem converting <code> within .

Source document has been shared.

===========

Conversion notes:

Docs to Markdown version 1.0β17
Thu Aug 22 2019 08:09:55 GMT-0700 (PDT)
Source doc: https://docs.google.com/open?id=1AvIIl-GeKOqIQv1iafvZ36XJEp1iYQq3vpO5KF6KJ50

WARNING:
You have 2 H1 headings. You may want to use the "H1 -> H2" option to demote all headings by one level.

----->

>>>>> gd2md-html alert: ERRORs: 0; WARNINGs: 1; ALERTS: 0.

See top comment block for details on ERRORs and WARNINGs.
In the converted Markdown or HTML, search for inline alerts that start with >>>>> gd2md-html alert: for specific instances that need correction.

Links to alert messages:

>>>>> PLEASE check and correct alert issues and delete this message and the inline alerts.

Tasks

Please debug the testArrayEquality function. **Don't forget to share your thought process, out loud. **Only modify the testArrayEquality function.
Let's double check... When you run your code,** does every assertion test log "test passed."? If any test logs "test failed", you have more debugging to do.

Notes

For this assignment, and the remaining assignments, when you click "submit", we are not running any automated tests. The only tests will be the assert functions. **Please make sure the assert tests pass.

Can h1 etc not include full length bold

Just because the
h1 is bold
I want it to come out like this:
# is bold
not like this:
# **is bold**
If I want good Markdown then I do not want the whole of the text in any header to have bold markup. I want my final process to do that if needed.
Can there be an option to strip the bold off headers?

Add option to remove link to document

If the comments include a link to the Google doc then there's a chance someone will use it to get to the document. I would like an option to hide information about the source of the document.

Remove inline warning for multiple H1 headings

From another bug/enhancement request (issue #5): Regarding the H1 heading warning. I agree with you that that should not be an inline warning. I'll remove that when I get a chance (probably as a separate bug).

GD2md-html not displayed in Add-ons menu

GD2md-html is not displayed in the Add-ons menu of Google Docs. I've tried reinstalling the add-on, bu there was no success.

numbered lists do not increment

1. blah 1
2. blah 2
3. blah 3

gets converted to:

1. blah 1
1. blah 2
1. blah 3

Mixing italics and bold breaks markdown formatting.

Because italics are currently being converted to * rather than _, if you bold a word inside an italicized sentenced, the first * of the ** bold closes the italics. Then you have a word that looks like *this** and the sentence ends like this.*

Example:
Input: *Mixed **format** test.*
Output: Mixed *format** test.*

(On some sites like Reddit. It works how it should here on github.)

Two ways to fix it would be to convert italics to _ instead of *, or to fully format mixed cases.

_Mixed **format** test._
*Mixed* ***format*** *test.*

Front Matter

I tried to do a front matter for Jekyll site on top of Google doc, but when I exported it, the "---" removed from Markdown. Any idea, how to keep the "---"?

Thank you!

Hello @evbacher. Thank you for this add-on. It works perfectly and saved me a bunch of time. Truly appreciated.

Won't install

After I install, I don't get a Convert option on the Add-ons dropdown, only an Email as Markdown. When i try it use it, it tells me I need to log in as Google, and then says the app isn't verified by Google.

Empty paragraphs are ignored.

In both HTML and Markdown output, empty paragraphs are ignored.

Normal line-breaks (may by Shift + Enter) are preserved just fine, but if I make two paragraph breaks, and leave one line empty, the output seems to ignore it.

Is there a way to use this addon's functionality in the server side?

Hey, I am wondering if we can do just what the title says. I can't seen to be able to use the code from the html file you've provided in the repo, unless I'm missing something major about how a gdocs add on works?
I would have liked to email you about this to be honest, I don't think an issue board is the right place to discuss this. It would be great if we could exchange emails to continue discussing this.

This is the HTML with the unclosed sub-list. The “Security Event” just continues on as there is no closing list element.

Do not close the tag LI

The bulleted list does not close its items.
It turns out like this:

<ul>
<li> aesthetics, individuality,
<li> sustainability
<li> durability
<li> maintainability.
</ li>
</ ul>

li do not close

All <li> tags do not have a </li> closure tag.

evbacher / gd2md-html Goto Github PK

gd2md-html's Introduction

Docs to Markdown (gd2md-html)

Overview

Installation and other details

Contributing

Contributors

gd2md-html's People

Contributors

Stargazers

Watchers

Forkers

gd2md-html's Issues

Tasks

Notes

Recommend Projects

Recommend Topics

Recommend Org