Giter VIP home page Giter VIP logo

table2md's Introduction

Table2MD - Convert Tables To Markdown

  • Extracts and converts multiple HTML tables (even whole page source)
  • Converts HTML links, code tags, line breaks (keep it as <br> if the table has multiple lines within a cell)
  • Escape | characters
  • Use first row as header or insert an empty first row
  • First header/column style
  • Adjust table width to be the same as the header, fill to match max width or trim to min width
  • Smart selector which tries dynamically find cells even if the table is badly formatted (see example), with regex support
  • Supported formats: HTML, spaces(2 or more), Excel, CSV, dash, and smart selection mode

HTML Example

<table>
    <tr>
        <th>Fruit</th>
        <th>Color</th>
        <th>Taste</th>
    </tr>
    <tr>
        <td>Apple<a href="https://apple.com">apple.com</a></td>
        <td>Red, Green, Yellow</td>
        <td>Sweet</td>
    </tr>
    <tr>
        <td>Banana</td>
        <td>Yellow<code>#FFE135</code></td>
        <td>Sweet</td>
    </tr>
    <tr>
        <td>Orange</td>
        <td>Orange</td>
        <td>Citrusy | sour</td>
    </tr>
</table>

Converts to:

| Fruit | Color | Taste |
| --- | --- | --- |
| Apple[apple.com](https://apple.com) | Red, Green, Yellow | Sweet |
| Banana | Yellow``#FFE135`` | Sweet |
| Orange | Orange | Citrusy \| sour |

Smart Selector Example

PORT     STATE   SERVICE  SUB-SERVICE     VERSION
22/tcp   open    ssh      protocol       OpenSSH 8.2 (protocol 2.0)
25/tcp open     smtp      mail-queue Postfix smtpd
53/tcp     open  domain DNS-resolver       BIND 9.11.4-P2
80/tcp   open    http     web-host Apache httpd 2.4.41
110/tcp  open   pop3   email-fetch  Dovecot pop3d
111/tcp  open rpcbind    RPC-routing  2-4 (RPC #100000)
143/tcp    open imap     email-store   Dovecot imapd
443/tcp  open https     SSL-handshake      OpenSSL/1.0.2k
587/tcp  open   submissi email-relay   Postfix smtpd
993/tcp    open  imaps       -            Dovecot imapd
995/tcp  open   pop3s    email-secure   Dovecot pop3d
3306/tcp open    mysql      DB-main     MySQL 5.7.30
5432/tcp open    postgresql  DB-secondary PostgreSQL DB 11.8
8080/tcp open   http-proxy proxy-gateway Nginx 1.17.9
8443/tcp open   https-alt   java-server Apache Tomcat/Coyote   JSP engine 1.1

Converts to:

| PORT | STATE | SERVICE | SUB-SERVICE | VERSION |
| --- | --- | --- | --- | --- |
| 22/tcp | open | ssh | protocol | OpenSSH 8.2 (protocol 2.0) |
| 25/tcp | open | smtp | mail-queue Postfix | smtpd |
| 53/tcp | open | domain | DNS-resolver | BIND 9.11.4-P2 |
| 80/tcp | open | http | web-host Apache | httpd 2.4.41 |
| 110/tcp | open | pop3 | email-fetch  Dovecot | pop3d |
| 111/tcp | open | rpcbind | RPC-routing  2-4 | (RPC #100000) |
| 143/tcp | open | imap | email-store | Dovecot imapd |
| 443/tcp | open | https | SSL-handshake | OpenSSL/1.0.2k |
| 587/tcp | open | submissi | email-relay | Postfix smtpd |
| 993/tcp | open | imaps | - | Dovecot imapd |
| 995/tcp | open | pop3s | email-secure | Dovecot pop3d |
| 3306/tcp | open | mysql | DB-main | MySQL 5.7.30 |
| 5432/tcp | open | postgresql | DB-secondary | PostgreSQL DB 11.8 |
| 8080/tcp | open | http-proxy | proxy-gateway | Nginx 1.17.9 |
| 8443/tcp | open | https-alt | java-server | Apache Tomcat/Coyote   JSP engine 1.1 |

With threshold at 25, some of the items are off but it's pretty good considering it will take way longer to manually fix it.

The inputs:

  • Table width: Sets the table width, will create empty cells if it cannot match the requirement, will cut off cells longer than the width.
  • Delimiter: The delimiter of the table, like - or +, the custom delimiter should be the only characters seperating the cells (e.g. the example above but - instead of spaces). Or add your own regex here, it then tries to match for the cell character rule (see below), defaults to match spaces (entering nothing is equal to using space (smart selection)).
  • Additional cell characters: By default, only alphanumeric characters are chosen as the start and end of a cell, add additional characters here.
  • Threshold: The % where a cell breakpoints must appear over all the rows for the separator to be created, basically, higher will mean less cell dividers, lower means more.
  • Converting: If there are more than one set of cell breakpoints with the same percentage, the result will be shuffled on each conversion. The example above is one of the possible results.

Other Tips & Tricks

  • Use "trim blank lines"`: blank lines will affect the accuracy of the cell breakpoints.
  • Make empty tables: Turn off trim blank lines, hit enter for rows, set number of columns in the table width input.
  • Convert to CSV/Excel: Remove the divider row, go to import wizard under paste, choose delimited, select only "other" as delimiter, enter |. You will need to delete the first and last columns.

How It Works

This is not a text extractor where it creates a table from any text, it requires some existing pattern to work, so assume that the input data is meant to be a table, or is formatted in a somewhat consistent way, and each line represents a row. The most important attribute is that the cells are spaced (or any other characters chosen) evenly. The example is probably a more extreme case you can throw at it.

The program first find all the possible breakpoints, then choose the ones with the most occurrence that is higher than the threshold. It then tries to find the exact divider of a cell by matching the delimiter and cell contents.

More rows will return a more accurate result since there are more data points, it is not perfect but does what I need it to do pretty well, I haven't tested it against everything so there might be some odd behaviour. Some minor editing or adding extra cell characters will help to improve the results. You can also change the settings or convert again to reshuffle the result.

TL;DR? It's magic. Just move the slider and hope for the best.

There are a ton of these out there, why did I write this?

I often have tables that I want to add to my markdown document, but sometimes they come in weird formats which doesn't play well with all the other converters I have tried. Why manually edit them when I can spend hours writing this program to solve my niche problem?

This program was originally written to convert HTML tables only and has been adapted to convert other formats, hence the use of createElement to process the cells.

Escape pipe characters for obsidian.md

obsidian (yes I know it displays HTML) has a long standing problem of escape pipe characters being shown in inline code in tables.

E.g. the cell

this\|that

will be shown as-is in obsidian. Changing | to &#124 fixes it.

table2md's People

Contributors

xre0us avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

table2md's Issues

Can you also convert the text before and after the table?

Can you also convert the text before and after the table? And the title of the HTML page or of the section (i.e. GitHub issue title)?

So when saving the converted text, we can also save some context information about the content of the table.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.