Giter VIP home page Giter VIP logo

sulguk's Introduction

Sulguk - HTML to telegram entities converter

PyPI version downloads license

Need to deliver formatted content to your bot clients? Having a hangover after trying to fit HTML into telegram? Beautifulsoup is too complicated and not helping with messages?

Try sulguk (์ˆ ๊ตญ, a hangover soup) - delivered since 1800s.

Problem

Telegram supports parse_mode="html", but:

  • Telegram processes spaces and new lines incorrectly. So we cannot format HTML source for more readability.
  • Amount of supported tags is very low
  • It does not ignore additional attributes in supported tags.

Let's imagine we have HTML like this:

<b>This is a demo of <a href="https://github.com/tishka17/sulguk">Sulguk</a></b>

  <u>Underlined</u>
  <i>Italic</i>
  <b>Bold</b>

This is how it is rendered in browser (expected behavior):

But this is how it is rendered in Telegram with parse_mode="html":

T osolve this we can convert HTML to telegram entites with sulguk. So that's how it looks now:

Example

  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Convert it into text and entities
result = transform_html(raw_html)
  1. Send it to telegram.

Depending on your library you may need to convert entities from dict into proper type

await bot.send_message(
    chat_id=CHAT_ID,
    text=result.text,
    entities=result.entities,
)

Example for aiogram users

  1. Add SulgukMiddleware to your bot
from sulguk import AiogramSulgukMiddleware

bot.session.middleware(AiogramSulgukMiddleware())
  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Send it using sulguk as a parse_mode:
from sulguk import SULGUK_PARSE_MODE

await bot.send_message(
    chat_id=CHAT_ID,
    text=raw_html,
    parse_mode=SULGUK_PARSE_MODE,
)

Supported tags:

For all supported tags unknown attributes are ignored as well as unknown classes. Unsupported tags are raising an error.

Standard telegram tags (with some changes):

  • <a> - a hyperlink with href attribute
  • <b>, <strong> - a bold text
  • <i>, <em> - an italic text
  • <s>, <strike>, <del> - a strikethrough text
  • <u>, <ins> - an underlined text
  • <span> - an inline element with optional attribute class="tg-spoiler" to make a spoiler
  • <tg-spoiler> - a telegram spoiler
  • <pre> with optional class="language-<name>" - a preformatted block with code. <name> will be sent as a language attribute in telegram.
  • <code> - an inline preformatted element.

Note: In standard Telegram HTML you can set a preformatted text language nesting <code class="language-<name>"> in <pre> tag. This works when it is an only child. But any additional symbol outside of <code> breaks it. The same behavior is supported in sulguk. Otherwise, you can set the language on <pre> tag itself.

Additional tags:

  • <br/> - new line
  • <hr/> - horizontal line
  • <ul> - unordered list
  • <ol> - ordered list with optional attributes
    • reversed - to reverse numbers order
    • type (1/a/A/i/I) - to set numbering style
    • start - to set starting number
  • <li> - list item, with optional value attribute to change number. Nested lists have indentation
  • <div> - a block (not inline) element
  • <p> - a paragraph, emphasized with empty lines
  • <q> - a quoted text
  • <blockquote> - a block quote. Like a paragraph with indentation
  • <h1>-<h6> - text headers, styled using available telegram options
  • <noscirpt> - contents is shown as not scripting is supported
  • <cite>, <var> - italic
  • <progress>, <meter> are rendered using emoji (๐ŸŸฉ๐ŸŸฉ๐ŸŸฉ๐ŸŸจโฌœ๏ธโฌœ๏ธ)
  • <kbd>, <samp> - preformatted text
  • <img> - as a link with picture emoji before. alt text is used if provided.

Tags which are treated as block elements (like <div>):

<footer>, <header>, <main>, <nav>, <section>

Tags which are treated as inline elements (like <span>):

<html>, <body>, <output>, <data>, <time>

Tags which contents is ignored:

<head>, <link>, <meta>, <script>, <style>, <template>, <title>

Command line utility for channel management

  1. Install with addons
pip install 'sulguk[cli]'
  1. Set environment variable BOT_TOKEN
export BOT_TOKEN="your telegram token"
  1. Send HTML file as a message to your channel. Additional files will be sent as comments to the first one. You can provide a channel name or a public link
sulguk send @chat_id file.html
  1. If you want to, edit using the link from shell or from your tg client. Edition of comments is supported as well.
sulguk edit 'https://t.me/channel/1?comment=42' file.html

sulguk's People

Contributors

tishka17 avatar bralbral avatar vlkorsakov avatar birdi7 avatar timchesko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.