Giter VIP home page Giter VIP logo

urltitle's Introduction

URLTitle

A script for Eggdrop IRC bot that detects URLs from public messages on channels and prints out their title

Installation

Just copy urltitle.tcl to your eggdrop scripts directory, set config parameters and add to your Eggdrop configuration file. Works without any configuration, but you can set some options in the script file if you want.

Requirements

The script should work without any additional dependencies, but for the best results, the following tcl packages are recommended:

  • tls: Required for https URLs
  • htmlparse: Parse html entities in titles
  • tdom: More reliable <title> tag parsing using xpath (instead of regex).

Broken URLs?

If you encounter any URLs that isn't working properly, please report them under #10.

urltitle's People

Contributors

astrorigin avatar dereckhall avatar dkfellows avatar knofte avatar teeli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

urltitle's Issues

Fix incorret character output

Hi,

Can we change charset for title sent in channel ?

I'm getting this output:

output

The first character "é" is supposed to be "é" and character "á" is "á". Maybe changing the output command to use UTF8 or CP1252 will fix this. The link used to get the title was this, the charset of the website is UTF-8 and language pt-BR.

Errors since latest update

I'm getting these errors on the partyline:

<Bot> [20:08:17] Tcl error [UrlTitle::handler]: error "Unterminated element 'head' (within 'script')" at position 21267
<Bot> "     csm.measure('csm_head_delivery_finished');
<Bot>     }
<Bot>   </script>
<Bot>         </head> <--Error--
<Bot>     <body id="styleguide-v2" class="fixed">
<Bot> <script>
<Bot>     if (typeof uet == 'fu"
<Bot> [20:15:17] Tcl error [UrlTitle::handler]: invalid command name ""
<Bot> [20:28:06] Tcl error [UrlTitle::handler]: error "Unterminated element 'head' (within 'script')" at position 21245
<Bot> "     csm.measure('csm_head_delivery_finished');
<Bot>     }
<Bot>   </script>
<Bot>         </head> <--Error--
<Bot>     <body id="styleguide-v2" class="fixed">
<Bot> <script>
<Bot>     if (typeof uet == 'fu"
<Bot> [20:29:22] Tcl error [UrlTitle::handler]: invalid command name ""
<Bot> [20:39:15] Tcl error [UrlTitle::handler]: error "Unterminated element 'head' (within 'script')" at position 20935
<Bot> "     csm.measure('csm_head_delivery_finished');
<Bot>     }
<Bot>   </script>
<Bot>         </head> <--Error--
<Bot>     <body id="styleguide-v2" class="fixed">
<Bot> <script>
<Bot>     if (typeof uet == 'fu"
<Bot> [20:42:18] Tcl error [UrlTitle::handler]: error "Unterminated element 'head' (within 'script')" at position 20920
<Bot> "     csm.measure('csm_head_delivery_finished');
<Bot>     }
<Bot>   </script>
<Bot>         </head> <--Error--
<Bot>     <body id="styleguide-v2" class="fixed">
<Bot> <script>
<Bot>     if (typeof uet == 'fu"
<Bot> [20:48:33] Tcl error [UrlTitle::handler]: invalid command name ""

Any idea what can cause this, and how to fix perhaps?

Content-Type

This doesn't seem correct because sites without Content-Type gets through. A DOCTYPE check would also be good.

Broken URLs

Please report any URLs that aren't working properly (either causing errors on your bot's partyline or just not showing titles correctly) here.

Make sure you include the URL in question, any errors you might see and your configuration (any relevant software versions, e.g. eggdrop version, tcl version, tcl extension versions)

Outputs empty $urtitle variable to the IRC channel

Hello,

If the script attempts to parse an imgur url (or similar photo etc url) where it is unable to scrape a title, it will output the empty variable to the channel:

Title:

I added the following to check for an empty $urtitle
if {$urtitle eq ""} {
break
}

this is directly above:
if {[string length $urtitle]} {
puthelp "PRIVMSG $chan :title: $urtitle"
}
break
}

HTTPS doesn't work, and quite incomplete instructions

I have the below; verified with './configure' and finding paths & dates on system (of my eggdrop binary, TCL) which not all people who use eggdrops know how to do that or about TCL & TLS (in fact most eggdrop users I know don't know.) I'd have appreciated a chance or such instructions to find it out last time (could be added to a page or part of your main open issue, 'Broken URLs') and instructions if one needs to do anything in eggdrop.conf (necessary in your script's description section.)

  • eggdrop 1.8.4
  • Tcl version: 8.3.5
  • SSL/TLS Support: yes (OpenSSL 1.0.2s 28 May 2019)
  • urltitle 0.11

The problem is HTTPS URLs never work (most/all HTTP ones work.) I think you were right last time it's a configuration problem, because I solved it once or twice last few years maybe somehow changing that. I just forgot and don't understand, for example, most the eggdrop.conf SSL section (didn't even recall it's related to TLS)... neither do most people I know who use eggdrops... so we need to know what to set in eggdrop.conf, not to mention different scripts use different parts of that and many/most people will have no idea which yours uses. Hundreds HTTPS URLs haven't been working over several weeks, and of course, I can't go back to get all, but below are examples of some main ones that when are pasted as their own line on IRC, nothing happens (only Google displays at HTTP.)

https://amazon.com
https://apple.com
https://facebook.com
https://google.com
https://microsoft.com
https://twitter.com
https://yahoo.com

HTTPS doesn't work

In my bots, HTTPS doesn't work with urltitle.0.10.tcl . If you need to load other scripts to get HTTPS to work, can you please write that documentation?

"Software caused connection abort" on certain domains

Since upgrading to OpenSSL3.0 I noticed some domains cannot be communicated with.
Domains such as facebook.com, youtu.be and maps.app.goo.gl.

After some debugging I can see it's due to some TLS decoding error.
I realize it's probably a bug related to TclTLS, but wondering if someone here has a clue on how to possibly circumvent the issue.

Thanks.

SSL Support

Your script doesn't seem to do SSL support. Well, it does say SSL is optional but I'm having trouble trying to get it to actually support HTTPS links.

I have tcl-tls installed on the server, so that's not an issue... I also tried adding package require tls to urltitle.tcl but that's a no joy. I do see that your script checks for package require tls but I'm not sure why this is the case?

Sidenote: Adding package require tls causes the script to simply not output any messages, such as Connection to $url failed or similar.

SNI support not enabled by default [cloudflare SSL error: tlsv1 alert internal error]

dev-lang/tcl-8.6.6
dev-tcltk/tcllib-1.15-r2
dev-tcltk/tls-1.6.7
net-irc/eggdrop-1.8.0

When try link from the https://centmin.sh/

SSL channel "sockdc6a20": error: tlsv1 alert internal error

https://www.rust-lang.org/en-US/

SSL channel "sockdd33e0": error: sslv3 alert handshake failure

I compiled eggdrop-1.6.20 on gentoo
dev-lang/tcl-8.6.6
dev-tcltk/tcllib-1.15-r2
dev-tcltk/tls-1.6.7

Also I compiled eggdrop-1.6.21 and 1.6.20 on debian
tcl 8.6.0+8
tcl-tls 1.6+dfsg-3:
tcllib 1.16-dfsg-2:
libsqlite3-tcl 3.8.7.1-1+deb8u2

The errors absolutely the same.

SSL channel "sock19d9620": error: sslv3 alert handshake failure
SSL channel "sock1a0fe60": error: tlsv1 alert internal error

This is only with sites with cloudflare SSL, other https links works ok.

Problem with some Punctuation.

That what i mean is that when you paste some url from youtube like
"Title: Imany - Don't Be So Shy (PanosG Remix) (Unofficial Video) - YouTube" the original title it`s with " ' ". And this tcl can read right this Punctuation.

SSL channel error with every URL

Some Errors from the Partyline log:

SSL channel "sock55d421723a10": error: tlsv1 alert protocol version
[22:00:12] Connection to https://google.com failed
[22:00:12] Error: connect failed broken pipe
SSL channel "sock55d4217ab1e0": error: wrong version number
[22:00:08] Connection to http://www.wikipedia.com failed
[22:00:08] Error: failed to use socket
SSL channel "sock55d421723a10": error: tlsv1 alert protocol version
[22:00:12] Connection to https://google.de failed
[22:00:12] Error: connect failed broken pipe

Some System Background:
[root@server ~]# uname -a
Linux server 5.17.9-300.fc36.x86_64 #1 SMP PREEMPT Wed May 18 15:08:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Package tdom-0.8.2-32.fc36.x86_64 is already installed.
Package tcltls-1.7.22-5.fc36.x86_64 is already installed.
htmlparse.tcl -> /usr/share/tcl8.6/tcllib-1.20/htmlparse/htmlparse.tcl

Tcl error [UrlTitle::handler]: can't read "delay": no such variable

hello, I'm using eggdrop 1.8.1 compiled on a 64bit linux system.

I receive 'Tcl error [UrlTitle::handler]: can't read "delay": no such variable' in the console after loading the script, performing .chanset, and typing a URL in the channel.

on line 98 i see 'variable delay'
I changed this to 'global delay'

this resolved my issue.

Some charset problems

[23:37:23] <�03Al�> https://www.vbox7.com/play:bb0a7f43b9
[23:37:27] <�04ATOM�> Title: �а �ези ���ки моми�е�а мо�ж�ване�о не е п�облем / VBOX7 .
It`s possible to put more charset like cp1251 ?
10x

crashes eggdrop

urltitle.tcl (0.11, 0.12) now crashes eggdrop (crashes whenever urltitle.tcl load not commented out)

* Last context: tclhash.c/250 []
* Please REPORT this BUG!
* Check doc/BUG-REPORT on how to do so.
* Wrote DEBUG
* SEGMENT VIOLATION -- CRASHING!

DEBUG:

Debug (eggdrop v1.8.4) written Fri May 29 06:51:20 2020
Patch level: stable
Tcl library: /usr/local/lib/tcl8.6
Tcl version: 8.6.10 (header version 8.6.8)
Compiled with IPv6 support
Compiled with TLS support
Configure flags: '--with-tcllib=/usr/local/lib/libtcl86.so' '--with-tclinc=/usr/local/include/tcl8.6/tcl.h' '--with-handlen=9'
Compile flags: gcc -g -O2 -pipe -Wall -I.. -I..  -DHAVE_CONFIG_H  
Link flags: gcc
Strip flags: touch
Context: main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         main.c/1072, []
         tclhash.c/222, []
         tclhash.c/250 []

IDX ADDR                                     + PORT NICK      TYPE  INFO
--- ---------------------------------------- ------ --------- ----- ---------
3   84.200.23.69                              50000 (telnet)  lstn  50000
4   127.0.0.1                                     0 (dns)     dns   (ready)

Compiled without extensive memory debugging (sorry).
Open sockets: 3 (listen), 4 (passed on), 6 (file), done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.