Giter VIP home page Giter VIP logo

fast-html-parser's Introduction

Fast HTML Parser

HTML Parser for FPC and Delphi originally written by Jazarsoft

  • Modified for use as a pure command line unit (no dialogs) for freepascal.
  • Also added UPPERCASE tags so that when you check for i.e. it returns all tags like < FONT > and < FoNt > and < font >

Versions

Revision 18 is Version 1 of this tool

After revision 18 version 2 of the tool is being worked on with more object methods to access elements by Name or ID for example just like a DOM.

Todo

  • keep the entire HTML file in an array for later usage: htmltags[] and text[]
  • parse like this: OnSection(opentag, text, closetag); as a different parser kind so that globals are not needed to keep track of InTag booleans, etc. so that all are together, tag, text, closing tag, in the same procedure
  • associate a number (open tag) with the text label using a record or such i.e. < body > < b >some text< / b >< / body > where < b > is tag "2" and some text is text "1"
  • turn into a DLL using FPC or C so that other languages can use a callback to parse html fast in that language (i.e. golang, python, etc.)

Use this parser for what reasons:

  • make your own web browsers,
  • make your own text copies of web pages for caching purposes
  • Grab content from websites -without- using regular expressions
  • Seems to be MUCH MUCH FASTER than regular expressions, as it is after all a true parser
  • convert website tables into spreadsheets (parse TD and TR, turn in to CSV or similar)
  • convert websites into txt files
  • convert website tables into CSV/Database (parse TD and TR)
  • find certain info from a web page.. i.e. all the bold text or hyperlinks in a page.
  • Parse websites remotely from a CGI app using something like Sockets or Synapse and SynWrap to first get the HTML site. This would allow you to dynamically parse info from websites and display data on your site in real time.
  • HTML editor.. WYSIWYG or a partial WYSIWYG editor. Ambitious, but possible.
  • HTML property editor. Not completely wysiwyg but ability to edit proprties of tags. Work would need to be done to parse each property in a tag.

fast-html-parser's People

Contributors

z505 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-html-parser's Issues

'const' string params

add 'const' for str params in all funcs where S is not changed.


function IsTag(TagType: string; Tag: string): boolean;
function IsCloseTag(TagType: string; Tag: string): boolean;
function Substr(sub, s: string): boolean;
function StripTabs(s: string): string;
function ReturnsToSpaces(s: string): string;
function LessenSpaces(s: string): string;
function CleanHtm1(s: string): string;

prefix helper funcs with some str

E.g. "html_" prefix

function IsTag(TagType: string; Tag: string): boolean;
function IsCloseTag(TagType: string; Tag: string): boolean;

->

function html_IsTag(TagType: string; Tag: string): boolean;
function html_IsCloseTag(TagType: string; Tag: string): boolean;

GetElementById problem

Hi,

This code returns and an empty HtmlTagEnd as output instead of an empty string and </div> for HtmlTagEnd.

procedure doHTML;
const
  RawHTML =
  '<!DOCTYPE html>' + #13#10 +
  '<html>' + #13#10 +
  '<head>' + #13#10 +
    '<meta http-equiv="content-type" content="text/html;' + #13#10 +
      'charset=windows-1252">' + #13#10 +
    '<title></title>' + #13#10 +
  '</head>' + #13#10 +
  '<body>' + #13#10 +
    '<div align="center"> <font color="#006600"><b>Strong Data Digital' + #13#10 +
          'Template</b></font> <br>' + #13#10 +
      '<font color="#000099">XML Base64 Encoded Document [xmlDoc]</font>' + #13#10 +
    '</div>' + #13#10 +
    '<div id="xmlDoc" style="display: none;">' + #13#10 +
    '</div>' + #13#10 +
  '</body>' + #13#10 +
  '</html>';
var
  Parser     : THTMLParser;
  HtmlTag    ,
  HtmlTagEnd : string;
begin
  Parser := THTMLParser.Create(RawHTML);
  Memo1.Lines.Text := Parser.GetElementById('xmlDoc',HtmlTag,HtmlTagEnd);
  FreeAndNil(Parser);
end;

Best regards.

rename event types to make common prefix

  TOnFoundTag = procedure(NoCaseTag, ActualTag: string) of object;
  // procedural:
  TOnFoundTagP = procedure(NoCaseTag, ActualTag: string);

  // when text  found in the HTML
  TOnFoundText = procedure(Text: string) of object;
  // procedural:
  TOnFoundTextP = procedure(Text: string);

->

THtmlParserOnFoundTag
THtmlParserOnFoundText
etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.