Giter VIP home page Giter VIP logo

gumbo's Introduction

Gumbo

Low Level PHP Extension for Gumbo HTML5 Parser (https://github.com/google/gumbo-parser)

We recommend you do NOT use this in production at this time - this is a super early release

Installation

git clone https://github.com/BipSync/gumbo.git
cd gumbo
phpize
./configure
make
make install

This will build a 'gumbo.so' shared extension, load it in php.ini using:

[gumbo]
extension = gumbo.so

Usage

Get the text of a html string:

$html = "<html><body><p>Hello World</p></body></html>";
$output = gumbo_parse( $html );
$rootNode = gumbo_output_get_root( $output );

$getTextContent = function( $node ) use ( &$getTextContent ) {
    $textContent = "";
    switch ( gumbo_node_get_type( $node ) ) {
        case GUMBO_NODE_ELEMENT:
            foreach ( gumbo_element_get_children( $node ) as $childNode ) {
                $textContent .= $getTextContent( $childNode );
            }
            break;
        case GUMBO_NODE_TEXT:
            $textContent = gumbo_text_get_text( $node );
            break;
    }
    return $textContent;
};
echo $getTextContent( $rootNode );

Returns:

Hello World

Functions

Function Returns
gumbo_parse( $html ) Gumbo Output Resource
gumbo_output_get_root( $output ) Gumbo Node Resource
gumbo_node_get_type( $node ) int (see constants)
gumbo_element_get_tag_name( $elementNode ) string
gumbo_element_get_tag_open( $elementNode ) string
gumbo_element_get_tag_close( $elementNode ) string
gumbo_element_get_attributes( $elementNode ) associative array
gumbo_element_get_children( $elementNode ) array of Gumbo Node Resources
gumbo_text_get_text( $textNode ) string
gumbo_destroy_output( $output )

Constants

  • GUMBO_NODE_DOCUMENT
  • GUMBO_NODE_ELEMENT
  • GUMBO_NODE_TEXT
  • GUMBO_NODE_CDATA
  • GUMBO_NODE_COMMENT
  • GUMBO_NODE_WHITESPACE

Contact

If you have found a bug, have an idea or a question, email me at [email protected]

gumbo's People

Contributors

paulatbipsync avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.