Giter VIP home page Giter VIP logo

tiny-html-lexer's Introduction

Hi! πŸ‘‹ I am Alwin,

Welcome to my GitHub page! This is a place where I have collected some of my work and some research projects.

I often try to develop theory and application simultaneously.

A selection of projects, listed below.

URL Specification

I am working on a number of projects around the specification of URLs. This works aims to resolve the differences between the WHATWG standard and the IETF specifications. It adds support for relative URLs in a way that agrees with the WHATWG standard.

  • URL Specification β€” An URL specification that generalises the WHATWG standard.
  • URLReference β€” An URL class that adds support for relative URLs.
  • spec-url β€” A low level, core library that implements the URL specification above.
  • reurl β€” An alternative URL library with immutable URL objects.

HTML Language

I am working on some projects around the HTML language as well. The end goal is to create an accurate and concise characterisation of the HTML5 language as parsed, together with an implementation!

  • html-parser β€” An ongoing effort to define an ever simpler and more concise HTML parsing algorithm that agrees with the HTML5 standard.
  • html-lexer β€” An html lexer that produces annotated chunks of raw input.
  • tiny-html-lexer β€” A minute html tokeniser based on the previous, but using regular expressions.

DOM Expressions

  • Domex β€” Domex (short for DOM expressions) is an algebraic language for specifying web user interfaces. Domex works by pattern-matching on input (model or viewmodel) data, and allows for recursive patterns. One way to think of it is as "recursive format strings" that produce HTML.

Logic things

  • Regex β€” A regular expression compiler that computes deterministic state machines (DFAs) by implementing (and slightly extending) the theory of derivatives of regular expressions.
  • Ess β€” Ess is a research type-language for describing properties of sem-structured data (such as JSON) and a theorem prover for that language.

Category Theory

I wrote my Master's Thesis about the generalisation of Universal Algebra and Universal Coalgebra at the Institute for Logic, Language and Computation in Amsterdam; I've separated out the introductory notes on Category Theory and patched up some typos as well.

Graphics File Formats

Some fun projects.

  • Haikon-js β€” A parser for HVIF vector icon files, in JS (and Zig).
  • XoDB β€” Some support for the reMarkable tablet database and –notebooks in JS.

Other projects

  • immutable-aatree β€” A persistent ordered dictionary datastructure.
  • tiny-css-parser β€” A CSS parser with a small code-base.
  • tm-plist β€” A parser for the ASCII property list file format (as used by TextMate).

tiny-html-lexer's People

Contributors

alwinb avatar dalyisaac avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

dalyisaac

tiny-html-lexer's Issues

Exceptions to numeric character references

Can you explain what this is about?

// Exception: Numeric references 0x80-0x9F get mapped as follows:
const _exceptions =
'€\x81β€šΖ’β€žβ€¦β€ β€‘Λ†β€°Ε β€ΉΕ’\x8DΕ½\x8F\x90β€˜β€™β€œβ€β€’β€“β€”Λœβ„’Ε‘β€ΊΕ“\x8DΕΎΕΈ'

I'm not aware of any exceptions from the Unicode standard and couldn't find any information on it.

Perhaps it would be a good idea to add a link to the standard clause, as a comment in the source-code?

Line/column numbers in tokens

For my use-case, I need line/column-numbers in the tokens.

I see the lexer has this feature (part of the reason I like this library) but it's unused by the tokenizer.

The tokenizer is a small piece of code, so I'm comfortable just to copy/modify as needed for my use-case - but, if you prefer, I could also submit a pull-request with this feature addition.

It will have a slight performance and memory impact - so I figured I'd ask first, if this is a feature you'd want?

TypeScript definitions

I noticed this comment in the TypeScript definitions:

// TODO (FIXME): Update typescript annotations
// (Internals changed, now using generator function)
// Postponing till I have decided on a nice API.

I think the API is very nice, but I don't know what else you might have had in mind.

You already tagged a release-candidate, so maybe you just forgot? πŸ™‚

I can try to adjust them to match the current API and submit a PR, if you'd like?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.