Giter VIP home page Giter VIP logo

ebml's Introduction

EBML

An EBML parser written in Go.

Introduction

Extensible Binary Meta Language (EBML) is a generalized file format for any kind of data, aiming to be a binary equivalent to XML. It provides a basic framework for storing data in XML-like tags. It was originally developed for the Matroska audio/video container format.

Source: https://en.wikipedia.org/wiki/Extensible_Binary_Meta_Language

This library is based on the July 2020 version of RFC 8794 (with additions from github.com/ietf-wg-cellar/ebml-specification). This document did not reach "Internet Standard" status yet. RFC 8794 is in a "Proposed Standard" status.

The goal of this project is to create an implementation based on the document and during the implementation provide feedback.

Production readiness

This project is still in alpha phase. In this stage the public API can change between days.

Beta version will be considered when the feature set covers the documents the implementation is based on, and the public API is reached a mature state.

Stable version will be considered only if enough positive feedback is gathered to lock the public API and all document the implementation is based on became "Internet Standard".

Documents

Official sites

Huge thanks to the Matroska.org for their work.

IETF Documents

Huge thanks to the IETF CELLAR Working Group for their work.

Inspiration

Inspiration for the implementation comes from the following places:

Similar libraries

Last updated: 2023-05-22

URL Status
https://github.com/at-wat/ebml-go In active development
https://github.com/ebml-go/ebml + https://github.com/ebml-go/webm Last updated on 17 Nov 2022
https://github.com/ehmry/go-ebml Deleted
https://github.com/jacereda/ebml Last updated on 10 Jan 2016
https://github.com/mediocregopher/ebmlstream Last updated on 15 Dec 2014
https://github.com/pankrator/ebml-parser Last updated on 24 Jun 2020
https://github.com/pixelbender/go-matroska Last updated on 29 Oct 2018
https://github.com/pubblic/ebml Last updated on 12 Dec 2018
https://github.com/quadrifoglio/go-mkv Last updated on 20 Jun 2018
https://github.com/rrerolle/ebml-go Last updated on 1 Dec 2012
https://github.com/remko/go-mkvparse Last updated on 19 May 2022
https://github.com/tpjg/ebml-go Last updated on 1 Dec 2012

ebml's People

Contributors

nerg4l avatar

Stargazers

 avatar

Watchers

 avatar  avatar

ebml's Issues

Use doc types to define structure

At the moment decoding uses a generated struct as schema definition. It would be beneficial to define schema separately and make decoding independent of the provided struct. This would allow for users to provide their own struct to decode only the fields they need or map[string]interface{} to decode everything (including Void and CRC32 elements).

Proposed definition:

type Definition struct {
	ID      []byte
	Type    string
	Name    string
	Default interface{}
	Parent  *Definition
}

The level number can be calculated from counting the parents or by adding an extra field.

Support EBML Stream / Matroska Livestreaming

To make this work the library has to handle elements with Unknown Data Size and multiple Documents (or continuous Header and Segment elements).

  • Support for Unknown Data Size
  • Support for reading more than one EBML Document

EBML Stream

An EBML Stream is a file that consists of one or more EBML Documents that are concatenated together. An occurrence of an EBML Header at the Root Level marks the beginning of an EBML Document.

Source: https://www.rfc-editor.org/rfc/rfc8794#name-ebml-stream

Matroska Livestreaming

Livestreaming

[...]

A live Matroska stream is different from a file because it usually has no known end (only ending when the client disconnects). For this, all bits of the “size” portion of the Segment Element MUST be set to 1. Another option is to concatenate Segment Elements with known sizes, one after the other. This solution allows a change of codec/resolution between each segment. For example, this allows for a switch between 4:3 and 16:9 in a television program.

[...]

Source: https://www.matroska.org/technical/streaming.html

Consider using uint64 / int64 instead of big.Int in VINT

Element ID

The theoretical maximum byte length of Element ID is infinite. Its length is defined by EBMLMaxIDLength. ebml.xml defines EBMLMaxIDLength as type="uinteger" range=">=4" default="4".

In reality it is not likely to have a EBMLMaxSizeLength greater than 4. The library could handle values up to 7 and if this value is greater than 7 the library could return an error saying the EBML document cannot be parsed by this lib.

In case EBMLMaxSizeLength has a value of 7, the maximal value it can contain is 2^56-2 which fits into uint64 and int64 as well.

Element Data Size

The theoretical maximum byte length of Element Data Size is infinite. Its length is defined by EBMLMaxSizeLength. ebml.xml defines EBMLMaxSizeLength as type="uinteger" range="not 0" default="8". The library uses big.Int to be compatible with this specification.

In reality it is not likely to have a EBMLMaxSizeLength greater than 8. In case this value is greater than 8 the library could return an error saying the EBML document cannot be parsed by this lib.

The value of EBMLMaxSizeLength has a direct connection with VINTMAX. VINTMAX limits the value of known Element Data Sizes.

In case EBMLMaxSizeLength has a value of 8, the value of VINTMAX is 2^56-2 which fits into uint64 and int64 as well.

ebml: Provide a way to iterate on elements of an EBML document

In such cases when reading the whole content of an EBM document into memory there should be a way to scan through it.

Questions:

  1. How to iterate?
  2. How to skip child elements?
  3. How to seek?

In the current design DecodeBody accepts ebml.EBML (EBML Header) as it's second parameter to make sure DecodeHeader is called first. Maybe it would be better to reconsider this decision and store the header info in the decoder struct.

Option 0

Based on encoding/xml

f, err := os.Open("/foo/bar")
// handle error
d := ebml.NewDecoder(f)
var h ebml.EBML
// decode header
for {
	el := d.Element()
	// handle element
}

Question 2 and 3 applies.

Option 1

Based on text/scanner.

f, err := os.Open("/foo/bar")
// handle error
d := ebml.NewDecoder(f)
var h ebml.EBML
// decode header
s := d.Scanner(h)
// decode header
for s.Next(&h) {
	el := s.Element()
	// handle element
}

Question 2 and 3 applies.

Option 2

Based on github.com/yuin/goldmark/ast

f, err := os.Open("/foo/bar")
// handle error
s := ebml.NewDecoder(f)
var h ebml.EBML
// decode header
ast.Walk(h, func(el ebml.Element) (ebml.WalkStatus, error) {
	// handle element
	// return ebml.WalkStop, nil
	// return ebml.WalkSkipChildren, nil
	return ebml.WalkContinue, nil
})

Question 3 applies.

Consider using go embed to embed Doc Type

Go 1.6 added embed which allows to embed a file into the binary of a package with a simple directive. This would simplify the definition of Doc Types. No more code generation would be needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.