Giter VIP home page Giter VIP logo

doc2web's Introduction

alt text

Free your document

Word to html converter engine (work in progress)

Goals of the project

This project is an attempt to create an lazy, extensible, cross-platform and high performance wordprocessingML (open xml) to html converter.

We will not accept pull requests until we reach version 1.0.

Doc2web is lazy

Doc2web will only gather the minimum text and css for the conversion. If you are converting a single paragraph you should expect a slim html, even if it's a 200 page's document that weights 5mb.

Doc2web is extensible

Doc2web provide a simple plugin system that allows any developer to add virtual nodes or text mutations.

These nodes will be then converted in tags and the mutations will be applies to the output. All the hard work is done for you, you just have to describe the result that you want and Doc2web will give valid html.

The nodes and mutations positions uses real numbers so you can "squeeze" elements between text and other elements in a infinite fashion. The nodes also has Z index. The engine will use those to generate intersections and ensure that the html is valid what ever the nodes or mutations you throw at it.

An IoC container will be use to allow plugins to works together as they the engine crawls the document. It will also manage object lifecycle and provide clever garbage collection for better performance.

Doc2web is cross platform

Doc2web leverage the new .NET STANDARD 2.0 which is supported on .NET CORE 2.0 and .NET 4.6.1, Mono 5.4, Xamarin iOS 10.14 Xamarin Mac 3.8 and Xamarin Android 7.5.

Doc2web is fast

Doc2web is build for real time. Lazy mechanism and efficient cpu cycles and memory manage is at the core of this project goals.

Our current benchmarks (using an I7-5500U) converts 260 pages in ~180ms when the open xml is simple and 40 pages in ~115 ms when the open xml is very complicated.


Roadmap 1.0 (2017 Q4)

  • Core
    • IoC Integration
    • Virtual nodes
      • Tag, style and attributes
    • Text tranformation
      • Insertions
      • Replacements
      • Deletions
    • Tag optimization and rendering
  • Implemented plugins
    • Styling
      • Media query
      • Dynamic styling
      • Paragraph styling
      • Run styling
      • Interconnected styling
      • OpenXML Properties support
        • Bold
        • Borders
        • Caps
        • Color
        • Font size
        • Highlighting
        • Indentation (responsive)
        • Italic
        • Justification
        • Run fonts
        • Spacing
        • Small caps
        • Vanish
        • Underline
    • Numbering
      • Roman, letters, ordinal, etc.
      • Indentation
      • Styling (theme and inline)
    • Text processing
      • Paragraphs
      • Runs
      • Tabulation configuration
    • Tables
      • Warning to let the user know it's not supported yet.
    • Table of content
    • Text fixes
      • Break/tabs/hypen character insertions
      • Cross references clean up
      • Html escape
      • Remove w:instrText
  • Benchmarks
    • Conversion
    • Rendering
    • Styling
    • Numbering
    • Comparing against OpenXmlPowerTools
  • CLI Tool
    • Convert documents
    • Verbose, debug and parallelism options
    • Crash tests
    • Search for keywords in documents
  • Documentation
    • XML documentation
    • How to use the C# Api
    • How to extend with plugins
    • Benchmark and performance breakdown
    • Doc2web vs other tools
    • Contribution guide
    • Plugin samples
    • Github pages
  • Other
    • Coverage > 90%
    • Continuous integration
    • NuGet package publicly available (pre-release)
    • Public and easy docker container with CLI/Benchmark

doc2web's People

Contributors

osasseville avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.