Giter VIP home page Giter VIP logo

docx_to_epub3_pc's Introduction

DOCX_TO_EPUB3_PC

Convert .docx file to EPUB 3 book (on a PC)

Dependencies

Note: make the script executable by opening a terminal and entering the following command: - chmod + path/to/script.sh

Usage

  1. Create a folder directory which include the .docx file you wish to convert
  • Root Folder/
    • Title.txt
    • cover.png
    • cover.txt
    • my_file.docx
  1. Create a metadata file for your EPUB (e.g., Title.txt)
  • use YAML format:
    • Use --- at the beginning of the file
    • Use ... at the end of the file
  • For example, the YAML block can contain: title: creator: publisher: etc.

(See Title.txt file in this repo for an example of a YAML metadata block)

  • Place the Title.txt file in the root folder
  • Note: if you are using a different name or format for your metadata document, make sure to adjust the script (see below) so that the pandoc command contains the correct name and file extension (the default name for your metadata file in the script is "Title.txt")
  1. Add an image file for your EPUB (e.g., Cover.jpg) + a custom alt text in a separate file (e.g., Cover.txt)

    • Place the cover.png file in the root folder
    • Note: if you are using a different name or format for your cover image file, make sure to adjust the script (see below) so that the pandoc command contains the correct name and file extension (the default name for your cover file in the script is "cover.png")
    • Place the cover.txt file in the root folder
    • Edit the Cover.txt file for the appropriate alternate text for the Cover.png
      • e.g., alt="EPUB logo"
  2. Edit the .docx file in MS Word

    • add heading structure
      • mark page number as Heading #6 (if you added the word "page" in front of the page number (in ABBYY), you can do a regex find and replace (find what:page\ [0-9]; replace with: heading style 6) to convert all the page numbers to a Heading
    • format tables (repeat header row)
    • add alternate text for images
    • create math equations using MathType; then delete the image placeholder for the math
      • When all the equations have been entered, use the convert equations button (on the MathType ribbon) OR Use GrindEQ MathType to MS Word Equation Tool.
        • select: MathType equations
        • Select Range: Whole document
        • Convert equations to Texvc(LaTeX delimiters)
        • unselect the checkboxes include translator name as a comment
        • unselect include MathType data as a comment
    • Mark up other languages in the following way (currently we allow up to four different languages in one book, where the default language for the whole book is en-US for English)
      • first language (default is French)
        • +++ (start of text)
        • === (end of text)
      • second language (default is Italian)
        • @@@ (start of text)
        • %%% (end of text)
      • third language (Default is Spanish)
        • !!! (start of text)
        • ??? (end of text)
      • Note: if your book has languages other than French, Italian, or Spanish, edit the script for the appropriate ISO values.

    (Repeat step 4 until every chapter of the book has been corrected and edited)

  3. Use the bash script (DOCX_To_EPUB3_PC.sh) to convert the DOCX file into an EPUB 3 book

    • To run the script: open a terminal and enter this command from the root directory
      • ./DOCX_To_EPUB3_PC.sh
      • Press enter and wait for the script to execute
    • The script performs these functions:
      • Converts docx file to Markdown
        • Note: Currently Pandoc cannot convert a DOCX file with LaTex Math to MathML when EPUB 3 is the output format specified; Pandoc has no problem, however, converting Markdown files + Latex to MathML when exporting to EPUB3
      • Corrects LaTeX syntax after Pandoc conversion from DOCX to Markdown
      • Converts Markdown file to EPUB 3
      • Adds epub:type markup to page numbers in document
      • creates a page-list nav section to NAV document
      • adds accessibility metadata to the package document
      • adds custom alternate text for the cover image
      • adds xml:lang attribute to the XHTML files, including up to four languages
      • Runs the ACE accessibility checker by DAISY to create an accessibility report on the EPUB 3 book
  4. Check the ACE accessibility report to confirm that there are no errors:

    • The script will run the ACE tool on the EPUB and output it to a "Report" folder
    • Correct any errors in the EPUB (see next step)
    • Once the EPUB book has no errors, change the name of the EPUB to the name of the book
  5. (OPTIONAL) Use Sigil to correct / edit EPUB information for accessibility

  • Note: our script adds this information automatically but there is need for human editing to confirm the access mode, accessibility summary, add accessibility features, accessibility hazards etc.
  • Here are some example items that you may wish to edit/ add: <meta property="schema:accessibilitySummary">This publication conforms to WCAG 2.0 Level AA.</meta> <meta property="schema:accessMode">textual</meta> <meta property="schema:accessMode">visual</meta> <meta property="schema:accessModeSufficient">textual,visual</meta> <meta property="schema:accessibilityFeature">MathML</meta>
  • Save the Ebook and exit Sigil
  1. Open the book in an EPUB reading system of your choice

    • Use the Book Industry Standards Group website (www.BISG.org) to check which reading systems support the EPUB features included in your book.
    • we recommend using the following reading systems:
      • Vital Source Bookshelf (cross-platform)
      • iBooks (macOS and iOS)
      • MS Edge (PC)
      • Easy Reader app (iOS)

docx_to_epub3_pc's People

Contributors

polizoto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.