Light

deangelisdf / write2audiobook Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 4.0 124 KB

A powerful tool designed to convert text-based documents into engaging audiobooks. Perfect for anyone looking to make reading more accessible, whether for people with visual impairments or for those who simply prefer listening on the go.

License: MIT License

Python 100.00%

audiobooks ffmpeg python3 visually-impaired

write2audiobook's Introduction

Write2Audiobook

Simplify life with audiobooks.
A tool designed to help visually impaired people by converting text-based documents into audiobooks.

Features

Convert various text formats to audio, including EPUB, TXT, PPTX, and DOCX.
Easy to use with simple command-line instructions.
Enhances accessibility for visually impaired users.

Requirements

To get started, clone the repository and install the necessary dependencies:

git clone https://github.com/deangelisdf/write2audiobook
cd write2audiobook
python3 -m pip install -r requirements.txt

How to Use

You can convert your documents to audiobooks using the following commands:
To convert an EPUB book to an audiobook:

python3 ebook2audio.py book.epub language

To convert a plain text file to an audiobook:

python3 txt2audio.py text.txt language

To convert a PowerPoint presentation to an audiobook:

python3 pptx2audio.py presentation.pptx language

To convert a Word document to an audiobook:

python3 docx2audio.py document.docx language

where language supported is it stay for italian and en stay for english.

Contributing

We welcome contributions! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For questions, suggestions, or feedback, feel free to open an issue or contact us.

write2audiobook's People

Contributors

Stargazers

Watchers

Forkers

brunodavi tgal117 greg-martinez44 krishn1412

write2audiobook's Issues

refactor m4b.py

The requeriments.txt is requirements

Refact PPTX script

(ebook) use OPF metadata

(ffmpeg) add cover to final result audiobook

The m4a generate by the tools are lack of cover.

Improve table reading (docx)

This table reading is already in place, but for the reader is not clear when it start and the iteration with them.

Desiderable: add a parameter to add verbosity reading of table

(Pre-analysis) (ebook/docx) Remove reference in book.

In many book is possible to find link to reference, but the reading of those numbers or phrase is not natural in a lecture section.

In all scripts exist a method used to extract chapter text, the analysis required here need to take this text as input and remove possible reference.
Most of times a reference Is defined with following syntax [1] where 1 is the reference.
Reference examples:

[1]
(Author, 1999)
phrase.1

Desiderable: add an optional flag to remove the references from text.

is it possible compress the final audio?

Right now the bit_rate is the only solution I found to reduce the final file dimension.

This is a feasibility, then the desiderable - is to have as output

pull request with a solution
Or
analysis of all compression path tried

(tool) move constante to libraries

As BACK_END_TTS Need to be move to m4b.py, maybe also other constante global Need to be moved to library.

This Activity Need to reduce duplicate code.

Unit test required

Create a user guide documentation

Need to create a blog something similar to explane how to use the tool and how contribute

(plantuml) read uml diagram

Parsing and interpretate It in order to generate text

The parser for each diagram shall stay under a specific folder plant_uml
Each parser shall return a graph represent the diagram read

(PPTX) Generate description of diagram

Starting from PowerPoint diagram generate a description.

It can be done using phrase as:

The box "input" are linked to box "System A".
The box "System A" are linked with text "output"

The library used is pptx-python, the objects must be retrived are Shapes and Autoshapes (https://python-pptx.readthedocs.io/en/latest/api/shapes.html#shape-objects-autoshapes)

Reduce Cyclomatic Complexity

In order to reduce the bug injection and improve readability of code.

Desiderable: create a report of changes for each commit generated with a GitHub action

add internal documentation

The functions in the scripts are not all documented.
To improve quality and readability of scripts, need to add at least header comments for each function.
Desiderable is to add the typing for each function and comments, header comments.

Prototype pdf reader

DeepL API integration

Sometime a book Is not wrote in our mother tongue, to solve It we can use api of DeepL to translate the documents in our preferiti language.

test all script, in different envirorments

All the scripts right now are used only under Windows.
Can be useful to expand compatibility at least with MacOs.
Desiderable is to test it under ubuntu and other linux distro.

Improve code style

Update all scripts to be compliance with PyLint rule active

(epub) move temporary mp3 files to temporary folder

when an epub are converted in m4b, the following steps are executed:

extractiong all data from epub in a temporary folder
extract guide from epub
for each file (not in guide)
- extract text and save it (on the folder where the script is executed)
- generate mp3 and save it (on the folder where the script is executed)
generate ffmetada starting from mp3 generate previously
merge all mp3 and ffmetada in unique M4B file

The goal of this activity is remove the temporary file.

Prototype Excel reader

Similar to other script, can be useful to have a reader for excel files.

Desiderable:

add requirements
read simple table and sheets
say how many graph are present in the page (in the next step, we want read also graphs)

class diagram

Create a script base on other in report, to read in input a class diagram (plantuml) and generate:

plantuml class-diagram parser
textual equivalent to diagram
generate the final audio
unit tests

Docs: https://plantuml.com/class-diagram

Fix code-style issues

As last commit the GitHub action failed.

add language support

The tool support only italian language, add other language require to configure properly each backend provided to the tool.

To add new language, the developer Need to modify the global dictionary as following:

LANGUAGE_DICT = {"it-IT":"it"}
LANGUAGE_DICT_PYTTS = {"it-IT":"italian"} in m4b.py
TITLE_KEYWORD = {"it-IT":"TITOLO", "en":"TITLE"}
CHAPTER_KEYWORD= {"it-IT":"CAPITOLO", "en":"CHAPTER"} in docx2audio.py

Desiderable: looking for hard coded strings and generalize it with a global dictionary, as previous one.

(PPTX) get context by image

Desiderable: Usage of Google Lens or equivalent service to analyze an image and retrieve information about it

extract image from slides
convert in JPEG format (to compresse image)
sent to generative AI (as bing copilot or chatgpt) to retrived image context

sequence diagram

Create a script base on other in report, to read in input a class diagram (plantuml) and generate:

plantuml sequence diagram parser
textual equivalent to diagram
generate the final audio
unit tests

Docs: https://plantuml.com/sequence-diagram

(ffmpeg) the tool shall add pause track between chapters

The tool shall permit to add custom pause track between two chapters, in this way the audio read are not "flat" and improve the listener focus.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.