The latex2speech from hutchresearch

Action Items 11/5/20

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Repeating lines

Using sample.tex this is in Documentation/sample_issues/ directory in our project

The input at the top starting from line 40 is

`Some basic instructions are given next.
Put your text in here. You can be a little sloppy about
spacing. It adjusts the text to look good.
{\small You can make the text smaller.}
{\tiny You can make the text tiny.}

Skip a line for a new paragraph.
You can use italics ({\em e.g.} {\em Thermodynamics is everywhere}) or {\bf bold}.
Greek letters are a snap: $\Psi$, $\psi$,
$\Phi$, $\phi$. Equations within text are easy---`

However, it renders like this as the output for this section

Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <break time="40ms"/>
Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <break time="40ms"/>
Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <emphasis level="strong">  </emphasis>  e.g. <emphasis level="strong">  </emphasis>   Thermodynamics is everywhere ) or   bold .
Greek letters are a snap:  Psi  ,  psi  ,
 Phi  ,  phi  .  Equations within text are easy---

It repeats multiple times...

Zip | Tar | Directory Traversal

Now that zipped and tar files work we need to traverse the files to 3 levels.

In the example below, as we unzip a folder, there could be other directories or zipped files and even potentially tar files. For now we will only go three levels deep and evaluate any .tex or .bib files.

test.zip
   Folder1
      math.tex
      bib.tex
   zip1.zip
      zip_contents.tex
      second_contents.tex
table.tex
secondTable.tex````

Mathmode - Does not Render

The following mathmode does not render. This is an enhancement, not a priority.

\begin{eqnarray}
du &=& T\ ds -P\ dv, \qquad \mbox{first law.}\label{fl}\\
ds &\ge& {\delta q \over T}.\qquad  \qquad \mbox{second law.} \label{sl}\\
dd &\le& {\delta q \over T}.\qquad  \qquad \mbox{third law.} \label{tl}
\end{eqnarray}

Add Content to Home Page

Need to replace lorem ipsum text with actual steps

Download Page - Web Design

[Home Page Design/Development] - Finished
[Progress Bar Design/Development] - Finished

Need to create design for download page, then implement

Footnote command

If we have time \footnote could be added to the pronunciation file! Not a priority

References\footnote{Lamport, L., 1986, {\em \LaTeX: User's Guide \& Reference Manual},

Emphasis Tag Problem

Given this input You can use italics ({\em e.g.} {\em Thermodynamics is everywhere}) or {\bf bold}. we have the \em tag.

This should render something similiar

You can use italics <emphasis level = "strong">e.g.</emphasis> ...

However the output is this

You can use italics ( <emphasis level="strong">  </emphasis>  e.g. <emphasis level="strong">  </emphasis>   Thermodynamics is everywhere ) or   bold .

The emphasis tag is empty with the stuff that should be in it is on the outside.

Beginning contents being read last

Beginning commands such as \title{} \author{} and \date{} is sometimes being read last. There are cases where it will be read at the beginning, but other cases where it is inserted at the very end. This bug is in correlation to bug #50

LaTeX file

\usepackage[utf8]{inputenc}
\usepackage{amsmath}

\title{Tables and More Tables}
\author{Taichen Rose}
\date{January 2021}

\newtheorem{thm}{Theorem}
\newtheorem{corr}{Corollary}

\begin{document}
\maketitle
\begin{center}
\begin{tabular}{ c c c }
 cell one & cell two & cell three \\
 cell4 & cell5 & cell6 \\
 cell7 & cell8 & cell9
\end{tabular}
\end{center}

\end{document}

Output:
<speak>Title<p>Table Contents:<break time="40ms" />New Row: , Column 1, Value: cell one , Column 2, Value: cell two , Column 3, Value: cell three \\ New Row: , Column 1, Value: cell4 , Column 2, Value: cell5 , Column 3, Value: cell6 \\ New Row: , Column 1, Value: cell7 , Column 2, Value: cell8 , Column 3, Value: cell9<break time="40ms" /></p><break time="0.3s" />Tables and More Tables<break time="0.3s" />By:<break time="0.3s" />Taichen Rose<break time="0.3s" />Published:<break time="0.3s" />January 2021<break time="0.3s" /></speak>

Where the title, author, and publish date should be read at the beginning of the file, not the contents.

S3 Bucket Delay

From my calculations it takes 15 seconds for Amazon Polly to feed the .mp3 file to the S3 bucket. Once our algorithm finishes, it directs the user to the download page, however, the user will have to wait 15 seconds to actually download the file. This is because there is NO file currently in the S3 bucket, it takes 15 seconds for it to form.

mathmode bug

When Sympy -> SSML happens, they don't render open or close tags. [Was fixed prior, now error again]

LaTeX Example:
\begin{equation} h(x)=x^2 \end{equation}

Putting it into LaTeX - > SSML mathmode it produces
SSML h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/>

However, after the complete TexParser, the following output is
the equation h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/&gt

Where it uses &lt / &gt instead of < >

Commands - Extra back slash parsing bug

In LaTeX files, there are normal commands such as

\LaTeX

Which properly get rendered as the command "LaTeX", however, when the command such as

\LaTeX\ words after

Then the parser renders "LaTeX" as a command and "words" as a command. This will make "words" not become concatenated to the SSML file.

Action Items 10/29/20

Backlog:

Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Action Items 10/8/20

Backlog:

Fix conversion db tests

Due to update of XML for type = "none" for tables and mathmode, needed to update XML.

These tests that were written were in regards to a previous database before this change. Changing the type = "none" still results in errors across the board.

_getArg function returns NoneType

In _getArg there is a case with sample.tex (zipped in sample.zip here) file that returns NoneType. Later down the road, we need this argument to not be null. There is an edge case here that returns arg as Null

Example file that breaks: sample.tex.zip

Error Log with corresponding file:
[2021-03-07 20:33:10,367] ERROR in app: Exception on /upload [POST] Traceback (most recent call last): File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/app.py", line 63, in handle_upload audio_links = start_polly(file_holder, input_holder, bib_holder) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/aws_polly_render.py", line 265, in start_polly parsed_contents = start_conversion(texFile.read()) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/aws_polly_render.py", line 245, in start_conversion parsed_contents = parser.parse(contents) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 284, in parse self._parseNodes(doc.contents, tree) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 240, in _parseNodes parseOut = self._parseEnvironment(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 167, in _parseEnvironment self._parseNodes(contents, ssmlParent, leftChild=leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 240, in _parseNodes parseOut = self._parseEnvironment(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 167, in _parseEnvironment self._parseNodes(contents, ssmlParent, leftChild=leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 242, in _parseNodes parseOut = self._parseCommand(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 221, in _parseCommand self._resolveCmdElements(cmdNode, ssmlParent, elemList, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 201, in _resolveCmdElements self._resolveCmdElements(cmdNode, elemList[i], elemList[i].children, None) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 187, in _resolveCmdElements contents = self._getArg(cmdNode, elem).contents AttributeError: 'NoneType' object has no attribute 'contents'

Upload to Cloud

Upload application using either Elastic Beanstalk or EC2

-> Document future applications that you can host (Can Host flask applications on Heroku)

Commands that need to be supported

Commands that need to be supported are listed below

\subsection
\Tex

\subsection{Vision Statement}
\TeX

Mathmode Enhancements

[Copied from previous bug list] -> These bugs currently are not priority, some are more expansions than bugs. ANTLR currently doesn't accept this.

Cases that break because Sympy can't withold these criteria (Need a fix or use try/catch):

Equation can't have multiple equal signs
Example \sqrt[3]{8}=8^{\frac{1}{3}}=2
There can't be a one sided equal sign there must be stuff on both sides
Example = 3 + 2
ANTLR doesn't render ln as natural log, need to have \ln
Not a fan of f(x, y) equations
Example \int_{a}^b\int_{c}^d f(x,y)dxdy
Don't think limits can be within summations
Example \sum\limits_{j=1}^k A_{\alpha_j}
Doesn't like having no value after a sum with bounds
Example \sum_{i=1}^{n}
Doesn't like > sign (probably doesn't like < signs) need to have leq
Example \inf_{x > s}f(x)
Doesn't understand prime
Example f' or f''
Sometimes with leq it evaluates the equation
Example 3\leq2 becomes False
Example 3\geq3 becomes True
Doesn't like ^ in some cases
Example ^3/_7

Action Items 11/19/20

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Multi-file input doesn't work if input statements are in-line

In latex the following is valid
...
This is text \include{other-file} and this is text.
...
but in this case the functions found_bibliography_file and found_input_file break since they only look at the beginning of the line for the command.

A solution I thought of involves using regular expressions to quickly and easilly drill down through all invocations of input commands. I'll be willing to finish working on this, but only after more important verification work gets done first.

Change Dev Flask to Production

Look into adding server to flask for production uses.

Options
-> Gunicorn
-> Option See-Mong recommended

You must serve the Flask app with a WSGI container. Use uWSGI or Nginx. The Flask development server can't be relied on for actual deployments.

Itemize error

\itemize, enumerate, and \item have shown to have error issues, I will post the problems I run into here

To fix these problems, we need to add the items to our pronunciation.xml file

Action Items 11/12/20

TR Download ~100k tex files representative of all of the arxiv subjects
CB Produce an easily parseable list of all latex commands
- Others
- Convert existing lists into csv
DR Draft a document summarizing your take-aways from the equation exchange exercise
CB+WH Keep working on parser
JN+DR Expand the keyword counting script to support more than just bare keyword
TR Look into latex math parsers specifically (want something polished, ideally)

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Zipped/Tar Files

Get zipped/tar files to work

Sentences are being cut off - Due to Input Files

Note: This was fixed, but I'm going to keep this issue up so I can do it more elegantly

In the Vision and Scope document, sentences are being cut off and smashed together.

Example of LaTeX document
Begin Comment Our Product: differentiation statement For on the go researchers and academics who wish to increase their available paper reading time, \TeX 2Speech will provide on-demand speech synthesis for papers and a wide variety of other scientific and technical documentation written in the \LaTeX\ format. Unlike standard TTS systems, \TeX 2Speech will be able to effectively parse a \LaTeX\ document containing mathematical equations, and convert it to comprehensible spoken word.

Output

Begin Comment  Our Product: differentiation statement
For on the go researchers and academics who wish to increase their available paper reading time,   2Speech will provide on-demand speech synthesis for papers and a wide variety of other scientific and technical documentation written in the  LaTeX   2Speech will be able to effectively parse a  LaTeX

Web Application Redesign

Update the look of the application

Bring back Dropzone
Create design and logo

Action Items 12/3/20

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Update README.md

Add technical documentation + team information to the README!

Startup Action Items 10/1/20

Action Items 10/15/20

Backlog:

Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Action Items 1/12/21

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Mathmode - Too Many Parenthesis

Below I show an example. The output is correct, just there are a LOT of parenthesis, I'm not sure if this is a priority, but just wanted to address this.

Input is as follows

$\left.{\partial T \over \partial P}\right|_{s} = 
\left.{\partial v \over \partial s}\right|_{P}$.

Here is the output...

<prosody pitch="+25%"><break time="0.3ms"/>
  -begin first parentheses</prosody><break time="0.3ms"/> 
    left times 
    <prosody pitch="+25%"><break time="0.3ms"/>
    -begin second parentheses</prosody><break time="0.3ms"/> 
      <prosody pitch="+25%"><break time="0.3ms"/>
        -begin third parentheses</prosody><break time="0.3ms"/> 
          partial times 
          <prosody pitch="+25%"><break time="0.3ms"/>
            -begin fourth parentheses</prosody><break time="0.3ms"/> 
              T times 
              <prosody pitch="+25%"><break time="0.3ms"/>
                -begin fifth parentheses</prosody><break time="0.3ms"/> 
                  over times 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                    -begin sixth parentheses</prosody><break time="0.3ms"/> 
                      partial times P 
                      <prosody pitch="+25%"><break time="0.3ms"/>
                    +end sixth parentheses</prosody><break time="0.3ms"/> 
                    <prosody pitch="+25%"><break time="0.3ms"/>
                 +end fifth parentheses</prosody><break time="0.3ms"/> 
                 <prosody pitch="+25%"><break time="0.3ms"/>
               +end fourth parentheses</prosody><break time="0.3ms"/> 
               <prosody pitch="+25%"><break time="0.3ms"/>
            +end third parentheses</prosody><break time="0.3ms"/> 
            times right 
          <prosody pitch="+25%"><break time="0.3ms"/>
         +end second parentheses</prosody><break time="0.3ms"/> 
           <prosody pitch="+25%"><break time="0.3ms"/>
  +end first parentheses</prosody><break time="0.3ms"/> 
   equals 
   <prosody pitch="+25%"><break time="0.3ms"/>
  -begin first parentheses</prosody><break time="0.3ms"/> 
    left times 
    <prosody pitch="+25%"><break time="0.3ms"/>
      -begin second parentheses</prosody><break time="0.3ms"/> 
      <prosody pitch="+25%"><break time="0.3ms"/>
        -begin third parentheses</prosody><break time="0.3ms"/> 
          partial times 
          <prosody pitch="+25%"><break time="0.3ms"/>
            -begin fourth parentheses</prosody><break time="0.3ms"/> 
              v times 
              <prosody pitch="+25%"><break time="0.3ms"/>
                -begin fifth parentheses</prosody><break time="0.3ms"/> 
                  over times 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                  -begin sixth parentheses</prosody><break time="0.3ms"/>
                    partial times s 
                    <prosody pitch="+25%"><break time="0.3ms"/>
                  +end sixth parentheses</prosody><break time="0.3ms"/> 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                +end fifth parentheses</prosody><break time="0.3ms"/> 
                <prosody pitch="+25%"><break time="0.3ms"/>
              +end fourth parentheses</prosody><break time="0.3ms"/> 
              <prosody pitch="+25%"><break time="0.3ms"/>
            +end third parentheses</prosody><break time="0.3ms"/> 
            times right <prosody pitch="+25%"><break time="0.3ms"/>
          +end second parentheses</prosody><break time="0.3ms"/> 
          <prosody pitch="+25%"><break time="0.3ms"/>
  +end first parentheses</prosody><break time="0.3ms"/>  .

Progress Bar

When a user uploads a document there needs to be a progress bar to show progress or feedback.

def currently unsupported by ExpandMacros

More a reminder for myself, but the most basic and abstract macro definition capability of LaTeX is currently unsupported by the expand_macros module.

Database error

Some things in the XML is not being displayed as output. Could potentially be XML error or database error. This error may tie hand in hand with issue #35

basicMath.tex in the Documentation, when ran will not display the information correctly.

XML example:
<cmd name = "title" type = "none"> Title <break time = "0.3s"/> <arg num = "1"/> <break time = "0.3s"/> </cmd>

Output:
TitleBasic Math Bish Smiley Face

When the expected output should be something like
Title <break time = "0.3s"/> Basic Math Bish Smiley Face <break time = "0.3s"/>

Tabularx command

Get the tabularx command rendered (if have time)

The example below is from our Vision and Scope document

\begin{tabularx}{\linewidth}{|l|l|l|X|}\hline
Ver. & Date & Who & Change\\\hline
1.1  & 10/23/20  & Connor Barlow  & Added client considerations in risks and limitations\\\hline
\end{tabularx}

External .bib files get messed up with TexSoup

This software currently adds external .bib files before it gets handed to the main TexParser. Since this is done prior, when the contents of the file (with the embedded bibliography) is put into TexSoup, it removes a bunch of bib's values.

Reproduce Error: Upload a file with a corresponding bib file

Solution: Move bib file to be done after TexParser

Action Items 12/10/20

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Mathmode Preprocess

When given mathmode to the ANTLR tree, there are some characters that immediately break this. In the future we need to get a try / catch feature since there are so many math items that our parser can't handle.

Characters that we need to get rid of before it gets handed to the mathmode pipe:

\ & \\ [ ] .

Cases that break because Sympy can't withold these criteria (Need a fix or use try/catch):

Equation can't have multiple equal signs
Example \sqrt[3]{8}=8^{\frac{1}{3}}=2
There can't be a one sided equal sign there must be stuff on both sides
Example = 3 + 2
ANTLR doesn't render ln as natural log, need to have \ln
Not a fan of f(x, y) equations
Example \int_{a}^b\int_{c}^d f(x,y)dxdy
Don't think limits can be within summations
Example \sum\limits_{j=1}^k A_{\alpha_j}
Doesn't like having no value after a sum with bounds
Example \sum_{i=1}^{n}
Doesn't like > sign (probably doesn't like < signs) need to have leq
Example \inf_{x > s}f(x)
Doesn't understand prime
Example f' or f''
Sometimes with leq it evaluates the equation
Example 3\leq2 becomes False
Example 3\geq3 becomes True
Doesn't like ^ in some cases
Example ^3/_7

Action Items 10/22/20

Backlog:

Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

Sympy -> SSML

When Sympy -> SSML happens, they don't render open or close tags.

LaTeX Example:
\begin{equation} h(x)=x^2 \end{equation}

Putting it into LaTeX - > SSML mathmode it produces
SSML h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/>

However, after the complete TexParser, the following output is
the equation h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/&gt

Where it uses &lt / &gt instead of < >

Parser Recursion Problem :(

Parser is reading items out of order, this is in correlation to #35. Some sort of recursion is acting up.

File that reproduces error: main.tex in Documentation files

Example below (this is a snippet from main.tex):

`
\title{Very Basic Environments}
\author{Connor Barlow}
\date{October 2020}

\newtheorem{thm}{Theorem}
\newtheorem{corr}{Corollary}

\begin{document}
\maketitle

\section{Lists}
\begin{enumerate}
\item This is a numbered list.
\item The numbers should be read aloud.
\end{enumerate}
\begin{itemize}
\item This is a bullet list.
\item The items have no intrinsic ordering.
\end{itemize}
\begin{description}
\item[Bob] What type of list is this?
\item[Bill] Its for listing items that have a corresponding name!
\item[Bob] Neat.
\end{description}
\section{Simple Table}
\par
Here's a very ugly table to show different combinations of lines.
\par
\begin{tabular}{|l||cr}
1 & 2 & 3 \
4 & 5 & 6 \
\hline
7 & 8 & 9 \
\hline \hline
\end{tabular}
`

Output:
`
TitleThis text is not formatted in any way,
ListsSection:Section:Simple Table
Here's a very ugly table to show different combinations of lines.
Table Contents:
but thisis! Now here's some math related things. \
Here's some varieties of math mode. First some line separated math modes f(x)=x g(x)= and now inline math modes, lim _ x f(x)= and g(x)= (x) . For something a bit more sophisticated we have equations and theorems.
the equation h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/> &= - \ &= -
This is my theorem.

This is my corollary.

And now a couple basic formatting environments.

These shouldn't change too much besides documents structure.

And their typical use won't be as contrived as this.

New Row: , Column 1, Value: 1 , Column 2, Value: 2 , Column 3, Value: 3 \\ New Row: , Column 1, Value: 4 , Column 2, Value: 5 , Column 3, Value: 6 \\ New Row: , Column 1, Value: New Row: , Column 1, Value: 7 , Column 2, Value: 8 , Column 3, Value: 9 \\Very Basic EnvironmentsBy:Connor BarlowPublished:October 2020Section: `

As shown above, contents are being read out of order completely.

Action items 1/19/21

Backlog:

Look into how pdflatex parses
Code up a demo with TexSoup that recursively traverses a latex doc
Brainstorm the role DAC could play
Produce a 1-2 page survey document on existing latex2speech tools
Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
Produce a <= 1 page survey document on options for parsing and manipulating latex from python
Identify and mark the latex commands that should have an effect on the synthesized speech
Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
Build a test suite to evaluate progress as we implement features

\LaTeX\ command

\LaTeX\ command breaks the application again.

\LaTeX is rendered as LaTeX command, however due to the second \ at the end, it renders everything after a possible command which completely disregards the text.

Comments - Comments Hurt TexSoup

With comments denoted by %, it ruins some of TexSoup's parsing. It adds the environments and commands after it for some reason. Some examples are shown below of what I mean by this.

% Document Information
%
\title{Tex2Speech\\Vision and Scope}
\author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose}
\date{October 23, 2020}

In the LaTeX snippit above, when we run this through the parser this is the output.

<speak>
% Document Information % \title{Tex2Speech\\Vision and Scope} \author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose} \date{October 23, 2020}
</speak>

While if I were to get rid of the comment ... giving the Parser this

\title{Tex2Speech\\Vision and Scope}
\author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose}
\date{October 23, 2020}

It would be output like this

<speak>
Title: <break time="0.3s"/> Tex2Speech \\ Vision and Scope <break time="0.3s"/> By: <break time="0.3s"/> Connor Barlow, Walker Herring, Jacob Nemeth, \\ Dylon Rajah and Taichen Rose <break time="0.3s"/> Published: <break time="0.3s"/> October 23, 2020 <break time="0.3s"/>
</speak>

Expand on DB for Mathmode

Some mathmode stuff needs to be implemented in our database. As shown below, only environment "equation" is implemented, but $, $$, etc need to be completed

`

the equation

<env name="eqnarray" type="mathmode">
</env>

<env name="$" type="mathmode">
</env>

<env name="$$" type="mathmode">
</env>

`

Speak tag being rendered early

Speak tag is being rendered in the testingBib.tex and BibFile.bib files. I believe what is happening is the TexParser adds the starting speech and the ending speech.

The .bib file is being rendered at the end of the TexParser so it must be added right after this.

<speak> Title  <break time="0.3s"/>   An Example Document  <break time="0.3s"/>   By:  <break time="0.3s"/>   John Smith  <break time="0.3s"/>   Published:  <break time="0.3s"/>    <break time="0.3s"/>   Section:  <break time="0.3s"/>   The first section  <break time="0.3s"/>
This is an example of a document formatted using  LaTeX  .
This is an example of a citation  <emphasis level="reduced"> Cited in reference as: gG07  <break time="0.3s"/>    </emphasis> .
Now here is an example of an equation:


i (r,t) = - ^2 (r,t)+V(r) (r,t)
 </speak> <emphasis level='strong'> References Section </emphasis> <break time='1s'/>  Bibliography item is read as: <break time='0.5s'/>gG07. Type: book<break time='0.5s'/>  Authors: Gratzer, George A., <break time='0.3s'/> title: More Math Into LaTeX<break time='0.3s'/>publisher: Birkhauser<break time='0.3s'/>address: Boston<break time='0.3s'/>year: 2007<break time='0.3s'/>edition: 4th<break time='0.3s'/>

PDFLaTeX Not working for example

PDFLaTeX is not working for the sample.tex file. Instead of replacing Eq.~() with Equation 1, or Equation 2, it empties reference. Need to relook to double check code in aws_polly_render

Input:
Eq.~(\ref{fl}) is the first law. Eq.~(\ref{sl}) is the second law. Eq.~(\ref{tl}) is the third law.

Output:
Eq.~( ) is the first law. Eq.~( ) is the second law. Eq.~( ) is the third law.

Amazon Polly Bug

Infinitely loops at any file when calling tts function in aws_polly_render

Example:
`
CONTENTS AFTER CHANGE

Title Basic Math Bish Smiley Face By: Taichen Rose Published: March 2021
Start of comment This is a comment I guess?

Hey guys LaTeX
I'm doing great LaTeX LaTeX LaTeX the equation

TEST
An error occurred (InvalidSsmlException) when calling the StartSpeechSynthesisTask operation: Invalid SSML request
`

This will loop multiple times until it exits out from runtime error. It says Invalid SSML request, but it stil shouldn't be looping. Also when you open up the final1.tex master file it for some reason has latex commands instead of ssml.

hutchresearch / latex2speech Goto Github PK

latex2speech's People

Contributors

Stargazers

Watchers

Forkers

latex2speech's Issues

Recommend Projects

Recommend Topics

Recommend Org