Giter VIP home page Giter VIP logo

latex2speech's People

Contributors

blackboardbold avatar dylonrajah avatar jrnemeth avatar walkerjh avatar willsower avatar wolrab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

latex2speech's Issues

Change Dev Flask to Production

Look into adding server to flask for production uses.

Options
-> Gunicorn
-> Option See-Mong recommended

You must serve the Flask app with a WSGI container. Use uWSGI or Nginx. The Flask development server can't be relied on for actual deployments.

Action Items 11/12/20

  • TR Download ~100k tex files representative of all of the arxiv subjects
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv
  • DR Draft a document summarizing your take-aways from the equation exchange exercise
  • CB+WH Keep working on parser
  • JN+DR Expand the keyword counting script to support more than just bare keyword
  • TR Look into latex math parsers specifically (want something polished, ideally)

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Fix conversion db tests

Due to update of XML for type = "none" for tables and mathmode, needed to update XML.

These tests that were written were in regards to a previous database before this change. Changing the type = "none" still results in errors across the board.

Action Items 1/12/21

  • Sort top commands/environments found into ones that we definitely will generate/modify speech, ones we definiely won't need to do anything with, and ones you're not sure off
    • JN
    • DR
  • TR Demo ANTLR
  • TR Demo webapp
  • WH Start thinking about / working on SymPy to SSML process
  • CB Add newenvironment and renewenvironment
  • CB+TR Add support for figure and table references
  • All: Work on to cleaning/organizing the code base/repo
  • WH Keep investigating sympy representations and contemplate whether that representation makes things easier or harder to render.
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Zip | Tar | Directory Traversal

Now that zipped and tar files work we need to traverse the files to 3 levels.

In the example below, as we unzip a folder, there could be other directories or zipped files and even potentially tar files. For now we will only go three levels deep and evaluate any .tex or .bib files.

test.zip
   Folder1
      math.tex
      bib.tex
   zip1.zip
      zip_contents.tex
      second_contents.tex
table.tex
secondTable.tex````

Multi-file input doesn't work if input statements are in-line

In latex the following is valid
...
This is text \include{other-file} and this is text.
...
but in this case the functions found_bibliography_file and found_input_file break since they only look at the beginning of the line for the command.

A solution I thought of involves using regular expressions to quickly and easilly drill down through all invocations of input commands. I'll be willing to finish working on this, but only after more important verification work gets done first.

Beginning contents being read last

Beginning commands such as \title{} \author{} and \date{} is sometimes being read last. There are cases where it will be read at the beginning, but other cases where it is inserted at the very end. This bug is in correlation to bug #50

LaTeX file

\usepackage[utf8]{inputenc}
\usepackage{amsmath}

\title{Tables and More Tables}
\author{Taichen Rose}
\date{January 2021}

\newtheorem{thm}{Theorem}
\newtheorem{corr}{Corollary}

\begin{document}
\maketitle
\begin{center}
\begin{tabular}{ c c c }
 cell one & cell two & cell three \\
 cell4 & cell5 & cell6 \\
 cell7 & cell8 & cell9
\end{tabular}
\end{center}

\end{document}

Output:
<speak>Title<p>Table Contents:<break time="40ms" />New Row: , Column 1, Value: cell one , Column 2, Value: cell two , Column 3, Value: cell three \\ New Row: , Column 1, Value: cell4 , Column 2, Value: cell5 , Column 3, Value: cell6 \\ New Row: , Column 1, Value: cell7 , Column 2, Value: cell8 , Column 3, Value: cell9<break time="40ms" /></p><break time="0.3s" />Tables and More Tables<break time="0.3s" />By:<break time="0.3s" />Taichen Rose<break time="0.3s" />Published:<break time="0.3s" />January 2021<break time="0.3s" /></speak>

Where the title, author, and publish date should be read at the beginning of the file, not the contents.

Mathmode Preprocess

When given mathmode to the ANTLR tree, there are some characters that immediately break this. In the future we need to get a try / catch feature since there are so many math items that our parser can't handle.

Characters that we need to get rid of before it gets handed to the mathmode pipe:

\ & \\ [ ] .

Cases that break because Sympy can't withold these criteria (Need a fix or use try/catch):

  1. Equation can't have multiple equal signs
    Example \sqrt[3]{8}=8^{\frac{1}{3}}=2
  2. There can't be a one sided equal sign there must be stuff on both sides
    Example = 3 + 2
  3. ANTLR doesn't render ln as natural log, need to have \ln
  4. Not a fan of f(x, y) equations
    Example \int_{a}^b\int_{c}^d f(x,y)dxdy
  5. Don't think limits can be within summations
    Example \sum\limits_{j=1}^k A_{\alpha_j}
  6. Doesn't like having no value after a sum with bounds
    Example \sum_{i=1}^{n}
  7. Doesn't like > sign (probably doesn't like < signs) need to have leq
    Example \inf_{x > s}f(x)
  8. Doesn't understand prime
    Example f' or f''
  9. Sometimes with leq it evaluates the equation
    Example 3\leq2 becomes False
    Example 3\geq3 becomes True
  10. Doesn't like ^ in some cases
    Example ^3/_7

Database error

Some things in the XML is not being displayed as output. Could potentially be XML error or database error. This error may tie hand in hand with issue #35

basicMath.tex in the Documentation, when ran will not display the information correctly.

XML example:
<cmd name = "title" type = "none"> Title <break time = "0.3s"/> <arg num = "1"/> <break time = "0.3s"/> </cmd>

Output:
TitleBasic Math Bish Smiley Face

When the expected output should be something like
Title <break time = "0.3s"/> Basic Math Bish Smiley Face <break time = "0.3s"/>

Parser Recursion Problem :(

Parser is reading items out of order, this is in correlation to #35. Some sort of recursion is acting up.

File that reproduces error: main.tex in Documentation files

Example below (this is a snippet from main.tex):

`
\title{Very Basic Environments}
\author{Connor Barlow}
\date{October 2020}

\newtheorem{thm}{Theorem}
\newtheorem{corr}{Corollary}

\begin{document}
\maketitle

\section{Lists}
\begin{enumerate}
\item This is a numbered list.
\item The numbers should be read aloud.
\end{enumerate}
\begin{itemize}
\item This is a bullet list.
\item The items have no intrinsic ordering.
\end{itemize}
\begin{description}
\item[Bob] What type of list is this?
\item[Bill] Its for listing items that have a corresponding name!
\item[Bob] Neat.
\end{description}
\section{Simple Table}
\par
Here's a very ugly table to show different combinations of lines.
\par
\begin{tabular}{|l||cr}
1 & 2 & 3 \
4 & 5 & 6 \
\hline
7 & 8 & 9 \
\hline \hline
\end{tabular}
`

Output:
`
TitleThis text is not formatted in any way,
ListsSection:Section:Simple Table
Here's a very ugly table to show different combinations of lines.
Table Contents:
but thisis! Now here's some math related things. \
Here's some varieties of math mode. First some line separated math modes f(x)=x g(x)= and now inline math modes, lim _ x f(x)= and g(x)= (x) . For something a bit more sophisticated we have equations and theorems.
the equation h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/> &= - \ &= -
This is my theorem.

This is my corollary.

And now a couple basic formatting environments.

These shouldn't change too much besides documents structure.

And their typical use won't be as contrived as this.

New Row: , Column 1, Value: 1 , Column 2, Value: 2 , Column 3, Value: 3 \\ New Row: , Column 1, Value: 4 , Column 2, Value: 5 , Column 3, Value: 6 \\ New Row: , Column 1, Value: New Row: , Column 1, Value: 7 , Column 2, Value: 8 , Column 3, Value: 9 \\Very Basic EnvironmentsBy:Connor BarlowPublished:October 2020Section: `

As shown above, contents are being read out of order completely.

Comments - Comments Hurt TexSoup

With comments denoted by %, it ruins some of TexSoup's parsing. It adds the environments and commands after it for some reason. Some examples are shown below of what I mean by this.

% Document Information
%
\title{Tex2Speech\\Vision and Scope}
\author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose}
\date{October 23, 2020}

In the LaTeX snippit above, when we run this through the parser this is the output.

<speak>
% Document Information % \title{Tex2Speech\\Vision and Scope} \author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose} \date{October 23, 2020}
</speak>

While if I were to get rid of the comment ... giving the Parser this

\title{Tex2Speech\\Vision and Scope}
\author{Connor Barlow, Walker Herring, Jacob Nemeth,\\Dylon Rajah and Taichen Rose}
\date{October 23, 2020}

It would be output like this

<speak>
Title: <break time="0.3s"/> Tex2Speech \\ Vision and Scope <break time="0.3s"/> By: <break time="0.3s"/> Connor Barlow, Walker Herring, Jacob Nemeth, \\ Dylon Rajah and Taichen Rose <break time="0.3s"/> Published: <break time="0.3s"/> October 23, 2020 <break time="0.3s"/>
</speak>

Tabularx command

Get the tabularx command rendered (if have time)

The example below is from our Vision and Scope document

\begin{tabularx}{\linewidth}{|l|l|l|X|}\hline
Ver. & Date & Who & Change\\\hline
1.1  & 10/23/20  & Connor Barlow  & Added client considerations in risks and limitations\\\hline
\end{tabularx}

Startup Action Items 10/1/20

  • Set up a github kanban board?
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
    • martysweet/latex-to-speech
    • tugboat
    • latex-access?
  • Find other relevant latex accessibility projects and add to survey document
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
    • Especially ones that allow phonetic or prosodic modifiers
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Produce a list of all latex commands
    • Sublist for environments
    • Sublist for various symbol sets
    • Others
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Expand on DB for Mathmode

Some mathmode stuff needs to be implemented in our database. As shown below, only environment "equation" is implemented, but $, $$, etc need to be completed

`


the equation


<env name="eqnarray" type="mathmode">
</env>

<env name="$" type="mathmode">
</env>

<env name="$$" type="mathmode">
</env>

`

Action Items 10/15/20

  • TR Investigate and report on the MathML to SSML code you found
  • TR Look for any other MathML to SSML converters
  • TR Look for cheap copies of ISBN-10: 0130226165 ("Spoken Language Processing")
  • JN Figure out and report on what pandoc can do for us
  • JN Look into and report on rules around scraping arxiv
  • CB Produce a list of all latex commands
    • Sublist for environments
    • Sublist for commands
    • Sublist for various symbol sets
    • Others
  • CB Prepare a set of 10+ tex files that span a range of environments and features that we can use
  • WH Find or start preparing a pronunciation dictionary for latex symbols (e.g. math, greek) [see if anything useful in CMU Pronunciation Dictionary] (make sure phonetic rep plays nice with Polly)
  • DR Look into and report on any existing large scale collection of (e.g. arxiv) latex files
  • ?? Mid-week, if no existing collection is found, assemble or acquire an arxiv latex scraper
  • ?? Once we have a large set of latex files, compute statistics about different environments and commands

Backlog:

  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Speak tag being rendered early

Speak tag is being rendered in the testingBib.tex and BibFile.bib files. I believe what is happening is the TexParser adds the starting speech and the ending speech.

The .bib file is being rendered at the end of the TexParser so it must be added right after this.

<speak> Title  <break time="0.3s"/>   An Example Document  <break time="0.3s"/>   By:  <break time="0.3s"/>   John Smith  <break time="0.3s"/>   Published:  <break time="0.3s"/>    <break time="0.3s"/>   Section:  <break time="0.3s"/>   The first section  <break time="0.3s"/>
This is an example of a document formatted using  LaTeX  .
This is an example of a citation  <emphasis level="reduced"> Cited in reference as: gG07  <break time="0.3s"/>    </emphasis> .
Now here is an example of an equation:


i (r,t) = - ^2 (r,t)+V(r) (r,t)
 </speak> <emphasis level='strong'> References Section </emphasis> <break time='1s'/>  Bibliography item is read as: <break time='0.5s'/>gG07. Type: book<break time='0.5s'/>  Authors: Gratzer, George A., <break time='0.3s'/> title: More Math Into LaTeX<break time='0.3s'/>publisher: Birkhauser<break time='0.3s'/>address: Boston<break time='0.3s'/>year: 2007<break time='0.3s'/>edition: 4th<break time='0.3s'/>

S3 Bucket Delay

From my calculations it takes 15 seconds for Amazon Polly to feed the .mp3 file to the S3 bucket. Once our algorithm finishes, it directs the user to the download page, however, the user will have to wait 15 seconds to actually download the file. This is because there is NO file currently in the S3 bucket, it takes 15 seconds for it to form.

Progress Bar

When a user uploads a document there needs to be a progress bar to show progress or feedback.

_getArg function returns NoneType

In _getArg there is a case with sample.tex (zipped in sample.zip here) file that returns NoneType. Later down the road, we need this argument to not be null. There is an edge case here that returns arg as Null

Example file that breaks: sample.tex.zip

Error Log with corresponding file:
[2021-03-07 20:33:10,367] ERROR in app: Exception on /upload [POST] Traceback (most recent call last): File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/Users/taichen/opt/anaconda3/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/app.py", line 63, in handle_upload audio_links = start_polly(file_holder, input_holder, bib_holder) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/aws_polly_render.py", line 265, in start_polly parsed_contents = start_conversion(texFile.read()) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/aws_polly_render.py", line 245, in start_conversion parsed_contents = parser.parse(contents) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 284, in parse self._parseNodes(doc.contents, tree) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 240, in _parseNodes parseOut = self._parseEnvironment(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 167, in _parseEnvironment self._parseNodes(contents, ssmlParent, leftChild=leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 240, in _parseNodes parseOut = self._parseEnvironment(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 167, in _parseEnvironment self._parseNodes(contents, ssmlParent, leftChild=leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 242, in _parseNodes parseOut = self._parseCommand(texNode, ssmlParent, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 221, in _parseCommand self._resolveCmdElements(cmdNode, ssmlParent, elemList, leftChild) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 201, in _resolveCmdElements self._resolveCmdElements(cmdNode, elemList[i], elemList[i].children, None) File "/Users/taichen/Desktop/Tai/Tex2Speech/latex2speech/tex2speech/conversion_parser.py", line 187, in _resolveCmdElements contents = self._getArg(cmdNode, elem).contents AttributeError: 'NoneType' object has no attribute 'contents'

\LaTeX\ command

\LaTeX\ command breaks the application again.

\LaTeX is rendered as LaTeX command, however due to the second \ at the end, it renders everything after a possible command which completely disregards the text.

Sympy -> SSML

When Sympy -> SSML happens, they don't render open or close tags.

LaTeX Example:
\begin{equation} h(x)=x^2 \end{equation}

Putting it into LaTeX - > SSML mathmode it produces
SSML h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/>

However, after the complete TexParser, the following output is
the equation h of x equals &lt;prosody pitch="+25%"&gt;&lt;break time="0.3ms"/&gt;begin first parentheses&lt;/prosody&gt;&lt;break time="0.3ms"/&gt; x to the power of 2 &lt;prosody pitch="+25%"&gt;&lt;break time="0.3ms"/&gt;end first parentheses&lt;/prosody&gt;&lt;break time="0.3ms"/&gt

Where it uses &lt / &gt instead of < >

Mathmode - Too Many Parenthesis

Below I show an example. The output is correct, just there are a LOT of parenthesis, I'm not sure if this is a priority, but just wanted to address this.

Input is as follows

$\left.{\partial T \over \partial P}\right|_{s} = 
\left.{\partial v \over \partial s}\right|_{P}$.

Here is the output...

<prosody pitch="+25%"><break time="0.3ms"/>
  -begin first parentheses</prosody><break time="0.3ms"/> 
    left times 
    <prosody pitch="+25%"><break time="0.3ms"/>
    -begin second parentheses</prosody><break time="0.3ms"/> 
      <prosody pitch="+25%"><break time="0.3ms"/>
        -begin third parentheses</prosody><break time="0.3ms"/> 
          partial times 
          <prosody pitch="+25%"><break time="0.3ms"/>
            -begin fourth parentheses</prosody><break time="0.3ms"/> 
              T times 
              <prosody pitch="+25%"><break time="0.3ms"/>
                -begin fifth parentheses</prosody><break time="0.3ms"/> 
                  over times 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                    -begin sixth parentheses</prosody><break time="0.3ms"/> 
                      partial times P 
                      <prosody pitch="+25%"><break time="0.3ms"/>
                    +end sixth parentheses</prosody><break time="0.3ms"/> 
                    <prosody pitch="+25%"><break time="0.3ms"/>
                 +end fifth parentheses</prosody><break time="0.3ms"/> 
                 <prosody pitch="+25%"><break time="0.3ms"/>
               +end fourth parentheses</prosody><break time="0.3ms"/> 
               <prosody pitch="+25%"><break time="0.3ms"/>
            +end third parentheses</prosody><break time="0.3ms"/> 
            times right 
          <prosody pitch="+25%"><break time="0.3ms"/>
         +end second parentheses</prosody><break time="0.3ms"/> 
           <prosody pitch="+25%"><break time="0.3ms"/>
  +end first parentheses</prosody><break time="0.3ms"/> 
   equals 
   <prosody pitch="+25%"><break time="0.3ms"/>
  -begin first parentheses</prosody><break time="0.3ms"/> 
    left times 
    <prosody pitch="+25%"><break time="0.3ms"/>
      -begin second parentheses</prosody><break time="0.3ms"/> 
      <prosody pitch="+25%"><break time="0.3ms"/>
        -begin third parentheses</prosody><break time="0.3ms"/> 
          partial times 
          <prosody pitch="+25%"><break time="0.3ms"/>
            -begin fourth parentheses</prosody><break time="0.3ms"/> 
              v times 
              <prosody pitch="+25%"><break time="0.3ms"/>
                -begin fifth parentheses</prosody><break time="0.3ms"/> 
                  over times 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                  -begin sixth parentheses</prosody><break time="0.3ms"/>
                    partial times s 
                    <prosody pitch="+25%"><break time="0.3ms"/>
                  +end sixth parentheses</prosody><break time="0.3ms"/> 
                  <prosody pitch="+25%"><break time="0.3ms"/>
                +end fifth parentheses</prosody><break time="0.3ms"/> 
                <prosody pitch="+25%"><break time="0.3ms"/>
              +end fourth parentheses</prosody><break time="0.3ms"/> 
              <prosody pitch="+25%"><break time="0.3ms"/>
            +end third parentheses</prosody><break time="0.3ms"/> 
            times right <prosody pitch="+25%"><break time="0.3ms"/>
          +end second parentheses</prosody><break time="0.3ms"/> 
          <prosody pitch="+25%"><break time="0.3ms"/>
  +end first parentheses</prosody><break time="0.3ms"/>  .

Action Items 10/22/20

  • Assess pros and cons of something like latex -> MathML -> SSML (as opposed to writing our own latex -> SSML)
    • CB
    • DR
    • JN
  • DR+JN Write a script to take a list of latex files and produce counts of environments, symbols, commands, etc.
  • TR Check if the IPA WH found will work with Polly
  • TR Start (or make a plan to start) bulk download of arxiv source docs
  • JN Ask Aran about where to store up to ~10GB of files that's shared between group members
  • CB Produce a list of all latex commands
    • Sublist for various symbol sets
    • Others
  • WH (w/ help from TR) Start testing Polly with the symbol sets we've found so far

Backlog:

  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Action Items 10/8/20

  • CB Familiarize with and give explanation of SSML
  • CB Produce a list of all latex commands
    • Sublist for environments
    • Sublist for commands
    • Sublist for various symbol sets
    • Others
  • WH Familiarize with and give explanation of MathML
  • WH Familiarize with and give explanation or demo of CAR
  • TR Prepare a demo of AWS Polly (esp. phonetic and prosodic features)
  • TR Prepare a demo of Google TTS (esp. phonetic and prosodic features)
  • JN Familiarize with and give demo of one or more latex-parsing python library(ies)
  • JN Familiarize with and give explanation or demo of latex to MathML converter
  • DR Familiarize with and give explanation or demo of martysweet/latex-to-speech
  • DR Read and give summary of what the tugboat article says

Backlog:

  • Brainstorm the role DAC could play

  • Produce a 1-2 page survey document on existing latex2speech tools

    • martysweet/latex-to-speech
    • tugboat
    • CAR
    • latex-access?
  • Find other relevant latex accessibility projects and add to survey document

  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits

    • Especially ones that allow phonetic or prosodic modifiers
    • AWS Polly (free for first year)
    • Google cloud
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python

  • Identify and mark the latex commands that should have an effect on the synthesized speech

  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)

  • Build a test suite to evaluate progress as we implement features

Action Items 12/10/20

  • JN+DR Finish command counting script
    • Make script robust to corrupted files (e.g. just skip over ones with invalid utf-8 symbols)
    • Run bare keyword and version with arguments on all of the files TR has downloaded (shareout output next week)
  • TR Compile the various latex grammars for ANTLR
  • WH Start thinking about / working on SymPy to SSML process
  • CB+WH Make any progress you can on the parser efforts
  • All: Work on to cleaning/organizing the code base/repo
  • JN+DR: Start brainstorming how you might render the most common environments your script find (e.g. do we say something, change volume/pitch/etc)
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Update README.md

Add technical documentation + team information to the README!

External .bib files get messed up with TexSoup

This software currently adds external .bib files before it gets handed to the main TexParser. Since this is done prior, when the contents of the file (with the embedded bibliography) is put into TexSoup, it removes a bunch of bib's values.

Reproduce Error: Upload a file with a corresponding bib file

Solution: Move bib file to be done after TexParser

Mathmode - Does not Render

The following mathmode does not render. This is an enhancement, not a priority.

\begin{eqnarray}
du &=& T\ ds -P\ dv, \qquad \mbox{first law.}\label{fl}\\
ds &\ge& {\delta q \over T}.\qquad  \qquad \mbox{second law.} \label{sl}\\
dd &\le& {\delta q \over T}.\qquad  \qquad \mbox{third law.} \label{tl}
\end{eqnarray}

Footnote command

If we have time \footnote could be added to the pronunciation file! Not a priority

References\footnote{Lamport, L., 1986, {\em \LaTeX: User's Guide \& Reference Manual},

Emphasis Tag Problem

Given this input You can use italics ({\em e.g.} {\em Thermodynamics is everywhere}) or {\bf bold}. we have the \em tag.

This should render something similiar

You can use italics <emphasis level = "strong">e.g.</emphasis> ...

However the output is this

You can use italics ( <emphasis level="strong">  </emphasis>  e.g. <emphasis level="strong">  </emphasis>   Thermodynamics is everywhere ) or   bold .

The emphasis tag is empty with the stuff that should be in it is on the outside.

Action Items 10/29/20

  • DR+JN Write a script that just collects \keyword (look for regex \[a-zA-Z]+)
  • TR Download ~100k tex files representative of all of the arxiv subjects
  • WH Investigate and share out about the latex->mathml parses latex math mode (latex2mathml)
  • CB Produce an easily parseable list of all latex commands
    • Sublist for various symbol sets
    • Others
  • Take five complicated equations (lots of symbols, etc.) create SSML for how you think each should be rendered
    • DR
    • JN
    • CB
    • TR
    • WH

Backlog:

  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Amazon Polly Bug

Infinitely loops at any file when calling tts function in aws_polly_render

Example:
`
CONTENTS AFTER CHANGE

Title Basic Math Bish Smiley Face By: Taichen Rose Published: March 2021
Start of comment This is a comment I guess?

Hey guys LaTeX
I'm doing great LaTeX LaTeX LaTeX the equation

TEST
An error occurred (InvalidSsmlException) when calling the StartSpeechSynthesisTask operation: Invalid SSML request
`

This will loop multiple times until it exits out from runtime error. It says Invalid SSML request, but it stil shouldn't be looping. Also when you open up the final1.tex master file it for some reason has latex commands instead of ssml.

Action Items 11/19/20

  • CB+WH Explore details of texsoup's functionality
    • CB+WH Look back at tex2py to see if it's useful (verdict: no)
  • JN+DR Modify command counting script
    • Take an input file that is just a list of file paths to tex files, and process all of those
    • Keep ironing out the bugs that can arise with braces
    • Run bare keyword and version with arguments on all of the files TR has downloaded (shareout output next week)
  • TR Look into sympy and ANTLR
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Sentences are being cut off - Due to Input Files

Note: This was fixed, but I'm going to keep this issue up so I can do it more elegantly

In the Vision and Scope document, sentences are being cut off and smashed together.

Example of LaTeX document
Begin Comment Our Product: differentiation statement For on the go researchers and academics who wish to increase their available paper reading time, \TeX 2Speech will provide on-demand speech synthesis for papers and a wide variety of other scientific and technical documentation written in the \LaTeX\ format. Unlike standard TTS systems, \TeX 2Speech will be able to effectively parse a \LaTeX\ document containing mathematical equations, and convert it to comprehensible spoken word.

Output

Begin Comment  Our Product: differentiation statement
For on the go researchers and academics who wish to increase their available paper reading time,   2Speech will provide on-demand speech synthesis for papers and a wide variety of other scientific and technical documentation written in the  LaTeX   2Speech will be able to effectively parse a  LaTeX

Upload to Cloud

Upload application using either Elastic Beanstalk or EC2

-> Document future applications that you can host (Can Host flask applications on Heroku)

Itemize error

\itemize, enumerate, and \item have shown to have error issues, I will post the problems I run into here

To fix these problems, we need to add the items to our pronunciation.xml file

Download Page - Web Design

[Home Page Design/Development] - Finished
[Progress Bar Design/Development] - Finished

Need to create design for download page, then implement

Action Items 11/5/20

  • TR Download ~100k tex files representative of all of the arxiv subjects
  • WH Prepare slides on how latex->mathml parses latex math mode (latex2mathml)
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv
  • Send each other the audio only of your five complicated equations. Pick one random equation from each other student, and try to write the equation based on the audio. Note ambiguities.
    • DR
    • JN
    • CB
    • TR
    • WH
  • Think critically about what changes you would want to make it best for how you would want to hear the equation.
    • DR
    • JN
    • CB
    • TR
    • WH
  • CB Start on a baby parser
  • JN+DR Expand the keyword counting script to support more than just bare keyword

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

PDFLaTeX Not working for example

PDFLaTeX is not working for the sample.tex file. Instead of replacing Eq.~() with Equation 1, or Equation 2, it empties reference. Need to relook to double check code in aws_polly_render

Input:
Eq.~(\ref{fl}) is the first law. Eq.~(\ref{sl}) is the second law. Eq.~(\ref{tl}) is the third law.

Output:
Eq.~( ) is the first law. Eq.~( ) is the second law. Eq.~( ) is the third law.

Mathmode Enhancements

[Copied from previous bug list] -> These bugs currently are not priority, some are more expansions than bugs. ANTLR currently doesn't accept this.

Cases that break because Sympy can't withold these criteria (Need a fix or use try/catch):

  1. Equation can't have multiple equal signs
    Example \sqrt[3]{8}=8^{\frac{1}{3}}=2
  2. There can't be a one sided equal sign there must be stuff on both sides
    Example = 3 + 2
  3. ANTLR doesn't render ln as natural log, need to have \ln
  4. Not a fan of f(x, y) equations
    Example \int_{a}^b\int_{c}^d f(x,y)dxdy
  5. Don't think limits can be within summations
    Example \sum\limits_{j=1}^k A_{\alpha_j}
  6. Doesn't like having no value after a sum with bounds
    Example \sum_{i=1}^{n}
  7. Doesn't like > sign (probably doesn't like < signs) need to have leq
    Example \inf_{x > s}f(x)
  8. Doesn't understand prime
    Example f' or f''
  9. Sometimes with leq it evaluates the equation
    Example 3\leq2 becomes False
    Example 3\geq3 becomes True
  10. Doesn't like ^ in some cases
    Example ^3/_7

Repeating lines

Using sample.tex this is in Documentation/sample_issues/ directory in our project

The input at the top starting from line 40 is

`Some basic instructions are given next.
Put your text in here. You can be a little sloppy about
spacing. It adjusts the text to look good.
{\small You can make the text smaller.}
{\tiny You can make the text tiny.}

Skip a line for a new paragraph.
You can use italics ({\em e.g.} {\em Thermodynamics is everywhere}) or {\bf bold}.
Greek letters are a snap: $\Psi$, $\psi$,
$\Phi$, $\phi$. Equations within text are easy---`

However, it renders like this as the output for this section

Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <break time="40ms"/>
Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <break time="40ms"/>
Some basic instructions are given next.
Put your text in here.  You can be a little sloppy    about
spacing.  It adjusts the text to look good.
  You can make the text smaller.  You can make the text tiny.
Skip a line for a new paragraph.
You can use italics ( <emphasis level="strong">  </emphasis>  e.g. <emphasis level="strong">  </emphasis>   Thermodynamics is everywhere ) or   bold .
Greek letters are a snap:  Psi  ,  psi  ,
 Phi  ,  phi  .  Equations within text are easy---

It repeats multiple times...

Commands - Extra back slash parsing bug

In LaTeX files, there are normal commands such as

\LaTeX

Which properly get rendered as the command "LaTeX", however, when the command such as

\LaTeX\ words after

Then the parser renders "LaTeX" as a command and "words" as a command. This will make "words" not become concatenated to the SSML file.

Action Items 12/3/20

  • JN+DR Finish command counting script
    • Take an input file that is just a list of file paths to tex files, and process all of those
    • Make script robust to corrupted files (e.g. just skip over ones with invalid utf-8 symbols)
    • Run bare keyword and version with arguments on all of the files TR has downloaded (shareout output next week)
  • TR+WH See if you can pry apart the sympy objects
  • CB Move code from personal repo to this latex2speech repo
  • CB+WH Make any progress you can on the parser efforts
  • TR Do a search for documentation on latex math mode grammar (something we could feed to ANTLR)
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

Action items 1/19/21

  • TR+CB Connect in CB's code
  • TR Work on parsing tables and citations/bib
  • WH Integrate with xml format CB has
  • WH Keep advancing the sympy-to-ssml code
  • CB meet with DR+JN
  • DR+JN add non-math command pronunciations to pronunciations.xml
  • Sort top commands/environments found into ones that we definitely will generate/modify speech, ones we definiely won't need to do anything with, and ones you're not sure off
  • All: Work on to cleaning/organizing the code base/repo
  • CB Produce an easily parseable list of all latex commands
    • Others
    • Convert existing lists into csv

Backlog:

  • Look into how pdflatex parses
  • Code up a demo with TexSoup that recursively traverses a latex doc
  • Brainstorm the role DAC could play
  • Produce a 1-2 page survey document on existing latex2speech tools
  • Produce a 1-2 page survey document of existing speech synthesis (TTS) tools and toolkits
  • Produce a <= 1 page survey document on options for parsing and manipulating latex from python
  • Identify and mark the latex commands that should have an effect on the synthesized speech
  • Enumerate guiding principles (e.g. is it better to guess how to normalize/pronounce and be wrong, or to render it in a more "raw" form?)
  • Build a test suite to evaluate progress as we implement features

mathmode bug

When Sympy -> SSML happens, they don't render open or close tags. [Was fixed prior, now error again]

LaTeX Example:
\begin{equation} h(x)=x^2 \end{equation}

Putting it into LaTeX - > SSML mathmode it produces
SSML h of x equals <prosody pitch="+25%"><break time="0.3ms"/>begin first parentheses</prosody><break time="0.3ms"/> x to the power of 2 <prosody pitch="+25%"><break time="0.3ms"/>end first parentheses</prosody><break time="0.3ms"/>

However, after the complete TexParser, the following output is
the equation h of x equals &lt;prosody pitch="+25%"&gt;&lt;break time="0.3ms"/&gt;begin first parentheses&lt;/prosody&gt;&lt;break time="0.3ms"/&gt; x to the power of 2 &lt;prosody pitch="+25%"&gt;&lt;break time="0.3ms"/&gt;end first parentheses&lt;/prosody&gt;&lt;break time="0.3ms"/&gt

Where it uses &lt / &gt instead of < >

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.