Giter VIP home page Giter VIP logo

Comments (3)

phfaist avatar phfaist commented on July 29, 2024

Your approach is correct! Your code also captures the macro \mu which is not explicitly declared to the latexwalker. The default implementation relies on the behavior for default macros, which is to keep them as a macro node with no arguments. (The macro is separately declared for latex2text as representing the unicode "μ" symbol.)

I realize it's a bit of a weakness of the API for now that the parse_args() method is not given information about the macro/environment that is currently being parsed. This is usually not a problem in typical settings where you set a MacroSpec or EnvironmentSpec to specific macros, since in such cases the parser is usually tailored to a specific macro/environment. A possible approach to display the unknown macro name is to hook directly into the LatexContextDb object. I also realize that these objects don't expose a simple way of doing this, but the following code achieves the desired behavior:

from pylatexenc import latexwalker, macrospec, latex2text

class UnknownMacroArgsParser(macrospec.MacroStandardArgsParser):
    def __init__(self, macroname):
        super().__init__()
        self.macroname = macroname

    def parse_args(self, w, pos, parsing_state=None):
        print("Unknown macro `\\{}' at {}".format(self.macroname, pos))
        return super().parse_args(w, pos, parsing_state=parsing_state)

class CustomLatexContextDb(macrospec.LatexContextDb):
    def __init__(self, db):
        super().__init__()
        for cat in db.categories():
            self.add_context_category(
                cat,
                macros=db.iter_macro_specs([cat]),
                environments=db.iter_environment_specs([cat]),
                specials=db.iter_specials_specs([cat]),
            )

    def get_macro_spec(self, macroname):
        mspec = super().get_macro_spec(macroname)
        if mspec is not None:
            mspec
        return macrospec.MacroSpec(macroname, args_parser=UnknownMacroArgsParser(macroname))

walker_context = CustomLatexContextDb(latexwalker.get_default_latex_context_db())

# second example
output = latex2text.LatexNodes2Text().latex_to_text(
    r"""start
$\mu $
\foo
\foobar
""", latex_context=walker_context)
print(output)
# prints:
#
# Unknown macro `\mu' at 11
# Unknown macro `\foo' at 18
# Unknown macro `\foobar' at 26
# start
# μ

It's not a particularly elegant solution, and I'll look into how to make this easier in future versions of pylatexenc.

Regarding macros that are considered as unknown to latexwalker but are known to latex2text, you could consider emitting a warning only after performing a search in the latex2text context db object (call l2tcontext.get_macro_spec(macroname) and check if it is None, where l2tcontext is the context-db object used by latex2text). I hope this helps.

I'm going to change the issue title to reflect that the desired improvement to pylatexenc is that unknown macro/environment/specials handlers be given more information about what macro/environment/specials was encountered.

from pylatexenc.

phfaist avatar phfaist commented on July 29, 2024

Actually, I realize that issue #32 already asked a very similar question. If you care about converting to text, not necessarily about obtaining the argument structure, you can plug into latex2text's context db to issue warnings for unknown macros. See my comment in issue #32.

from pylatexenc.

gamboz avatar gamboz commented on July 29, 2024

Thank you for the clarifications.
Yes, #32 is better for my use case (sorry I didn't spot it by myself).

from pylatexenc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.