newlawrence / calculate Goto Github PK

Math Expressions Parser Engine

License: MIT License

C++ 99.85% CMake 0.15%

cpp14 mathematical-expressions template-metaprogramming header-only no-dependencies parser library

calculate's Introduction

Version	2.1.1rc10

Header-only library written in modern C++ aiming for flexibility and ease of use. Calculate is not only a mathematical expressions parser but an engine built on top of the Shunting Yard algorithm.

The main objective of the library is to offer a clean and intuitive interface, where the expressions act and feel like regular functions. Another objective is to be completely configurable; from the underlying data type to the tokenizing logic, the library is in fact a custom parser factory.

auto parser = calculate::Parser{};
auto sum = parser.parse("x+y");

sum(1., 2.);  // returns 3.

Calculate is available as a conan package:

# Append calculate to Conan's repositories list
conan remote add calculate https://api.bintray.com/conan/newlawrence/calculate

Features

Generic. double and std::complex<double> parsers included by default.
User defined constants, functions, and prefix, suffix and binary operators.
Infix and postfix notations supported.
Regex-based customizable lexers.
Header-only.

Build and test

Calculate doesn't have any third party dependencies, the library should work with any compiler fully compatible with the C++14 standard. Currently it has being tested under gcc (5.2+), clang (3.7+), msvc (19.10+) and intel (18.0+).

The examples and tests need CMake to be built. Conan can be used to handle the dependencies:

# Build the example (Boost libraries needed)
conan install example --install-folder build/example
cmake -H. -Bbuild -DCALCULATE_BUILD_EXAMPLES=ON
cmake --build build --target example

# Build and run the tests (Catch2 library needed)
conan install test --install-folder build/test
cmake -H. -Bbuild -DCALCULATE_BUILD_TESTS=ON
cmake --build build --target make_test  # build
cmake --build build --target test       # run

User guide

Want to try? Check out Calculate's wiki to get started.

License: MIT (see copying).

calculate's People

Contributors

Stargazers

Watchers

Forkers

makiolo kamyarinfinity wanglin1991 chaeso99 amoswuuuu pierreguillot dualword

calculate's Issues

Unit testing

It is an interesting idea to do some research on C++ unit testing frameworks and to adopt one that can run on a Continuous Integration cloud service.

Switch the Fortran binding

Now Cmake version 3.7 has been released, it is possible to compile Fortran projects that use subdomules (see Cmake release notes).

This allow the switching to the future version of the binding, although compatibility with gfortran has to be tested yet.

Include bindings as git submodules

This will help with the organization and deploy of the whole library.

Rename Calculate class

In other languages, given the lack of namespaces in C and the case insensitiveness in Fortran, the Calculate class has received the name of Expression, which resembles better its "raison d'être".

It is a good idea to give it the same name on the C++ side.

Minor modifications

Two issues. As extensive checking is being made in calculate.cpp, the error handling lines (namely, the exceptions) will never run; those lines can be removed and add the convenient noexcept clauses. The other one is simple, the symbols.hpp header should be better moved to a calculate subfolder under include.

Transfer exception handling responsibility.
Move symbols.hpp.

Wrong parsing precedence of unary operators

The parser doesn't take into account the precedence of operators in its unary mode, as internally they're substituted by helper functions leading to incorrect results like:

>>> calculate.parse('-1**2')()
1.0

While it should be:

>>> -1**2
-1

Unable to build the Python bindings with GNU compiler under Windows

Since the MSVC compiler doesn't prefix "lib" to the libraries, the Python code cannot open it; so this workaround was adopted.

The problem is that, under Windows, the GNU compiler actually adds the "lib" prefix, so the previous workaround makes the command cmake -E rename "calculate.dll" "libcalculate.dll" fail, making the make command to stop.

Project at an early stage

The library is currently at an early stage. Next step is to wrap calculate module functions inside a class, and then give it the ability to evaluate an expression against some user-defined variables.

Calculate class design.
Exception handling.
User-defined variables.

Lexer asymmetry between to_value and to_string methods

Why this is considered a bug and what are the benefits of the solution proposed below.

First, lets start explaining how the current lexer and the uderlying tokenizing logic works. Regular expressions are used to identify and classify the distintic types of the tokens; they're relatively easy to design so it's a good choice to allow future user customizations.

Nevertheless, there is a problem when individual tokens can act as a regular operator or being part of a number, for instance, the minus sign -. Currently this is handled in the parser using the alias mechanism, that is, the minus sign - will always be identified as an operator by the lexer, and later the parser will evaluate in its context if it has to act as the - operator or be replaced by its "alias", the unary function neg. For instance:

The expression -1 - 1 is finally parsed as neg(1) - 1.

This mechanism works fine but it has three major inconvenients:

The regex to identify numbers in the lexer must discard the sign obligatory; what means that using to_value method in negative numbers will raise a BadCast exception, while using to_string with a negative will hapily return the number with its preceding - (here's the aforementioned asymmetry).
The lexer now has an implicit dependency on the parser to work properly. For instance, the provided Parser class, that has no symbols preloaded, will fail parsing a negative number because it lacks the function to "negate" a number (here's where the aforementioned bug is). Besides that, negative numbers are real numbers and it is wrong conceptually to considered them as a compound of a postive number and a function applied to it.
Last, it slows down evaluations, as the expression object has to perform two operations when handling negative numbers (first get the number and then negate it) instead of one (just return the number).

One possible solution is to provide smarter regular expressions to identify in which cases consider the sign as part of the number but, first, it will require support for negative lookbehind which the std::regex class lacks of and, second, it will be very difficult to the user to provide its own custom tokenizers (not to mention the massive difficulty of writing a proper regex for complex numbers, or the impact on the performance).

Instead of that, providing the lexer with the symbols that it needs to work is a much simple solution. That is, the default lexer will be given functions to handler internaly the plus and minus signs; and the default complex lexer can make use of them to connect the real and the imaginary part into one complex numbers; because, it has not been mentioned before, but the default complex lexer only accepts pure real or pure imaginary numbers). The benefits of this approach are:

Simpler interface. The counterintuitive alias system to simulate unary operators won't be needed anymore.
Parsers without any symbols preloaded will work out of the box. The bug will be fixed, as the lexer will have its bare minimun of symbols to work properly on its own.
Lexers' method will work symmetrically; to_value will accept explicit positive and negative numbers (and non pure real nor non pure imaginary numbers in the case of complex lexers).
Performance will increase during evaluations as numbers like -1 will be returned directly instead of performing the neg(1) operation.

Duplicate symbol error when compiling with Clang 13.0 on macOS

I wanted to use this library in my project, but I'm getting a linker error about duplicate symbols when including it in more than one translation unit. In my example I have two .cpp files DoubleSpinBox.cpp and IntegerSpinBox.cpp that both include the calculate.hpp header. This results in the following error:

duplicate symbol 'calculate::defaults::complex<false>' in:
    CMakeFiles/virtualbow-gui.dir/source/gui/widgets/DoubleSpinBox.cpp.o
    CMakeFiles/virtualbow-gui.dir/source/gui/widgets/IntegerSpinBox.cpp.o
duplicate symbol 'calculate::defaults::real<false>' in:
    CMakeFiles/virtualbow-gui.dir/source/gui/widgets/DoubleSpinBox.cpp.o
    CMakeFiles/virtualbow-gui.dir/source/gui/widgets/IntegerSpinBox.cpp.o

The "offending" code seems to be this section in lexer.hpp:

Calculate/include/calculate/lexer.hpp

Lines 33 to 51 in 673e4c7

 template<bool> 

 constexpr const char* real = 

 R"(^[+\-]?\d+$)"; 

 template<> 

 constexpr const char* real<false> = 

 R"(^[+\-]?(?:(?:NaN|Inf)|(?:(?:\d+\.?\d*|\.\d+)(?:[eE][+\-]?\d+)?))$)"; 

 template<bool> 

 constexpr const char* complex = 

 R"(^(?:(?:(?:[+\-]?\d+?)(?:[+\-]?\d+?)[ij])|(?:(?:[+\-]?\d+)[ij]?))$)"; 

 template<> 

 constexpr const char* complex<false> = 

 R"(^(?:)" 

 R"((?:(?:[+\-]?(?:(?:NaN|Inf)|(?:(?:\d+\.?\d*?|\.\d+?)(?:[eE][+\-]?\d+?)?))))" 

 R"((?:[+\-](?:(?:NaN|Inf)|(?:(?:\d+\.?\d*?|\.\d+?)(?:[eE][+\-]?\d+?)?)))[ij])|)" 

 R"((?:(?:[+\-]?(?:(?:NaN|Inf)|(?:(?:\d+\.?\d*|\.\d+)(?:[eE][+\-]?\d+)?)))[ij]?))" 

 R"()$)";

The error only happens with the Clang compiler though, GCC on Linux and Windows seems to have no problem. So maybe Clang is more strict about something or it might even be a compiler bug. I tried making those constants static. The result is that it compiles successfully with Clang, but not with GCC anymore... Any idea what the proper solution would be here?

Increase code coverage

Since the complete refactor from version 1 to version 2 of the library, it is necessary to rewrite all the tests from scratch.

The chosen test runner will be the new version of the Catch library (version 2). All is already set up in the current commit.

Add Calculate to math-parser-benchmark-project

It is time to check how well does Calculate performs against the most popular C++ math expressions parsers out there. The math-parser-benchmark-project from the Github user @ArashPartow is the appropiate place to do so.

The objective is to achieve 100% accuracy. A pull request will be opened to polish some of Calculate's internals to achieve it.

The project has also been forked to add Calculate to the tests.

Write unit tests for the Fortran bindings

And its respective coverage reports. FUnit seems a good framework to get the job done.

Emulate namespaces in C

Namespaces can be emulated in C using a struct like in this Stack Overflow answer. Following that approach, the C side will have a cleaner interface and will resemble best its OOP C++ counterpart than the current solution of mangle names prepending CALC_ to the methods.

Improve floating point handling

The default lexers provided in the library need some refactoring in the way they handle comparison between two numbers and the string conversion (must read this and this).

The models proposed for the library can be found in the following readings (special thanks to @BruceDawson0xB):

Further reading about the topic: float tricks, stupid float tricks, time and floats, they sure look equal...

Add Travis CI badge

Add badge to readme.md.

Add recipe for Anaconda

Easy as the build is handled by CMake.

Conda packages implicitly tied to Python version under Linux and macOS

Despite the fact that the library doesn't depend on the Python version (in fact, universal wheels work like a charm) the conda packages do.

The issue has been discovered now that Python 3.6 is out, since no Python dependency has been set the building process uses this last release. After that, if the package is installed in an environment with another version, it will pass without problems but an Import error will be raised when used. This happens due to the fact that the lack of an explicit Python version bypasses conda checking mechanisms, the library gets installed under a python3.6 folder inside the current environment library tree being that the reason why Python is unable to find it at import time.

The only way to fix it, seems the use of the noarch clause, which removes any dependency on the operative system or the Python version; which clearly is not an option.

Add support for postfix unary operators

Some operators like the factorial one ! are written after its operand instead of before. Currently, Calculate has no support for them.

Precision loss in builtin constants

The use of the standard function std::to_string here causes a loss in precision when using builtin constants, as it defaults to only 6 decimal places (issue on Stack Overflow).

Queries for the library metadata

It will be nice (and easy to include) little query functions with information about the version, release date, author... whatever in PR #58; currently this information is only available in the Python bindings.

It will be also a good idea to let handling things like updating the date info to a git pre-commit hook.

Add version number to the project and publish the first release

The project have proved its stability. It's a good time to give it a version number (lets say 1.0.0) and to make the first release.

Create bindings for Fortran

It should be easy using the C interface.

Create bindings for Python

Issue for an already existing branch. Bindings thanks to CMake and cffi.

Variable message inside exceptions

The current exception system has fixed messages which brings not so much detailed information of the cause of a problem. For example the message for calculate::Expression("x") is just Undefined symbol, while Undefined symbol 'x' will be much more descriptive.

The reason the current implementation can't have variable messages is due to the fact that BaseCalculationException inherits from std::exception which can't be constructed. Inheriting from std::runtime_error allows for customizing the message as can be seen here.

This is a nice feature to include on the upcoming 1.1.0 release (PR #58).

User defined constants, operators and functions

The changes already included on the upcoming 1.1.0 version (PR #58) are fine but, this will be the crown jewel!

Write docs

I'll comment properly the whole source code, but I'll appreciate any help with documentation generators.

Missing definitions library when compiling dll with MSVC

When building the shared library using MSVC no definitions library to link against the dll is created, leading to project build errors (since the accompanying examples files spect it by default).

The solution is to prepend the compiler directive __declspec(export) on each function and class declaration to declare which symbols are expected to be exported to the shared library. But, since this issue is Windows specific, there have to be another way to achieve effect.

Maybe creating a def file at compile time as it stated here may solve the issue.

Add code coverage

I've found a good cloud service that integrates just fine with Travis CI, Codevov. It can work with gcov and report visuals of the code coverage in HTML, but has the downside of not being popular yet and it's poorly documented.

Recover C compatibility

Before the OOP transition, the library had a wrapper to make it interoperable with C, maybe it would be a good idea to restore it when exception handling support is finished.

New branching model

I've come up with this excellent article and, from now on, I've decided to adopt its Successful Git branching model on this repository and all of its associated bindings.

There will be two permanent branches:
- master: containing only stable releases.
- develop: to push all the daily work.

Develop of new features will be done on its own branches that won't be (unlikely) pushed to origin. There will be also branched related to releases and hotfixes as stated in the article.

It's going to be a little hard to get used to this kind of workflow, but it will help with the development of the library and will pave the way for easier future collaborations.

Error in Shunting Yard algorithm

A bug was introduced on this commit that breaks the operator precedence resolution, what means that all the releases, from 1.0.0 to 1.0.3 are completely broken.

The fix is trivial, just replacing the mistaken line:

auto p1 = castChild<Operator>(element)->precedence;
auto p2 = castChild<Operator>(element)->precedence;

For:

auto p1 = castChild<Operator>(element)->precedence;
auto p2 = castChild<Operator>(another)->precedence;

But the error silently passing until now means that a 100% coverage in the tests is not enough. Further testing on the logic and the regular expressions are needed to ensure that the library is working the way it is supposed to.

Bindings for FreeBASIC

FreeBASIC is a modern implementation of the QBASIC language. The language is easily interoperable with C and even it can deal with C++ classes.

Refactor the library to be more compile-time based

Being a little library to do simple maths calculations, I think it will be more interesting forgetting about user defined functions and rewrite the entire library to be as compile-time as possible, which in turn will make it more performant.

Road to the 2.0.0 version. Bye, bye STL (or almost) and welcome SFINAE and template metaprogramming!

C++ numeric conversion routines are locale dependent

Default numeric conversion routines in C++ are locale dependent (see). Since Calculate provides de capability of implement user define lexers, with which the users can define the conversion policy of their choice; it is important to make the provided default lexers not to use such routines (in fact, default regexes are tunned to work with the standard C locale).

A stringstream approach to perform the conversions is needed in order to avoid future problems with users that need to implement locale-dependent functionality in their programs.

New build system and CI for Calculate version 2

The project needs to get back all the capabilities lost during the transition from version 1 to version 2 of the library related to building and continuous integration.

User defined regexes shall not contain any capture groups

It's important to add a sanity check on this given the fact that it will mess with the tokenizer regex.

Duplicate symbols for parentheses classes

These lines of code need to be moved from symbols.h to symbols.cpp. Explicit instantiation of the Parenthesis template classes in the header source is leading to duplicate symbols each time the file is imported.

    template class Parenthesis<'('>;
    template class Parenthesis<')'>;

This issue hasn't been present while building the library with clang, but it is a bug anyways. In the future all the builds will be tested also under gcc and MSVC.

Python API inconsistent with that of C++

Being able of reproducing the behavior of the C++ API, the ability of evaluating the expression using a collection instead of a variable number of arguments is missing in Python.

Get string containing the variables on the C side

Currently, the C function getVariables returns the number of variables of the Calculate object. It should return a string with them instead, to make it possible (along with the getExpression function) to construct the same Calculate object from another languages using the C binding interface.

Write a little wiki

It will be a good idea to write a little wiki about the project to cover the installation and use of the library and its wrappers to attract people to it.

The example of usage in the readme is a good point of start, but it's not enough.

Abandon Gitflow way of structuring branches

After thinking for a while, I've decided to not continue keeping two eternal branches master and develop. The only practical of use of this overengineering is to ensure that all commits in the master branch are stable releases but, since there are tags, it is kinda redundant.

From now on, this project will continue only keeping master as the main development branch. The mechanism for new features and hotfixes will continue the same.

I've created this issue to leave a record to this decission.

Write unit tests for the Python bindings

The same as with Fortran in #39. pytest is the way to go in this case.

Boolean expression support

Hi,

I would like to know if adding support for parsing and AST generation/evaluation of boolean expressions could be in the scope of this library. My use case is arbitrary boolean expression evaluation, where the evaluation of the expression would work much like a C++ expression template, but done at runtime:

// the model

class Filter
{
    ...
};

Filter operator&&(Filter a, Filter b)
{
    return Filter{[=](const auto& value) {
        return a(value) && b(value);
    }};
}
...

// the parser

namespace calculate {
class FilterParser : public BaseParser<Filter> {
public:
    IntegerParser() : BaseParser<Type>{lexer_from_defaults<Type>()} {
        using namespace defaults;
        operators.insert({
            {"and", {and_<Type>, Precedence::low, Associativity::FULL}},
            {"or", {or_<Type>, Precedence::low, Associativity::LEFT}},
            {"not", {not_<Type>, Precedence::normal, Associativity::FULL}},
            {"xor", {xor_<Type>, Precedence::normal, Associativity::LEFT}},
        });
    }
};

Better symbol detection

Current regular expression detection doesn't cover the next scenarios:

Exponential notation (i.e 1e-3).
Function names ended in numbers (i.e log10).
Variable names ended in numbers (i.e x0).
Orphan unprocessed characters in expression.

Another big issue here. Variadic operator() throws an EvaluationException every time it is called with more than two arguments.

Expression syntax checking not working

The syntax checking subroutine is not working as expected.
Using the test_cpp example source:

~$ ./test_cpp '1 2'
Too many arguments
~$ ./test_cpp '1 )( 2'
Parenthesis mismatch

While the expected behavior should be:

~$ ./test_cpp '1 2'
Syntax error
~$ ./test_cpp '1 )( 2'
Syntax error

	template<bool>
	constexpr const char* real =
	R"(^[+\-]?\d+$)";

	template<>
	constexpr const char* real<false> =
	R"(^[+\-]?(?:(?:NaN\|Inf)\|(?:(?:\d+\.?\d*\|\.\d+)(?:[eE][+\-]?\d+)?))$)";

	template<bool>
	constexpr const char* complex =
	R"(^(?:(?:(?:[+\-]?\d+?)(?:[+\-]?\d+?)[ij])\|(?:(?:[+\-]?\d+)[ij]?))$)";

	template<>
	constexpr const char* complex<false> =
	R"(^(?:)"
	R"((?:(?:[+\-]?(?:(?:NaN\|Inf)\|(?:(?:\d+\.?\d*?\|\.\d+?)(?:[eE][+\-]?\d+?)?))))"
	R"((?:[+\-](?:(?:NaN\|Inf)\|(?:(?:\d+\.?\d*?\|\.\d+?)(?:[eE][+\-]?\d+?)?)))[ij])\|)"
	R"((?:(?:[+\-]?(?:(?:NaN\|Inf)\|(?:(?:\d+\.?\d*\|\.\d+)(?:[eE][+\-]?\d+)?)))[ij]?))"
	R"()$)";