Giter VIP home page Giter VIP logo

formulate's Introduction

scikit-hep: metapackage for Scikit-HEP

https://codecov.io/gh/scikit-hep/scikit-hep/graph/badge.svg?branch=master

Project info

The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python embracing all major topics involved in a physicist's work. The project started in Autumn 2016 and its packages are actively developed and maintained.

It is not just about providing core and common tools for the community. It is also about improving the interoperability between HEP tools and the Big Data scientific ecosystem in Python, and about improving on discoverability of utility packages and projects.

For what concerns the project grand structure, it should be seen as a toolset rather than a toolkit.

Getting in touch

There are various ways to get in touch with project admins and/or users and developers.

scikit-hep package

scikit-hep is a metapackage for the Scikit-HEP project.

Installation

You can install this metapackage from PyPI with pip:

python -m pip install scikit-hep

or you can use Conda through conda-forge:

conda install -c conda-forge scikit-hep

All the normal best-practices for Python apply; you should be in a virtual environment, etc.

Package version and dependencies

Please check the setup.cfg and requirements.txt files for the list of Python versions supported and the list of Scikit-HEP project packages and dependencies included, respectively.

For any installed scikit-hep the following displays the actual versions of all Scikit-HEP dependent packages installed, for example:

>>> import skhep
>>> skhep.show_versions()

System:
    python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
executable: /srv/conda/envs/notebook/bin/python
   machine: Linux-5.15.0-72-generic-x86_64-with-glibc2.27

Python dependencies:
       pip: 23.1.2
     numpy: 1.24.3
     scipy: 1.10.1
    pandas: 2.0.2
matplotlib: 3.7.1

Scikit-HEP package version and dependencies:
        awkward: 2.2.2
boost_histogram: 1.3.2
  decaylanguage: 0.15.3
       hepstats: 0.6.1
       hepunits: 2.3.2
           hist: 2.6.3
     histoprint: 2.4.0
        iminuit: 2.21.3
         mplhep: 0.3.28
       particle: 0.22.0
          pylhe: 0.6.0
       resample: 1.6.0
          skhep: 2023.06.09
         uproot: 5.0.8
         vector: 1.0.0

Note on the versioning system:

This package uses Calendar Versioning (CalVer).

formulate's People

Contributors

chrisburr avatar eduardo-rodrigues avatar henryiii avatar jonas-eschle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

formulate's Issues

idea: add sympy conversion

As an idea, we could add conversion to sympy. This would allow to use the full power, including:

  1. latex formatting
  2. simplification of expression
  3. resolve #2 by converting forth and back as sympy removes redundant brackets automatically
  4. functional form with lambdify allowing to use "arbitrary" backend

The implementation should be straight forward:

  1. create all variables as symbols and inject in future sympify calls
  2. convert an expression to numexpr and sympify it

The only caveat I see so far is that the == operator has a different meaning in Sympy and is not a logical equal (https://docs.sympy.org/latest/gotchas.html#double-equals-signs), furthermore booleans and numericals don't mix so implicitly.

So it seems to me an interesting idea where we may gain a lot, but it could also be a nightmare if the differences in behavior are too large. I am not a Sympy expert, so maybe others have an opinion?

Return numexpr expression for given container

Would it be possible to return the numexpr for instance for a dict of numpy arrays / a numpy record array?

For instance like this

arrays = {"X_PX": array(....), "X_PY": array(....), "X_PZ": array(....)}
momentum = formulate.from_root('TMath::Sqrt(X_PX**2 + X_PY**2 + X_PZ**2)')
momentum.to_numexpr(arrays)
which returns
'sqrt(((arrays["X_PX"] ** 2) + (arrays["X_PY"] ** 2) + (arrays["X_PZ"] ** 2)))'

Could it be evaluated by numexpr then ?

Matt

Optimise deep recursion

Currently deeply nested functions result in a RecursionError. This could be optimised by having a simpler parser find functions and then evaluate their arguments iteratively instead of recursively.

BUG: pow operator without whitespace errors

Hi all, thanks a lot for the great package!

There seems to be a bug with missing whitespaces and the power operator **. It can only be parsed if a whitespace is inserted before.

formulate.from_numexpr('a**2') raises an error
formulate.from_numexpr('a ** 2') works

Other operators work fine without whitespace.

This is the full error:

In [12]: formulate.from_numexpr('a**2')                                         
ERROR:formulate:TODO TRACEBACK: ('a**2', 1, 'Expected end of text')
ERROR:formulate:Error parsing: a**2
ERROR:formulate:                ▲
ERROR:formulate:                ┃
ERROR:formulate:                ┗━━━━━━ Error here or shortly after
---------------------------------------------------------------------------
ParseException                            Traceback (most recent call last)
~/anaconda3/envs/rkq37/lib/python3.7/site-packages/formulate/parser.py in to_expression(self, string)
    232         try:
--> 233             result = self._parser.parseString(string, parseAll=True)
    234             assert len(result) == 1, result

~/anaconda3/envs/rkq37/lib/python3.7/site-packages/pyparsing.py in parseString(self, instring, parseAll)
   1954                     exc.__traceback__ = self._trim_traceback(exc.__traceback__)
-> 1955                 raise exc
   1956         else:

~/anaconda3/envs/rkq37/lib/python3.7/site-packages/pyparsing.py in parseImpl(self, instring, loc, doActions)
   3813         if loc < len(instring):
-> 3814             raise ParseException(instring, loc, self.errmsg, self)
   3815         elif loc == len(instring):

ParseException: Expected end of text, found '*'  (at char 1), (line:1, col:2)

During handling of the above exception, another exception occurred:

ParsingException                          Traceback (most recent call last)
<ipython-input-12-c3a50f695328> in <module>
----> 1 formulate.from_numexpr('a**2')

~/anaconda3/envs/rkq37/lib/python3.7/site-packages/formulate/parser.py in to_expression(self, string)
    244             exception = ParsingException()
    245             exception.__context__ = None
--> 246             raise exception
    247         else:
    248             return result

ParsingException: 

Any ideas?

parsing `from_root` can be slow

Hi,

this is an excellent library. I am incorporating it into my ATLAS analysis code together with @jpivarski's uproot and awkward as a drop-in replacement for TTree::Draw

I notices that some expressions can be slow to parse e.g. this takes almost 2 seconds:

$> cat slow.py 
import formulate
formulate.from_root('((weight * (n_mu > 0)) * ((tt_cat==0 || tt_cat==3 || tt_cat==6)))')
$> time python slow.py 
real	0m1.803s
user	0m1.995s
sys	0m0.094s

is there any way this can be sped up?

Thanks,
Lukas

Note: it's not slow of course in an absolute sense, but if you do such an operation very often it can accumulate quite quickly and I was surprised parsing such an expression would have a noticable time cost

more flexible to_string conversion to support ternary operators

First of, thanks a lot for this great package!

Numexpr supports where whereas ROOT supports an (equivalent) ternary operator from C++ (Expression1 ? Expression2 : Expression3). While the former is already implemented and works fine with the current function registry, the latter cannot (AFAIK) be supported with the current construction of joining the arguments withing brackets with ,.

Two ideas (using as an example sqrt):

  1. To enable support for this conversion, I would propose a to_string method that can be registered with the function in PFunction. Defaults to the current conversion. This takes the name of the method and arguments.
    Disadvantage: we have too much freedom (e.g. for sqrt:
    ('sqrt', 1, lambda f, args: f + "(" + args[0] + ")")
    or duplicate the name
    ('sqrt', 1, lambda args: 'sqrt' + "(" + args[0] + ")").

  2. use string formatting: require the signature to be contained in the string definition: '(sqrt({})', 1).
    Disadvantage: arbitrary number of arguments?

I would propose to go for the first solution.

Add support for variable scoping

I've started looking into using this package for a project I'm working on. We'll want to be able to specify branches which might be nested within our trees, eg. branch.sub_branch. I've tested this briefly with version 0.0.7 of formulate though and I see that such variables cannot be identified:

$ python -m formulate --from-numexpr 'branch.sub_branch < 4' --variables
ERROR:formulate:TODO TRACEBACK: ('branch.sub_branch < 4', 6, 'Expected end of text')
ERROR:formulate:Error parsing: branch.sub_branch < 4
ERROR:formulate:                     ▲
ERROR:formulate:                     ┃
ERROR:formulate:                     ┗━━━━━━ Error here or shortly after

Would it be possible to add support for this? PyParsing has a specific helper method which might be useful here, delimitedList. I think the easiest for the user is to return a single variable in this case branch.sub_branch in the example above. That might mean just including the . in the definition of the Word for Variables?

Release tagging

Tags should have vX.Y.Z format, not X.Y.Z (only 0.1.0 is this way - GitHub's own UI gives this recommendation), and a GitHub release should always be made when tagging, so it shows up in the UI. @mayou36, perhaps you can can fix?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.