plasma-umass / scalene Goto Github PK

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

License: Apache License 2.0

Python 52.14% Makefile 0.38% C++ 5.45% JavaScript 41.12% HTML 0.70% CSS 0.21%

python profiling performance-analysis cpu-profiling profiler python-profilers gpu-programming scalene profiles-memory performance-cpu

scalene's Introduction

Scalene: a Python CPU+GPU+memory profiler with AI-powered optimization proposals

by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno.

Scalene community Slack

(tweet from Ian Ozsvald, author of High Performance Python)

Scalene web-based user interface: http://plasma-umass.org/scalene-gui/

About Scalene

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than many other profilers while delivering far more detailed information. It is also the first profiler ever to incorporate AI-powered proposed optimizations.

AI-powered optimization suggestions

Note

To enable AI-powered optimization suggestions, you need to enter an OpenAI key in the box under "Advanced options". Your account will need to have a positive balance for this to work (check your balance at https://platform.openai.com/account/usage).

Once you've entered your OpenAI key (see above), click on the lightning bolt (⚡) beside any line or the explosion (💥) for an entire region of code to generate a proposed optimization. Click on a proposed optimization to copy it to the clipboard.

You can click as many times as you like on the lightning bolt or explosion, and it will generate different suggested optimizations. Your mileage may vary, but in some cases, the suggestions are quite impressive (e.g., order-of-magnitude improvements).

Quick Start

Installing Scalene:

python3 -m pip install -U scalene

conda install -c conda-forge scalene

Using Scalene:

After installing Scalene, you can use Scalene at the command line, or as a Visual Studio Code extension.

Using the Scalene VS Code Extension:

First, install the Scalene extension from the VS Code Marketplace or by searching for it within VS Code by typing Command-Shift-X (Mac) or Ctrl-Shift-X (Windows). Once that's installed, click Command-Shift-P or Ctrl-Shift-P to open the Command Palette. Then select "Scalene: AI-powered profiling..." (you can start typing Scalene and it will pop up if it's installed). Run that and, assuming your code runs for at least a second, a Scalene profile will appear in a webview.

Commonly used command-line options:

scalene your_prog.py                             # full profile (outputs to web interface)
python3 -m scalene your_prog.py                  # equivalent alternative

scalene --cli your_prog.py                       # use the command-line only (no web interface)

scalene --cpu your_prog.py                       # only profile CPU
scalene --cpu --gpu your_prog.py                 # only profile CPU and GPU
scalene --cpu --gpu --memory your_prog.py        # profile everything (same as no options)

scalene --reduced-profile your_prog.py           # only profile lines with significant usage
scalene --profile-interval 5.0 your_prog.py      # output a new profile every five seconds

scalene (Scalene options) --- your_prog.py (...) # use --- to tell Scalene to ignore options after that point
scalene --help                                   # lists all options

Using Scalene programmatically in your code:

Invoke using scalene as above and then:

from scalene import scalene_profiler

# Turn profiling on
scalene_profiler.start()

# Turn profiling off
scalene_profiler.stop()

Using Scalene to profile only specific functions via @profile:

Just preface any functions you want to profile with the @profile decorator and run it with Scalene:

# do not import profile!

@profile
def slow_function():
    import time
    time.sleep(3)

Web-based GUI

Scalene has both a CLI and a web-based GUI (demo here).

By default, once Scalene has profiled your program, it will open a tab in a web browser with an interactive user interface (all processing is done locally). Hover over bars to see breakdowns of CPU and memory consumption, and click on underlined column headers to sort the columns. The generated file profile.html is self-contained and can be saved for later use.

Scalene Overview

Scalene talk (PyCon US 2021)

This talk presented at PyCon 2021 walks through Scalene's advantages and how to use it to debug the performance of an application (and provides some technical details on its internals). We highly recommend watching this video!

Fast and Accurate

Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
Scalene is accurate. We tested CPU profiler accuracy and found that Scalene is among the most accurate profilers, correctly measuring time taken.

Scalene performs profiling at the line level and per function, pointing to the functions and the specific lines of code responsible for the execution time in your program.

CPU profiling

Scalene separates out time spent in Python from time in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve.
Scalene highlights hotspots (code accounting for significant percentages of CPU time or memory allocation) in red, making them even easier to spot.
Scalene also separates out system time, making it easy to find I/O bottlenecks.

GPU profiling

Scalene reports GPU time (currently limited to NVIDIA-based systems).

Memory profiling

Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.
Scalene separates out the percentage of memory consumed by Python code vs. native code.
Scalene produces per-line memory profiles.
Scalene identifies lines with likely memory leaks.
Scalene profiles copying volume, making it easy to spot inadvertent copying, especially due to crossing Python/library boundaries (e.g., accidentally converting numpy arrays into Python arrays, and vice versa).

Other features

Scalene can produce reduced profiles (via --reduced-profile) that only report lines that consume more than 1% of CPU or perform at least 100 allocations.
Scalene supports @profile decorators to profile only specific functions.
When Scalene is profiling a program launched in the background (via &), you can suspend and resume profiling.

Comparison to Other Profilers

Performance and Features

Below is a table comparing the performance and features of various profilers to Scalene.

Slowdown: the slowdown when running a benchmark from the Pyperformance suite. Green means less than 2x overhead. Scalene's overhead is just a 35% slowdown.

Scalene has all of the following features, many of which only Scalene supports:

Lines or functions: does the profiler report information only for entire functions, or for every line -- Scalene does both.
Unmodified Code: works on unmodified code.
Threads: supports Python threads.
Multiprocessing: supports use of the multiprocessing library -- Scalene only
Python vs. C time: breaks out time spent in Python vs. native code (e.g., libraries) -- Scalene only
System time: breaks out system time (e.g., sleeping or performing I/O) -- Scalene only
Profiles memory: reports memory consumption per line / function
GPU: reports time spent on an NVIDIA GPU (if present) -- Scalene only
Memory trends: reports memory use over time per line / function -- Scalene only
Copy volume: reports megabytes being copied per second -- Scalene only
Detects leaks: automatically pinpoints lines responsible for likely memory leaks -- Scalene only

Output

If you include the --cli option, Scalene prints annotated source code for the program being profiled (as text, JSON (--json), or HTML (--html)) and any modules it uses in the same directory or subdirectories (you can optionally have it --profile-all and only include files with at least a --cpu-percent-threshold of time). Here is a snippet from pystone.py.

Memory usage at the top: Visualized by "sparklines", memory consumption over the runtime of the profiled code.
"Time Python": How much time was spent in Python code.
"native": How much time was spent in non-Python code (e.g., libraries written in C/C++).
"system": How much time was spent in the system (e.g., I/O).
"GPU": (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
"Memory Python": How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
"net": Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
"timeline / %": Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
"Copy (MB/s)": The amount of megabytes being copied per second (see "About Scalene").

Scalene

The following command runs Scalene on a provided example program.

scalene test/testme.py

Click to see all Scalene's options (available by running with --help)

    % scalene --help
     usage: scalene [-h] [--outfile OUTFILE] [--html] [--reduced-profile]
                    [--profile-interval PROFILE_INTERVAL] [--cpu-only]
                    [--profile-all] [--profile-only PROFILE_ONLY]
                    [--use-virtual-time]
                    [--cpu-percent-threshold CPU_PERCENT_THRESHOLD]
                    [--cpu-sampling-rate CPU_SAMPLING_RATE]
                    [--malloc-threshold MALLOC_THRESHOLD]
     
     Scalene: a high-precision CPU and memory profiler.
     https://github.com/plasma-umass/scalene
     
     command-line:
        % scalene [options] yourprogram.py
     or
        % python3 -m scalene [options] yourprogram.py
     
     in Jupyter, line mode:
        %scrun [options] statement
     
     in Jupyter, cell mode:
        %%scalene [options]
        code...
        code...
     
     optional arguments:
       -h, --help            show this help message and exit
       --outfile OUTFILE     file to hold profiler output (default: stdout)
       --html                output as HTML (default: text)
       --reduced-profile     generate a reduced profile, with non-zero lines only (default: False)
       --profile-interval PROFILE_INTERVAL
                             output profiles every so many seconds (default: inf)
       --cpu-only            only profile CPU time (default: profile CPU, memory, and copying)
       --profile-all         profile all executed code, not just the target program (default: only the target program)
       --profile-only PROFILE_ONLY
                             profile only code in filenames that contain the given strings, separated by commas (default: no restrictions)
       --use-virtual-time    measure only CPU time, not time spent in I/O or blocking (default: False)
       --cpu-percent-threshold CPU_PERCENT_THRESHOLD
                             only report profiles with at least this percent of CPU time (default: 1%)
       --cpu-sampling-rate CPU_SAMPLING_RATE
                             CPU sampling rate (default: every 0.01s)
       --malloc-threshold MALLOC_THRESHOLD
                             only report profiles with at least this many allocations (default: 100)
     
     When running Scalene in the background, you can suspend/resume profiling
     for the process ID that Scalene reports. For example:
     
        % python3 -m scalene [options] yourprogram.py &
      Scalene now profiling process 12345
        to suspend profiling: python3 -m scalene.profile --off --pid 12345
        to resume profiling:  python3 -m scalene.profile --on  --pid 12345

Scalene with Jupyter

Instructions for installing and using Scalene with Jupyter notebooks

This notebook illustrates the use of Scalene in Jupyter.

Installation:

!pip install scalene
%load_ext scalene

Line mode:

%scrun [options] statement

Cell mode:

%%scalene [options]
code...
code...

Installation

Using pip (Mac OS X, Linux, Windows, and WSL2)

Scalene is distributed as a pip package and works on Mac OS X, Linux (including Ubuntu in Windows WSL2) and (with limitations) Windows platforms.

Note

The Windows version currently only supports CPU and GPU profiling, but not memory or copy profiling.

You can install it as follows:

  % pip install -U scalene

  % python3 -m pip install -U scalene

You may need to install some packages first.

See https://stackoverflow.com/a/19344978/4954434 for full instructions for all Linux flavors.

For Ubuntu/Debian:

  % sudo apt install git python3-all-dev

Using conda (Mac OS X, Linux, Windows, and WSL2)

  % conda install -c conda-forge scalene

Scalene is distributed as a conda package and works on Mac OS X, Linux (including Ubuntu in Windows WSL2) and (with limitations) Windows platforms.

Note

The Windows version currently only supports CPU and GPU profiling, but not memory or copy profiling.

On ArchLinux

You can install Scalene on Arch Linux via the AUR package. Use your favorite AUR helper, or manually download the PKGBUILD and run makepkg -cirs to build. Note that this will place libscalene.so in /usr/lib; modify the below usage instructions accordingly.

Frequently Asked Questions

Can I use Scalene with PyTest?

A: Yes! You can run it as follows (for example):

python3 -m scalene --- -m pytest your_test.py

Is there any way to get shorter profiles or do more targeted profiling?

A: Yes! There are several options:

Use --reduced-profile to include only lines and files with memory/CPU/GPU activity.
Use --profile-only to include only filenames containing specific strings (as in, --profile-only foo,bar,baz).
Decorate functions of interest with @profile to have Scalene report only those functions.
Turn profiling on and off programmatically by importing Scalene (import scalene) and then turning profiling on and off via scalene_profiler.start() and scalene_profiler.stop(). By default, Scalene runs with profiling on, so to delay profiling until desired, use the --off command-line option (python3 -m scalene --off yourprogram.py).

How do I run Scalene in PyCharm?

A: In PyCharm, you can run Scalene at the command line by opening the terminal at the bottom of the IDE and running a Scalene command (e.g., python -m scalene <your program>). Use the options --cli, --html, and --outfile <your output.html> to generate an HTML file that you can then view in the IDE.

How do I use Scalene with Django?

A: Pass in the --noreload option (see #178).

Does Scalene work with gevent/Greenlets?

A: Yes! Put the following code in the beginning of your program, or modify the call to monkey.patch_all as below:

from gevent import monkey
monkey.patch_all(thread=False)

How do I use Scalene with PyTorch on the Mac?

A: Scalene works with PyTorch version 1.5.1 on Mac OS X. There's a bug in newer versions of PyTorch (pytorch/pytorch#57185) that interferes with Scalene (discussion here: #110), but only on Macs.

Technical Information

For details about how Scalene works, please see the following paper, which won the Jay Lepreau Best Paper Award at OSDI 2023: Triangulating Python Performance Issues with Scalene. (Note that this paper does not include information about the AI-driven proposed optimizations.)

To cite Scalene in an academic paper, please use the following:

@inproceedings{288540,
author = {Emery D. Berger and Sam Stern and Juan Altmayer Pizzorno},
title = {Triangulating Python Performance Issues with {S}calene},
booktitle = {{17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)}},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {51--64},
url = {https://www.usenix.org/conference/osdi23/presentation/berger},
publisher = {USENIX Association},
month = jul
}

Success Stories

If you use Scalene to successfully debug a performance problem, please add a comment to this issue!

Acknowledgements

Logo created by Sophia Berger.

This material is based upon work supported by the National Science Foundation under Grant No. 1955610. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

scalene's People

Contributors

Stargazers

Watchers

Forkers

igordzreyev cscg drpoggi clorton beyonddream-productions jolks bedros trendingtechnology mazelinx kwccoin deeplearning2012 johnma9380 chiragjn bijoumd78 jojurajan doytsujin sandy4321 rdpli proc-64core friday-james cclauss zengdukan taoufik07 hhy5277 kertich qianmuluo tchigher renyi devender-yadav wscullin gridl groutr gbaf alexeiz antonalekseev khs123 zhenghh04 yaerobi bersen988 chaozhang323 lilgrassin bryankim96 insutanto blindzhou pradeepambati javierhonduco onisimchukv spread0x pnijhara yanbin-ha dvincelli devanderhoff trehack zh010zh suhailmohdntnx tu-cao gityfx2018 e7dal lxngoddess5321 adrielliu twoletters rakshikab boegel armstrong-liu jaltmayerpizzorno testerclub maseratigo aburgool sternj tkhan3 karenkyu rogerfitz florianboergel sudosalim srravula1 245charan ravigv stjordanis programmer-util dmytrosytnyk ouya-bytes crosstuck ramzi-alqrainy pervrosen jangnh tsrmtrue light4 drugintelligence nipi64310 xrosliang jordanvrtanoski laranea mingminger2333 alirezabayatmk saul-hu 4144 herolin12 hyeon95y eachsaj neixlo

scalene's Issues

potential enhancement: incorporate flame graphs as in py-spy

https://github.com/brendangregg/FlameGraph

pprofile style context profiling

I'm usually not interested in profiling an entire program, I'm more interested in profiling some hotspot of code, or just some new piece of code.

pprofile allows for just profiling a specific region of code (from pprofile's main page):

def someOtherHotSpotCallable():
    # Statistic profiler
    prof = pprofile.StatisticalProfile()
    with prof(
        period=0.001, # Sample every 1ms
        single=True, # Only sample current thread
    ):
        # Code to profile
    prof.print_stats()

It would be nice if scalene allowed for such granularity.

warning by running make

I meet this warning by running make and make failed (Ubuntu18.04)
clang++ -std=c++17 -flto -g -ffast-math -fno-builtin-malloc -O3 -DNDEBUG -fvisibility=hidden -D'CUSTOM_PREFIX(x)=xx##x' -I/usr/include/nptl -fno-builtin-malloc -pipe -fPIC -I. -I./include -IHeap-Layers -IHeap-Layers/wrappers -IHeap-Layers/utility -D_REENTRANT=1 -shared libscalene.cpp Heap-Layers/wrappers/gnuwrapper.cpp -Bsymbolic -o libscalene.so -ldl -lpthread Heap-Layers/wrappers/gnuwrapper.cpp:55:9: warning: 'CUSTOM_PREFIX' macro redefined [-Wmacro-redefined] #define CUSTOM_PREFIX(x) custom##x ^ <command line>:2:9: note: previous definition is here #define CUSTOM_PREFIX(x) xx##x ^ 1 warning generated.

Scalene not working on Windows because ITIMER_PROF in signal is not supported

Hi,

For your information, I would like to report that scalene is not working on Windows 10. I get the following error message when I try running it:

  Traceback (most recent call last):
  File "C:\...\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\...\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\...\anaconda3\lib\site-packages\scalene\__main__.py", line 1, in <module>
    from scalene import scalene
  File "C:\...\anaconda3\lib\site-packages\scalene\scalene.py", line 239, in <module>
    scalene.main()
  File "C:\...\anaconda3\lib\site-packages\scalene\scalene.py", line 218, in main
    profiler = scalene(os.path.join(program_path, os.path.basename(sys.argv[0])))
  File "C:\...\anaconda3\lib\site-packages\scalene\scalene.py", line 75, in __init__
    signal.signal(signal.SIGPROF, self.cpu_signal_handler)
AttributeError: module 'signal' has no attribute 'SIGPROF'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\...\anaconda3\lib\site-packages\scalene\scalene.py", line 199, in exit_handler
    scalene.disable_signals()
  File "C:\...\anaconda3\lib\site-packages\scalene\scalene.py", line 192, in disable_signals
    signal.signal(signal.ITIMER_PROF, signal.SIG_IGN)
AttributeError: module 'signal' has no attribute 'ITIMER_PROF'

As reported here, apparently signal is not suppored fully on Windows.

Profiling of a running python application

Three years ago I faced the problem of profiling a running application as some programs do not halt (Memory profiling of a running python application). Please see if this is relevant as a future feature to the project.

Segfault with CPU+Memory profiling

Trying out the memory profiler but get a segfault with benchmark/julia1_nopil.py. I modified the benchmark to run a little longer. When running the benchmark with no modification, it succeeds.

❯ make
clang++ -std=c++14 -g -ffast-math -fno-builtin-malloc -O3 -DNDEBUG  -D'CUSTOM_PREFIX(x)=xx##x' -I/usr/include/nptl -fno-builtin-malloc -pipe -fPIC -I. -I./include -IHeap-Layers -IHeap-Layers/utility -D_REENTRANT=1 -shared libscalene.cpp Heap-Layers/wrappers/gnuwrapper.cpp -Bsymbolic -o libscalene.so -ldl -lpthread
Heap-Layers/wrappers/gnuwrapper.cpp:55:9: warning: 'CUSTOM_PREFIX' macro redefined [-Wmacro-redefined]
#define CUSTOM_PREFIX(x) custom##x
        ^
<command line>:2:9: note: previous definition is here
#define CUSTOM_PREFIX(x) xx##x
        ^
1 warning generated.

~/projects/scalene fix_usage*
❯ LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene benchmarks/julia1_nopil.py
Length of x: 2000
Total elements: 4000000
Scalene: Memory exhausted: sz = 32
[1]    1376452 segmentation fault (core dumped)  LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene

Config:
clang version 9.0.1
Target: x86_64-pc-linux-gnu (archlinux)
Python 3.8.1
scalene master (0ef5467)

How to explain these memory behaviors when using numpy?

When running a very simple script based on numpy functions, we can get the following results:

test2.py: % of CPU time = 100.00% out of   3.59s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      |
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             |
     3	 |     0.30% |    48.40% |         -80 |       1.03% | x = np.array(range(10**7))
     4	 |     0.59% |    50.72% |           0 |      98.97% | np.array(np.random.uniform(0, 100, size=10**8))
     5	 |           |           |             |             |

How can we get:

A negative memory growth for the first line?
A null memory growth on the second line?

System info

Platform : Mac OS X
Python: 3.7 (brew)
Numpy: 1.16.4

make failed with commit hash 40b21441cd42d95b9123d4d2dbf230161e0b3ba9

I met with make errors(link errors) when using the lastest code
commit 40b2144
Author: emeryberger [email protected]
Date: Sun Apr 26 13:35:12 2020 -0400

Added fast memcpy implementation.

The error message is about llvmgolden.so
Linking failed because this llvm plug in is not installed.
My workaround is to change heaplayers-make.mk in the first line
from: CPPFLAGS = -std=c++17 -flto -g -ffast-math ......
to : CPPFLAGS = -std=c++17 -g -ffast-math ......

remove -flto

Not getting CPU-usage profiling results

Hi,
I installed the scalene package using pip install scalene within my virtualenv. On running it, scalene main.py , I get memory stats but not the CPU-usage ones.

# Some Log....
Line #    Mem usage    Increment   Line Contents
================================================
    35    237.4 MiB    237.4 MiB   @profile()
    36                             def linearRegressionfit(Xt,Yt,Xts,Yts):
    37    237.4 MiB      0.0 MiB       lr=LinearRegression()
    38    241.0 MiB      3.5 MiB       model=lr.fit(Xt,Yt)
    39    241.0 MiB      0.0 MiB       predict=lr.predict(Xts)
    40                             
    41    241.0 MiB      0.0 MiB       print("train Accuracy",lr.score(Xt,Yt))
    42    241.0 MiB      0.0 MiB       print("test Accuracy",lr.score(Xts,Yts))

# More log.....

Scalene: Program did not run for long enough to profile.

Steps to Reproduce

Create python script:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
import numpy as np

from memory_profiler import profile
#fp=open('memory_profile.log','w+')
@profile()
def import_data():
    data=pd.read_csv("kc_house_data.csv")
    data=data.drop(["id","date"],axis=1)
    data["sqft_above"].fillna(1788.39,inplace=True)
    return data
    #data.describe()

@profile()
def parse_data(data):
    Y=data["price"].va[What actually happens]lues
    Y=np.log(Y)
    features=data.columns
    X1=list(set(features)-set(["price"]))
    X=data[X1].values
    ss=StandardScaler()
    X=ss.fit_transform(X)
    return X,Y

@profile()
def linearRegressionfit(Xt,Yt,Xts,Yts):
    lr=LinearRegression()
    model=lr.fit(Xt,Yt)
    predict=lr.predict(Xts)

    print("train Accuracy",lr.score(Xt,Yt))
    print("test Accuracy",lr.score(Xts,Yts))

@profile()
def randForestRegressorfit(Xt,Yt,Xts,Yts):
    regr = RandomForestRegressor(n_estimators=100,max_features='auto',max_depth=80 ,min_samples_leaf=1
                                 ,min_samples_split=2,random_state=0)
    model=regr.fit(Xt,Yt)
    predict=regr.predict(Xts)
    print("train Accuracy : ",regr.score(Xt,Yt))
    print("test Accuracy : ",regr.score(Xts,Yts))


if __name__ == '__main__':
    data = import_data()
    X, Y = parse_data(data)
    Xt,Xts,Yt,Yts=train_test_split(X,Y,test_size=0.4,random_state=0)
    linearRegressionfit(Xt,Yt,Xts,Yts)

    randForestRegressorfit(Xt,Yt,Xts,Yts)
    linearRegressionfit(Xt,Yt,Xts,Yts)

Get the dataset from kc_house_data.zip
Run the code: $ scalene main.py

What could be the reason ?

The project is only available from python3.6 because of f-strings

As it uses f-strings that were available after 3.6
print(f"{fname}: % of CPU time = {percent_cpu_time:6.2f}% out of {Scalene.elapsed_time:6.2f}s.", file=out)
line 211 of scalene/scalene.py

Table shows misleading results for Yappi

Hi Emery,

I am author of Yappi and I have seen your work by chance. It seems like a nice project.

I have a simple request if possible: mainpage shows Yappi like 18x slowdown whereas cProfile is 2x, but the real issue is that: Yappi profiles CPU time by default and cProfile profiles Wall time by default. This makes 10x difference since reading HW clocks is an expensive operation and I think to make comparison fair, I would kindly request you to profile the same application(julia_nogil.py) with setting clock_type to wall. Here is the command I used to do that and the results indicate only 2x-2.5x slowdown roughly similar to cProfile(which should be the case). Currently it seems like it is slower even than the line_profiler :)

> yappi -c wall julia_nopil.py
playground » python julia_nopil.py                                      
Length of x: 1000
Total elements: 1000000
calculate_z_serial_purepython took 19.6705131531 seconds
Total sum of elements (for validation): 33219980

Thanks!

[CentOS 7] Throw PermissionError at CPU Only mode

Throw PermissionError at CPU Only mode when I run the testme.py without sudo

====================================

Scalene: An exception of type PermissionError occurred. Arguments:
(13, 'Permission denied')
Traceback (most recent call last):
File "/data/project/py_venv/scalene/lib/python3.7/site-packages/scalene/scalene.py", line 1382, in main
if profiler.output_profiles():
File "/data/project/py_venv/scalene/lib/python3.7/site-packages/scalene/scalene.py", line 1309, in output_profiles
console.save_text("/dev/stdout", styles=True)
File "/data/project/py_venv/scalene/lib/python3.7/site-packages/rich/console.py", line 967, in save_text
with open(path, "wt", encoding="utf-8") as write_file:
PermissionError: [Errno 13] Permission denied: '/dev/stdout'

multiprocessing support?

Hi, this tool looks amazing! A question: is there any multiprocessing support?
memory_profiler for example can track forked child processes memory: https://github.com/pythonprofilers/memory_profiler#tracking-forked-child-processes

add pip install for the dynamic library for memory profiling

Right now, using scalene's memory profiling feature requires downloading and building from the github repo. It would be nicer and more convenient if this could be done via pip install.

Using memory profiler on linux hangs and uses 100% CPU

Hi, I am trying to profile memory usage using scalene, but attempts at running under python 3.8 on Ubuntu 20.04 with the instructions provided, ie LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python3 -m scalene testy.py only hangs

See below for a reproducible example:

Dockerfile:

FROM ubuntu:focal

RUN apt-get update && apt-get install git clang python python3-pip python3-dev python3-numpy -y
RUN git clone https://github.com/emeryberger/scalene && cd scalene && make && python3 setup.py install

WORKDIR /scalene

Build image, run image and run test:

docker build -t scalene-cpu .
docker run -it scalene-cpu bash
# inside docker
scalene test/testme.py  # works fine
LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc scalene test/testme.py  # runs forever

Is it because it is running under clang v10 and/or python 3.8?

Scalene on Raspberry Pi error on ld.so

I´m trying to analyze a pygame aplication with scalene and got this error at startup

ERROR: ld.so: object '/home/pi/.local/lib/python3.7/site-packages/scalene/libscalene.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored. ERROR: ld.so: object '/home/pi/.local/lib/python3.7/site-packages/scalene/libscalene.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Looks like, because it´s a 32 bit filesystem.

When the program starts, It shows only a black screen with the dimensions defined at the program, but, that´s it, I know it is running, because I can se the prints at console. No output files either.

Arguments support for profiled program.

It seems, that it is not possible, to provide arguments to the profiled program, if i am correct.
This is in my oppinion a big issue, but easy to handle i guess :)

I Tried:
python3 -m scalene testpy.py argument1 argument2

Scalene doesn't work with pyperformance (direct execution of individual benchmarks)

To reproduce:

% pip install pyperformance
% git clone https://github.com/python/pyperformance.git
% cd pyperformance/pyperformance/benchmarks
% scalene bm_pyflate.py 
Scalene: could not find input file.
Scalene: Program did not run for long enough to profile.

Digging in, it appears that when bm_pyflate invokes runner, it creates a process that somehow loses the original arguments, leading Scalene to think it is trying to run the program 6.

Segmentation fault using tensorflow or pytorch

Just for curiosity, I tried to profile a tensorflow and a pytorch script using scalene but got segmentation faults for both scripts.
The python scripts come from tensorflow and pytorch tutorial.

Environment:

python: 3.7
tensorflow: 2.2
pytorch: 1.5
scalene: installed using homebrew
System: MacOS Catalina version 10.15.5

Below are the details to reproduce the error:

tensorflow

python script:

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
print("predictions", predictions)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)

I can successfuly execute the script:

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
2020-07-11 22:48:31.990954: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-11 22:48:32.014027: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd41b7a03b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-11 22:48:32.014048: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
predictions [[-0.27964824  0.78479844 -0.39851144  0.14115062  0.09268872 -0.1322664
   0.04173797 -0.04924813 -0.10641377  0.1781306 ]]
Epoch 1/5
1875/1875 [==============================] - 1s 717us/step - loss: 0.3024 - accuracy: 0.9126
Epoch 2/5
1875/1875 [==============================] - 1s 714us/step - loss: 0.1416 - accuracy: 0.9576
Epoch 3/5
1875/1875 [==============================] - 1s 708us/step - loss: 0.1070 - accuracy: 0.9674
Epoch 4/5
1875/1875 [==============================] - 1s 699us/step - loss: 0.0881 - accuracy: 0.9731
Epoch 5/5
1875/1875 [==============================] - 1s 708us/step - loss: 0.0749 - accuracy: 0.9766
313/313 - 0s - loss: 0.0748 - accuracy: 0.9766

Profile using scalene gives segmentation fault.

(base) ➜  scalene git:(master) ✗ scalene ./test/tf-keras.py 
2020-07-11 22:49:21.339374: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-11 22:49:21.370759: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x11b1b4630 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-11 22:49:21.370778: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
predictions [[-0.1633205  -0.22706667  0.58521605  0.357562   -0.51517636  0.45471746
  -0.10387493  0.41047204 -0.26368517  0.10465179]]
Epoch 1/5
/usr/local/bin/scalene: line 3: 24592 Segmentation fault: 11  DYLD_INSERT_LIBRARIES=/usr/local/Cellar/libscalene/HEAD-a49f5ca/lib/libscalene.dylib PYTHONMALLOC=malloc python3 -m scalene "$@"

Similar for Pytorch script

python script:

# -*- coding: utf-8 -*-
import random
import torch


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

The script can run successfully

(base) ➜  scalene git:(master) ✗ python ./test/torch-dynamic-model.py 
99 38.210121154785156
199 0.7706254720687866
299 2.6024699211120605
399 0.5532416701316833
499 0.3656597137451172

But I got segmentation fault when profiling the script

(base) ➜  scalene git:(master) ✗ scalene test/torch-dynamic-model.py 
/usr/local/bin/scalene: line 3: 23709 Segmentation fault: 11  DYLD_INSERT_LIBRARIES=/usr/local/Cellar/libscalene/HEAD-a49f5ca/lib/libscalene.dylib PYTHONMALLOC=malloc python3 -m scalene "$@"

consider reporting memory growth and allocation operations separately

Scalene currently reports memory growth (mallocs - frees). This reporting can be misleading because a single line could allocate a lot of memory and then free it (e.g., in some numpy operations); if mallocs = frees, nothing will be reported. Since frequent memory allocations and deallocations could point to a performance problem (inadvertent creation of huge temporary data structures), this is unfortunate.

At the cost of adding an additional column, Scalene could report usage (that is, allocation activity: mallocs + frees). Reporting usage would make the situation described above visible, and would let programmers distinguish between memory usage and memory growth, which are two very different kinds of problems.

(Implementation is trivial.)

Profiling multiple files

Hi, I might be missing smth obvious, but is there an option to profile code from multiple files or at least to choose the file to profile?

Say I have three files.
main.py:

import fast
import slow

def main():
	i = 5000
	s = slow.run(i)
	print(s)
	s = fast.run(i)
	print(s)

if __name__ == "__main__":
    main()

slow.py:

def run(i):
    s = 0
    for j in range(i):
        for k in range(j):
            s +=1
    return s

fast.py:

def run(i):
    s = i * (i - 1) // 2
    return s

When I run python -m scalene main.py I get this:

slow.py: % of CPU time = 100.00% out of   0.35s.
  	 |     CPU % |     CPU % |   
  Line	 |  (Python) |  (native) |  [slow.py]
--------------------------------------------------------------------------------
     1	 |           |           | def run(i):
     2	 |           |           |     s = 0
     3	 |           |           |     for j in range(i):
     4	 |    11.69% |           |         for k in range(j):
     5	 |    87.68% |     1.80% |             s +=1
     6	 |           |           |     return s

Is there a way to profile main.py and fast.py as well as slow.py?

crashes on eval/compile

There are some problems with code which use eval/compile:

import attr

while True:
    @attr.s
    class T:
        x = attr.ib(default=666)

$ python -m scalene issue.py

File "/home/poh/.pyenv/versions/3.7.2/lib/python3.7/site-packages/attr/_make.py", line 245, in _make_attr_tuple_class
    eval(compile("\n".join(attr_class_template), "", "exec"), globs)
  File "", line 1, in <module>
  File "/home/poh/.pyenv/versions/3.7.2/lib/python3.7/site-packages/scalene/scalene.py", line 91, in cpu_signal_handler
    if not scalene.should_trace(fname):
  File "/home/poh/.pyenv/versions/3.7.2/lib/python3.7/site-packages/scalene/scalene.py", line 129, in should_trace
    if filename[0] == '<':
IndexError: string index out of range

Attrs is very famous library. It is «must have» to support it.

add discussion of other profilers mentioned on Hacker News to README.md

https://github.com/benfred/py-spy

generates line-level execution time details in form of flamegraphs
requires root on OS X
no memory profiling

https://github.com/vpelletier/pprofile

statistical mode comparable to scalene for execution time, but doesn't include time or percent
no memory profiling

https://pyflame.readthedocs.io/en/latest/installation.html

doesn't work on many platforms
need to test on Linux to measure overhead
AFAICT no memory profiling

add support for profiling Jupyter notebooks

These are in 'cells', not .py files.

https://translucentcomputing.com/2019/12/performance-in-jupyter-python/

Contributor documentation

Is your feature request related to a problem? Please describe.
I would like to contribute to Scalene, but I cannot find any documentation on how to contribute. The Python part is easy, but the rest confuses me quite a bit.

Describe the solution you'd like

Add a Contribute chapter to the README
Describe how to setup a Python environment
Describe how to run all the tests
Describe how to install and use the package locally. (e.g. pip install -e scalene)

pytest plugin

Is your feature request related to a problem? Please describe.
We already have a big test suite that could be used directly to profile specific functionality.
I have not found a way to run Scalene with Pytest.

Describe the solution you'd like
It would be really cool when a Pytest plugin could be created that allows us to run a (set of) tests with Scalene running in the background. I could come up with two solutions:

The first solution is to run pytest trough Scalene

scalene pytest test_mod.py

The second solution is to add a flag to pytest that runs Scalene.

pytest --scalene test_mod.py

The latter would have my preference, since that would allow us to have more fine-grained tests and it would allow us to add test configurations in an IDE (like PyCharm).

Describe alternatives you've considered

At this moment, the only alternative that I could come up with is to create a separate script that runs the code just for profiling.

libscalene.so exports too many symbols

A library intended for LD_PRELOAD use must take care not to pollute the global symbol namespace with symbols that might collide with ones the program actually want to use. libscalene exports its internal symbols and it shouldn't do that.

$ nm --demangle -D ~/software/scalene/libscalene.so  | grep -v ' [uUwW] '
0000000000002ec0 T customaligned_alloc
0000000000002ca0 T customcalloc
0000000000002c90 T customcfree
0000000000002c80 T customfree
0000000000003020 T customgetcwd
0000000000003100 T custommallinfo
0000000000002c70 T custommalloc
0000000000002f60 T custommalloc_good_size
0000000000002f50 T custommalloc_usable_size
00000000000030b0 T custommallopt
0000000000002d90 T custommemalign
0000000000002e20 T customposix_memalign
0000000000003230 T custompvalloc
0000000000002cf0 T customrealloc
0000000000003290 T customrecalloc
0000000000002fe0 T customstrdup
0000000000002f90 T customstrndup
00000000000031e0 T customvalloc
0000000000006120 D __malloc_initialize_hook
0000000000002480 T xxfree
00000000000025a0 T xxfree_sized
00000000000023d0 T xxmalloc
00000000000030e0 T xxmalloc_GET_STATE
0000000000002770 T xxmalloc_lock
00000000000030f0 T xxmalloc_SET_STATE
00000000000030d0 T xxmalloc_STATS
00000000000030c0 T xxmalloc_TRIM
0000000000002780 T xxmalloc_unlock
0000000000002760 T xxmalloc_usable_size
0000000000002340 T getTheCustomHeap()
00000000000031d0 T operator delete[](void*)
0000000000003160 T operator delete(void*)
0000000000003180 T operator new[](unsigned long)
00000000000031c0 T operator new[](unsigned long, std::nothrow_t const&)
0000000000003120 T operator new(unsigned long)
0000000000003170 T operator new(unsigned long, std::nothrow_t const&)
0000000000006278 V RepoSource<4096>::getSource()::head

Scalene changes to directory of program being profiled

Thanks for releasing this tool!

I was trying to profile a program that happened to do a little bit of file I/O, accessing files relative to the current working directory, and spent a couple minutes being confused why the program just died (without printing out any error messages!) when profiled with Scalene before giving up.

Returning to this now, I was able to profile my program if I modify my program to always operate on absolute file paths instead of relative ones. I'm guessing the issue is because Scalene moves into the directory of the program being profiled here, before running the program:
https://github.com/emeryberger/scalene/blob/7d059652614c91baddb9f0f33f2747a07518c364/scalene/scalene.py#L926
It appears that when the program fails to find a file to open while being profiled, the error is silent, which made it hard for me to initially guess what the problem was.

Is it possible to avoid the change of working directory, or at least add some documentation warning about this?

Collect success stories!

Inspired by this issue on a different project, I'd love to hear stories from people who have successfully used Scalene. Did you use it to fix a performance problem, excessive memory consumption, or a leak? (Or something else?) What kind of performance problem? How did Scalene help? Your stories will help guide the development of new features, and also brighten my day!

Strange behavior if script is in a subdirectory

Trying this on a project which has a runner script which tries to find an "engine" (essentially a module) in various different places to support both running from an installed location and from inside the source tree for development. scalene is doing something odd behind the scenes which breaks this project's logic. Quick example to illustrate:

scripts/foo.py:

import os

print(__file__)
print(os.path.realpath(__file__))

Results:

$ python3 scripts/foo.py
scripts/foo.py
/tmp/scripts/foo.py   # <- as expected
$ python3 -m scalene scripts/foo.py
scripts/foo.py
/tmp/scripts/scripts/foo.py  # <- the extra /scripts is problematic

make error of Heap-Layers

Heap-Layers/wrappers/gnuwrapper.cpp:55:9: warning: 'CUSTOM_PREFIX' macro redefined
[-Wmacro-redefined]
#define CUSTOM_PREFIX(x) custom##x
^
:2:9: note: previous definition is here
#define CUSTOM_PREFIX(x) xx##x
^
1 warning generated.

Possible to track code run via PyEval_CallObject?

PyEval_CallObject can be used to invoke python code from C. Seems scalene currently can't profile code run this way. Is this a fundamental limit, or can this functionality be added relatively easily? The use case I'm looking at is profiling tf.py_func calls in tensorflow.

signal.ItimerError: [Errno 22] Invalid argument

After installing scalene from pip, version 0.8.2, I get the following error:

(base) alberto@serenity:~$ python -m scalene scalene/test/testme.py
Scalene: An exception of type ItimerError occurred. Arguments:
(22, 'Invalid argument')
Traceback (most recent call last):
  File "/home/alberto/miniconda3/lib/python3.7/site-packages/scalene/scalene.py", line 712, in main
    profiler.start()
  File "/home/alberto/miniconda3/lib/python3.7/site-packages/scalene/scalene.py", line 488, in start
    Scalene.enable_signals()
  File "/home/alberto/miniconda3/lib/python3.7/site-packages/scalene/scalene.py", line 196, in enable_signals
    signal.setitimer(Scalene.cpu_timer_signal, Scalene.mean_signal_interval, Scalene.mean_signal_interval)
signal.ItimerError: [Errno 22] Invalid argument

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/alberto/miniconda3/lib/python3.7/site-packages/scalene/scalene.py", line 656, in exit_handler
    Scalene.disable_signals()
  File "/home/alberto/miniconda3/lib/python3.7/site-packages/scalene/scalene.py", line 648, in disable_signals
    signal.setitimer(Scalene.cpu_timer_signal, 0)
signal.ItimerError: [Errno 22] Invalid argument

I'm running Python 3.7.4 in Ubuntu 16.04.

enumerate in README.md which profilers support multiple threads

As of emeryberger@9bdf7cc, Scalene does.

Homebrew formula

Hi, thanks for the great tool! I created Homebrew formula in order to make installation and usage on macOS more convenient which builds and installs libscalene.dylib along with the wrapper script:

#!/usr/bin/env sh

DYLD_INSERT_LIBRARIES=/usr/local/Cellar/libscalene/HEAD-e3cdcb7/lib/libscalene.dylib PYTHONMALLOC=malloc python -m scalene "$@"

Formula can be installed with:

$ brew tap antonalekseev/tap
$ brew install --head libscalene

and then used as simple as $ scalene test/testme.py

If you don't mind it can be placed in this repository in Formula directory in order to make it available for tapping with $ brew tap emeryberger/scalene, or just mentioned in the README with my tap as an installation option.

how to work with gunicorn?

differentiate between time spent in the interpreter and in C code

We now can do this by tracking when the signal interval has been significantly exceeded (tracked as of emeryberger@772dc97). The idea is to divide CPU reporting into two columns: one for in-Python execution, and one for in-C execution.

Problem running scalene with argparse

I have a program that uses argparse to allow for arguments into the program. Anyway, I tried using scalene a couple different ways but it seems to just hang. Is there a way to get this to work?

example:

scalene /path/to/python_program.py -r /path/to/input1 -o /path/to/output --extra-flag

I tried the above after making the program executable (in linux). I also tried:

scalene python /path/to/python_program.py -r /path/to/input1 -o /path/to/output --extra-flag

segmentation fault

[1] 4103 segmentation fault LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc scalene test/testme.py

Core Dump in CPU+MEM Conda install running PyTorch

I have conda installed with python 3.7 on a custom SUSE linux distro. I am trying to use your tool on this simple test script:
https://github.com/jtchilders/deephyper_pytorch_layers/blob/master/conv3d/conv3d_run.py
which runs a PyTorch layer. My installation of PyTorch uses Intel's MKL math library for acceleration on Intel chips.

I cloned your repo, ran make using GCC 8.3.0, ran python setup.py build, then added the build/lib path to my PYTHONPATH so python could find scalene. When I run:
LD_PRELOAD=/path/to/libscalene.so PYTHONMALLOC=malloc python -m scalene conv3d_run.py I get a SIGSEGV core-dump.

I'm not sure the GDB output will be useful, but I'll include it:

Core was generated by `python -m scalene conv3d_run.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055d24e6b8b46 in PyNumber_InPlaceOr ()
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-62.22.5.x86_64

The released version contains an old.py file

The released version on pypi contains a file called old.py, I guess it just slipped, please make sure to not include it in the next release.

Provide a way to give arguments to tested script

Hi! Looks like a great project, thanks for it! I tested it tonight, and found something that could be enhanced. (Sorry, no PR at the moment, I have to go to sleep now :p).

I may have missed it, but I found no way to pass arguments to the tested script, even argv[0] is not given.

For example, given this file:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("foo")
print(parser.parse_args())

I'd like the ability to run:

$ python -m scalene the_file.py an_argument

Bests.

Currently with just python -m scalene the_file.py argparse is whining there's not even an argv[0], and if I try to add an_argument scalene complains about the argument not being recognized.

I bet a sane path would be to stick to the cProfile command line arguments:

Usage: cProfile.py [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ...

So one can easily jump from a tool to another.

Minimum execution time?

Hello, i'm trying to profile a script, but when calling python3 scalene script.py it says "Program did not run for long enough to profile." Unix's time command says it runs for roughly ~96ms ,so my question is What is the minimum time a program must run before it can be profiled by scalene? Or maybe i'm doing something wrong? Thanks

Installation: Added AUR package

Just so you're aware, I created this AUR package https://aur.archlinux.org/packages/python-scalene-git/ for installation on Arch Linux/Arch-based Linux distributions.

crash when used on complex project

I had a few threads active, and can't share the code unfortunately. I think this is the relevant python callstack, let me know what other information I can provide.

This was on amazonlinux:2 docker container, w/ python 3.8.1

orchestrator_1  | Fatal Python error: Unreachable C code path reached
orchestrator_1  | Python runtime state: initialized
orchestrator_1  | 
orchestrator_1  | Thread 0x00007fecf6fc0700 (most recent call first):
orchestrator_1  |   File "/pyenv/test/httplib2/__init__.py", line 194 in _build_ssl_context
orchestrator_1  |   File "/pyenv/test/httplib2/__init__.py", line 1238 in __init__
orchestrator_1  |   File "/pyenv/test/httplib2/__init__.py", line 1758 in request
orchestrator_1  |   File "/py/pyenv/venv_run.sh: line 24:    12 Segmentation fault      "$@"

Support `console_scripts`

I tried running scalene on one of my CLI programs. It didn't go well:

I tried running python -m scalene --help; this does not work
I generally write my CLI programs to launch console_scripts entry points, which doesn't seem supported.
I added an explicit if __name__ == "__main__" to my CLI program and scalene no longer aborted; however it also didn't produce any output beyond what my program already produced.

Any guidance on proper usage would be appreciated.

Memory profiling keeps on running forever while only cpu profiling doesn't

Hello, this is the first issue I'm writing ever, so I would appreciate any critic about my writing and problem displaying style.

I'm on Linux Mint 9, I have just installed scalene using pip3 install scalene, created an easy python script called target.py:

import numpy as np

def main():
    x = np.array(range(10 ** 7))
    y = np.array(np.random.uniform(0, 100, size=(10 ** 8)))

main()

Now, if I run: scalene target.py --cpu-only or python3 -m scalene target.py --cpu-only I get the following:

                                         target.py: % of CPU time = 100.00% out of   4.38s.                                         
       ╷        ╷        ╷                                                                                                          
  Line │CPU %   │CPU %   │                                                                                                          
       │Python  │native  │target.py                                                                                                 
╺━━━━━━┿━━━━━━━━┿━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │   7.1% │  17.2% │import numpy as np                                                                                        
     2 │        │        │                                                                                                          
     3 │        │        │def main():                                                                                               
     4 │   0.2% │  38.0% │    x = np.array(range(10**7))                                                                            
     5 │   0.5% │  37.0% │    y = np.array(np.random.uniform(0, 100, size=(10**8)))                                                 
     6 │        │        │                                                                                                          
     7 │        │        │main()

Whether if I run scalene target.py or python3 -m scalene target.py (that is, cpu AND memory profiling), the console doesn't dispay anything and keeps "waiting", I can type stuff and it would display but it the process would keep running until I stop it with a keyboard interrupt Ctrl + C and get this:

^CTraceback (most recent call last):
  File "/home/edo/.local/bin/scalene", line 5, in <module>
    from scalene.__main__ import main
  File "/home/edo/.local/lib/python3.6/site-packages/scalene/__main__.py", line 1, in <module>
    from scalene import scalene
  File "/home/edo/.local/lib/python3.6/site-packages/scalene/scalene.py", line 261, in <module>
    result = subprocess.run(args)
  File "/usr/lib/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib/python3.6/subprocess.py", line 855, in communicate
    self.wait()
  File "/usr/lib/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt

Support for Windows

Scalene doesn't make use of exit codes

I'd expect that scalene <SOMETHING> returns an error code to the calling shell.
Instead it always returns 0.
This is also an issue when trying to set up CI for the scalene build itself since the scalene <SOMETHING> succeeds even when there is something wrong with building libscalene.so.

AttributeError: type object 'scalene' has no attribute 'current_footprint'

When I run test file for both cpu&mem profiling,
LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py
error occurs
scalene: An exception of type AttributeError occurred. Arguments:
("type object 'scalene' has no attribute 'current_footprint'",)
Traceback (most recent call last):
File "/work/scalene/scalene/scalene.py", line 261, in main
exec(code, the_globals)
File "test/testme.py", line 7, in
arr = [i for i in range(1,1000)]
File "test/testme.py", line 7, in
arr = [i for i in range(1,1000)]
File "/work/scalene/scalene/scalene.py", line 153, in free_signal_handler
scalene.current_footprint -= 1
AttributeError: type object 'scalene' has no attribute 'current_footprint'