Comments (7)
Related SOF answer: https://stackoverflow.com/questions/32123475/profiling-builds-with-stack
TL;DL: use stack build --profile
to build the system and run it using stack exec -- <path-to-penrose-binary> +RTS -p
It seems like if we run stack build --profile
, it's quite costly to come back to the original stack build
because the libraries have to be rebuilt and it, as we all know, takes 20 mins to do so.
- TODO: find a way to build libs with profiling options separately
from penrose.
Profile result for a circle + a label + contains
Sun Jul 22 11:36 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/oneset.sub sty/venn.sty dsll/setTheory.dsl
total time = 2.35 secs (2354 ticks @ 1000 us, 1 processor)
total alloc = 1,224,297,568 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.3 3.7
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 3.0 15.5
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 2.8 0.4
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 2.6 1.0
penalty NewFunctions src/NewFunctions.hs:(168,1)-(169,23) 2.5 1.3
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 2.0 1.4
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.0 4.4
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 1.9 1.8
constrFuncDict NewFunctions src/NewFunctions.hs:(174,1)-(186,13) 1.8 1.5
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 1.8 3.7
.: ShapeDef src/ShapeDef.hs:462:1-10 1.7 0.8
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 1.7 2.5
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 1.4 8.1
pPrint Text.Show.Pretty Text/Show/Pretty.hs:71:1-26 1.3 0.1
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.3 4.4
checkDeclaredType Env src/Env.hs:(319,1)-(322,118) 1.2 0.0
happyDoAction Text.Show.Parser templates/GenericTemplate.hs:(111,1)-(136,60) 1.1 0.3
ppCon Text.Show.Pretty Text/Show/Pretty.hs:(132,1)-(133,51) 1.1 2.1
binary Numeric.AD.Internal.Reverse src/Numeric/AD/Internal/Reverse.hs:(182,3)-(191,89) 1.1 0.8
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.1 1.5
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.1 1.2
addProperty.fieldDict' NewStyle src/NewStyle.hs:1052:16-77 1.1 1.0
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.1 0.6
r2f Utils src/Utils.hs:65:1-16 1.0 0.2
block Text.Show.Pretty Text/Show/Pretty.hs:(150,1)-(152,63) 0.9 1.6
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.9 2.2
toPolymorphic Server src/Server.hs:156:1-74 0.8 1.4
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.6 1.4
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.5 1.1
from penrose.
For nested.sub
, the optimization seems to be very slow. Here is the profiling result:
Sun Jul 22 11:41 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/nested.sub sty/venn.sty dsll/setTheory.dsl
total time = 30.60 secs (30596 ticks @ 1000 us, 1 processor)
total alloc = 23,942,023,920 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
.: ShapeDef src/ShapeDef.hs:462:1-10 6.5 2.9
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 5.4 1.1
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 4.5 17.8
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 3.8 0.4
predEq NewStyle src/NewStyle.hs:641:1-99 3.6 1.0
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.3 3.0
penalty NewFunctions src/NewFunctions.hs:(168,1)-(169,23) 3.3 1.3
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 2.8 3.0
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 2.7 2.1
compare NewStyle src/NewStyle.hs:874:25-27 2.1 0.0
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.1 5.1
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 2.0 9.2
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 1.8 1.1
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 1.7 2.0
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.4 4.9
findShape.\ ShapeDef src/ShapeDef.hs:401:24-45 1.4 0.0
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.2 1.2
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.1 0.5
toSubPred NewStyle src/NewStyle.hs:(460,1)-(462,73) 1.0 2.2
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.0 1.0
relMatchesLine NewStyle src/NewStyle.hs:(742,1)-(749,28) 1.0 2.8
evalFnArgs NewStyle src/NewStyle.hs:(1547,1)-(1551,37) 0.8 2.2
toPolymorphic Server src/Server.hs:156:1-74 0.8 1.2
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.8 1.8
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.7 1.3
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.5 1.4
from penrose.
After INLINE
ing (.:)
and penalty
, here is the result for nested.sub
. I expected heavy usage of ShapeDef util functions, but seems like evaluation takes a long time here.
(UPDATE: adding INLINE
maybe defeats the purpose, because fprof-auto
will automatically exclude the INLINE
calls...)
Sun Jul 22 13:28 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/nested.sub sty/venn.sty dsll/setTheory.dsl
total time = 22.30 secs (22299 ticks @ 1000 us, 1 processor)
total alloc = 20,678,967,480 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 5.8 1.1
getName ShapeDef src/ShapeDef.hs:(474,1)-(476,64) 5.0 0.9
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 4.7 18.3
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 3.9 0.4
predEq NewStyle src/NewStyle.hs:641:1-99 3.9 1.1
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.5 3.1
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 2.9 2.2
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 2.7 3.1
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.5 5.2
compare NewStyle src/NewStyle.hs:874:25-27 2.3 0.0
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 2.2 9.5
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 2.0 2.1
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 2.0 1.2
findShape.\ ShapeDef src/ShapeDef.hs:401:24-45 1.7 0.0
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.4 5.1
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.3 0.5
toSubPred NewStyle src/NewStyle.hs:(460,1)-(462,73) 1.1 2.3
relMatchesLine NewStyle src/NewStyle.hs:(742,1)-(749,28) 1.1 2.9
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.1 1.2
addProperty.trn' NewStyle src/NewStyle.hs:1053:16-57 1.1 1.0
dist Utils src/Utils.hs:319:1-64 1.0 0.5
contains NewFunctions src/NewFunctions.hs:(379,1)-(400,79) 1.0 0.3
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.0 1.0
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.9 1.9
evalFnArgs NewStyle src/NewStyle.hs:(1547,1)-(1551,37) 0.8 2.3
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.8 1.3
toPolymorphic Server src/Server.hs:156:1-74 0.7 1.2
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.6 1.5
toSubExpr NewStyle src/NewStyle.hs:(451,1)-(453,110) 0.5 1.0
toSubPredArg NewStyle src/NewStyle.hs:(456,1)-(457,49) 0.5 1.0
BTW 20678.96748 MB used in memory?
from penrose.
Ran the old system on master
and got the following result:
Sun Jul 22 17:02 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS src/sub/nested.sub src/sty/venn.sty src/dsll/setTheory.dsl
total time = 40.54 secs (40541 ticks @ 1000 us, 1 processor)
total alloc = 26,682,575,096 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
breakDelim Data.List.Split.Internals src/Data/List/Split/Internals.hs:(151,1)-(156,36) 7.1 24.6
breakDelim.(...) Data.List.Split.Internals src/Data/List/Split/Internals.hs:155:25-52 6.3 4.3
matchDelim Data.List.Split.Internals src/Data/List/Split/Internals.hs:(73,1)-(77,23) 5.5 4.0
splitInternal Data.List.Split.Internals src/Data/List/Split/Internals.hs:(139,1)-(148,70) 5.4 7.9
insertBlanks' Data.List.Split.Internals src/Data/List/Split/Internals.hs:(195,1)-(201,49) 4.5 7.2
split Data.List.Split.Internals src/Data/List/Split/Internals.hs:249:1-68 3.4 9.6
penalty Functions src/Functions.hs:(498,1)-(499,23) 3.2 1.2
onSublist Data.List.Split.Internals src/Data/List/Split/Internals.hs:278:1-72 3.1 0.0
doDrop Data.List.Split.Internals src/Data/List/Split/Internals.hs:(172,1)-(173,14) 3.0 4.3
splitInternal.(...) Data.List.Split.Internals src/Data/List/Split/Internals.hs:144:3-31 2.7 0.0
postProcess Data.List.Split.Internals src/Data/List/Split/Internals.hs:(163,1)-(168,45) 2.6 2.1
breakDelim.match Data.List.Split.Internals src/Data/List/Split/Internals.hs:155:25-52 2.0 0.0
objOrSecondaryShape Runtime src/Runtime.hs:(328,1)-(335,27) 1.9 0.0
binary Numeric.AD.Internal.Reverse src/Numeric/AD/Internal/Reverse.hs:(182,3)-(191,89) 1.6 1.2
splitInternal.toSplitList Data.List.Split.Internals src/Data/List/Split/Internals.hs:(146,3)-(148,70) 1.5 2.8
matched Style src/Style.hs:(362,1)-(371,30) 1.4 0.0
lookupAll Runtime src/Runtime.hs:320:1-110 1.4 0.2
procBlock.isOneToOne.bijectify Style src/Style.hs:465:19-75 1.3 2.4
getDictAndFns.initDict.\ Style src/Style.hs:335:22-103 1.2 0.9
procBlock.addShapes Style src/Style.hs:(443,9)-(445,50) 1.1 0.6
procBlock.varmaps Style src/Style.hs:434:9-87 1.1 0.4
getConstrTuples.getType Substance src/Substance.hs:673:11-137 1.0 2.0
procBlock.isOneToOne.flatMap Style src/Style.hs:462:17-66 0.8 1.9
matchWith Style src/Style.hs:(376,1)-(380,33) 0.8 2.0
Not sure what we can learn from this. I'm pretty confused about the Data.split
usage. The line that really uses the splitOn
, which is from this package, is:
nameParts = splitOn nameSep
from penrose.
Thanks for doing the profiling! Looking at the most recent nested.sub
report, I'm very surprised that getName
and predEq
take so much time. getName
should be fast and predEq
is only called in the compilation phase, not the optimization/runtime. Can we exclude the compilation from the profiling?
Also, I would expect the optimization to be taking a lot of time (step
, line search, etc.) but I don't even see it on the list (except for dist
and contains
).
Some of the slowness might be on the rendering side. Try using the Chrome JS profiler? https://developers.google.com/web/tools/chrome-devtools/rendering-tools/
Also try looking for tips on making numeric Haskell code fast, e.g. https://wiki.haskell.org/Performance/Floating_point
http://book.realworldhaskell.org/read/profiling-and-optimization.html
from penrose.
We profiled the system several times and removed the largest bottlenecks. The opt could still be faster per-step, but it's probably more effective to now think about using smarter optimization methods or better objective functions. #120
from penrose.
Related Issues (20)
- Cannot share gists with empty programs in the IDE
- Example at https://penrose.cs.cmu.edu/docs/ref/api does not compile
- Error when parsing floating-point numbers in Substance
- Improper handling of `Prop`-typed expressions in Substance HOT 1
- Nested function calls in Substance
- Multiple bugs related to tracking state changes when saving drafts and workspaces in the editor HOT 1
- UX improvements for saving/editing in the editor
- Incorrect handling of indexed sets with flipped ranges HOT 2
- Style `toString` functionality HOT 2
- Run in non-browser environments HOT 4
- Document arrowhead types
- Impossible n-gon example broken
- `saveWorkspaceEffect` in editor functionality should be split based on whether state update should be immediate or debounced
- `AutoLabel` on Substance types
- Calling `random` with varying or computed values results in uncaught errors in the IDE
- Using Substance variables not declared in Style header
- Unresolved "compiling" toasts in `roger`-mode IDE
- Multiplicity in Domain type declarations HOT 6
- Unified language for notation and styling HOT 1
- feat: Warn user if they're about to close a browser tab with unsaved Penrose code HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from penrose.