sbt / ivy Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jeromebenois/ivy
patched Apache Ivy for sbt
License: Apache License 2.0
This project forked from jeromebenois/ivy
patched Apache Ivy for sbt
License: Apache License 2.0
Hi,
the source has this packed jquery file but it's not possible to identify its version.
I tried to compare the md5sum with the one provided by the upstream project :
https://code.jquery.com/jquery/
but I could match one.
Its size would let me think that it's between 1.1.1 and 1.1.2.
So providing the origin of the file, its version, the license could help the packagers in the distros.
Thanks,
F.
Would you consider switching to Apache Ivy if we would graft easyant-integration, 2.3.x-sbt and 2.4.x-sbt branches?
After doing a lot of debugging due to sbt/sbt#2209, I am fairly confident that there is an issue with ivy resolution for the http://rubygems-proxy.torquebox.org/releases
repo
A lot of investigation has been done, using both ivyLoggingLevel := UpdateLogging.Full
and logLevel in update := Level.Debug
(in sbt), and it appears that ivy is just spitting out a FAILURE
, without providing any reason, i.e.
[debug] tried http://rubygems-proxy.torquebox.org/releases/rubygems/addressable/2.3.8/addressable-2.3.8.gem
[warn] [FAILED ] rubygems#addressable;2.3.8!addressable.gem: (0ms)
This fork may have changed how ivy
does resolution in some cases (I doubt its an issue with the mainline ivy
, else a lot more people would have complained about it, and the resolution works fine with mainline maven
)
For quick reference, here is a trivial build.sbt
you can use to debug the issue https://gist.github.com/mdedetrich/ea95947c9b8e35a4d849
Ref sbt/sbt#1616
See
ivy/src/java/org/apache/ivy/plugins/resolver/IBiblioResolver.java
Lines 166 to 167 in 3cf3148
When a repository is specified as repo.foo.com
, but repo.foo.com
redirects to dl.bintray.com
, ivy looks for the credential for host = dl.bintray.com
while e.g. coursier works with the credential for host = repo.foo.com
.
It looks like this was a safety choice at the ivy/httpclient side, to avoid accidentally leaking credentials to the wrong server: it is the standard behavior of the HttpClient APIs (https://github.com/sbt/ivy/blob/2.3.x-sbt/src/java/org/apache/ivy/util/url/HttpClientHandler.java#L303, also when used via setCredentials
with an AuthScope
).
Unfortunately, it looks like in Coursier the connection returned by url.openConnection
does follow redirects, but doesn't provide access to the redirected hostname.
How could we make this consistent?
There have been reports filed to sbt/sbt project on artifact resolution being "slow" especially for multi-project builds, such as sbt/sbt#413 and SBT hangs resolving dependencies. The problem has been exacerbated by the fact that sbt treats individual subprojects as independent library dependency graph (or "deps graph"), which is reasonable assumption to make for many cases.
However, there is a growing number of projects used in corporate settings in which a large project is split into subprojects with identical deps graph. In these projects, call to sbt update
is repeated for dozens of times each time the deps graph is invalidated. If Ivy resolution takes 30 seconds, 20 * 30 becomes 10 minutes. The fact has lead sbt team to investigate the performance characteristics of Ivy resolution.
I have been working closely with a proprietary project to analyze this behavior, but in order to make the optimization confirmable, I have created a project with similar characteristic using only open source dependencies. eed3si9n/large-graph-project#v1. The original project contains hundreds of intertwined dependencies, and large-graph project attempts to emulate the graph. As it is an emulation, the demonstration of the speedup could be limited in for some of the fixes, but it's still useful to have an objective sample data. Here's a baseline measurement. To prime the submodules and Ivy cache, you might have to run it several times first.
$ sbt ";common/clean; common/update"
trial | sbt 0.13.2 |
---|---|
run1 | 16s |
run2 | 16s |
run3 | 18s |
run4 | 18s |
run5 | 17s |
median | 17±1s |
Here are some analysis from running YourKit profiler on this project. Start sbt on large-graph-project, navigate to common
by typing in:
> project common
Then, to make sure Ivy cache is loaded run the following:
> ;clean; update; clean
The next time you run update
it should mostly measure Ivy resolution only. Run YourKit, attach it to sbt, and start CPU Profiling with Tracing mode with "Profile J2EE" unchecked, and all filtered unchecked as well.
common> update
[info] Updating {file:/Users/eyokota/workspace/large-graph-project/}common...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[success] Total time: 390 s, completed May 5, 2014 12:13:40 PM
Note that baseline is 17±1s so 390s is 23x slowdown caused by the profiler. Here's a link to YourKit sbt-0.13.2-large-graph-project-2014-05-05.snapshot
From Threads tab > Select the time span when I performed update task by looking at the CPU % visually. Next, expand the thread call tree that's spending 99% of the time.
Eventually we find a relevant method Classpaths.updateTask
.
sbt.Classpaths$$anonfun$updateTask$1.apply(Object) **372781** 99% 369
This is our starting point of analysis. The above indicates that 99% of the selected range is spent in updateTask
taking 372781ms with 369 sample points. Although the actual timing of 372781ms is somewhat artificial, we can see how this timing consists of further down the call chain.
For a while the call tree remains linear, meaning all of 372781ms is spent a call which calls another method. The first diversion occurs in sbt.IvyActions.update
.
sbt.IvyActions$$anonfun$update$1.apply(Ivy, DefaultModuleDescriptor, String) **372781** 99% 369
This breaks into two parts:
sbt.IvyActions$.sbt$IvyActions$$resolve(Enumeration$Value, Ivy, DefaultModuleDescriptor, String) **370724** 99% 367
sbt.IvyRetrieve$.updateReport(ResolveReport, File) 2056 1% 2
Of 372s, it spends 2 seconds updating report. Given that this 2s is stretched out 23x time, the effect should be negligible.
Of 370724ms in sbt.IvyActions.revolve
, all of its time in org.apache.ivy.Ivy.resolve
, which in turn spends all of its time in org.apache.ivy.core.resove.ResolveEngine.resolve
.
org.apache.ivy.core.resolve.ResolveEngine.resolve(ModuleDescriptor, ResolveOptions) **370724** 99% 367
Ivy's ResolveEngine.resolve is the first time we see some breakdown. This method breaks down into five parts:
org.apache.ivy.core.resolve.ResolveEngine.getDependencies(ModuleDescriptor, ResolveOptions, ResolveReport) **363593** 97% 360
org.apache.ivy.core.resolve.ResolveEngine.outputReport(ResolveReport, ResolutionCacheManager, ResolveOptions) 4063 1% 4
org.apache.ivy.core.resolve.IvyNode.isCompletelyEvicted() 1042 0% 1
org.apache.ivy.core.report.ResolveReport.setDependencies(List, Filter) 1019 0% 1
org.apache.ivy.core.resolve.ResolveEngine.downloadArtifacts(ResolveReport, Filter, DownloadOptions) 1006 0% 1
Of 370s, it's spending 363s in ResolveEngine.getDependencies
and 4s writing Ivy resolution report. The resulting XML file is quite large, so it makes sense that it takes some time. Let's keep the focus on the ResolveEngine.getDependencies
method.
org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) 362555 97% 359
All of 362555ms in ResolveEngine.fecthDependencies
is spent in ResolveEngine.doFetchDependencies
. ResolveEngine.doFetchDependencies
in turn spends all of its time in ResolveEngine.fetchDependencies
.
org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) 362555 97% 359
The second call to fetchDependencies breaks down into two parts:
org.apache.ivy.core.resolve.ResolveEngine.doFetchDependencies(VisitNode, String) **361909** 97% 358
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean) **645** 0% 1
fetchDependencies
is written in a recursive way. Note that the number of the sampling point has branched between the two methods. 358 sample points took doFetchDependencies
, while one took visitNode.loadData
.
Again, doFetchDependencies
spends all of 361909ms in ResolveEngine.fetchDependencies
.
org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) **361909** 97% 358
This time 358 sample points splits into three ways:
org.apache.ivy.core.resolve.ResolveEngine.doFetchDependencies(VisitNode, String) **352680** 94% 349
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean) **8194** 2% 8
org.apache.ivy.core.resolve.ResolveEngine.resolveConflict(VisitNode, String) **1035** 0% 1
So, 8 sample points calling VisitNode.loadData
taking 8s, and one sample point calling ResolveEngine.resolveConflict
taking 1s, and the rest branches out to yet another call to ResolveEngine.doFetchDependencies
.
There seems to be a pattern here.
The following is the call graph from the first ResolveEngine.fetchDependencies
with 359 sample points. Around 90% of the sample points are expanded out at this point.
Eventually one of the following methods are being called:
org.apache.ivy.core.resolve.ResolveEngine.resolveConflict(VisitNode, String)
org.apache.ivy.core.resolve.VisitNode.getDependencies(String)
org.apache.ivy.core.resolve.VisitNode.gotoNode(IvyNode)
org.apache.ivy.core.resolve.VisitNode.isEvicted()
org.apache.ivy.core.resolve.VisitNode.isCircular()
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean)
Each spends around 1s, but with 359 sample points each spending 1s would add up to roughly to 362555ms.
Here's the callee list sorted by time:
This is useful, but likely have some overlaps. Here's the callee list sorted by the "own time."
What's concerning is the invocation count for some of these methods. For example,
org.apache.ivy.core.resolve.IvyNode.loadData(String, IvyNode, String, String, boolean, IvyNodeUsage) 19567 0 872 **32374**
This seems to be a partial sample, so the actual count could be higher.
To find out just how many times doFetchDependencies
is called for which library dependencies, I have added a simple log entry in Ivy code.
$ git diff HEAD^ HEAD
diff --git a/src/java/org/apache/ivy/core/resolve/ResolveEngine.java b/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
index bb3bc95..fddfbf3 100644
--- a/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
+++ b/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
@@ -793,6 +793,7 @@ public class ResolveEngine {
// now we can actually resolve this configuration dependencies
if (!isDependenciesFetched(node.getNode(), conf) && node.isTransitive()) {
+ Message.debug("- about to get dependencies for " + node.toString());
Collection/* <VisitNode> */dependencies = node.getDependencies(conf);
for (Iterator iter = dependencies.iterator(); iter.hasNext();) {
VisitNode dep = (VisitNode) iter.next();
We can publish this Ivy locally, and modify sbt to use that version and also to pass through Ivy debug log into sbt's debug log. Next, run grep to grab only the added log entry from the debug log as follows:
$ grep "about to *" update-debug-log.txt > doFetchDepependencies-log.txt
Here is a link to doFetchDepependencies-log.txt.
[debug] - about to get dependencies for com.example.large#common_2.10;0.1.0-SNAPSHOT
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for com.example.large#util1_2.10;0.1.0-SNAPSHOT
[debug] - about to get dependencies for org.scalaz#scalaz-effect_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
....
If we count the lines, doFetchDepependencies
is being called 7929 times.
$ cat doFetchDepependencies-log.txt | wc -l
7929
Next run uniq
to group by the library dependencies to see how much of this are duplicates.
$ cat doFetchDepependencies-log.txt | sort | uniq -c | sort -rn > sorted-doFetchDependencies-log.txt
Here is a link to sorted-doFetchDependencies-log.txt.
29 [debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
27 [debug] - about to get dependencies for com.example.large#common_2.10;0.1.0-SNAPSHOT
24 [debug] - about to get dependencies for org.slf4j#slf4j-api;1.7.5
24 [debug] - about to get dependencies for org.scalaz#scalaz-effect_2.10;7.0.6
24 [debug] - about to get dependencies for org.scala-lang#scala-library;2.10.4
24 [debug] - about to get dependencies for org.json4s#json4s-native_2.10;3.2.6
24 [debug] - about to get dependencies for org.json#json-simple;1.1.1
24 [debug] - about to get dependencies for org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016
24 [debug] - about to get dependencies for org.eclipse.jetty#jetty-webapp;9.1.0.v20131115
24 [debug] - about to get dependencies for org.eclipse.jetty#jetty-plus;9.1.0.v20131115
....
The number of occurrence is prepended to the line. For example, doFetchDependencies
was called for org.scala-lang#scala-library
29 times. Of the total 449 dependencies are tracked, on average doFetchDependencies
was called 17 times per library dependency. Due to eviction, certain duplications are necessary to recalculate the dependency graph, but there may be some opportunities for optimization for example around IO overhead. In the field, we are observing even higher the number of occurrence (50+ times for Jackson Json parser etc.).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.