sbt / ivy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jeromebenois/ivy

7.0 9.0 20.0 10.65 MB

patched Apache Ivy for sbt

License: Apache License 2.0

XSLT 1.75% CSS 0.11% Scala 0.08% Java 95.81% HTML 2.26%

ivy's People

Stargazers

Watchers

ivy's Issues

jquery.pack.js origin

Hi,
the source has this packed jquery file but it's not possible to identify its version.
I tried to compare the md5sum with the one provided by the upstream project :
https://code.jquery.com/jquery/
but I could match one.
Its size would let me think that it's between 1.1.1 and 1.1.2.
So providing the origin of the file, its version, the license could help the packagers in the distros.
Thanks,

Switch to original Apache Ivy

Would you consider switching to Apache Ivy if we would graft easyant-integration, 2.3.x-sbt and 2.4.x-sbt branches?

Ivy resolution failing with rubygems repo

After doing a lot of debugging due to sbt/sbt#2209, I am fairly confident that there is an issue with ivy resolution for the http://rubygems-proxy.torquebox.org/releases repo

A lot of investigation has been done, using both ivyLoggingLevel := UpdateLogging.Full and logLevel in update := Level.Debug (in sbt), and it appears that ivy is just spitting out a FAILURE, without providing any reason, i.e.

[debug]         tried http://rubygems-proxy.torquebox.org/releases/rubygems/addressable/2.3.8/addressable-2.3.8.gem
[warn]  [FAILED     ] rubygems#addressable;2.3.8!addressable.gem:  (0ms)

This fork may have changed how ivy does resolution in some cases (I doubt its an issue with the mainline ivy, else a lot more people would have complained about it, and the resolution works fine with mainline maven)

For quick reference, here is a trivial build.sbt you can use to debug the issue https://gist.github.com/mdedetrich/ea95947c9b8e35a4d849

Ivy doesn't check maven-metadata-local.xml

Ref sbt/sbt#1616

See

ivy/src/java/org/apache/ivy/plugins/resolver/IBiblioResolver.java

Lines 166 to 167 in 3cf3148

 String metadataLocation = IvyPatternHelper.substitute( 

 root + "[organisation]/[module]/[revision]/maven-metadata.xml", mrid);

how to lookup the credential when a repo redirects?

When a repository is specified as repo.foo.com, but repo.foo.com redirects to dl.bintray.com, ivy looks for the credential for host = dl.bintray.com while e.g. coursier works with the credential for host = repo.foo.com.

It looks like this was a safety choice at the ivy/httpclient side, to avoid accidentally leaking credentials to the wrong server: it is the standard behavior of the HttpClient APIs (https://github.com/sbt/ivy/blob/2.3.x-sbt/src/java/org/apache/ivy/util/url/HttpClientHandler.java#L303, also when used via setCredentials with an AuthScope).

Unfortunately, it looks like in Coursier the connection returned by url.openConnection does follow redirects, but doesn't provide access to the redirected hostname.

How could we make this consistent?

Ivy resolve is not the most efficient

background

There have been reports filed to sbt/sbt project on artifact resolution being "slow" especially for multi-project builds, such as sbt/sbt#413 and SBT hangs resolving dependencies. The problem has been exacerbated by the fact that sbt treats individual subprojects as independent library dependency graph (or "deps graph"), which is reasonable assumption to make for many cases.

However, there is a growing number of projects used in corporate settings in which a large project is split into subprojects with identical deps graph. In these projects, call to sbt update is repeated for dozens of times each time the deps graph is invalidated. If Ivy resolution takes 30 seconds, 20 * 30 becomes 10 minutes. The fact has lead sbt team to investigate the performance characteristics of Ivy resolution.

sample data

I have been working closely with a proprietary project to analyze this behavior, but in order to make the optimization confirmable, I have created a project with similar characteristic using only open source dependencies. eed3si9n/large-graph-project#v1. The original project contains hundreds of intertwined dependencies, and large-graph project attempts to emulate the graph. As it is an emulation, the demonstration of the speedup could be limited in for some of the fixes, but it's still useful to have an objective sample data. Here's a baseline measurement. To prime the submodules and Ivy cache, you might have to run it several times first.

$ sbt ";common/clean; common/update"

trial	sbt 0.13.2
run1	16s
run2	16s
run3	18s
run4	18s
run5	17s
median	17±1s

profiling info

Here are some analysis from running YourKit profiler on this project. Start sbt on large-graph-project, navigate to common by typing in:

> project common

Then, to make sure Ivy cache is loaded run the following:

> ;clean; update; clean

The next time you run update it should mostly measure Ivy resolution only. Run YourKit, attach it to sbt, and start CPU Profiling with Tracing mode with "Profile J2EE" unchecked, and all filtered unchecked as well.

common> update
[info] Updating {file:/Users/eyokota/workspace/large-graph-project/}common...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[success] Total time: 390 s, completed May 5, 2014 12:13:40 PM

Note that baseline is 17±1s so 390s is 23x slowdown caused by the profiler. Here's a link to YourKit sbt-0.13.2-large-graph-project-2014-05-05.snapshot

From Threads tab > Select the time span when I performed update task by looking at the CPU % visually. Next, expand the thread call tree that's spending 99% of the time.

sbt.Casspaths.updateTask

Eventually we find a relevant method Classpaths.updateTask.

sbt.Classpaths$$anonfun$updateTask$1.apply(Object) **372781** 99% 369

This is our starting point of analysis. The above indicates that 99% of the selected range is spent in updateTask taking 372781ms with 369 sample points. Although the actual timing of 372781ms is somewhat artificial, we can see how this timing consists of further down the call chain.

sbt.IvyActions.update

For a while the call tree remains linear, meaning all of 372781ms is spent a call which calls another method. The first diversion occurs in sbt.IvyActions.update.

sbt.IvyActions$$anonfun$update$1.apply(Ivy, DefaultModuleDescriptor, String) **372781** 99% 369

This breaks into two parts:

sbt.IvyActions$.sbt$IvyActions$$resolve(Enumeration$Value, Ivy, DefaultModuleDescriptor, String) **370724** 99% 367
sbt.IvyRetrieve$.updateReport(ResolveReport, File) 2056 1% 2

Of 372s, it spends 2 seconds updating report. Given that this 2s is stretched out 23x time, the effect should be negligible.
Of 370724ms in sbt.IvyActions.revolve, all of its time in org.apache.ivy.Ivy.resolve, which in turn spends all of its time in org.apache.ivy.core.resove.ResolveEngine.resolve.

ResolveEngine.resolve

org.apache.ivy.core.resolve.ResolveEngine.resolve(ModuleDescriptor, ResolveOptions) **370724** 99% 367

Ivy's ResolveEngine.resolve is the first time we see some breakdown. This method breaks down into five parts:

org.apache.ivy.core.resolve.ResolveEngine.getDependencies(ModuleDescriptor, ResolveOptions, ResolveReport) **363593** 97% 360
org.apache.ivy.core.resolve.ResolveEngine.outputReport(ResolveReport, ResolutionCacheManager, ResolveOptions) 4063 1% 4
org.apache.ivy.core.resolve.IvyNode.isCompletelyEvicted() 1042 0% 1
org.apache.ivy.core.report.ResolveReport.setDependencies(List, Filter) 1019 0% 1
org.apache.ivy.core.resolve.ResolveEngine.downloadArtifacts(ResolveReport, Filter, DownloadOptions) 1006 0% 1

Of 370s, it's spending 363s in ResolveEngine.getDependencies and 4s writing Ivy resolution report. The resulting XML file is quite large, so it makes sense that it takes some time. Let's keep the focus on the ResolveEngine.getDependencies method.

ResolveEngine.fetchDependencies (first call)

org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) 362555 97% 359

All of 362555ms in ResolveEngine.fecthDependencies is spent in ResolveEngine.doFetchDependencies. ResolveEngine.doFetchDependencies in turn spends all of its time in ResolveEngine.fetchDependencies.

ResolveEngine.fetchDependencies (second call)

org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) 362555 97% 359

The second call to fetchDependencies breaks down into two parts:

org.apache.ivy.core.resolve.ResolveEngine.doFetchDependencies(VisitNode, String) **361909** 97% 358
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean) **645** 0% 1

fetchDependencies is written in a recursive way. Note that the number of the sampling point has branched between the two methods. 358 sample points took doFetchDependencies, while one took visitNode.loadData.
Again, doFetchDependencies spends all of 361909ms in ResolveEngine.fetchDependencies.

ResolveEngine.fetchDependencies (third call)

org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(VisitNode, String, boolean) **361909** 97% 358

This time 358 sample points splits into three ways:

org.apache.ivy.core.resolve.ResolveEngine.doFetchDependencies(VisitNode, String) **352680** 94% 349
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean) **8194** 2% 8
org.apache.ivy.core.resolve.ResolveEngine.resolveConflict(VisitNode, String) **1035** 0% 1

So, 8 sample points calling VisitNode.loadData taking 8s, and one sample point calling ResolveEngine.resolveConflict taking 1s, and the rest branches out to yet another call to ResolveEngine.doFetchDependencies.
There seems to be a pattern here.

ResolveEngine.fetchDependencies (all calls)

The following is the call graph from the first ResolveEngine.fetchDependencies with 359 sample points. Around 90% of the sample points are expanded out at this point.

Eventually one of the following methods are being called:

org.apache.ivy.core.resolve.ResolveEngine.resolveConflict(VisitNode, String)
org.apache.ivy.core.resolve.VisitNode.getDependencies(String)
org.apache.ivy.core.resolve.VisitNode.gotoNode(IvyNode)
org.apache.ivy.core.resolve.VisitNode.isEvicted()
org.apache.ivy.core.resolve.VisitNode.isCircular()
org.apache.ivy.core.resolve.VisitNode.loadData(String, boolean)

Each spends around 1s, but with 359 sample points each spending 1s would add up to roughly to 362555ms.

Here's the callee list sorted by time:

This is useful, but likely have some overlaps. Here's the callee list sorted by the "own time."

What's concerning is the invocation count for some of these methods. For example,

org.apache.ivy.core.resolve.IvyNode.loadData(String, IvyNode, String, String, boolean, IvyNodeUsage) 19567 0 872 **32374**

This seems to be a partial sample, so the actual count could be higher.

extra logging

To find out just how many times doFetchDependencies is called for which library dependencies, I have added a simple log entry in Ivy code.

$ git diff HEAD^ HEAD
diff --git a/src/java/org/apache/ivy/core/resolve/ResolveEngine.java b/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
index bb3bc95..fddfbf3 100644
--- a/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
+++ b/src/java/org/apache/ivy/core/resolve/ResolveEngine.java
@@ -793,6 +793,7 @@ public class ResolveEngine {

         // now we can actually resolve this configuration dependencies
         if (!isDependenciesFetched(node.getNode(), conf) && node.isTransitive()) {
+            Message.debug("- about to get dependencies for " + node.toString());
             Collection/* <VisitNode> */dependencies = node.getDependencies(conf);
             for (Iterator iter = dependencies.iterator(); iter.hasNext();) {
                 VisitNode dep = (VisitNode) iter.next();

We can publish this Ivy locally, and modify sbt to use that version and also to pass through Ivy debug log into sbt's debug log. Next, run grep to grab only the added log entry from the debug log as follows:

$ grep "about to *" update-debug-log.txt > doFetchDepependencies-log.txt

Here is a link to doFetchDepependencies-log.txt.

[debug] - about to get dependencies for com.example.large#common_2.10;0.1.0-SNAPSHOT
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
[debug] - about to get dependencies for com.example.large#util1_2.10;0.1.0-SNAPSHOT
[debug] - about to get dependencies for org.scalaz#scalaz-effect_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
[debug] - about to get dependencies for org.scalaz#scalaz-core_2.10;7.0.6
....

If we count the lines, doFetchDepependencies is being called 7929 times.

$ cat doFetchDepependencies-log.txt | wc -l
    7929

Next run uniq to group by the library dependencies to see how much of this are duplicates.

$ cat doFetchDepependencies-log.txt | sort | uniq -c | sort -rn > sorted-doFetchDependencies-log.txt

Here is a link to sorted-doFetchDependencies-log.txt.

  29 [debug] - about to get dependencies for org.scala-lang#scala-library;2.10.3
  27 [debug] - about to get dependencies for com.example.large#common_2.10;0.1.0-SNAPSHOT
  24 [debug] - about to get dependencies for org.slf4j#slf4j-api;1.7.5
  24 [debug] - about to get dependencies for org.scalaz#scalaz-effect_2.10;7.0.6
  24 [debug] - about to get dependencies for org.scala-lang#scala-library;2.10.4
  24 [debug] - about to get dependencies for org.json4s#json4s-native_2.10;3.2.6
  24 [debug] - about to get dependencies for org.json#json-simple;1.1.1
  24 [debug] - about to get dependencies for org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016
  24 [debug] - about to get dependencies for org.eclipse.jetty#jetty-webapp;9.1.0.v20131115
  24 [debug] - about to get dependencies for org.eclipse.jetty#jetty-plus;9.1.0.v20131115
....

The number of occurrence is prepended to the line. For example, doFetchDependencies was called for org.scala-lang#scala-library 29 times. Of the total 449 dependencies are tracked, on average doFetchDependencies was called 17 times per library dependency. Due to eviction, certain duplications are necessary to recalculate the dependency graph, but there may be some opportunities for optimization for example around IO overhead. In the field, we are observing even higher the number of occurrence (50+ times for Jackson Json parser etc.).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	String metadataLocation = IvyPatternHelper.substitute(
	root + "[organisation]/[module]/[revision]/maven-metadata.xml", mrid);