Giter VIP home page Giter VIP logo

parthatalukdar / junto Goto Github PK

View Code? Open in Web Editor NEW
149.0 149.0 52.0 7.8 MB

This toolkit consists of implementations of various graph-based semi-supervised learning (SSL) algorithms. Currently, three algorithms are implemented: Gaussian Random Fields (GRF), Adsorption, and Modified Adsorption (MAD). Junto also contains Hadoop-based implementations of these three algorithms.

Home Page: https://github.com/parthatalukdar/junto

License: Apache License 2.0

Shell 8.08% Scala 24.29% Java 67.63%

junto's People

Contributors

dhgarrette avatar eponvert avatar jasonbaldridge avatar parthatalukdar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

junto's Issues

Discrepancy in auxiliary functions between MAD paper and junto's implementation

In http://talukdar.net/papers/adsorption_ecml09.pdf, Pg. 5,
the monotonically decreasing function
f(x) = log(beta) / log(beta + e^x)
is used in computing
cv = f(H[v]) where H[v] is the entropy of transition probabilities for each node
However, in junto's code, src/main/scala/upenn/junto/graph/Vertex.scala, line 162,
var cv = math.log(beta) / math.log(beta + ent)
instead of
var cv = math.log(beta) / math.log(beta + math.exp(ent))

This discrepancy is resulting in the converged labels to be different when using the paper's formula from using the junto's formula.

Moreover, in src/main/scala/upenn/junto/graph/Vertex.scala, lines (170-178), special handling for the case where jv (dv in paper) is 0. However, if the computation of cv uses the form in paper, this case of jv (or dv) being 0 would not arise.

I am trying to understand why this discrepancy was introduced and what are the implications of the converged labels being different between the two cases.

AssertionError with a 17Mb input_graph

Dear all,

I am testing your software with an input_graph file with the following characteristics:

  • size 17Mb
  • nodes 82919
  • edges 668775
  • undirected graph
  • weights ranged between 0.0001 and 1.0

I am using the simple_config (in examples folder) setting:

  • MAD algorithm
  • 10 iterations (i actually tried even with just 1 iteration)
  • no node pruning (i actually tried even very high pruning coefficients)
  • default hyperparameters

After 2 seconds of computation (all 4 cores used) I get the following message:

Exception in thread "main" java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:165)
    at upenn.junto.config.GraphBuilder$$anonfun$apply$3.apply(GraphLoader.scala:157)
    at upenn.junto.config.GraphBuilder$$anonfun$apply$3.apply(GraphLoader.scala:155)
    at scala.collection.immutable.List.foreach(List.scala:309)
    at upenn.junto.config.GraphBuilder$.apply(GraphLoader.scala:155)
    at upenn.junto.config.GraphConfigLoader$.apply(GraphLoader.scala:59)
    at upenn.junto.app.JuntoConfigRunner$.apply(Junto.scala:98)
    at upenn.junto.app.JuntoConfigRunner$.main(Junto.scala:132)
    at upenn.junto.app.JuntoConfigRunner.main(Junto.scala)

I tried with a much smaller graph (10271 nodes, 45047 edges, weights ranged between 0.03 and 1) and the software worked just fine. This suggest me that it could be due to (I) the size of the graph or to (II) the very small weight of some edges.

Any idea what the problem could be? I would appreciate your viewpoint.

Thanks,
michele.

junto error: error while loading CharSequence

error: error while loading CharSequence, class file '/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken
(bad constant pool tag 18 at byte 10)

Incompatibility with Oracle Java 8

Hello,

Thank you for the amazing work!
I was just compiling Junto with Java Oracle 8. It seems that there are some incompatibilities.
JAVA VERSION:
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

ERROR:
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /home/af/Documents/junto-master/project
[info] Set current project to junto (in build file:/home/af/Documents/junto-master/)
[info] Updating {file:/home/af/Documents/junto-master/}default-752ee9...
[info] Resolving org.scala-lang#scala-library;2.10.0 ...
[info] Resolving com.typesafe.akka#akka-actor_2.10;2.1.0 ...
[info] Resolving com.typesafe#config;1.0.0 ...
[info] Resolving org.clapper#argot_2.10;1.0.0 ...
[info] Resolving org.clapper#grizzled-scala_2.10;1.1.2 ...
[info] Resolving jline#jline;2.6 ...
[info] Resolving net.sf.trove4j#trove4j;3.0.3 ...
[info] Resolving com.typesafe#scalalogging-log4j_2.10;1.0.1 ...
[info] Resolving org.scala-lang#scala-reflect;2.10.0 ...
[info] Resolving org.apache.logging.log4j#log4j-api;2.0-beta3 ...
[info] Done updating.
[success] Total time: 1 s, completed 04/03/2015 4:36:49 PM
[info] Compiling 11 Scala sources and 28 Java sources to /home/af/Documents/junto-master/target/classes...
[info] 'compiler-interface' not yet compiled for Scala 2.10.0. Compiling...
error: error while loading CharSequence, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken
(class java.lang.RuntimeException/bad constant pool tag 18 at byte 10)
error: error while loading Comparator, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/util/Comparator.class)' is broken
(class java.lang.RuntimeException/bad constant pool tag 18 at byte 20)
error: error while loading AnnotatedElement, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/reflect/AnnotatedElement.class)' is broken
(class java.lang.RuntimeException/bad constant pool tag 18 at byte 76)
error: error while loading Arrays, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/util/Arrays.class)' is broken
(class java.lang.RuntimeException/bad constant pool tag 18 at byte 765)
/tmp/sbt_713306da/API.scala:384: error: java.util.Comparator does not take type parameters
private[this] val sortClasses = new Comparator[Symbol] {
^
5 errors found
error Error compiling sbt component 'compiler-interface'
[error] Total time: 4 s, completed 04/03/2015 4:36:54 PM

It beautifully compiles and runs on OpenJDK-7 though.

Cheers.

Error: Could not find or load main class upenn.junto.app.JuntoConfigRunner

Opening an issue for the first time, sorry if this isn't how it's done.

Running
junto config
produces the error in the title, restated below:
Error: Could not find or load main class upenn.junto.app.JuntoConfigRunner

Java version information is as below:
$ java -showversion
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

I'm also unable to find any class called JuntoConfigRunner in the code. Is it something to do with the classpath?

Apologies again if the solution is trivial or unworthy of a new issue. I'm trying to implement MAD in Python and was referred to your library by Dr. Ashwin Srinivasan.

A missing exponential part in the calculation of the quantity cv

Dear all,

At the line number 161-162 of the file Vertex.scala, it is the implementation of the quantity cv as described in the section 2.2 [1].

val ent = GetNeighborhoodEntropy(neighborClone)
var cv = math.log(beta) / math.log(beta + ent)

In the original paper, the entropy, denoted by the variable ent, is passed to a function f(x). The definition of the function is f(ent) = log(beta)/log(beta + e^(ent)). By the way, the line didn't implement the natural exponential part. I am just a beginner Scala programmer. I think it may be a mistake. Do you have any idea about this point?

Many thanks,
Phiradet

[1] Talukdar, P. P., & Crammer, K. (2009). New Regularized Algorithms for Transductive Learning. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II (pp. 442โ€“457). Berlin, Heidelberg: Springer-Verlag.

A suggestion

To make it more clear, maybe you should tell people to compile using $sbt bin/build update compile instead of $bin/build update compile.
Furthermore, which example is the algorithm in "New Regularized Algorithms for Transductive Learning"?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.