matt-gardner / pra Goto Github PK

Scala 71.15% Python 1.41% Java 27.45%

pra's Introduction

PRA (Path Ranking Algorithm) and SFE (Subgraph Feature Extraction)

PRA and SFE are algorithms that extract feature matrices from graphs, and use those feature matrices to do link prediction in that graph. This repository contains implementations of PRA and SFE, as used in the following papers (among others):

Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction. Matt Gardner and Tom Mitchell. EMNLP 2015. (website)
Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases. Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell. EMNLP 2014. (website)
Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues. Matt Gardner, Partha Talukdar, Bryan Kisiel, and Tom Mitchell. EMNLP 2013. (website)

To reproduce the experiments in those papers, see the corresponding website. Note that the EMNLP 2015 paper has the most detailed instructions, and the older papers use versions of the code that aren't compatible with the current repository; if you really want to reproduce the older experiments, talk to me.

See the github.io page for code documentation. Please feel free to file bugs, feature requests, or send pull requests.

If the Travis CI badge above says that the build is failing, just click on the badge, go to Build History, and find the most recent commit with a passing build, and check that one out to use the code. Or, if you're using this as a library, just specify the most recent released version in your library dependencies with sbt or mvn, and it should be based on a passing build (see the changelog below for the most recent version).

NOTE

This code generally takes quite a bit of memory. That's probably a byproduct of how it was developed; I typically use a machine that has 400GB of RAM, so I don't need to worry too much about how much memory the code is using. That means I probably do some things that are memory inefficient; on NELL graphs, the code can easily use upwards of 40GB. On larger graphs, and with various parameter settings, it can easily use much more than that. With small graphs, though, I can successfully run the code on a machine that only has 8GB of RAM. This needs some work to be made more memory efficient on larger graphs. It should be straightforward to implement a stochastic gradient training regime, for instance, that would allow for much more memory-efficient computation.

License

This code makes use of a number of other libraries that are distributed under various open source licenses (notably the Apache License and the Common Public License). You can see those dependencies listed in the build.sbt file. The code under the src/ directory is distributed under the terms of the GNU General Public License, version 3 (or, at your choosing, any later version of that license). You can find the text of that license here.

Changelog

Version 3.4 (released on 8/26/2016):

Migrated the graph creation code to using the pipeline architecture from my util library.
Added a node pair feature extractor inspired by one of the node feature extractors.

Version 3.3 (released on 3/22/2016):

Integrated this github repo with Travis CI and coveralls (as seen in the badges at the top of this README).
Implemented a kind of PathFollower for SFE - that is, given a node in the graph, and a set of features, find all other nodes that are reachable by those features. This is both easier and more complicated than the PathFollower in PRA; we don't have to compute probabilities, but we do potentially have more complicated features that make this computation difficult (I'm planning on punting on the complicated ones for now...). Currently implemented and tested for a few simple feature extractors.

Version 3.2 (released on 1/21/2016):

Allow for feature extraction and classification over nodes in the graph, instead of only node pairs. This means that in addition to classifying relations between two entities, you can now also classify entity types based on the features of your graph.
Made a change in how feature matrices are computed with SubgraphFeatureGenerator, which saves a whole lot of memory, especially on large datasets. Previously, all of the intermediate graph structures for all training / test instances would be kept in memory when computing a feature matrix, and this could get very unwieldy. Now we compute them one at a time, only keeping the final feature vector.
Implemented remote graphs. This means that if you have a really large graph that takes a long time to load, or is too big to fit in memory along with all of your feature computation, you can start a graph server once somewhere and let it run, while just passing the code a reference to where the server is running. The trouble is that it's way too slow. To make this really feasible, I need to push more of the computation to the graph, so you don't have to do so much socket communication (i.e., make a remote BfsPathFinder, or SubgraphFeatureGenerator). I did significantly increase the efficiency of loading the graph and storing it in memory as part of this, so runtimes improve quite a bit, at least.

Version 3.1 (released on 11/9/2015):

Started work on getting SGD training for PRA/SFE. There is a lot there in the code to make this work, and some aspects already do. In fact, you can run with an SGD model right now, if you want, it just doesn't work that well. I haven't figured out parameters / learning rates / momentum / whatever to make the SGD training actually perform as well as MALLET's batch training. Hopefully, though, figuring this out will allow for much improved memory requirements in the code.
Some improvements on the Graph object API, with the end goal of allowing the graph to live on a remote server, among other graph variations.
Removed the utility code into its own repository, which is now a dependency. This was so I could use those utilities in other projects I'm working on.
Some general efficiency improvements, that so far seem to lead to a ~20% reduction in both running time and memory usage.

Version 3.0 (released on 5/30/2015):

More refinement on the parameter specification (hence the larger version bump, as the parameter files are not compatible with previous versions). This nests parameters in the specification file according to how they are used in the code, and makes some things in the code way simpler. I think the specification is also conceptually cleaner, but maybe someone else would just think it's more verbose...
A lot of code moved to scala, and in the process some of it became more configurable.
It could still use some more versatility, but there are some improvements to how the graph works - there's a setting where you can keep the graph in memory, for instance, instead of using GraphChi to do random walks over the graph on disk. You can also make instance-specific graphs, so that each training and testing instance has its own graph to use. These need to be pretty small for this to make sense, though.
There is a new mechanism for selecting negative examples, using personalized page rank to select them instead of PRA's random walks. It turns out that it doesn't affect PRA's performance at all, really, but it allows for a better test scenario, and it allows for comparing methods on the same training data, where some other method isn't capable of selecting its own negative examples.
Allowed for other learning algorithms to use PRA's feature matrix. We tried using SVMs with various kernels, and it turns out that logistic regression is better, at least on the metrics we used. And the code is set up to allow you to (relatively) easily experiment with other algorithms, if you want to.
Implemented a new way of creating a feature matrix over node pairs in the graph, which is simpler and easier than PRA; it's similar to just doing the first step of PRA and extracting a feature matrix from the resulting subgraphs. It's faster and works quite a bit better.

Version 2.0 (released on 3/4/2015):

Much better parameter specification. See the github.io page for information on the new way to specify and run experiments. This totally breaks backwards compatibility with older formats, so you'll need to go read the documentation if you want to upgrade to this version.
Working synthetic data generation. There are a lot of parameters to play with here; see the documentation linked above for some more info.
A matrix multiplication implementation of the vector space random walks from the EMNLP 2014 paper. This is at least done in theory. I haven't gotten the performance to be quite as good yet, but the mechanism for doing it is in the code.
Better handling of JVM exit (version 1.1 and earlier tend to spit out InterruptedExceptions at you when it terminates, and most of the time won't give you back the sbt console).

Version 1.1 (released on 12/20/2014):

ExperimentScorer now shows more information. It used to only show each experiment ranked by an overall score (like MAP); now it does a significance test on those metrics, and shows a table of each experiment's performance on each individual relation in the test. ExperimentScorer is not currently very configurable, though - you have to change the code if you want to show something else. This is relatively easy, though, as the parameters are all at the top of the file. You could also write another class that calls ExperimentScorer with your own parameters, if you want.
Added matrix multiplication as an alternative to random walks in the path following step. This is still somewhat experimental, and more details will be forthcoming in a few months. There's a new parameter that can be specified in the param file called path follower. Set it to matrix multiplication to use the new technique. The value of this is mostly theoretical at this point, as performance is pretty much identical to using random walks, except it's slower and less scalable. I plan on getting the vector space random walk idea into the matrix multiplication code soon.
Removed the onlyExplicitNegatives option, because it turns out it's redundant with a setting of the matrix accept policy.
Started work on synthetic data generation, but it's not done yet (well, you can generate some data, but learning from it doesn't turn out as I expect. Something is up...). A final release of working synthetic data generation will have to wait until version 1.2.

pra's People

Stargazers

Watchers

pra's Issues

How can I get aliases file?

Hi Matt, i am trying to run your exprements in “Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases. Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell. EMNLP 2014.”, but i found some data files is missed. For example, i can not find files which name is end with '.aliases'.

Waiting for your reply, thanks.

sbt run - No main class detected

Hello,

Sorry for disturbing but I followed the guide and the project doesn't work it said that "No main class detected". The following code contains the result of each command.

C:\Users\Nouha\pra>sbt test
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from C:\Users\Nouha\pra\project
[info] Set current project to pra (in build file:/C:/Users/Nouha/pra/)
[info] ScalaTest
[info] Run completed in 57 milliseconds.
[info] Total number of tests run: 0
[info] Suites: completed 0, aborted 0
[info] Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
[info] No tests were executed.
[success] Total time: 1 s, completed 13 dÚc. 2018 14:51:12

C:\Users\Nouha\pra>sbt "run Users\Nouha\pra"
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from C:\Users\Nouha\pra\project
[info] Set current project to pra (in build file:/C:/Users/Nouha/pra/)
java.lang.RuntimeException: No main class detected.
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) No main class detected.
[error] Total time: 0 s, completed 13 dÚc. 2018 14:51:48

Waiting for your reply.

Thank you !

Have you tried different machine learning models else Logestic Regression

Have you tried using different machine learning models for the learning path else logistic regression ?
for example support vector machines ?!

If yes, Is there any available report or publication regarding results of these models with SFE and PRA ?

Make remote FeatureGenerators

With large graphs (such as Freebase), it can take upwards of 10 minutes just to load the graph from disk and create the graph object. Maybe there are some things I can do in code to make that a bit quicker, but it still would be nice to only have it done once and be able to reuse a running graph server.

Memory issue of sbt

Hi Matt,

I encountered a memory issue when running the program by sbt.
I specified JVM argument -Xmx40960m in the sbt configurations (I tried both $JAVA_OPTS, $SBT_OPTS, .sbtopts, and directly add -J-Xmx40960m in the running command). The sbt launcher ran with the specified JVM arg, however this argument cannot be passed to the real program, that is, "edu.cmu.ml.rtw.pra.experiments.ExperimentRunner". Therefore I always got an OutOfMemory exception and exit within 10 minutes.

I used jps command to check JVM arguments, shown below:
kangqi@Darkstar:~/workspace/pra$ jps -vl
7512 sun.tools.jps.Jps -Dapplication.home=/usr/java/jdk1.8.0_51 -Xms8m
7464 edu.cmu.ml.rtw.pra.experiments.ExperimentRunner
7401 /usr/share/sbt-launcher-packaging/bin/sbt-launch.jar -Xms40960m -Xmx40960m -XX:ReservedCodeCacheSize=512m -XX:MaxMetaspaceSize=1024m -XX:MaxPermSize=512m

pid=7401 is the sbt launcher, and pid=7464 is the real experiment program, which didn't assign with a Xmx argument. Have you ever encountered this situation, and could you give me some advice on this JVM configuration problem?

Bug in ExperimentRunner

Hi Matt,

I found a bug when I'm running ExperimentRunner with the experiment specification under
examples/experiment_specs/nell/final_emnlp2015/sfe_bfs_pra_anyrel.json.
Below is a snapshot of the output score file of relation riverflowsthroughcity:

concept:river:mfolozi_river concept:river:mfolozi_river -3.203326821678852

concept:river:yangtze concept:river:yangtze 3.1706911223570553
concept:river:yangtze concept:river:yangtze 3.1706911223570553

concept:river:rio_grande concept:river:rio_grande 0.9799970081086755
concept:river:rio_grande concept:river:rio_grande 0.9799970081086755
concept:river:rio_grande concept:river:rio_grande 0.9799970081086755

There has 1823 rows, which is the same as the size of testing file. However, in each row,
the source always equals to target, which I think might be a bug. Then I searched the code,
and find the function lineToInstance in Dataset.scala, which seems to be used for
reading training data and testing data. And I found this:

val source = graph.getNodeIndex(fields(0))
val target = graph.getNodeIndex(fields(0))      <----- I think it should be fields(1)

Is it a bug, or maybe I downloaded some old version of the code? Thanks!

Performing Inference and Parameter Learning

Hi,

I intend to learn weights for given possible paths and perform inference with these paths. Does the code provide that? If not, do I need to create my own feature matrix and fill its values using PRA or SFE random walks?

In addition, I'm quite confused about the metrics used in the paper (MAP and MRR) and what relevant entities would be in this context. Metrics as log-likelihood can't be calculated?

Thank you.

baseline_vs.embeddings

Hi, Matt. I was trying to run PRA instead of SFE. But I run into the following problem.
[error] Exception in thread "main" java.lang.IllegalStateException: Error specifying embeddings (you must give full paths to embedding files)
So, I want to know what does "pca_svo" mean in file baseline_vs*.json.

Compress large files

The graph files (including any associated dictionaries) really should be compressed, to save disk space and disk I/O time. It would probably be faster when loading these large files, because decompressing the file in memory generally takes less time than reading an already decompressed file from disk.

run sfe_bfs_pra_anyrel

I try to follow http://rtw.ml.cmu.edu/emnlp2015_sfe/ to replicate the results reported in emnlp 2015.
Even though I can get the code run until the end, I get an error
INFO models.MalletLogisticRegression: Using L1 regularization
cc.mallet.optimize.OrthantWiseLimitedMemoryBFGS
INFO: getValue() (MalletLogisticRegression$LogRegOptimizable.getValue()

Is it common or I mess up something when I set up the code.
Thanks,

FakeDatasetFactory cannot be resolved

I checked out v3.0then I ran sbt testand I got the following error

[error] ...  pra/src/test/scala/edu/cmu/ml/rtw/pra/config/SpecFileReaderSpec.scala:4: object FakeDatasetFactory is not a member of package edu.cmu.ml.rtw.pra.experiments
[error] import edu.cmu.ml.rtw.pra.experiments.FakeDatasetFactory
[error] one error found
[error] (test:compileIncremental) Compilation failed

I've tried v2.0and It's working so I think this error is related to v3.0 only.

Extract SFE features to file

Is there any documentation about how to use SFE as a library to extract SFE features into a file. without doing any learning or prediction.

re-run EMNLP 2015 paper using PRA not SFE

I'm trying to run the experiment from EMNLP 2015 using PRA with random walks but I can't.

I've used the following configuration file
~/pra/examples/experiment_specs/nell/final_emnlp2015/sfe_bfs_pra_anyrel_PRA.json

{
  "operation": {
    "type": "train and test",
    "features": {
      "path finder": {
        "walks per source": 100,
        "path finding iterations": 3,
        "path accept policy": "paired-only"
      },
      "path selector": {
        "number of paths to keep": 1000
      },
      "path follower": {
        "walks per path": 50,
        "matrix accept policy": "all-targets"
      }
    },
    "learning": {
      "l1 weight": 0.005,
      "l2 weight": 1
    }
  }
}

then I ran it like the following:

sbt "run ./examples/ sfe_bfs_pra_anyrel_PRA"

Then chosed experiment runner and it omited the following error

[info] Running edu.cmu.ml.rtw.pra.experiments.ExperimentRunner ./examples/ sfe_bfs_pra_anyrel_PRA
[info] Found 353 experiment specs, and kept 1 of them
[info] Running PRA from spec file ./examples/experiment_specs/nell/final_emnlp2015/sfe_bfs_pra_anyrel_PRA.json
[info] Split not found at ./examples/splits/nell_with_negatives/; creating it...
[error] Exception in thread "main" java.util.NoSuchElementException: None.get
[error]         at scala.None$.get(Option.scala:347)
[error]         at scala.None$.get(Option.scala:345)
[error]         at edu.cmu.ml.rtw.pra.data.SplitCreator.<init>(SplitCreator.scala:32)
[error]         at edu.cmu.ml.rtw.pra.experiments.Driver.createSplitIfNecessary(Driver.scala:286)
[error]         at edu.cmu.ml.rtw.pra.experiments.Driver.runPra(Driver.scala:57)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPraFromSpec(ExperimentRunner.scala:70)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:55)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:55)
[error]         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
[error]         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
[error]         at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
[error]         at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
[error]         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPra(ExperimentRunner.scala:55)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.main(ExperimentRunner.scala:25)
[error]         at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner.main(ExperimentRunner.scala)
java.lang.RuntimeException: Nonzero exit code returned from runner: 1
        at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code returned from runner: 1
[error] Total time: 3 s, completed 18-Apr-2016 13:38:56

How can I manage to make it working with PRA ?

sbt test

Generate negative examples

Using the repository as a jar file, How can anyone generates a sample of negative examples of triples for a specific relationship using personalised page rank algorithm ?

It seems like this class PprNegativeExampleSelector is the one that do the job, but I cannot figure the most proper way to use for extracting negative examples.

PRA ArrayIndexOutOfBoundsException

Hello Matt,

We have successfully ran SFE. However, when we were running PRA, an ArrayIndexOutOfBounds problem occurs. See below. So would you please have a look about this? I appreciate it.

Thank you in advance!

"""
Computing feature values
9:48:59 PM walk-manager - t:1 INFO: Initial size for walk bucket: 32
9:48:59 PM path-follower execute - t:1 INFO: Creating feature matrix using random walks
9:48:59 PM long-walk-manager initializeWalks - t:1 INFO: Calculate sizes. Walks length:3574
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1786
at edu.cmu.graphchi.walks.LongWalkManager.initializeWalks(LongWalkManager.java:150)
at edu.cmu.graphchi.walks.DrunkardDriver.initWalks(DrunkardDriver.java:98)
at edu.cmu.graphchi.walks.DrunkardMobEngine.run(DrunkardMobEngine.java:147)
at edu.cmu.ml.rtw.pra.features.RandomWalkPathFollower.execute(RandomWalkPathFollower.java:129)
at edu.cmu.ml.rtw.pra.features.PraFeatureGenerator.computeFeatureValues(PraFeatureGenerator.scala:124)
at edu.cmu.ml.rtw.pra.features.PraFeatureGenerator.createTestMatrix(PraFeatureGenerator.scala:52)
at edu.cmu.ml.rtw.pra.operations.TrainAndTest.runRelation(Operation.scala:108)
at edu.cmu.ml.rtw.pra.experiments.Driver$$anonfun$_runStep$1.apply(Driver.scala:93)
at edu.cmu.ml.rtw.pra.experiments.Driver$$anonfun$_runStep$1.apply(Driver.scala:87)
at scala.collection.immutable.Stream.foreach(Stream.scala:594)
at edu.cmu.ml.rtw.pra.experiments.Driver._runStep(Driver.scala:87)
at com.mattg.pipeline.Step.runStep(Step.scala:187)
at com.mattg.pipeline.Step._runPipeline(Step.scala:152)
at com.mattg.pipeline.Step.runPipeline(Step.scala:79)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPraFromSpec(ExperimentRunner.scala:76)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:61)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:61)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPra(ExperimentRunner.scala:61)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.main(ExperimentRunner.scala:31)
at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner.main(ExperimentRunner.scala)
"""

running ExperimentRunner problem

Hi Dr.gradner,
I'm recently learning your pra algorthm,and using the code to run some examples in the codebase.
I just do the options like sbt "run ./examples/ final_emnlp2015",and there are many errors like that

mg1@slave2 ~/pra

$ sbt "run ./examples/ final_emnlp2015"
[info] Loading project definition from /home/mg1/pra/project
[info] Set current project to pra (in build file:/home/mg1/pra/)
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
Multiple main classes detected, select one to run:
[1] edu.cmu.ml.rtw.pra.experiments.ExperimentRunner
[2] edu.cmu.ml.rtw.pra.experiments.ExperimentScorer
[3] edu.cmu.ml.rtw.users.matt.one_off.neil_experiments
Enter number: 1
[info] Running edu.cmu.ml.rtw.pra.experiments.ExperimentRunner ./examples/ final_emnlp2015
[info] Found 349 experiment specs, and kept 13 of them
[info] Running PRA from spec file ./examples/experiment_specs/nell/final_emnlp2015/sfe_bfs_pra_ one_sided.json
[error] Exception in thread "main" org.json4s.package$MappingException: Did not find value whic h can be converted into java.lang.String
[error] at org.json4s.Extraction$.convert(Extraction.scala:603)
[error] at org.json4s.Extraction$.extract(Extraction.scala:350)
[error] at org.json4s.Extraction$.extract(Extraction.scala:42)
[error] at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
[error] at edu.cmu.ml.rtw.pra.graphs.GraphCreator.createGraphChiRelationGraph(GraphCrea tor.scala:76)
[error] at edu.cmu.ml.rtw.pra.graphs.GraphCreator.createGraphChiRelationGraph(GraphCrea tor.scala:72)
[error] at edu.cmu.ml.rtw.pra.experiments.Driver.createGraphIfNecessary(Driver.scala:26 9)
[error] at edu.cmu.ml.rtw.pra.experiments.Driver.runPra(Driver.scala:46)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPraFromSpec(ExperimentRu nner.scala:76)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(Expe rimentRunner.scala:56)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(Expe rimentRunner.scala:56)
[error] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala: 245)
[error] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala: 245)
[error] at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
[error] at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
[error] at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPra(ExperimentRunner.sca la:56)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.main(ExperimentRunner.scala :26)
[error] at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner.main(ExperimentRunner.scala)
java.lang.RuntimeException: Nonzero exit code returned from runner: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
error Nonzero exit code returned from runner: 1
[error] Total time: 3 s, completed Sep 29, 2015 3:21:58 PM

because i just started to learn scala ,can you give some advice? Thank you very much!

ExperimentRunner output unhelpful

Hi Matt,

I've downloaded the code, and I found there are many json files in the "examples" folder,
so I ran the command: sbt "run ./examples/" , passing this folder to ExperimentRunner
as the base directory. The program did recognize all the json files, however, none of experiment specifications is performed, and it seems that the function "runPraFromSpec" has never
been executed.
In fact I'm a fresh man learning scala, I'm not sure about the meaning of the code "shuffled.map(runPraFromSpec(pra_base) _)". So is it the bug of program, or I made a
mistake somewhere? Thanks so much!

Why PRA performs better than SFE ?

Hello, Matt,

I have successfully run SFE and PRA. However, I found that PRA had better precision and accuracy than SFE. I guess that it is may because I set the feature size in SFE to -1 so that there are too many features in the experiment which may influence the result. And below are my configuration files.

Thank you in advance!

"""
{
"graph": {
"name": "yago_sfe"
"relation sets": [
"load relation_sets/yago"
]
},
"split": "yago",
"operation": {
"features": {
"type": "subgraphs",
"path finder": {
"type": "BfsPathFinder",
"number of steps": 3
},
"feature extractors": [
"PraFeatureExtractor",
"AnyRelFeatureExtractor"
],
"feature size": -1
}
"learning": {
"l1 weight": 0.001,
"l2 weight": 0.01
}
}
}
"""

"""
{
"graph": {
"name": "yago_pra"
"relation sets": [
"load relation_sets/yago"
]
},
"split": "yago",
"operation": {
"type": "train and test",
"features": {
"type": "pra",
"path finder": {
"type": "RandomWalkPathFinder",
"walks per source": 100,
"path finding iterations": 3,
"path accept policy": "paired-only"
},
"path selector": {
"number of paths to keep": 1000
},
"path follower": {
"walks per path":50,
"matrix accept policy": "paired-targets-only"
}
}
"learning": {
"l1 weight": 0.005,
"l2 weight": 0.01
}
}
}
"""

I can't run the code....

I cloned the latest version, but it seems not work here. it can not pass all the tests. sbt show me this:

[info] ScalaTest
[info] Run completed in 31 seconds, 983 milliseconds.
[info] Total number of tests run: 159
[info] Suites: completed 26, aborted 0
[info] Tests: succeeded 156, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed: Total 183, Failed 12, Errors 0, Passed 171
[error] Failed tests:
[error]     edu.cmu.ml.rtw.pra.graphs.GraphOnDiskSpec
[error]     edu.cmu.ml.rtw.pra.features.PraFeatureGeneratorSpec
[error]     edu.cmu.ml.rtw.pra.features.RandomWalkPathFollowerTest
[error]     edu.cmu.ml.rtw.pra.graphs.GraphChiPprComputerSpec
[error]     edu.cmu.ml.rtw.pra.features.RandomWalkPathFinderTest
[error] (test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 42 s, completed Oct 14, 2016 4:40:07 PM

Then I try some simple examples. JVM throw out NullPointerException:

> run examples synthetic
[warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see the list

Multiple main classes detected, select one to run:

 [1] edu.cmu.ml.rtw.pra.experiments.ExperimentRunner
 [2] edu.cmu.ml.rtw.pra.experiments.ExperimentScorer
 [3] edu.cmu.ml.rtw.pra.graphs.RunRemoteGraphServer

Enter number: 1

[info] Running edu.cmu.ml.rtw.pra.experiments.ExperimentRunner examples synthetic
[info] Found 354 experiment specs, and kept 6 of them
[info] Running PRA from spec file examples/experiment_specs/debugging/synthetic.json
[error] Exception in thread "main" java.lang.NullPointerException
[error]     at edu.cmu.ml.rtw.pra.graphs.GraphCreator.generateSyntheticRelationSet(GraphCreator.scala:307)
[error]     at edu.cmu.ml.rtw.pra.graphs.GraphCreator$$anonfun$1.apply(GraphCreator.scala:51)
[error]     at edu.cmu.ml.rtw.pra.graphs.GraphCreator$$anonfun$1.apply(GraphCreator.scala:48)
[error]     at scala.collection.immutable.List.map(List.scala:273)
[error]     at edu.cmu.ml.rtw.pra.graphs.GraphCreator.<init>(GraphCreator.scala:48)
[error]     at edu.cmu.ml.rtw.pra.experiments.Driver.getGraphInput(Driver.scala:138)
[error]     at edu.cmu.ml.rtw.pra.experiments.Driver.<init>(Driver.scala:52)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPraFromSpec(ExperimentRunner.scala:76)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:61)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$$anonfun$runPra$1.apply(ExperimentRunner.scala:61)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error]     at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
[error]     at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.runPra(ExperimentRunner.scala:61)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner$.main(ExperimentRunner.scala:31)
[error]     at edu.cmu.ml.rtw.pra.experiments.ExperimentRunner.main(ExperimentRunner.scala)
java.lang.RuntimeException: Nonzero exit code returned from runner: 1
    at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code returned from runner: 1
[error] Total time: 10 s, completed Oct 14, 2016 5:12:56 PM

Finally I checkout to v3.0. The tests even can't be compiled successfully this time! sbt show me this:

> test
[info] Compiling 18 Scala sources and 21 Java sources to /home/liqimai/host/Document/KnowledgeGraph/pra/target/scala-2.11/test-classes...
[error] /home/liqimai/host/Document/KnowledgeGraph/pra/src/test/scala/edu/cmu/ml/rtw/pra/config/SpecFileReaderSpec.scala:4: object FakeDatasetFactory is not a member of package edu.cmu.ml.rtw.pra.experiments
[error] import edu.cmu.ml.rtw.pra.experiments.FakeDatasetFactory
[error]        ^
[error] one error found
[error] (test:compileIncremental) Compilation failed
[error] Total time: 23 s, completed Oct 14, 2016 5:01:55 PM

Could you help me! I've been stucked on all those errors for two days!

feature selection

Having run the code according to the instruction on http://rtw.ml.cmu.edu/emnlp2015_sfe/, I cannot figure out when the feature selection is done. Aren't the features selected before creating feature matrix?

RAM issues when extracting features

Hi,

I am running into RAM issues when performing the CreateMatrices operation for larger graphs and for more expressive features (i.e., going beyond PRA-like features) using SFE.

I observed that, from the time I start running the code to its end, RAM usage only tends to increase, independently of the current relation being processed. Interestingly, if I quit the execution and restart the code a second time from the last relation that the first run couldn't handle, the code is able to follow through with a number of new relations before again using SWAP space (and then, since everything slows down heavily I am forced to quit and restart a third time, and so on and so forth). I am able to process all relations in this manner, but it is not ideal.

Thus, I am trying to understand the mechanism responsible for that. In my (very humble) understanding, this could be due to the following reasons:

Regarding the Operation instance: Every time the code processes each relation, a new FeatureGenerator instance is created, where features are stored. Maybe this generator is not being cleaned from memory after each relation is run and this is what causes the RAM usage to explode as more and more relations are processed.
Another object (maybe split or graph ?) increases its size as more and more features are extracted. Maybe each subgraph created for each node pair is not being cleaned from memory, or something along these lines.

Notice that the reasons mentioned above are just speculations so far since I am still getting familiar with the code, but they seem to make sense given its behavior.

Any ideas on this will help.

SET UP WITH Different data set

I want to run the code with my data set but I am not sure how to correctly create an input data (graph, metadata files)
I currently have a database which contains 3 columns (subject, object and relationship column.)
How can I convert this data into a graph format (stored in graph.tgz)? What are included in the metadata and what are their format?

Thanks

Understand features observations

I did manage to extract feature using the configuration file approach, and they are stored into two file training_matrix.tsv and test_matrix.tsv.

I've extracted a random observation from the training_matrix.tsv file of different relations but I couldn't understand the feature representation what does it mean.

this is an example feature from relation concept:actorstarredinmovie, the following is only one line observation I took it into a new file tried to decompose it into a set of feature to understand what features are like, but I couldn't

concept:sportsteam:purdue_university,concept:sportsleague:ncaa  -1      
-generalizations-_generalizations-,1.0 -#- 
ANYREL:-@ANY_REL@-_generalizations-,1.0 -#- 

ANYREL: -generalizations-_@ANY_REL@-,1.0                                         -#- 
        -concept:subpartoforganization-concept:teamplaysinleague-           ,1.0 -#- 
        -_concept:superpartoforganization-generalizations-_generalizations- ,1.0 -#- 

ANYREL:-_concept:superpartoforganization-@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_@ANY_REL@-concept:teamplaysinleague-,1.0 -#- 
ANYREL:-_@ANY_REL@-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-_concept:leagueteams-_@ANY_REL@-@ALIAS@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-_@ANY_REL@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-@ANY_REL@-,1.0 -#- 
ANYREL:-_@ANY_REL@-generalizations-_generalizations-,1.0 -#- 
ANYREL:-concept:subpartoforganization-@ANY_REL@-_generalizations-,1.0 -#- 
ANYREL:-@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-_@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-_@ANY_REL@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-concept:teamplaysinleague-_@ANY_REL@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-_@ANY_REL@-,1.0 -#- 

ANYREL: -_concept:superpartoforganization-_@ANY_REL@-@ALIAS@-,1.0 -#- 
        -_concept:superpartoforganization-concept:teamplaysinleague-,1.0 -#- 
        -_concept:superpartoforganization-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#- 
        -concept:subpartoforganization-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#- 

ANYREL:-@ANY_REL@-concept:teamplaysinleague-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-concept:teamplaysinleague-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-_@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-@ANY_REL@-generalizations-_generalizations-,1.0 -#- 
-concept:subpartoforganization-_concept:leagueteams-,1.0 -#- -_concept:superpartoforganization-_concept:leagueteams-,1.0 -#- 
ANYREL:-concept:subpartoforganization-concept:teamplaysinleague-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-generalizations-_@ANY_REL@-,1.0 -#- -concept:subpartoforganization-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-@ANY_REL@-_generalizations-,1.0 -#- -concept:subpartoforganization-generalizations-_generalizations-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-generalizations-_@ANY_REL@-,1.0 -#- ANYREL:-concept:subpartoforganization-_concept:leagueteams-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-@ANY_REL@-,1.0 -#- -_concept:superpartoforganization-_@ALIAS@-@ALIAS@-,1.0 -#- ANYREL:-_@ANY_REL@-_concept:leagueteams-,1.0 -#- 
ANYREL:-concept:subpartoforganization-concept:teamplaysinleague-_@ANY_REL@-@ALIAS@-,1.0 -#- 
ANYREL:-@ANY_REL@-_concept:leagueteams-,1.0 -#- ANYREL:-_concept:superpartoforganization-_concept:leagueteams-_@ALIAS@-@ANY_REL@-,1.0 -#- 
ANYREL:-@ANY_REL@-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_@ANY_REL@-concept:teamplaysinleague-_@ALIAS@-@ALIAS@-,1.0 -#- -concept:subpartoforganization-concept:teamplaysinleague-_@ALIAS@-@ALIAS@-,1.0 -#- -_concept:superpartoforganization-concept:teamplaysinleague-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-concept:subpartoforganization-_concept:leagueteams-_@ANY_REL@-@ALIAS@-,1.0 -#- 
ANYREL:-@ANY_REL@-concept:teamplaysinleague-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:superpartoforganization-_@ANY_REL@-_@ALIAS@-@ALIAS@-,1.0 -#- 
ANYREL:-_concept:teamalsoknownas-@ANY_REL@-_generalizations-,1.0 -#- 
ANYREL:-concept:teamalsoknownas-@ANY_REL@-_generalizations-,1.0 -#- 
ANYREL:-concept:teamalsoknownas-generalizations-_@ANY_REL@-,1.0 -#- 
ANYREL:-concept:teamplaysincity-generalizations-_@ANY_REL@-,1.0 -#- 
ANYREL:-_concept:teamalsoknownas-generalizations-_@ANY_REL@-,1.0 -#- -concept:teamalsoknownas-generalizations-_generalizations-,1.0 -#- 
ANYREL:-_concept:citysportsteams-generalizations-_@ANY_REL@-,1.0 -#- 
ANYREL:-concept:teamplaysincity-@ANY_REL@-_generalizations-,1.0 -#- 
ANYREL:-_concept:citysportsteams-@ANY_REL@-_generalizations-,1.0 -#- -concept:teamplaysincity-generalizations-_generalizations-,1.0 -#- -_concept:citysportsteams-generalizations-_generalizations-,1.0 -#- -_concept:teamalsoknownas-generalizations-_generalizations-,1.0

what is -generalization and _generalization ?
what is the value 1.0 the is redundant in the training observation ?
what is -#- ?
what is the feature separator is it ? is it ANYREL: that defines new feature ?

If the following is a full valid feature (as I assume)

ANYREL: -_concept:superpartoforganization-_@ANY_REL@-@ALIAS@-,1.0 -#- 
        -_concept:superpartoforganization-concept:teamplaysinleague-,1.0 -#- 
        -_concept:superpartoforganization-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#- 
        -concept:subpartoforganization-_concept:leagueteams-_@ALIAS@-@ALIAS@-,1.0 -#-

what does it represent ?

Create split with user provided negative examples

Hi,

I want to automatically generate both the graph and the split from files (train.tsv, valid.tsv, test.tsv) that already have negative examples. For instance, each of the files would be in the following format:

Alice   Loves   Bob     1
Alice   Loves   Carl    -1
...     ...     ...     {1|-1}

I have seen that it is possible to generate the graph from relation sets that contain only positive triples, and that we can generate a split (with a proportion of automatically generated negative examples) from the graph.

What I am asking is if with the current implementation can we automatically create a split (the directory and the files) with negative examples specified by myself? If yes, how to do it?

PS: If you feel like this discussion should be part of another one instead of having its own topic, please let me know it and I'll move it there.

Get rid of MALLET

The MALLET dependency is 5 years old, and is basically abandoned code. My dependence on MALLET is holding me back from updating some other libraries that MALLET uses (like trove - MALLET requires 2.0.2, current version is 3.0.3, and they are not compatible). My use of MALLET is just in one small piece of the code, and using a different library for learning would be a good idea.

matt-gardner / pra Goto Github PK

pra's Introduction

PRA (Path Ranking Algorithm) and SFE (Subgraph Feature Extraction)

NOTE

License

Changelog

pra's People

Stargazers

Watchers

Forkers

pra's Issues

Could you help me! I've been stucked on all those errors for two days!

Recommend Projects

Recommend Topics

Recommend Org