Giter VIP home page Giter VIP logo

iga-adi-giraph's Issues

Shrink matrices at the leaves and elsewhere

Right now we are using the matrices of same size everywhere.
This does not make sense, as at the leaves this is not 6x6 but 3x3 only (to make matter worse for X and B as well, which tend to have a lot of columns).

Remove all forms of boxing

Right now some streams do .boxed() which is unacceptable for memory efficiency reasons. Make sure to remove it all.

Don't store matrices needlessly

The first time we need the big X matrices are only once we perform backwards substitutions. At any given point of time we only need the parent level matrices and the child level matrices, so we can create new ones as we go while deleting the old ones. This can provide a substantial memory optimisation benefit.

Allow odd worker count

For now odd worker counts cause the problem with partitioning

Exception in thread "org.apache.giraph.master.MasterThread" java.lang.IllegalStateException: java.lang.ArithmeticException: mode was UNNECESSARY, but rounding was necessary
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:201)
Caused by: java.lang.ArithmeticException: mode was UNNECESSARY, but rounding was necessary
	at com.google.common.math.MathPreconditions.checkRoundingUnnecessary(MathPreconditions.java:81)
	at com.google.common.math.IntMath.log2(IntMath.java:91)
	at edu.agh.iga.adi.giraph.direction.PartitioningStrategy.partitioningStrategy(PartitioningStrategy.java:40)
	at edu.agh.iga.adi.giraph.direction.io.IgaTreeSplitter.allSplitsFor(IgaTreeSplitter.java:25)
	at edu.agh.iga.adi.giraph.direction.io.InMemoryStepInputFormat.getSplits(InMemoryStepInputFormat.java:77)
	at org.apache.giraph.io.internal.WrappedVertexInputFormat.getSplits(WrappedVertexInputFormat.java:72)
	at org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:329)
	at org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:624)
	at org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:668)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:113)

Ensure OutputFormat thread safety

If VERTEX_OUTPUT_FORMAT_THREAD_SAFE is set to true and there are multiple threads set in NUM_COMPUTE_THREADS (so in NUM_OUTPUT_THREADS by default as well too) then we get:

2019-09-22 09:04:53,987 ERROR [org.apache.giraph.utils.LogStacktraceCallable] - Execution of callable failed
java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:89)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /user/kbhit/1569143060/_temporary/1/_temporary/attempt_1569140710858_0005_m_000001_1/step-0/part-m-00001 for DFSClient_NONMAPREDUCE_-1931685888_1 on 10.164.0.19 because DFSClient_NONMAPREDUCE_-1931685888_1 is already the current lease holder.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2412)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:357)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2309)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2230)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:745)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1507)
	at org.apache.hadoop.ipc.Client.call(Client.java:1453)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy10.create(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:297)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:267)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1206)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1148)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:477)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:477)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:418)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
	at org.apache.giraph.io.formats.GiraphTextOutputFormat.getRecordWriter(GiraphTextOutputFormat.java:67)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.createLineRecordWriter(TextVertexOutputFormat.java:116)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.initialize(TextVertexOutputFormat.java:97)
	at edu.agh.iga.adi.giraph.direction.io.StepVertexOutputFormat$IdWithValueVertexWriter.initialize(StepVertexOutputFormat.java:80)
	at org.apache.giraph.io.internal.WrappedVertexOutputFormat$1.initialize(WrappedVertexOutputFormat.java:82)
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:87)
	... 7 more
2019-09-22 09:04:53,997 ERROR [org.apache.giraph.worker.BspServiceWorker] - unregisterHealth: Got failure, unregistering health on /_hadoopBsp/giraph_yarn_application_1569140710858_0005/_applicationAttemptsDir/0/_superstepDir/0/_workerHealthyDir/iga-adi-w-1.europe-west4-a.c.charismatic-cab-252315.internal_1 on superstep 0
2019-09-22 09:04:54,000 ERROR [org.apache.giraph.yarn.GiraphYarnTask] - GiraphYarnTask threw a top-level exception, failing task
java.lang.RuntimeException: run: Caught an unrecoverable exception Exception occurred
	at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:106)
	at org.apache.giraph.yarn.GiraphYarnTask.main(GiraphYarnTask.java:184)
Caused by: java.lang.IllegalStateException: Exception occurred
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:274)
	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:813)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:361)
	at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:93)
	... 1 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:271)
	... 4 more
Caused by: java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:89)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /user/kbhit/1569143060/_temporary/1/_temporary/attempt_1569140710858_0005_m_000001_1/step-0/part-m-00001 for DFSClient_NONMAPREDUCE_-1931685888_1 on 10.164.0.19 because DFSClient_NONMAPREDUCE_-1931685888_1 is already the current lease holder.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2412)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:357)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2309)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2230)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:745)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1507)
	at org.apache.hadoop.ipc.Client.call(Client.java:1453)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy10.create(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:297)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:267)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1206)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1148)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:477)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:477)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:418)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
	at org.apache.giraph.io.formats.GiraphTextOutputFormat.getRecordWriter(GiraphTextOutputFormat.java:67)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.createLineRecordWriter(TextVertexOutputFormat.java:116)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.initialize(TextVertexOutputFormat.java:97)
	at edu.agh.iga.adi.giraph.direction.io.StepVertexOutputFormat$IdWithValueVertexWriter.initialize(StepVertexOutputFormat.java:80)
	at org.apache.giraph.io.internal.WrappedVertexOutputFormat$1.initialize(WrappedVertexOutputFormat.java:82)
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:87)
	... 7 more
End of LogType:task-3-stdout.log.
***************************************

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.