uma-pi1 / mgfsm Goto Github PK
View Code? Open in Web Editor NEWLarge scale frequent sequence mining
License: Apache License 2.0
Large scale frequent sequence mining
License: Apache License 2.0
Hi,
we are trying to execute mgfsm in distributed mode but the translatedFS folder into the output folder contains two empty files: SUCCESS and part* .
No problems found when executing in sequential mode with the same input file.
What's wrong?
I was browsing through the code but could not really pinpoint the place where I could potentially collect the supporting sequence IDs for each frequent pattern (and output them).
Is this possible to do in the algorithm (without too much effort)? If yes, where could I potentially start?
Thanks for the very nice algorithm you have designed.
I was trying to run the code in sequential mode on a Windows computer.
I just tried to run the algorithm on the very simple example you have provided for testing first.
It seems that the algorithm runs but the output file is not created (the output folder is created but it remains empty, there is no file in it)
I use this command in cmd to run the algorithm
java -jar target/mgfsm-0.0.1-SNAPSHOT-jar-with-dependencies.jar -i C:/MGFSM/DATA/Example.txt/ -o SAMPLE_OUTPUT2 -s 2 -g 2 -l 2 -m s
I was wondering if you could please let me know how I can fix this issue, I really need to use your algorithm on my data - seems very interesting.
Thanks,
Vahid
Below is what I receive when executing the above command in cmd:
20/03/19 15:35:09 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at de.mpii.fsm.driver.FsmDriver.main(FsmDriver.java:558)
20/03/19 15:35:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --execMode=[s], --gamma=[2], --indexing=[none], --input=[C:/MGFSM/DATA/], --lambda=[2], --numReducers=[90], --output=[SAMPLE_OUTPUT2/], --partitionSize=[10000], --startPhase=[0], --support=[2], --tempDir=[temp], --type=[a]}
20/03/19 15:35:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleting existing output path
The intermediate output will be written
to this temporary path :C:\Users\vahid\AppData\Local\Temp\MG_FSM_INTRM_OP_6557093669574892418
The temporary output associated with the internal map -reduce
jobs will be written to this temporary path :C:\Users\vahid\AppData\Local\Temp\MG_FSM_TEMP_OP_4937220842338026946
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:656)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:490)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:462)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775)
at de.mpii.fsm.driver.SequentialMode.encodeAndMine(SequentialMode.java:331)
at de.mpii.fsm.driver.SequentialMode.runSeqJob(SequentialMode.java:279)
at de.mpii.fsm.driver.FsmDriver.run(FsmDriver.java:512)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at de.mpii.fsm.driver.FsmDriver.main(FsmDriver.java:558)
Will this tool work with non-English text inputs? Do I need to modify the tokenization or make any other adjustments?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.