Hello and Regards...
I tried to replicate http://wormhole.readthedocs.org/en/latest/tutorial/criteo_kaggle.html on single server single worker.I am stuck at prediction in Factorization Machine. Any I am sorry for not knowing the proper format to describe the issues...
ISSUES:
- I started the script with 1 worker still it created 3 workers. all of the workers are stuck at nanosleep() never to return (plz see below for strace)..
- output is not created properly(plz see below for wc -l of the output folder and testData) .
- tracker script never returned.
$ wormhole/tracker/dmlc_local.py -n 1 -s 1 wormhole/bin/difacto.dmlc difacto.test.conf.small
INFO start listen on 127.0.1.1:9091
Connected 1 servers and 1 workers
Loading the last model
Predicting
sec ttl #ex inc #ex | |w|_0 logloss_w | |V|_0 logloss AUC
$cat difacto.test.conf.small
val_data = "data/train-part_80"
data_format = "libsvm"
model_in = "model/criteo"
predict_out = "output/criteo"
embedding {
dim = 7
threshold = 7
lambda_l2 = 0.0001
}
$ wc -l output/* data/train-part_80
47870 output/criteotrain-part_80_part-0
47876 output/criteotrain-part_80_part-9
482611 data/train-part_80
$ ps -Af | grep dmlc
madhur 9304 3379 0 13:06 pts/11 00:00:04 python wormhole/tracker/dmlc_local.py -n 1 -s 1 wormhole/bin/difacto.dmlc difacto.test.conf.small
madhur 9306 9304 0 13:06 pts/11 00:00:00 /bin/sh -c wormhole/bin/difacto.dmlc difacto.test.conf.small
madhur 9309 9304 0 13:06 pts/11 00:00:00 bash -c nrep=0 rc=254 while [ $rc -eq 254 ]; do export DMLC_NUM_ATTEMPT=$nrep wormhole/bin/difacto.dmlc difacto.test.conf.small rc=$?; nrep=$((nrep+1)); done
madhur 9311 9304 0 13:06 pts/11 00:00:00 bash -c nrep=0 rc=254 while [ $rc -eq 254 ]; do export DMLC_NUM_ATTEMPT=$nrep wormhole/bin/difacto.dmlc difacto.test.conf.small rc=$?; nrep=$((nrep+1)); done
madhur 9312 9309 0 13:06 pts/11 00:00:02 wormhole/bin/difacto.dmlc difacto.test.conf.small
madhur 9313 9311 0 13:06 pts/11 00:00:01 wormhole/bin/difacto.dmlc difacto.test.conf.small
madhur 9330 9306 0 13:06 pts/11 00:00:00 wormhole/bin/difacto.dmlc difacto.test.conf.small
$ sudo strace -p 9312
Process 9312 attached
restart_syscall(<... resuming interrupted call ...>) = 0
nanosleep({0, 50000000}, NULL) = 0
nanosleep({0, 50000000}, NULL) = 0
nanosleep({0, 50000000}, NULL) = 0
nanosleep({0, 50000000}, NULL) = 0
nanosleep({0, 50000000}, ^CProcess 9312 detached
<detached ...>
$sudo strace -p 9313 same as above
$sudo strace -p 9330
restart_syscall(<... resuming interrupted call ...>) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffef260e300) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, ^CProcess 9330 detached
<detached ...>