actionml / universal-recommender Goto Github PK

Highly configurable recommender based on PredictionIO and Mahout's Correlated Cross-Occurrence algorithm

Home Page: http://actionml.com/universal-recommender

License: Apache License 2.0

Scala 99.40% Shell 0.60%

universal-recommender's Introduction

The Universal Recommender

The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user preference indicators—it is called the Correlated Cross-Occurrence algorithm. Unlike matrix factorization embodied in things like MLlib's ALS, CCO is able to ingest any number of user actions, events, profile data, and contextual information. It then serves results in a fast and scalable way. It also supports item properties for building flexible business rules for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender.

Most recommenders can only use conversion events, like buy or rate. Using all we know about a user and their context allows us to much better predict their preferences.

Requirements—The Universal Recommender Has Moved!

The UR 0.8.0+ requires the Harness Machine Learning Server. 0.7.3 and before run in PredictionIO 0.12.1. This repo is now build into Harness as a pre-packaged Engine. See Upgrading from PIO to Harness

Documentation

All docs for the Universal Recommender are here and are hosted at https://github.com/actionml/docs.actionml.com. If you wish to change or edit the docs make a PR to that repo.

Contributions

Contributions are encouraged and appreciated. Create a push request (PR) against the develop branch of the git repo. We like to keep new features general so users will not be required to change the code of the UR to make use of the new feature. We will be happy to provide guidance or help via the GitHub PR review mechanism.

Version Log

UR v0.8.0+ The UR Has Moved!

The Universal Recommender has moved. Future versions will be included as a built-in Engine for the new Harness Machine Learning Server. the UR v0.8.0+ is data compatible with previous versions that are integrated with PredictionIO. This means you can export from UR+PIO and import into UR+Harness. See Upgrading from PIO to Harness

Adds:

Realtime changes to item properties with $set.
Ease of having more than one predictive model (Harness feature)
TTL based trim of old events. After a user defined period, events may be dropped from the collection without running a separate process (Harness feature)
Containerized deployment
many other features are inherited from

Git Tag: v0.7.3

Adds:

Switched to using python3 wherever python is invoked. Before this branch it was assumed that the environment mapped python to python3 which is required for PIO 0.12+ and the UR 0.7+. Since many distros have python invoke python 2.7 and python3 is needed to invoke python 3.6 we now do also.
Support for cross recommendations like "people sho have viewed similar to you have bought these items". Used to help find things in a browsing/searching scenario.

Git Tag: 0.7.2

Adds:

Pagination support in query using "from": 0, "num": 2 will return 2 recs from the first available, "from": 2, "num": 2 will return 2 starting at the 3rd since "from" is 0 based.

Git Tag: 0.7.1

This tag take precedence over 0.7.0, which should not be used. Changes:

Removes the need to build Mahout from the ActionML's fork and so is much easier to install.
Fixes a bug in the integration test which made it fail for macOS High Sierra in East Asian time zones.

Git Tag: 0.7.0

This README Has Special Build Instructions!

This tag is for the UR integrated with PredictionIO 0.12.0 using Scala 2.11, Spark 2.1.x, and most importantly Elasticsearch 5.x. Primary differences from 0.6.0:

Faster indexing, and queries due to the use of Elasticsearch 5.x
Faster model building due to speedups in the ActionML fork of Mahout, which requires the user to build Mahout locally. This step will be removed in a later version of the UR.
Several upgrades such as Scala 2.10 --> Scala 2.11, Python 2.7 --> Python 3
Spark 2.1.x support, PIO has a minor incompatibility with Spark 2.2.x
Prediction 0.12.0 support
Requires Elasticsearch 5.x. using the ES REST APIs exclusively now, enabling ES authentication use optionally. ES 5.x also improves indexing and query performance over previous versions.
Fixed a bug in exclusion rules based on item properties

WARNING: Upgrading Elasticsearch or HBase will wipe existing data if any, so follow the special instructions below before installing any service upgrades.

Upgrade from UR v0.6.0 Instructions

You must build PredictionIO with the default parameters so just run ./make-distribution this will require you to install Scala 2.11 and Python 3 (as the default Scala and Python). You can also run up to Spark 2.1.x (but not 2.2.x), ES 5.5.2 or greater (but 6.x has not been tested), Hadoop 2.6 or greater, you can get away with using older versions of services except ES must be 5.x. If you have issues getting pio to build and run send questions to the PIO mailing list.

Backup your data, moving from ES 1 to ES 5 will delete all data!!!! Actually even worse it is still in HBase but you can’t get at it so to upgrade do the following:

pio export with pio < 0.12.0 =====Before upgrade!=====
pio data-delete all your old apps =====Before upgrade!=====
build and install pio 0.12.0 including all the services =====The point of no return!=====
pio app new … and pio import … any needed datasets

Once PIO is running test with pio status and pio app list. To test your setup and UR integration, run ./examples/integration-test from the URs home.

Config for PIO 0.12.0 and the UR 0.7.0

a sample of pio-env.sh that works with one type of setup is below, but you'll have to change paths to match yours. This example show the new way to configure for Elasticsearch 5.x, which uses a new port number:

#!/usr/bin/env bash

# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# using Spark 2.2.1 here
SPARK_HOME=/usr/local/spark

# ES_CONF_DIR: You must configure this if you have advanced configuration for
# using ES 5.6.3
ES_CONF_DIR=/usr/local/elasticsearch/config

# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# using hadoop 2.8 here
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# using HBase 1.2.x here or whatever the highest numbered stable release is
HBASE_CONF_DIR=/usr/local/hbase/conf

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata
PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_eventdata
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

# ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 # <===== notice 9200 now
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch_xyz # <===== should match what you have in you ES config file
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/elasticsearch

PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_HOSTS=$PIO_FS_BASEDIR/models

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase

Build Mahout After PredictionIO!

Mahout has speedups for the Universal Recommender's use that have not been released yet so you will have to build from source. To make this easy we have a fork hosted here, with special build instructions. Make sure you are on the "sparse-speedup" branch and follow instructions in the README.md

Build the Universal Recommender

download the UR from here be sure move to the 0.7.0 tag.
replace the line: resolvers += "Local Repository" at "file:///Users/pat/.custom-scala-m2/repo” with your path to the local mahout build. the UR will not build unless this line is changed, this is expected
build the UR with pio build or run the integration test to get sample data put into PIO ./examples/integration-test

v0.6.0

This is a major upgrade release with several new features. Backward compatibility with 0.5.0 is maintained. Note: We no longer have a default engine.json file so you will need to copy engine.json.template to engine.json and edit it to fit your data. See the Universal Recommender Configuration docs.

Performance: Nearly a 40% speedup for most model calculation, and a new tuning parameter that can yield further speed improvements by filtering out unused or less useful data from model building. See minEventsPerUser in the UR configuration docs.
Complimentary Purchase aka Item-set Recommendations: "Shopping-cart" type recommendations. Can be used for wishlists, favorites, watchlists, any list based recommendations. Used with list or user data.
Exclusion Rules: now we have business rules for inclusion, exclusion, and boosts based on item properties.
PredictionIO 0.11.0: Full compatibility, but no support for Elasticsearch 5, an option with PIO-0.11.0.
New Advanced Tuning: Allows several new per indicator / event type tuning parameters for tuning model quality in a more targeted way.
Norms Support: For large dense datasets norms are now the default for model indexing and queries. This should result in slight precision gains, so better results.
Mahout 0.13.0 Support: the UR no longer requires a local build of Mahout.
GPU Support: via Mahout 0.13.0 the core math of the UR now supports the use of GPUs for acceleration.
Timeout Protection: Queries for users with very large histories could cause a timeout. We now correctly limit the amount of user history that is used as per documentation, which will all but eliminate timeouts.
Bug Fixes: The use of blackListEvents as defined in engine.json was not working for an empty list, which should and now does disable any blacklisting except explicit item blacklists contained in the query.

v0.5.0

Apache PIO Compatible: The first UR version compatible with Apache PredictionIO-0.10.0-incubating. All past versions do not work and should be upgraded to this. The ActionML build of PIO is permanently deprecated since it is merged with Apache PIO.

v0.4.2 Replaces 0.4.1

Fixes bug when a pio build failure triggered by the release of Apache PIO. If you have problems building v0.4.0 use this version. It is meant to be used with PredictionIO-0.9.7-aml.
Requires a custom build of Apache Mahout: instructions on the doc site This is temporary until the next Mahout release, when we will update to 0.4.3 (uses predicitonio-0.9.7-aml) and 0.5.0 (which uses predictionio-0.10.0 from Apache)

v0.4.0

This version requires PredictionIO-0.9.7-aml found here.
New tuning params are now available for each "indicator" type, making indicators with a small number of possible values much more useful—things like gender or category-preference. See docs for configuring the UR and look for the indicators parameter.
New forms of recommendations backfill allow all items to be recommended even if they have no user events yet. Backfill types include random and user defined. See docs for configuring the UR and look for the rankings parameter.

v0.3.0

This version requires PredictionIO-0.9.7-aml from the ActionML repo here.
Implements a moving time window if events: Now supports the SelfCleanedDataSource trait. Adding params to the DataSource part of engine.json allows control of de-duplication, property event compaction, and a time window of event. The time window is used to age out the oldest events. Note: this only works with the ActionML fork of PredictionIO found in the repo mentioned above.
Parameter changed: backfillField: duration to accept Scala Duration strings. This will require changes to all engine.json files that were using the older # of seconds duration.
Event-types used in queries: added support for indicator predictiveness testing with the MAP@k tool. This is so only certain mixes of user events are used at query time.
Bug fix: which requires that the typeName in engine.json is required be "items", with this release the type can be any string.

v0.2.3

removed isEmpty calls that were taking an extremely long time to execute, results in considerable speedup. Now the vast majority of pio train time is taken up by writing to Elasticsearch. This can be optimized by creating and ES cluster or giving ES lots of memory.

v0.2.2

a query with no item or user will get recommendations based on popularity
a new integration test has been added
a regression bug where some ids were being tokenized by Elasticsearch, leading to incorrect results, was fixed. NOTE: for users with complex ids containing dashes or spaces this is an important fix.
a dateRange in the query now takes precedence to the item attached expiration and available dates.

v0.2.1

date ranges attached to items will be compared to the prediction servers current data if no date is provided in the query.

v0.2.0

date range filters implemented
hot/trending/popular used for backfill and when no other recommendations are returned by the query
filters/bias < 0 caused scores to be altered in v0.1.1 fixed in this version so filters have no effect on scoring.
the model is now hot-swapped in Elasticsearch so no downtime should be seen, in fact there is no need to run pio deploy to make the new model active.
it is now possible to have an engine.json (call it something else) dedicated to recalculating the popularity model. This allows fast updates to popularity without recalculating the collaborative filtering model.
Elasticsearch can now be in cluster mode

v0.1.1

ids are now exact matches, for v0.1.0 the ids had to be lower case and were subject to tokenizing analysis so using that version is not recommended.

v0.1.0

user and item based queries supported
multiple usage events supported
filters and boosts supported on item properties and on user or item based results.
fast writing to Elasticsearch using Spark
convention over configuration for queries, defaults make simple/typical queries simple and overrides add greater expressiveness.

Known issues

see the github issues list

License

This Software is licensed under the Apache Software Foundation version 2 license found here: http://www.apache.org/licenses/LICENSE-2.0

universal-recommender's People

Contributors

Stargazers

Watchers

Forkers

ihuerga steamshon nowherenearithaca tomasz-pankowski onlyrohits apsaltis vkrit tseears yangxt nero520 marcelboettcher diogenesjf ahmed26 worldsayshi uohzoaix changguanghua guoxian simplechen avontd2868 lkjx77 niravthaker luyee favey pseudocorps mindcrusher11 nguyenthaihan shenbai xingzhixi etongle ilias500 bellsaladin mbit-cloud chenmeng86 so-far-so-good sxfmol alacambra bin2000 pippobaudos boostrack firatkarakusoglu vijayendra-g qjyzwlz carlthewebmaster rameshvinbox rahuldhote jibaro mindis yonglehou pologood jadderbao hareshpatel1990 anukat2015 veterun fanzw123 deendayal-garg rucky2013 pferrel techscientist drdebian sandy4321 ziyue08 puchka nachouriguen jk2227 cfifty cbora teichmaj mack0007 ixpress vswb haginot gottliebj creaworlds govtribe jai2033shankar bkrukowski byukan jhariani pavgra adrianvaduva sibnick raj-jainhc zhuomingliang mukgupta juliusgoth grofers svolkeri seanv507 slock83 paulbrejla owen864720655 domen1806 motenaioh alexlokotochek tlapi vadzimbelski-sciencesoft istvancsabakis lizhongyun thinkwrap nunum

universal-recommender's Issues

When will support ES 5.x

Hi~, I intend to use UR in my new project, as far as I know the latest version of UR still does not support ES 5.x, from the forum(Google groups) i know that the work (support ES 5.x) is ongoing, I would like to know the following questions:

0x00:
Does it mean that the current UR is only compatible with ES 1.x + PIO?

0x01:
Is the ES 1.7 satisfying fast queries based on a model of small and medium-sized data of two billion events?

0x02:
Will the next version plan to support ES 5.x ? (because I was in the selection stage, I would like to know how long this work will be completed.)

Thanks !

When training, logged number of events is garbage

When running command pio train, something like this is logged to console:

[INFO] [DataSource] Received events List(asset-watch, category-watch) 
[INFO] [DataSource] Number of events List(11, 14)

Here the second line doesn't have anything to do with actual number of events. Instead it prints how long the event names are: e.g asset-watch is 11 chars long and category-watch is 14 chars long.

The code in question is here:
https://github.com/actionml/universal-recommender/blob/master/src/main/scala/DataSource.scala#L88

logger.info(s"Received events ${eventRDDs.map(_._1)}")
logger.info(s"Number of events ${eventRDDs.map(_._1.length)}")

The second line should be something else. Maybe ${eventRDDs.map(_._2.length)}

UR is using elastic even when all the data sources are MYSQL

This is my pio-env.sh file

#!/usr/bin/env bash

#BASIC
SPARK_HOME=/usr/lib/spark

MYSQL_JDBC_DRIVER=/usr/share/java/mysql-connector-java.jar

PIO_FS_BASEDIR=$_LINIO_HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL

#Storage Data Sources
PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://10.60../pio_models
PIO_STORAGE_SOURCES_MYSQL_USERNAME=***
PIO_STORAGE_SOURCES_MYSQL_PASSWORD=***

VERSIONS

PIO_VERSION=0.12.1
PIO_SPARK_VERSION=2.3.1
PIO_ELASTICSEARCH_VERSION=5.6
HBASE_VERSION=1.4.6
PIO_HADOOP_VERSION=2.8.4
ZOOKEEPER=3.4.12
PYTHON_VERSION=3.6.3

ERROR WHILE RUNNING ./examples/integration-test

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@175ac243
[INFO] [Engine$] Preparator: com.actionml.Preparator@1073c664
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@152e7703)
[INFO] [Engine$] Data sanity check is on.
[INFO] [DataSource] Received events List(purchase, view, category-pref)
[INFO] [Engine$] com.actionml.TrainingData does not support data sanity check. Skipping check.
[INFO] [Preparator] EventName: purchase
[INFO] [Preparator] Downsampled users for minEventsPerUser: Some(3), eventName: purchase number of passing user-ids: 3
[INFO] [Preparator] Dimensions rows : 4 columns: 7
[INFO] [Preparator] Downsampled columns for users who pass minEventPerUser: Some(3), eventName: purchase number of user-ids: 3
[INFO] [Preparator] Dimensions rows : 3 columns: 6
[INFO] [Preparator] EventName: view
[INFO] [Preparator] Dimensions rows : 3 columns: 4
[INFO] [Preparator] Number of user-ids after creation: 3
[INFO] [Preparator] EventName: category-pref
[INFO] [Preparator] Dimensions rows : 3 columns: 2
[INFO] [Preparator] Number of user-ids after creation: 3
[INFO] [Engine$] com.actionml.PreparedData does not support data sanity check. Skipping check.
[INFO] [URAlgorithm] Actions read now creating correlators
[INFO] [PopModel] PopModel popular using end: 2018-09-18T18:54:23.466Z, and duration: 315360000, interval: 2008-09-20T18:54:23.466Z/2018-09-18T18:54:23.466Z
[INFO] [PopModel] PopModel getting eventsRDD for startTime: 2008-09-20T18:54:23.466Z and endTime 2018-09-18T18:54:23.466Z
[INFO] [URAlgorithm] Correlators created now putting into URModel
[INFO] [URAlgorithm] Index mappings for the Elasticsearch URModel: Map(expires -> (date,false), date -> (date,false), category-pref -> (keyword,true), available -> (date,false), purchase -> (keyword,true), popRank -> (float,false), view -> (keyword,true))
[INFO] [URModel] Converting cooccurrence matrices into correlators
[INFO] [URModel] Group all properties RDD
[INFO] [URModel] ES fields[11]: List(categories, countries, date, id, expires, category-pref, available, purchase, popRank, defaultRank, view)
[INFO] [EsClient$] Create new index: urindex_1537296874822, items, List(categories, countries, date, id, expires, category-pref, available, purchase, popRank, defaultRank, view), Map(expires -> (date,false), date -> (date,false), category-pref -> (keyword,true), available -> (date,false), purchase -> (keyword,true), popRank -> (float,false), view -> (keyword,true))
[INFO] [AbstractConnector] Stopped Spark@20177486{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Exception in thread "main" java.lang.IllegalStateException: No Elasticsearch client configuration detected, check your pio-env.sh forproper configuration settings
at com.actionml.EsClient$$anonfun$client$2.apply(EsClient.scala:86)
at com.actionml.EsClient$$anonfun$client$2.apply(EsClient.scala:86)
at scala.Option.getOrElse(Option.scala:121)
at com.actionml.EsClient$.client$lzycompute(EsClient.scala:85)
at com.actionml.EsClient$.client(EsClient.scala:85)
at com.actionml.EsClient$.createIndex(EsClient.scala:174)
at com.actionml.EsClient$.hotSwap(EsClient.scala:271)
at com.actionml.URModel.save(URModel.scala:82)
at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:367)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:295)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:180)
at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)
at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:690)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

when will support es 6.x?

Cannot detect GPU

It seems that UR cannot detect the GPU.

[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version

There is my GPU information:

nvidia-smi
Sat Jun  1 15:37:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P8    24W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Event blacklisted despite conf saying not to

Hi,

We use pio 0.10.0, the UR v 0.5.0 for our project.

We have a problem. Despite our configuration the engine generates a query to Elasticsearch that blacklists primary event items.
Here an example of the query we send to Pio:
curl -H "Content-Type: application/json" -d '{ "user": "4e810ef4-977a-4f04-b585-cf2c2996ec93", "num": 11 }' http://localhost:8001/queries.json

In return Pio generates the query for ES that you'll find in the attached file 'es_query.txt".

Our engine.json is also attached. The parameter blacklistEvent is set to [].

Pat Ferrel advised us to open an issue here, as I guess he suspects there might be a problem.

Thanks in advance for your help.
ES_Query.txt
engine.json.txt

Invalid Integration Test Pop Model

Looks like the integration test for "trend 2 day" is missing the corresponding engine.json.

Checking for needed files
File not found: trend-2-day-engine.json

Is this now trend-engine-4-days-ago.json? or is this an invalid test?

No engine found

Hello. I'm trying to deploy a model using ur but I got into some problems. First, I got several dependency errors when I was trying to execute examples/integration-tests. After I solved those, I couldn't deploy a model because the system can't find any engine.

I was wondering if everyone can clone the project and run examples/integration-tests straight-away with no problems.

Here is the console log:
Building and delpoying model [INFO] [Engine$] Using command '/usr/local/PredictionIO-0.12.0-incubating/sbt/sbt' at /home/***/Documents/***/PredictionIO/ur to build. [INFO] [Engine$] If the path above is incorrect, this process will fail. [INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.12.0-incubating.jar is absent. [INFO] [Engine$] Going to run: /usr/local/PredictionIO-0.12.0-incubating/sbt/sbt package assemblyPackageDependency in /home/***/Documents/***/PredictionIO/ur [INFO] [Engine$] Compilation finished successfully. [INFO] [Engine$] Looking for an engine... [ERROR] [Engine$] No engine found. Your build might have failed. Aborting.

Thank you :)

Can not train data

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@6ddd1c51
[INFO] [Engine$] Preparator: com.actionml.Preparator@1fb2eec
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@3d104c9b)
[INFO] [Engine$] Data sanity check is on.
[Stage 0:> (0 + 0) / 4]

it stuck in stage 0 long time

Training Error - "Extracting datasource params. No 'name' is found. "

Hi All,

I have installed prediction io on MacBook and installed the universal recommender template. I was able to build the app successfully. But, when I do a=n integration test or training on other data, I am getting the below error. I have also copied the engine.json extract. It will be very helpful if anyone share their inputs on this.

engine.json (nothing has been changed. its the default data)

{
"comment":" This config file uses default settings for all but the required values see README.md for docs",
"id": "default",
"description": "Default settings",
"engineFactory": "com.actionml.RecommendationEngine",
"datasource": {
"params" : {
"name": "sample-handmade-data.txt",
"appName": "handmade",
"eventNames": ["purchase", "view", "category-pref"],
"minEventsPerUser": 3
}
},
"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"es.index.auto.create": "true"
},

Exception when training -

[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

Exception in thread "main" java.lang.NoSuchMethodError: org.json4s.ParserUtil$.quote(Ljava/lang/String;)Ljava/lang/String;

at org.json4s.native.JsonMethods$$anonfun$2.apply(JsonMethods.scala:42)

at org.json4s.native.JsonMethods$$anonfun$2.apply(JsonMethods.scala:42)

at scala.collection.immutable.List.map(List.scala:284)

at org.json4s.native.JsonMethods$class.render(JsonMethods.scala:42)

at org.json4s.native.JsonMethods$.render(JsonMethods.scala:62)

at org.apache.predictionio.workflow.WorkflowUtils$$anonfun$getParamsFromJsonByFieldAndClass$2$$anonfun$2.apply(WorkflowUtils.scala:177)

at org.apache.predictionio.workflow.WorkflowUtils$$anonfun$getParamsFromJsonByFieldAndClass$2$$anonfun$2.apply(WorkflowUtils.scala:168)

at scala.Option.map(Option.scala:146)

at org.apache.predictionio.workflow.WorkflowUtils$$anonfun$getParamsFromJsonByFieldAndClass$2.apply(WorkflowUtils.scala:168)

at org.apache.predictionio.workflow.WorkflowUtils$$anonfun$getParamsFromJsonByFieldAndClass$2.apply(WorkflowUtils.scala:159)

at scala.Option.map(Option.scala:146)

at org.apache.predictionio.workflow.WorkflowUtils$.getParamsFromJsonByFieldAndClass(WorkflowUtils.scala:159)

at org.apache.predictionio.controller.Engine.jValueToEngineParams(Engine.scala:363)

at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:222)

at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Regards,
Raman S V

Use properties for recommendation other than categorial?

I'd like to add a property to my item like so:

{
    "event" : "$set",
    "entityType" : "item",
    "entityId" : "some-item-id",
    "properties" : {
        "city": "New York",
        "size_in_sqf": 35
    },
    "eventTime" : "2015-10-05T21:02:49.228Z"
}

then lets say the user has viewed several items and reported a view event.

Now I want the recommender to recommend me items that are similar to the one's the user viewed by taking into account the properties city & size_in_sqf.

That means it should show me items that are in New York and the size_in_sqf is around 35.
How do I do this? I can't find any tutorial other than the official one on using UR and I really want to accomplish something similar to this.

thanks!

java.lang.ClassNotFoundException: org.apache.lucene.util.PriorityQueue

Hi,

First of all thank you for writting this solr-recommender.

I'm trying to run the script but got class not found error as below. Could you please let me know how I can fix it?

Thanks a lot,
Kevin

................
15/01/22 14:30:35 INFO mapred.MapTask: Processing split: file:/Users/Zhang_Kevin/Documents/mine/big/projects/solr-recommender/tmp/tmp1/pairwiseSimilarity/part-r-00000:0+210
15/01/22 14:30:35 INFO mapred.MapTask: io.sort.mb = 100
15/01/22 14:30:35 INFO mapred.MapTask: data buffer = 79691776/99614720
15/01/22 14:30:35 INFO mapred.MapTask: record buffer = 262144/327680
15/01/22 14:30:35 INFO mapred.MapTask: Starting flush of map output
15/01/22 14:30:35 INFO mapred.LocalJobRunner: Map task executor complete.
15/01/22 14:30:35 WARN mapred.LocalJobRunner: job_local334070693_0007
java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$UnsymmetrifyMapper.map(RowSimilarityJob.java:520)
at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$UnsymmetrifyMapper.map(RowSimilarityJob.java:504)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.PriorityQueue
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 22 more
15/01/22 14:30:36 INFO mapred.JobClient: map 0% reduce 0%
15/01/22 14:30:36 INFO mapred.JobClient: Job complete: job_local334070693_0007
15/01/22 14:30:36 INFO mapred.JobClient: Counters: 0
15/01/22 14:30:36 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-Zhang_Kevin/mapred/staging/Zhang_Kevin1346036484/.staging/job_local1346036484_0008
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:249)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.run(RecommenderUpdateJob.java:129)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.main(RecommenderUpdateJob.java:275)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.PriorityQueue
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 26 more

Boosting categories only shows one category type

I have an app that uses Universal Recommender. The app is an app for finding a house for rent.
I want to recommend users houses based on houses they viewed or scheduled a tour on already.

I added all the users using the $set event.
I added all (96,676) the houses in the app like so:

predictionio_client.create_event(
                event="$set",
                entity_type="item",
                entity_id=listing.meta.id,
                properties={
                      "property_type": ["villa"] # There are many types of property_types such as "apartment"
                      .... # there are more properties that I defined but I dont use such as "city", "price", "bedrooms", and more
                }
            )

And I add the events of the house view & schedule like so:

predictionio_client.create_event(
            event="view",
            entity_type="user",
            entity_id=request.user.username,
            target_entity_type="item",
            target_entity_id=listing.meta.id
        )

Now I want to get predictions for my users based on the property_types they like.
So I send a prediction query boosting the property_types they like using Business Rules like so:

{
    'fields': [
        {
             'bias': 1.05, 
             'values': ['single_family_home', 'private_house', 'villa', 'cottage'], 
             'name': 'property_type'
        }
     ], 
     'num': 15, 
     'user': 'amit70'
}

Which I would then expect that I would get recommendations of different types such as private_house or villa or cottage. But for some weird reason while having over 95,000 houses of different property types I only get recommendations of ONE single type (in this case villa) but if I remove it from the list it just recommends 10 houses of ONE other different type. While I want it to give me of all the different types.
This is the response of the query:

{
    "itemScores": [
        {
            "item": "56.39233,-4.11707|villa|0",
            "score": 9.42542
        },
        {
            "item": "52.3288,1.68312|villa|0",
            "score": 9.42542
        },
        {
            "item": "55.898878,-4.617019|villa|0",
            "score": 8.531346
        },
        {
            "item": "55.90713,-3.27626|villa|0",
            "score": 8.531346
        },
.....

I cant understand why this is happening. The elasticsearch query this translates to is this:

GET /recommender/_search
{
  "from": 0,
  "size": 15,
  "query": {
    "bool": {
      "should": [
        {
          "terms": {
            "schedule": [
              "32.1439352176,34.833260278|private_house|0",
              "31.7848439,35.2047335|apartment_for_sale|0"
            ]
          }
        },
        {
          "terms": {
            "view": [
              "32.0734919,34.7722675|garden_apartment|0",
              "32.1375986782,34.8415740159|apartment|0",
              "32.0774,34.8861|apartment_for_sale|0",
              "31.7720155609,35.1917438892|apartment|0",
               ..... (over 20 more)
            ]
          }
        },
        {
          "terms": {
            "property_type": [
              "single_family_home",
              "private_house",
              "villa",
              "cottage"
            ],
            "boost": 1.1
          }
        },
        {
          "constant_score": {
            "filter": {
              "match_all": {}
            },
            "boost": 0
          }
        }
      ],
      "must": [],
      "must_not": [
        {
          "ids": {
            "values": [
              "31.7848439,35.2047335|apartment_for_sale|0",
              "32.1439352176,34.833260278|private_house|0"
            ],
            "boost": 0
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "popRank": {
        "unmapped_type": "double",
        "order": "desc"
      }
    }
  ]
}

Does anyone know why this is happening?
I only have 2 users using the app, the one I am querying with and another one.

This is my engine.json:

{
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "houses-data.txt",
      "appName": "Houses",
      "eventNames": ["schedule", "view"],
      "eventWindow": {
         "duration": "3650 days",
	 "removeDuplicates": false,
	 "compressProperties": false
      }
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "name": "ur",
      "params": {
        "appName": "Houses",
        "indexName": "recommender",
        "typeName": "items",
	"recsModel": "all",
	"backfillField": {
	  "name": "popRank",
	  "backfillType": "hot",
	  "eventNames": ["schedule", "view"]
	},
	"rankings": [
	  {
            "name": "popRank",
	    "type": "popular",
	    "eventNames": ["schedule", "view"]
	  },
	  {
	    "name": "uniqueRank",
	    "type": "random"
	  },
	  {
            "name": "preferredRank",
	    "type": "userDefined"
	  }
	],
	"indicators": [
	  {
	     "name": "schedule"
	  },
	  {
	     "name": "view",
	     "maxCorrelatorsPerItem": 50,
	     "minLLR": 5
	  }
	]
      }
    }
  ]
}

Pio build error

Hi!

Could you help me in this issue please? If you need more info please let me know.

I fetched the latest release (0.6.0) and when I try to run a pio build I get the following error:

Using existing engine manifest JSON at /var/pio/engines/ur/manifest.json
[INFO] [Console$] Using command '/PredictionIO-0.10.0-incubating/sbt/sbt' at the current working directory to build.
[INFO] [Console$] If the path above is incorrect, this process will fail.
[INFO] [Console$] Uber JAR disabled, but current working directory does not look like an engine project directory. Please delete lib/pio-assembly-0.10.0-incubating.jar manually.
[INFO] [Console$] Going to run: /PredictionIO-0.10.0-incubating/sbt/sbt package assemblyPackageDependency
[ERROR] [Console$] [error] impossible to get artifacts when data has not been loaded. IvyNode = org.xerial.snappy#snappy-java;1.0.5
[ERROR] [Console$] [error] (*:update) java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.xerial.snappy#snappy-java;1.0.5
[ERROR] [Console$] [error] Total time: 6 s, completed May 24, 2017 10:11:19 PM
[ERROR] [Console$] Return code of previous step is 1. Aborting.

If I checkout the 0.5.0 version everything is fine. I'm using PredictionIO 10.

Set bias = 1 in query will affect the score?

According to the documentation about bias.

Bias = 1: no effect

but when i play with the handmade demo. i found that actually setting bias = 1 will affect the score of the items. For example...

Query 1:

curl -H "Content-Type: application/json" -d '
{
    "user": "u-3",
    "fields": [{
        "name": "categories",
        "values": ["Tablets"],
        "bias": 1
    }, {
        "name": "countries",
        "values": ["Estados Unidos Mexicanos"],
        "bias": 5
    }]
}' http://localhost:8000/queries.json

Result 1:

{
	"itemScores": [{
		"item": "Iphone 4",
		"score": 0.869194746017456
	}, {
		"item": "Nexus",
		"score": 0.2377699315547943
	}, {
		"item": "Iphone 5",
		"score": 0.0
	}, {
		"item": "Galaxy",
		"score": 0.0
	}]
}

Query 2 (Remove the bias = 1 in query 1):

curl -H "Content-Type: application/json" -d '
{
    "user": "u-3",
    "fields": [{
        "name": "countries",
        "values": ["Estados Unidos Mexicanos"],
        "bias": 5
    }]
}' http://pio-prediction-server:8000/queries.json

Result 2:

{
	"itemScores": [{
		"item": "Iphone 4",
		"score": 1.0098044872283936
	}, {
		"item": "Nexus",
		"score": 0.07427313178777695
	}, {
		"item": "Iphone 5",
		"score": 0.0
	}, {
		"item": "Galaxy",
		"score": 0.0
	}]
}

So setting bias = 1 should have influence on the prediction result?

Compatability error with hadoop 2.x

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected.

Feature Request: Return Expanded Array

The hope is to create the capability to easily configure the recommendation engine to serve up not only the TargetEntityID and score, as exists currently, but also for users to possibly pass an array of values that would map to existing property list items

example:
extraParams = ['title', 'description','image']

Results after:
{
"itemScores":[
{"item":"22","score":4.072304374729956, "title":"title1", "description":"helpful meta description1", "image":"imageurl1"},
{"item":"62","score":4.058482414005789, "title":"title2", "description":"helpful meta description2", "image":"imageurl2"},
{"item":"75","score":4.046063009943821, "title":"title3", "description":"helpful meta description3", "image":"imageurl3"},
{"item":"68","score":3.8153661512945325, "title":"title4", "description":"helpful meta description4", "image":"imageurl4"}
]
}

Error during training when using remote ElasticSearch

I was trying to set up a PIO server with a remote ES and remote HBASE/Zookeeper via Docker.

versions used:

SCALA_VERSION 2.11.8
PIO_VERSION 0.12.1 ("from source" downloaded from apache mirror)
SPARK_VERSION 2.1.2
ELASTICSEARCH_VERSION 5.5.2
HBASE_VERSION 1.3.1

Here is my config:

pio-env.sh:

#!/usr/bin/env bash

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=${HOME}/.pio_store
PIO_FS_ENGINESDIR=${PIO_FS_BASEDIR}/engines
PIO_FS_TMPDIR=${PIO_FS_BASEDIR}/tmp

SPARK_HOME=${PIO_HOME}/vendors/spark-${SPARK_VERSION}-bin-hadoop2.7

HBASE_CONF_DIR=${PIO_HOME}/vendors/hbase-${HBASE_VERSION}/conf

# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata
PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

# ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=predictionio
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=es
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=${PIO_HOME}/vendors/elasticsearch-${ELASTICSEARCH_VERSION}

PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=${PIO_FS_BASEDIR}/models

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=${PIO_HOME}/vendors/hbase-${HBASE_VERSION}
# http://actionml.com/docs/small_ha_cluster
HBASE_MANAGES_ZK=true # when you want HBase to manage zookeeper

PIO itself seems to be running fine, here is the output of pio status:

pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at /PredictionIO-0.12.1
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /PredictionIO-0.12.1/vendors/spark-2.1.2-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.2 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Management$] Your system is all ready to go.

It seems as if the universal recommender does not pick up the PIO storage settings; but keeps his own settings. Running the integration tests it is using the template examples/handmade-engine.json where I added two lines within the sparkConf object (es.nodes and es.nodes.wan.only):

  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes.wan.only":"true",
    "es.nodes":"es"
  }

It seems to be talking to the right ES server, but I always get the following exception during the training (pio train -- --driver-memory 4g --executor-memory 4g) phase:

2018-09-04 07:49:35,562 ERROR org.apache.predictionio.data.storage.elasticsearch.ESEngineInstances [main] - Failed to update pio_meta/engine_instances/AWWjjner32JscvS-r-c9
org.apache.predictionio.shaded.org.elasticsearch.client.ResponseException: POST http://es:9200/pio_meta/engine_instances/AWWjjner32JscvS-r-c9?refresh=true: HTTP/1.1 400 Bad Request
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [sparkConf.es.nodes] of different type, current_type [text], merged_type [ObjectMapper]"}],"type":"illegal_argument_exception","reason":"mapper [sparkConf.es.nodes] of different type, current_type [text], merged_type [ObjectMapper]"},"status":400}
	at org.apache.predictionio.shaded.org.elasticsearch.client.RestClient$1.completed(RestClient.java:354)
	at org.apache.predictionio.shaded.org.elasticsearch.client.RestClient$1.completed(RestClient.java:343)
	at org.apache.predictionio.shaded.org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
	at org.apache.predictionio.shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
	at org.apache.predictionio.shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
	at java.lang.Thread.run(Thread.java:748)

This does not happen when I start a local ES on the same machine where PIO is located (using the original engine.json)

pio train exits without error

Following quickstart guide I stumbled across some trivial problem.
Integration test doesn't work for me and it doesn't return any error whatsoever. I found that the problem is in model training

When trying to train recommender using command:

pio train -- --driver-memory 4g --executor-memory 4g

it returns

[INFO] [Console$] Using existing engine manifest JSON at /root/ur/manifest.json
[INFO] [Runner$] Submission command: /PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --driver-memory 4g --executor-memory 4g --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/root/ur/target/scala-2.10/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar,file:/root/ur/target/scala-2.10/template-scala-parallel-universal-recommendation_2.10-0.5.0.jar --files file:/PredictionIO-0.10.0-incubating/conf/log4j.properties,file:/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-site.xml --driver-class-path /PredictionIO-0.10.0-incubating/conf:/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf file:/PredictionIO-0.10.0-incubating/lib/pio-assembly-0.10.0-incubating.jar --engine-id GwjbffjU4mg4sZHjkUMyS1fJBHQx4CVQ --engine-version 08f9b7206d6c4c46c26e6aa45fd73ed15e7de602 --engine-variant file:/root/ur/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.10.0,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0,PIO_HOME=/PredictionIO-0.10.0-incubating,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/root/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=predictionio,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.4.4,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/PredictionIO-0.10.0-incubating/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
[INFO] [RecommendationEngine$]

               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|



[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(handmade,List(purchase, view),None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://[email protected]:39501]
[WARN] [MetricsSystem] Using default name DAGScheduler for source because spark.app.id is not set.
[INFO] [DataSource]
??????????????????????????????????????????????????????????????
? Init DataSource                                            ?
? ?????????????????????????????????????????????????????????? ?
? App name                      handmade                     ?
? Event window                  None                         ?
? Event names                   List(purchase, view)         ?
??????????????????????????????????????????????????????????????

[INFO] [URAlgorithm]
??????????????????????????????????????????????????????????????
? Init URAlgorithm                                           ?
? ?????????????????????????????????????????????????????????? ?
? App name                      handmade                     ?
? ES index name                 urindex                      ?
? ES type name                  items                        ?
? RecsModel                     all                          ?
? Event names                   List(purchase, view)         ?
? ?????????????????????????????????????????????????????????? ?
? Random seed                   1517559198                   ?
? MaxCorrelatorsPerEventType    50                           ?
? MaxEventsPerEventType         500                          ?
? ?????????????????????????????????????????????????????????? ?
? User bias                     1.0                          ?
? Item bias                     1.0                          ?
? Max query events              100                          ?
? Limit                         4                            ?
? ?????????????????????????????????????????????????????????? ?
? Rankings:                                                  ?
? popular                       Some(popRank)                ?
??????????????????????????????????????????????????????????????

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.template.DataSource@447521e
[INFO] [Engine$] Preparator: org.template.Preparator@458031da
[INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@463045fb)
[INFO] [Engine$] Data sanity check is on.
[WARN] [TableInputFormatBase] Cannot resolve the host name for 5299d63cee83/172.17.0.2 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '2.0.17.172.in-addr.arpa'
[INFO] [DataSource] Received events List(purchase, view)
[INFO] [DataSource] Number of events List(8, 4)
[WARN] [TableInputFormatBase] Cannot resolve the host name for 5299d63cee83/172.17.0.2 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '2.0.17.172.in-addr.arpa'
[INFO] [Engine$] org.template.TrainingData does not support data sanity check. Skipping check.

pio deploy

returns:

[ERROR] [Console$] No valid engine instance found for engine GwjbffjU4mg4sZHjkUMyS1fJBHQx4CVQ 08f9b7206d6c4c46c26e6aa45fd73ed15e7de602.
Try running 'train' before 'deploy'. Aborting.

TTL resets to 365 on every restart

It reset EventTime index in MongoDB to 365 days no matter the TTL that you set.
I have a TTL of 4 years

{
"engineId": "purchase",
"dataset": {
"ttl": "1460 days",
},
"engineFactory": "com.actionml.engines.ur.UREngine",
"sparkConf": {
"master": "local",
"spark.driver-memory": "30g",
"spark.executor-memory": "30g"
},
"algorithm": {
"indicators": [
{
"name": "purchase"
},
{
"name": "view"
}
]
}
}

But it resets to 365 days on every restart.
The index exists, I even set by hand in MongoDB to 4 years, but when I restart, it changes the index to 365 days.

I can see in the logs the following:

09:36:41.458 INFO  MongoAsyncDao     - Drop index eventTime
09:36:41.556 INFO  MongoAsyncDao     - Create indexes List(Document((eventTime,BsonInt32{value=-1})) - IndexOptions{background=true, unique=false, name='eventTime', sparse=false, expireA$

Why it drop the index eventTime every time?

I see it checks:
case (iName, SingleIndex(o, isTtl), _) if isTtl && actualTtl(iName, actualIndexesInfo).forall(_.compareTo(ttl) != 0) =>

So if I set a different index ttl by hand using MongoDB GUI, it can believe TTL is different, and then drop it to recreate.

The thing is that I set TTL to 4 years in seconds on MongoDB, and then TTL in days on the config JSON file.
Can there be some small difference in seconds that oblige it to drop&recreate?

Error on training

Hi, all.
I have installed Harness
git clone https://github.com/actionml/harness-docker-compose.git && cd harness-docker-compose && cp .env.sample .env && docker-compose up -d --build
And when I try to train I give follow errors

ERROR SparkContextSupport$ - Spark context failed for job JobDescription(321e0ee5-f824-4b72-8589-fcc4e1e2270f,queued,Spark job,Some(Sun Dec 15 08:43:38 GMT 2019),None)
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.(SparkContext.scala:367)
at com.actionml.core.spark.SparkContextSupport$$anonfun$1.apply(SparkContextSupport.scala:133)
at com.actionml.core.spark.SparkContextSupport$$anonfun$1.apply(SparkContextSupport.scala:120)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

java.lang.NegativeArraySizeException error for secondary indicators

Hi,

The UR throws a java.lang.NegativeArraySizeException when there isn't at least one occurrence of the primary indicator. But it also throws the same error when no users in the secondary that are also in the primary (no cross-occurrences), as explained by Pat Ferrel in actionml's googlegroup. It would be great if the error messages were different for both cases and/or more informative.

Thank you very much for all the great work!!!
noelia

Hello, I got this error when I try to train the model

Log:

Engine config:
{
"engineId": "rs",
"engineFactory": "com.actionml.engines.ur.UREngine",
"sparkConf": {
"master": "local",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"spark.executor.memory": "2g",
"spark.driver.memory": "1g",
"spark.es.index.auto.create": "true",
"spark.es.nodes": "http://localhost:9200",
"spark.es.nodes.wan.only": "true"
},
"algorithm": {
"indicators": [
{
"name": "buy"
}
]
}
}

Elasticsearch instance:

Use "should_not" clause in ES for absolute deboosting

current deboosting assumes a match of attributes, which in the absence of any other field actually raises the score. Using a should_not clause would deboost even when there are no other fields.

Will real-time events based on new items affect recommendation queries if the model without re-training?

Hi , according to my understanding , UR use the realtime user history behavior for recommendation queries, my question is:

If users create some events based on new items (e.g. (u1, view, new-breaking-news) ), and the model without re-training so new items do not exist in the model, how will these events affect the recommendation query ?

Thanks.

Is URAlgorithm class thread safe ?

I always thought that algorithm object is shared between threads in processing predict queries. Does this predict method's code valid ?

queryEventNames = query.eventNames.getOrElse(modelEventNames) // eventNames in query take precedence
    val (queryStr, blacklist) = buildQuery(ap, query, rankingFieldNames)
    val searchHitsOpt = EsClient.search(queryStr, esIndex, queryEventNames)

Is queryEventNames published properly? Isn't it a data race ?

recommendations is empty. Training job status is failed. But in harness log, everything is ok

engine conf:

{
    "engineId": "rb",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "master": "local",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.executor.memory": "3g",
        "spark.driver.memory": "3g",
        "spark.es.index.auto.create": "true",
        "spark.es.nodes": "elasticsearch",
        "spark.es.nodes.wan.only": "true"
    },
    "algorithm":{
        "indicators": [
            {
                "name": "buy"
            },
            {
                "name": "add-to-cart"
            },
            {
                "name": "view"
            },
            {
                "name": "like"
            }
        ]
    }
}

items:

events:

jobs:

log: harness.log

Note
I am using docker-compose

Training data occasionally has errors

Normal time will Successfully stopped SparkContext
Error Time no show this message

Therefore, next time during data training, Spark calculations will not be able to be performed normally...

I have tried to delete jobs, but it seems that they cannot be deleted normally.

I have to restart ActionML Harness to resume normal training data..

Is there any way to ensure that each training data is successful and will not affect the next time

Sometimes an exception will occur after 10-20 training are performed, and an exception may occur after performing 3-5 times.

pio train

Hello,

When I run pio train command, i got the following exception:

ERROR] [NetworkClient] Node [85.255.11.174:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
[ERROR] [Executor] Exception in task 0.0 in stage 103.0 (TID 56)
[WARN] [TaskSetManager] Lost task 0.0 in stage 103.0 (TID 56, localhost): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[85.255.11.174:9200]]

Elasticsearch running on my local machine, this is my elsticsearch.yml

network.bind_host: localhost
node.name: "kebodev"
cluster.name: kebo_cluster
discovery.zen.ping.multicast.enabled: false

why does it try to connect to the 85.255.11.174 which is my public ip?

Can anyone help me? Thank you!

would you help me solve the problem: "had a not serializable result: org.apache.mahout.math.DenseVector"

Hi, dear developer
I extract the code of cco training from your pio-ur，but when i ran my jar in clusters, I meet the problem -- "had a not serializable result: org.apache.mahout.math.DenseVector".
I dont understand why the mahout design it`s Vector no able to serialize and I also dont understand why you use RandomAccessSparseVector in distributed program.
how can you solve this problem? Please help me, thank you very much.

business rule with contains instead of exact match

Is it possible to write business rules that act like "this property has to contain this value".
As an example let's say i have a set of items:

[
{"entityType" : "item",
   "entityId" : "1",
   "properties" : {
      "name": ["Yellow Car"]
   },
{"entityType" : "item",
   "entityId" : "2",
   "properties" : {
      "name": ["Car red"]
   },
{"entityType" : "item",
   "entityId" : "3",
   "properties" : {
      "name": ["motorbike"]
   }
]

I would like to write a business rule to get only items with name containing "Car" and so only item with id 1-2

Test failure on running "integration-test-rank"

System Information

PredictionIO 0.11.0
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL
elasticsearch-1.7.5
universal-recommender master branch (commit 956d2a5)
ubuntu16.04 running on docker

Describe the Problem

Performing a fresh install of PredictionIO and universal-recommender were fresh-installed, without previously importing any event or training. Running ./examples/rank/integration-test-rank shows the test result is not expected:

root@761bc1b7600d:~/ur# pio-start-all
Starting Elasticsearch...
 * Starting PostgreSQL 9.5 database server                                                                                                             [ OK ]
Waiting 10 seconds for Storage Repositories to fully initialize...
Starting PredictionIO Event Server...
root@761bc1b7600d:~/ur#
root@761bc1b7600d:~/ur# ./examples/rank/integration-test-rank
==================================================================
Integration test [Rank] for The Universal Recommender.
If some step fails check that your engine.json file has been restored or look for it in 'user-engine.json'
==================================================================
==================================================================
Checking for needed files
==================================================================
==================================================================
Checking status, should exit if pio is not running.
==================================================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /PredictionIO-0.11.0-incubating/vendors/spark-1.6.3-bin-hadoop2.6
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: PGSQL)...
[INFO] [Storage$] Verifying Event Data Backend (Source: PGSQL)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [Management$] Your system is all ready to go.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [Pio$] Created a new app:
[INFO] [Pio$]       Name: default-rank
[INFO] [Pio$]         ID: 1
[INFO] [Pio$] Access Key: 123456789
==================================================================
Checking to see if default-rank app exists, should exit if not.
==================================================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Pio$]     App Name: default-rank
[INFO] [Pio$]       App ID: 1
[INFO] [Pio$]  Description:
[INFO] [Pio$]   Access Key: 123456789 | (all)
==================================================================
Moving engine.json to user-engine.json
==================================================================
cp: cannot stat 'engine.json': No such file or directory
==================================================================
THE FIRST SERIES OF TESTS
==================================================================
==================================================================
Moving examples/rank/rank-engine.json to engine.json for integration test.
==================================================================
==================================================================
Deleting default-rank app data since the test is date dependent
==================================================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Pio$] Data of the following app (default channel only) will be deleted. Are you sure?
[INFO] [Pio$]     App Name: default-rank
[INFO] [Pio$]       App ID: 1
[INFO] [Pio$]  Description: None
[INFO] [App$] Removed Event Store for the app ID: 1
[INFO] [App$] Initialized Event Store for the app ID: 1
==================================================================
Importing data for integration test
==================================================================
Namespace(access_key='123456789', file='./data/sample-rank-data.txt', url='http://localhost:7070')
Importing data...
Event: show entity_id: product-1 target_entity_id: product-1 current_date: 2017-09-15T03:31:49.255628+00:00
Event: show entity_id: product-2 target_entity_id: product-2 current_date: 2017-09-14T08:19:49.255628+00:00
Event: show entity_id: product-3 target_entity_id: product-3 current_date: 2017-09-13T13:07:49.255628+00:00
Event: show entity_id: product-4 target_entity_id: product-4 current_date: 2017-09-12T17:55:49.255628+00:00
Event: like entity_id: product-4 target_entity_id: product-4 current_date: 2017-09-11T22:43:49.255628+00:00
Event: like entity_id: product-3 target_entity_id: product-3 current_date: 2017-09-11T03:31:49.255628+00:00
Event: like entity_id: product-2 target_entity_id: product-2 current_date: 2017-09-10T08:19:49.255628+00:00
Event: like entity_id: product-1 target_entity_id: product-1 current_date: 2017-09-09T13:07:49.255628+00:00
Event: $set entity_id: product-1 properties/color: ['red', 'black'] current_date: 2017-09-08T17:55:49.255628+00:00
Event: $set entity_id: product-2 properties/color: ['green', 'black'] current_date: 2017-09-07T22:43:49.255628+00:00
Event: $set entity_id: product-3 properties/color: ['red', 'black'] current_date: 2017-09-07T03:31:49.255628+00:00
Event: $set entity_id: product-4 properties/color: ['green', 'black'] current_date: 2017-09-06T08:19:49.255628+00:00
Event: show entity_id: product-4 target_entity_id: product-4 current_date: 2017-09-05T13:07:49.255628+00:00
Event: show entity_id: product-3 target_entity_id: product-3 current_date: 2017-09-04T17:55:49.255628+00:00
Event: like entity_id: product-3 target_entity_id: product-3 current_date: 2017-09-03T22:43:49.255628+00:00
Event: like entity_id: product-4 target_entity_id: product-4 current_date: 2017-09-03T03:31:49.255628+00:00
Event: $set entity_id: product-1 properties/defaultRank: 1.0 current_date: 2017-09-02T08:19:49.255628+00:00
Event: $set entity_id: product-2 properties/defaultRank: 2.7 current_date: 2017-09-01T13:07:49.255628+00:00
Event: $set entity_id: product-3 properties/defaultRank: 3.2 current_date: 2017-08-31T17:55:49.255628+00:00
Event: $set entity_id: product-4 properties/defaultRank: 4.7 current_date: 2017-08-30T22:43:49.255628+00:00
Event: $set entity_id: product-5 properties/defaultRank: 5.0 current_date: 2017-08-30T03:31:49.255628+00:00
Event: $set entity_id: product-6 properties/defaultRank: 6.9 current_date: 2017-08-29T08:19:49.255628+00:00
Event: $set entity_id: product-7 properties/defaultRank: 7.15 current_date: 2017-08-28T13:07:49.255628+00:00
Event: $set entity_id: product-8 properties/defaultRank: 8.07 current_date: 2017-08-27T17:55:49.255628+00:00
Event: like entity_id: product-3 target_entity_id: product-3 current_date: 2017-08-26T22:43:49.255628+00:00
Event: like entity_id: product-6 target_entity_id: product-6 current_date: 2017-08-26T03:31:49.255628+00:00
Event: show entity_id: product-3 target_entity_id: product-3 current_date: 2017-08-25T08:19:49.255628+00:00
Event: show entity_id: product-4 target_entity_id: product-4 current_date: 2017-08-24T13:07:49.255628+00:00
Event: show entity_id: product-5 target_entity_id: product-5 current_date: 2017-08-23T17:55:49.255628+00:00
Event: $set entity_id: product-1 properties/size: ['S', 'M'] current_date: 2017-08-22T22:43:49.255628+00:00
Event: $set entity_id: product-2 properties/size: ['SX', 'XL'] current_date: 2017-08-22T03:31:49.255628+00:00
Event: $set entity_id: product-3 properties/size: ['XL', 'X'] current_date: 2017-08-21T08:19:49.255628+00:00
Event: $set entity_id: product-4 properties/size: ['X', 'XL', 'S'] current_date: 2017-08-20T13:07:49.255628+00:00
Event: $set entity_id: product-5 properties/size: ['M', 'S', 'XS'] current_date: 2017-08-19T17:55:49.255628+00:00
Event: $set entity_id: product-9 properties/size: ['M', 'S', 'XS'] current_date: 2017-08-18T22:43:49.255628+00:00
All items: set(['product-9', 'product-8', 'product-7', 'product-6', 'product-5', 'product-4', 'product-3', 'product-2', 'product-1'])
Event: $set entity_id: product-9 properties/availableDate: 2017-09-10T17:55:49.255628+00:00 properties/date: 2017-09-12T17:55:49.255628+00:00 properties/expireDate: 2017-09-14T17:55:49.255628+00:00
Event: $set entity_id: product-8 properties/availableDate: 2017-09-11T13:07:49.255628+00:00 properties/date: 2017-09-13T13:07:49.255628+00:00 properties/expireDate: 2017-09-15T13:07:49.255628+00:00
Event: $set entity_id: product-7 properties/availableDate: 2017-09-12T08:19:49.255628+00:00 properties/date: 2017-09-14T08:19:49.255628+00:00 properties/expireDate: 2017-09-16T08:19:49.255628+00:00
Event: $set entity_id: product-6 properties/availableDate: 2017-09-13T03:31:49.255628+00:00 properties/date: 2017-09-15T03:31:49.255628+00:00 properties/expireDate: 2017-09-17T03:31:49.255628+00:00
Event: $set entity_id: product-5 properties/availableDate: 2017-09-13T22:43:49.255628+00:00 properties/date: 2017-09-15T22:43:49.255628+00:00 properties/expireDate: 2017-09-17T22:43:49.255628+00:00
Event: $set entity_id: product-4 properties/availableDate: 2017-09-14T17:55:49.255628+00:00 properties/date: 2017-09-16T17:55:49.255628+00:00 properties/expireDate: 2017-09-18T17:55:49.255628+00:00
Event: $set entity_id: product-3 properties/availableDate: 2017-09-15T13:07:49.255628+00:00 properties/date: 2017-09-17T13:07:49.255628+00:00 properties/expireDate: 2017-09-19T13:07:49.255628+00:00
Event: $set entity_id: product-2 properties/availableDate: 2017-09-16T08:19:49.255628+00:00 properties/date: 2017-09-18T08:19:49.255628+00:00 properties/expireDate: 2017-09-20T08:19:49.255628+00:00
Event: $set entity_id: product-1 properties/availableDate: 2017-09-17T03:31:49.255628+00:00 properties/date: 2017-09-19T03:31:49.255628+00:00 properties/expireDate: 2017-09-21T03:31:49.255628+00:00
44 events are imported.
==================================================================
Building and delpoying model
==================================================================
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Engine$] Using command '/PredictionIO-0.11.0-incubating/sbt/sbt' at /root/ur to build.
[INFO] [Engine$] If the path above is incorrect, this process will fail.
[INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.11.0-incubating.jar is absent.
[INFO] [Engine$] Going to run: /PredictionIO-0.11.0-incubating/sbt/sbt  package assemblyPackageDependency in /root/ur
[INFO] [Engine$] Compilation finished successfully.
[INFO] [Engine$] Looking for an engine...
[INFO] [Engine$] Found universal-recommender-assembly-0.6.0-deps.jar
[INFO] [Engine$] Found universal-recommender_2.10-0.6.0.jar
[INFO] [Engine$] Build finished successfully.
[INFO] [Pio$] Your engine is ready for training.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[WARN] [WorkflowUtils$] Environment variable MYSQL_JDBC_DRIVER is pointing to a nonexistent file /PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.41.jar. Ignoring.
[INFO] [Runner$] Submission command: /PredictionIO-0.11.0-incubating/vendors/spark-1.6.3-bin-hadoop2.6/bin/spark-submit --executor-memory 4g --driver-memory 4g --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/PredictionIO-0.11.0-incubating/lib/postgresql-42.0.0.jar,file:/root/ur/target/scala-2.10/universal-recommender-assembly-0.6.0-deps.jar,file:/root/ur/target/scala-2.10/universal-recommender_2.10-0.6.0.jar,file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hbase-assembly-0.11.0-incubating.jar,file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-jdbc-assembly-0.11.0-incubating.jar,file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-localfs-assembly-0.11.0-incubating.jar,file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar,file:/PredictionIO-0.11.0-incubating/lib/spark/pio-data-elasticsearch1-assembly-0.11.0-incubating.jar --files file:/PredictionIO-0.11.0-incubating/conf/log4j.properties --driver-class-path /PredictionIO-0.11.0-incubating/conf:/PredictionIO-0.11.0-incubating/lib/postgresql-42.0.0.jar:/PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.41.jar --driver-java-options -Dpio.log.dir=/root file:/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar --engine-id com.actionml.RecommendationEngine --engine-version 08f9b7206d6c4c46c26e6aa45fd73ed15e7de602 --engine-variant file:/root/ur/engine.json --verbosity 0 --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.11.0,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_HOME=/PredictionIO-0.11.0-incubating,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=predictionio,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/PredictionIO-0.11.0-incubating/vendors/elasticsearch-1.7.5,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_BUILD=/pio_build,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/PredictionIO-0.11.0-incubating/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
[INFO] [RecommendationEngine$]

               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|



[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(default-rank,List(show, like),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://[email protected]:38061]
[INFO] [DataSource]
??????????????????????????????????????????????????????????????
? Init DataSource                                            ?
? ?????????????????????????????????????????????????????????? ?
? App name                      default-rank                 ?
? Event window                  None                         ?
? Event names                   List(show, like)             ?
? Min events per user           None                         ?
??????????????????????????????????????????????????????????????

[INFO] [URAlgorithm]
??????????????????????????????????????????????????????????????
? Init URAlgorithm                                           ?
? ?????????????????????????????????????????????????????????? ?
? App name                      default-rank                 ?
? ES index name                 urindex                      ?
? ES type name                  items                        ?
? RecsModel                     all                          ?
? Event names                   List(show, like)             ?
? ?????????????????????????????????????????????????????????? ?
? Random seed                   -2086961264                  ?
? MaxCorrelatorsPerEventType    50                           ?
? MaxEventsPerEventType         500                          ?
? BlacklistEvents               List(show)                   ?
? ?????????????????????????????????????????????????????????? ?
? User bias                     1.0                          ?
? Item bias                     1.0                          ?
? Max query events              100                          ?
? Limit                         20                           ?
? ?????????????????????????????????????????????????????????? ?
? Rankings:                                                  ?
? popular                       Some(popularRank)            ?
? userDefined                   Some(defaultRank)            ?
? random                        Some(uniqueRank)             ?
??????????????????????????????????????????????????????????????

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@cf01c2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@341c6ac2
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@2becfd4c)
[INFO] [Engine$] Data sanity check is on.
[INFO] [DataSource] Received events List(show, like)
[INFO] [Engine$] com.actionml.TrainingData does not support data sanity check. Skipping check.
[INFO] [Preparator] EventName: show
[INFO] [Preparator] Dimensions rows : 4 columns: 5
[INFO] [Preparator] Number of user-ids after creation: 4
[INFO] [Preparator] EventName: like
[INFO] [Preparator] Dimensions rows : 4 columns: 5
[INFO] [Preparator] Number of user-ids after creation: 4
[INFO] [Engine$] com.actionml.PreparedData does not support data sanity check. Skipping check.
[INFO] [URAlgorithm] Actions read now creating correlators
[INFO] [PopModel] PopModel popular using end: 2017-09-15T03:36:17.916Z, and duration: 315360000, interval: 2007-09-18T03:36:17.916Z/2017-09-15T03:36:17.916Z
[INFO] [PopModel] PopModel getting eventsRDD for startTime: 2007-09-18T03:36:17.916Z and endTime 2017-09-15T03:36:17.916Z
[INFO] [PopModel] PopModel userDefined using end: 2017-09-15T03:36:17.990Z, and duration: 315360000, interval: 2007-09-18T03:36:17.990Z/2017-09-15T03:36:17.990Z
[INFO] [PopModel] PopModel random using end: 2017-09-15T03:36:17.991Z, and duration: 315360000, interval: 2007-09-18T03:36:17.991Z/2017-09-15T03:36:17.991Z
[INFO] [PopModel] PopModel getting eventsRDD for startTime: 2007-09-18T03:36:17.991Z and endTime 2017-09-15T03:36:17.991Z
[INFO] [URAlgorithm] Correlators created now putting into URModel
[INFO] [URAlgorithm] Index mappings for the Elasticsearch URModel: Map(defaultRank -> (float,false), show -> (string,true), popularRank -> (float,false), uniqueRank -> (float,false), like -> (string,true))
[INFO] [URModel] Converting cooccurrence matrices into correlators
[INFO] [URModel] Group all properties RDD
[Stage 74:=============>    (3 + 1) / 4][Stage 82:====>             (1 + 0) / 4][INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[Stage 82:==============>                                           (1 + 0) / 4][INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] [RootSolverFactory$] Unable to create class GPUMMul: attempting OpenMP version
[INFO] [RootSolverFactory$] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] [RootSolverFactory$] org.apache.mahout.viennacl.openmp.OMPMMul$
[INFO] [RootSolverFactory$] Unable to create class OMPMMul: falling back to java version
[INFO] [URModel] ES fields[11]: List(available, size, id, color, expires, uniqueRank, show, defaultRank, popularRank, date, like)
[INFO] [EsClient$] Mappings for the index: {  "properties": {        available    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        size    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        id    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        color    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        expires    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        uniqueRank    : {      "type": "float",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        show    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "true"      }    },        defaultRank    : {      "type": "float",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        popularRank    : {      "type": "float",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        date    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "false"      }    },        like    : {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : "true"      }    },            "id": {      "type": "string",      "index": "not_analyzed",      "norms" : {        "enabled" : false      }    }  }}
[INFO] [Engine$] org.apache.predictionio.data.storage.NullModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=AV6Dm3NWCbx7FOi5WAef
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
==================================================================
WARNING the model will be undeployed after this test,
so any running PredictionServer will be stopped
Waiting 30 seconds for the server to start
==================================================================
nohup: redirecting stderr to stdout
==================================================================
Running test query.
==================================================================
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   320  100   295  100    25    311     26 --:--:-- --:--:-- --:--:--   311
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   304  100   279  100    25   5811    520 --:--:-- --:--:-- --:--:--  5936
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   319  100   294  100    25   6584    559 --:--:-- --:--:-- --:--:--  6681
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   304  100   279  100    25   5946    532 --:--:-- --:--:-- --:--:--  6065
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   338  100   313  100    25   8790    702 --:--:-- --:--:-- --:--:--  8942
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   368  100   340  100    28   4519    372 --:--:-- --:--:-- --:--:--  4594
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   356  100   328  100    28   8444    720 --:--:-- --:--:-- --:--:--  8631
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   372  100   344  100    28   9287    755 --:--:-- --:--:-- --:--:--  9555
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   373  100   345  100    28   9914    804 --:--:-- --:--:-- --:--:-- 10147
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   356  100   328  100    28   2439    208 --:--:-- --:--:-- --:--:--  2429
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   317  100   313  100     4   2168     27 --:--:-- --:--:-- --:--:--  2173
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   335  100   313  100    22   2586    181 --:--:-- --:--:-- --:--:--  2608
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   335  100   313  100    22   9871    693 --:--:-- --:--:-- --:--:-- 10096
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   181  100    82  100    99   1183   1429 --:--:-- --:--:-- --:--:--  1455
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   416  100   313  100   103   5583   1837 --:--:-- --:--:-- --:--:--  5690
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   341  100   163  100   178   2784   3040 --:--:-- --:--:-- --:--:--  3787
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   298  100   131  100   167   1419   1809 --:--:-- --:--:-- --:--:--  1835
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   343  100   293  100    50   3332    568 --:--:-- --:--:-- --:--:--  3367
==================================================================
Restoring engine.json
==================================================================
mv: cannot stat 'user-engine.json': No such file or directory
==================================================================
Killing the deployed PredictionServer
==================================================================
==================================================================
ONE OR MORE TESTS FAILURE:
==================================================================
15c15 < {"itemScores":[{"item":"product-3","score":0.3595937192440033},{"item":"product-2","score":0.10758151859045029},{"item":"product-5","score":0.06365098059177399},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-3","score":0.22474607825279236},{"item":"product-2","score":0.10758151859045029},{"item":"product-5","score":0.04773823544383049},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 19c19 < {"itemScores":[{"item":"product-4","score":0.6799420118331909},{"item":"product-1","score":0.2569144368171692},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-4","score":0.31164008378982544},{"item":"product-1","score":0.16057151556015015},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 23c23 < {"itemScores":[{"item":"product-2","score":0.3595937192440033},{"item":"product-1","score":0.3595937192440033},{"item":"product-5","score":0.017842993140220642},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-2","score":0.3595937192440033},{"item":"product-1","score":0.2921698987483978},{"item":"product-5","score":0.017842993140220642},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 27c27 < {"itemScores":[{"item":"product-1","score":0.2559533715248108},{"item":"product-3","score":0.0944056436419487},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-1","score":0.21756036579608917},{"item":"product-3","score":0.05900352820754051},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 37c37 < {"itemScores":[{"item":"product-3","score":0.40796521306037903},{"item":"product-4","score":0.3626357316970825},{"item":"product-5","score":0.07773856818675995},{"item":"product-2","score":0.0770743265748024},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-3","score":0.2549782693386078},{"item":"product-4","score":0.158653125166893},{"item":"product-2","score":0.0770743265748024},{"item":"product-5","score":0.06478214263916016},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 41c41 < {"itemScores":[{"item":"product-4","score":0.8485281467437744},{"item":"product-3","score":0.20341692864894867},{"item":"product-1","score":0.20341692864894867},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-4","score":0.39774757623672485},{"item":"product-3","score":0.12713557481765747},{"item":"product-1","score":0.12713557481765747},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 45c45 < {"itemScores":[{"item":"product-4","score":0.28767499327659607},{"item":"product-1","score":0.21575623750686646},{"item":"product-2","score":0.06454890966415405},{"item":"product-5","score":0.010705795139074326},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-1","score":0.1753019392490387},{"item":"product-4","score":0.12585781514644623},{"item":"product-2","score":0.06454890966415405},{"item":"product-5","score":0.010705795139074326},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 49c49 < {"itemScores":[{"item":"product-2","score":0.07302875816822052},{"item":"product-3","score":0.07302875071763992},{"item":"product-1","score":0.07302875071763992},{"item":"product-5","score":0.029496734961867332},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-2","score":0.07302875816822052},{"item":"product-3","score":0.04564296826720238},{"item":"product-1","score":0.04564296826720238},{"item":"product-5","score":0.014748367480933666},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 53c53 < {"itemScores":[{"item":"product-4","score":0.4954302906990051},{"item":"product-1","score":0.28767499327659607},{"item":"product-3","score":0.1290978193283081},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-1","score":0.26070547103881836},{"item":"product-4","score":0.21675077080726624},{"item":"product-3","score":0.08068614453077316},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} 85c85 < {"itemScores":[{"item":"product-6","score":0.0},{"item":"product-7","score":0.0}]} --- > {"itemScores":[{"item":"product-5","score":0.04773823544383049},{"item":"product-6","score":0.0},{"item":"product-7","score":0.0}]} 91c91 < {"itemScores":[{"item":"product-3","score":0.7042884230613708},{"item":"product-2","score":0.14845967292785645},{"item":"product-5","score":0.12810268998146057},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]} --- > {"itemScores":[{"item":"product-3","score":0.4401802718639374},{"item":"product-2","score":0.14845967292785645},{"item":"product-5","score":0.1024821475148201},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
==================================================================
root@761bc1b7600d:~/ur# git log -n 1 | cat
commit 956d2a5b19c3ee66701fd0f30fc5484feafcbcc8
Author: Pat Ferrel <[email protected]>
Date:   Thu Aug 10 08:10:03 2017 -0700

    added the devworks blog link

Here is the diff for easier comparison:

root@761bc1b7600d:~/ur# diff -u ./rank-query-test-result.out data/rank-test-query-expected.txt
--- ./rank-query-test-result.out        2017-09-15 03:37:04.097160413 +0000
+++ data/rank-test-query-expected.txt   2017-09-15 03:13:07.203887741 +0000
@@ -12,19 +12,19 @@

 Recommendations for user: user-1

-{"itemScores":[{"item":"product-3","score":0.22474607825279236},{"item":"product-2","score":0.10758151859045029},{"item":"product-5","score":0.04773823544383049},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-3","score":0.3595937192440033},{"item":"product-2","score":0.10758151859045029},{"item":"product-5","score":0.06365098059177399},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for user: user-2

-{"itemScores":[{"item":"product-4","score":0.31164008378982544},{"item":"product-1","score":0.16057151556015015},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-4","score":0.6799420118331909},{"item":"product-1","score":0.2569144368171692},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for user: user-3

-{"itemScores":[{"item":"product-2","score":0.3595937192440033},{"item":"product-1","score":0.2921698987483978},{"item":"product-5","score":0.017842993140220642},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-2","score":0.3595937192440033},{"item":"product-1","score":0.3595937192440033},{"item":"product-5","score":0.017842993140220642},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for user: user-4

-{"itemScores":[{"item":"product-1","score":0.21756036579608917},{"item":"product-3","score":0.05900352820754051},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-1","score":0.2559533715248108},{"item":"product-3","score":0.0944056436419487},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for user: user-5

@@ -34,23 +34,23 @@

 Recommendations for item: product-1

-{"itemScores":[{"item":"product-3","score":0.2549782693386078},{"item":"product-4","score":0.158653125166893},{"item":"product-2","score":0.0770743265748024},{"item":"product-5","score":0.06478214263916016},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-3","score":0.40796521306037903},{"item":"product-4","score":0.3626357316970825},{"item":"product-5","score":0.07773856818675995},{"item":"product-2","score":0.0770743265748024},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for item: product-2

-{"itemScores":[{"item":"product-4","score":0.39774757623672485},{"item":"product-3","score":0.12713557481765747},{"item":"product-1","score":0.12713557481765747},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-4","score":0.8485281467437744},{"item":"product-3","score":0.20341692864894867},{"item":"product-1","score":0.20341692864894867},{"item":"product-6","score":0.0},{"item":"product-5","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for item: product-3

-{"itemScores":[{"item":"product-1","score":0.1753019392490387},{"item":"product-4","score":0.12585781514644623},{"item":"product-2","score":0.06454890966415405},{"item":"product-5","score":0.010705795139074326},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-4","score":0.28767499327659607},{"item":"product-1","score":0.21575623750686646},{"item":"product-2","score":0.06454890966415405},{"item":"product-5","score":0.010705795139074326},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for item: product-4

-{"itemScores":[{"item":"product-2","score":0.07302875816822052},{"item":"product-3","score":0.04564296826720238},{"item":"product-1","score":0.04564296826720238},{"item":"product-5","score":0.014748367480933666},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-2","score":0.07302875816822052},{"item":"product-3","score":0.07302875071763992},{"item":"product-1","score":0.07302875071763992},{"item":"product-5","score":0.029496734961867332},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 Recommendations for item: product-5

-{"itemScores":[{"item":"product-1","score":0.26070547103881836},{"item":"product-4","score":0.21675077080726624},{"item":"product-3","score":0.08068614453077316},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-4","score":0.4954302906990051},{"item":"product-1","score":0.28767499327659607},{"item":"product-3","score":0.1290978193283081},{"item":"product-2","score":0.0},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

 ============ popular item recs only ============

@@ -82,10 +82,10 @@

 Recommendations for user: user-1

-{"itemScores":[{"item":"product-5","score":0.04773823544383049},{"item":"product-6","score":0.0},{"item":"product-7","score":0.0}]}
+{"itemScores":[{"item":"product-6","score":0.0},{"item":"product-7","score":0.0}]}

 ============ query with item and user *EXPERIMENTAL* ============

 Recommendations for user-1 & product-1

-{"itemScores":[{"item":"product-3","score":0.4401802718639374},{"item":"product-2","score":0.14845967292785645},{"item":"product-5","score":0.1024821475148201},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}
+{"itemScores":[{"item":"product-3","score":0.7042884230613708},{"item":"product-2","score":0.14845967292785645},{"item":"product-5","score":0.12810268998146057},{"item":"product-6","score":0.0},{"item":"product-8","score":0.0},{"item":"product-7","score":0.0},{"item":"product-9","score":0.0}]}

However, running examples/integration-test success.

solr-recommender cannot recognize csv input

Case:
hadoop jar target/solr-recommender-0.1-SNAPSHOT-job.jar
finderbots.recommenders.hadoop.RecommenderUpdateJob
--input /test/input
--inputFilePattern ".csv"
--output /test/output/out
--tempDir /test/output/temp
--xRecommend`

Output:
Getting pref data from: /test/input
Writing recommendations to: /test/output/out
13/11/15 10:59:53 INFO root:
======

   Input path = /solr/input                                                                                            
   isDir = true            

======        

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2     
        at finderbots.recommenders.hadoop.ActionSplitterJob.split(ActionSplitterJob.java:111)
        at finderbots.recommenders.hadoop.ActionSplitterJob.run(ActionSplitterJob.java:235)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)     
        at finderbots.recommenders.hadoop.RecommenderUpdateJob.run(RecommenderUpdateJob.java:98)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
        at finderbots.recommenders.hadoop.RecommenderUpdateJob.main(RecommenderUpdateJob.java:264)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Note:
the input file is 'action-logs.csv'

I think it was caused by the DEFAULT_INPUT_DELIMITER.

ArrayIndexOutOfBoundsException: 0

When attempting to train the engine, I get to this error. I have tried many different PIO installations, and all result in this error. I am sure there is some simple mistake I am making, but any guidance would be appreciated. I am using PIO 0.11.0 and UR 0.6.0.

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@d36c1c3
[INFO] [Engine$] Preparator: com.actionml.Preparator@437281c5
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@27da994b)
[INFO] [Engine$] Data sanity check is on.
[WARN] [TableInputFormatBase] Cannot resolve the host name for ACPIOTest/127.0.1.1 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '1.1.0.127.in-addr.arpa'
[INFO] [DataSource] Received events List()
[WARN] [TableInputFormatBase] Cannot resolve the host name for ACPIOTest/127.0.1.1 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '1.1.0.127.in-addr.arpa'
[INFO] [Engine$] com.actionml.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] com.actionml.PreparedData does not support data sanity check. Skipping check.
[INFO] [URAlgorithm] Actions read now creating correlators
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:145)
        at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:311)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:285)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)
        at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
        at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)
        at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.apache.predictionio.controller.Engine$.train(Engine.scala:692)
        at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

README documentation outdated

The Correlated Cross-Occurrence Algorithm link (http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html) is no longer active. I believe the new source is here: (https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/)

How to "Delete" items ?

Hi, i'm using the universal recommander template with predictionio to recommend products from a store. My issue is that said store can delete products in back office, for any reason, when it does the item must of course not be recommanded anymore. I then need a way to delete said item but not in a permanent way. Right now the best way i'v found is to give an item a specific property (let's say "eliminated"), re-train and then filter for on that property with a bias = 0, this way i just need to remove that property if i want the item back .
Is there a better way to do this? Re-train take time , does a way to avoid it exist?
I tried some other methods like an $unset event on the item , but doing so still require a re-train and it also require a full $set event to bring it back if i need to do so , and $delete event, that seems to do nothing at all, but it worked when i tried with a different template (similar product) and without a re-train. last thing i tried was the avaible/expire date but first i don't know the expire date until someone decide to actually make it expire ,second i need to make the change and then re-train if i want the changes to apply, so i'm back to square one.
Any answer will be very appreciated.
universal recommander version is 0.7.3
predictionio version is 0.13.0

Query cache with ES 5.5

Hi, i upgrade UR from 0.6 to 0.7 and i notice ES was very slow during query response.
Index size is 2GB with 3,958,758 docs like old index in ES 1.7.
When i run
curl -XGET 'localhost:9200/_stats/request_cache?human'
i se that 0 byte are used for chache query.. so i search on ES docs and found this..
https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html

Requests where size is greater than 0 will not be cached even if the request cache is enabled in the index settings. To cache these requests you will need to use the query-string parameter detailed here.

So i edit EsClient.scala
and change search function url from
s"/$indexName/_search",
to
s"/$indexName/_search?request_cache=true",

run pio build relaunch deploy and notice a great improve on ES response time

and finally running
curl -XGET 'localhost:9200/_stats/request_cache?human'
i saw that query went stored.

Have fun

LLR illegal argument error UR v 0.6.0 using minEventsPerUser

on a large dataset occasionally we get an LLR error. This is probably due to downsampling messing with frequency numbers for LLR. minEventsPerUser wa added in April so this seem odd to be causing a problem now.

This happened about the time of adding numESWriteConnections but this seems an unlikely cause.

[INFO] [URModel] Converting cooccurrence matrices into correlators
[INFO] [URModel] Group all properties RDD
[WARN] [TaskSetManager] Lost task 10.0 in stage 101.0 (TID 27265, ip-172-16-1-195.ec2.internal): java.lang.IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
	at org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio(LogLikelihood.java:101)
	at org.apache.mahout.math.cf.SimilarityAnalysis$.logLikelihoodRatio(SimilarityAnalysis.scala:308)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:346)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:339)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1.apply$mcVI$sp(SimilarityAnalysis.scala:339)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:332)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:325)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:34)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:33)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:163)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[WARN] [TaskSetManager] Lost task 10.1 in stage 101.0 (TID 31939, ip-172-16-1-195.ec2.internal): java.lang.IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
	at org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio(LogLikelihood.java:101)
	at org.apache.mahout.math.cf.SimilarityAnalysis$.logLikelihoodRatio(SimilarityAnalysis.scala:308)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:346)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:339)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1.apply$mcVI$sp(SimilarityAnalysis.scala:339)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:332)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:325)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:34)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:33)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:163)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[WARN] [TaskSetManager] Lost task 10.2 in stage 101.0 (TID 31940, ip-172-16-3-216.ec2.internal): java.lang.IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
	at org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio(LogLikelihood.java:101)
	at org.apache.mahout.math.cf.SimilarityAnalysis$.logLikelihoodRatio(SimilarityAnalysis.scala:308)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:346)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:339)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1.apply$mcVI$sp(SimilarityAnalysis.scala:339)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:332)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:325)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:34)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:33)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:163)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[WARN] [TaskSetManager] Lost task 10.3 in stage 101.0 (TID 31941, ip-172-16-1-195.ec2.internal): java.lang.IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
	at org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio(LogLikelihood.java:101)
	at org.apache.mahout.math.cf.SimilarityAnalysis$.logLikelihoodRatio(SimilarityAnalysis.scala:308)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:346)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1$$anonfun$apply$mcVI$sp$1.apply(SimilarityAnalysis.scala:339)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7$$anonfun$apply$1.apply$mcVI$sp(SimilarityAnalysis.scala:339)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:332)
	at org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$7.apply(SimilarityAnalysis.scala:325)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:34)
	at org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$exec$2.apply(MapBlock.scala:33)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:163)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[ERROR] [TaskSetManager] Task 10 in stage 101.0 failed 4 times; aborting job
aml@ip-172-16-3-22:~/ur$

The $delete event does not work

Here is my engine configuration:

{
    "engineId": "2",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "master": "local",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.executor.memory": "4g",
        "spark.driver.memory": "3g",
        "spark.es.index.auto.create": "true",
        "spark.es.nodes": "elasticsearch",
        "spark.es.nodes.wan.only": "true"
    },
    "algorithm": {
        "indicators": [{
                "name": "view"
            },
            {
                "name": "buy"
            },
            {
                "name": "add-to-cart"
            },
            {
                "name": "rating"
            }
        ]
    }
}

According to the Universal Recommender document, to delete item, we use $delete event.
https://actionml.com/docs/h_ur_input#delete-items
However, I tried to use the $delete event:

{
    "event": "$delete",
    "entityType": "item",
    "entityId": "3453",
    "eventTime": "2021-01-14T00:39:45.618Z"
}

However, when I query, the item still remains in the result. It seems not working.

CalcRandom

Hi,
is possible in calcRandom function accept EventsNames parameter like other algorithm function?
i think it's a bug

Claudio

0.12.0-incubating support

Does anyone have this engine working with PredictionIO-0.12.0-incubating?

or should I just stick to 0.11.0?

Similarity Types

No similarity types can be specified, the similarity used with action1 is hard coded to LLR. For the cross-similarity matrix it is hard-coded to COOCCURRENCE since we don't use the RowSimilarityJob, which supports only one matrix as input. The XRecommender uses matrix multiply to get the cross-similairty matrix.

At least the option should be passed to the RecommenderJob, at best the RSJ should be made to work with two matrixes and the option passed to both jobs

Sbt version

Since project build fails on sbt 1.1.1, can we set up sbt 0.13.* in build.properties ?

when i run the solr-recommender-example, i get the following error. Please Help !

java.lang.IllegalArgumentException: Number of columns must be greater then 0! But numberOfColumns = 0
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$SimilarityReducer.setup(RowSimilarityJob.java:471)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/04/22 07:08:38 INFO mapred.JobClient: map 100% reduce 0%
15/04/22 07:08:47 INFO mapred.JobClient: map 100% reduce 33%
15/04/22 07:08:50 INFO mapred.JobClient: Task Id : attempt_201504212311_0049_r_000000_1, Status : FAILED
java.lang.IllegalArgumentException: Number of columns must be greater then 0! But numberOfColumns = 0
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$SimilarityReducer.setup(RowSimilarityJob.java:471)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/04/22 07:08:51 INFO mapred.JobClient: map 100% reduce 0%
15/04/22 07:09:00 INFO mapred.JobClient: map 100% reduce 33%
15/04/22 07:09:02 INFO mapred.JobClient: Task Id : attempt_201504212311_0049_r_000000_2, Status : FAILED
java.lang.IllegalArgumentException: Number of columns must be greater then 0! But numberOfColumns = 0
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$SimilarityReducer.setup(RowSimilarityJob.java:471)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

15/04/22 07:09:03 INFO mapred.JobClient: map 100% reduce 0%
15/04/22 07:09:12 INFO mapred.JobClient: map 100% reduce 33%
15/04/22 07:09:16 INFO mapred.JobClient: map 100% reduce 0%
15/04/22 07:09:18 INFO mapred.JobClient: Job complete: job_201504212311_0049
15/04/22 07:09:18 INFO mapred.JobClient: Counters: 23
15/04/22 07:09:18 INFO mapred.JobClient: Job Counters
15/04/22 07:09:18 INFO mapred.JobClient: Launched reduce tasks=4
15/04/22 07:09:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8837
15/04/22 07:09:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/04/22 07:09:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/04/22 07:09:18 INFO mapred.JobClient: Launched map tasks=1
15/04/22 07:09:18 INFO mapred.JobClient: Data-local map tasks=1
15/04/22 07:09:18 INFO mapred.JobClient: Failed reduce tasks=1
15/04/22 07:09:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=50583
15/04/22 07:09:18 INFO mapred.JobClient: FileSystemCounters
15/04/22 07:09:18 INFO mapred.JobClient: HDFS_BYTES_READ=236
15/04/22 07:09:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59406
15/04/22 07:09:18 INFO mapred.JobClient: File Input Format Counters
15/04/22 07:09:18 INFO mapred.JobClient: Bytes Read=97
15/04/22 07:09:18 INFO mapred.JobClient: Map-Reduce Framework
15/04/22 07:09:18 INFO mapred.JobClient: Map output materialized bytes=14
15/04/22 07:09:18 INFO mapred.JobClient: Combine output records=0
15/04/22 07:09:18 INFO mapred.JobClient: Map input records=0
15/04/22 07:09:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=175112192
15/04/22 07:09:18 INFO mapred.JobClient: Spilled Records=0
15/04/22 07:09:18 INFO mapred.JobClient: Map output bytes=0
15/04/22 07:09:18 INFO mapred.JobClient: CPU time spent (ms)=220
15/04/22 07:09:18 INFO mapred.JobClient: Total committed heap usage (bytes)=131469312
15/04/22 07:09:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=746778624
15/04/22 07:09:18 INFO mapred.JobClient: Combine input records=0
15/04/22 07:09:18 INFO mapred.JobClient: Map output records=0
15/04/22 07:09:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=125
15/04/22 07:09:19 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-kulwant/mapred/staging/kulwant/.staging/job_201504212311_0050
15/04/22 07:09:19 ERROR security.UserGroupInformation: PriviledgedActionException as:kulwant cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/kulwant/temp/similarityMatrix
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/kulwant/temp/similarityMatrix
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:249)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.run(RecommenderUpdateJob.java:129)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.main(RecommenderUpdateJob.java:275)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Include filter rule returning empty list

Hi!

Using Harness 0.5.1 and the included UR.

I'm trying to use the include filter rule, but I keep getting an empty result. I used a $set event to set a category on two items that exist in my data as such:

{
   "event" : "$set",
   "entityType" : "item",
   "entityId" : "exampleItem",
   "properties" : {
      "category": ["electronics", "mobile"],
      "expireDate": "2020-10-05T21:02:49.228Z"
   },
   "eventTime" : "2019-12-17T21:02:49.228Z"
}

Then I ran the train command and then this query:

{
   "rules": [
    {
      "name": "category",
      "values": ["electronics"],
      "bias": -1
    }]
}

but got an empty list as result.
The other rules (bias = 0 and > 0) seem to be working as specified.

Location based recommendations

Can I take into account user/item locations in universal recommender? If it is true how to do it? Thanks.

Deboosts not working as intended

When using a bias between 0 and 1, the intended behavior is that fields matching the condition will be deboosted, as per http://actionml.com/docs/ur_advanced_tuning. However, it results in a boost.

PR #27 provides a workaround.

debug-ressources.zip

Files included :

query made to the UR, both with and without deboost
query the UR made to elasticsearch, again with and without deboost
response from elastic, with and without the deboost
elastic's explanation for the first result with the deboost

As you can see, the field that should have been deboosted is instead boosted.
I'm running on the develop branch, last commit as of today : 3e30e55
I had the same issue on master (0.6.0).
Server was installed following your guide for a single machine : http://actionml.com/docs/single_machine

Cannot get the recommender queries?

I want to get the recommended data correctly.

Your thoughts and ideas

I followed the steps in the official documentation https://actionml.com/docs/h_ur.

Install with docker, no problem, document page: https://actionml.com/docs/harness_container_guide
Add the recommendation engine with harness_cli, no problem, the documentation page: https://actionml.com/docs/h_ur_quickstart
After the engine is added, the test function is normal.

Describe your problem

After the data was successfully entered by post and the training was activated, the recommendation engine did not give me feedback.

1

2

3

The training event was triggered and about 10,000 pieces of data were entered in advance. When the request was routed to the queries, the recommended items were not returned to me. The recommendation engine was running normally, and the log was not reported. I don’t know why.

No Source files found at: scripts/temp/tmp1/similarityMatrix

I get this error when i run the code from eclipse

Exception in thread "main" java.io.IOException: No Source files found at: scripts/temp/tmp1/similarityMatrix
at finderbots.recommenders.hadoop.WriteDRMsToSolr.getTaps(WriteDRMsToSolr.java:149)
at finderbots.recommenders.hadoop.WriteDRMsToSolr.joinDRMsWriteToSolr(WriteDRMsToSolr.java:79)
at finderbots.recommenders.hadoop.WriteToSolrJob.run(WriteToSolrJob.java:105)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.run(RecommenderUpdateJob.java:172)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at finderbots.recommenders.hadoop.RecommenderUpdateJob.main(RecommenderUpdateJob.java:275)

Any chance to make ur have no requirement on single executor machine's memory?

I'm researching this model and it is really awesome for small companies like us.
I've trained a model easily with 10 million trading orders. However, when I increase the number to 100 million, model cannot be trained.

Actually we have a cluster with 1TB memory. But this model requires the memory size of a single machine. My cluster have 20 nodes and each gets 64GB memory. It is obviously not enough for 10 million orders. I'm wondering if there is any chance for this model to make no requirement to a single machine. I think 1TB is quite enough. The bottleneck is on single machine's memory.

Driver is OK. I can find a temporary machine with 128GB or 256GB for a day. But I can't make this to single executor machines because they are constant and maybe I have to upgrade machines for all.

Or is there any way to make executors run on high memory machines?

actionml / universal-recommender Goto Github PK

universal-recommender's Introduction

The Universal Recommender

Requirements—The Universal Recommender Has Moved!

Documentation

Contributions

Version Log

UR v0.8.0+ The UR Has Moved!

Git Tag: v0.7.3

Git Tag: 0.7.2

Git Tag: 0.7.1

Git Tag: 0.7.0

Upgrade from UR v0.6.0 Instructions

Config for PIO 0.12.0 and the UR 0.7.0

Build Mahout After PredictionIO!

Build the Universal Recommender

v0.6.0

v0.5.0

v0.4.2 Replaces 0.4.1

v0.4.0

v0.3.0

v0.2.3

v0.2.2

v0.2.1

v0.2.0

v0.1.1

v0.1.0

Known issues

License

universal-recommender's People

Contributors

Stargazers

Watchers

Forkers

universal-recommender's Issues

This is my pio-env.sh file

VERSIONS

ERROR WHILE RUNNING ./examples/integration-test

System Information

Describe the Problem

I want to get the recommended data correctly.

Your thoughts and ideas

Describe your problem

Recommend Projects

Recommend Topics

Recommend Org