pipelineai / pipeline Goto Github PK
View Code? Open in Web Editor NEWPipelineAI
Home Page: https://generativeaionaws.com
License: Apache License 2.0
PipelineAI
Home Page: https://generativeaionaws.com
License: Apache License 2.0
Working through the instruction wiki - using only the code provided - when I try running pipeline-pyspark.sh
or pyspark.sh
I get the following:
[W 15:23:11.420 NotebookApp] server_extensions is deprecated, use nbserver_extensions
/usr/local/lib/python2.7/dist-packages/widgetsnbextension/__init__.py:30: UserWarning: To use the jupyter-js-widgets nbextension, you'll need to update
the Jupyter notebook to version 4.2 or later.
the Jupyter notebook to version 4.2 or later.""")
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using authentication. This is highly insecure and not recommended.
[I 15:23:11.477 NotebookApp] The port 8754 is already in use, trying another port.
[C 15:23:11.477 NotebookApp] ERROR: the notebook server could not be started because no available port could be found.
Running
lsof -Pnl +M -i4
I get that jupyter should be using 8754
jupyter-n 2541 0 3u IPv4 30881 0t0 TCP *:8754 (LISTEN)
Trying to then call pyspark
still gets the port error.
Am I missing something that I should have done?
PagerDuty, here we come!
related to #154
fallback in the following order
statically-generated version of the most-recent live model (s3 or local disk burned at Docker image creation time?)
if static version not available, fallback back to the statically-generated version of a previous live model (s3 or local disk burned at Docker image creation time)
fallback to completely non-personalized as last resort (local disk burned at Docker image creation time)
Build Fatal Error: dist/cassandra/2.2.3/apache-cassandra-2.2.3-bin.tar.gz NOT FOUND error raised when building docker image from source.
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md
while not GPU-specific, this batching does help GPU performance by batching up matrix operations which need to be moved between CPU and GPU memory.
moving batch calculations is much more efficient than individual calculations
helps sync dev/local and prod environments by allowing devs/data scientists to tap directly into the production stream during development and model training
Follow this pattern:
place the code in this path:
In the JDBC ODBC Hive ThriftServer "quit" needs to be changed to "!quit"
Hi Chris,
Looks like the file name has changed. Is it flux-setup.sh?
Thanks.
In Windows:
%USERPROFILE%\notebooks
don't work, see:
docker-archive/toolbox#80
something like:
//c/directory
worked
CMD ["bin/bash", "-c", "'/root/pipeline/bin/setup/RUNME_ONCE.sh'", ...]
store everything as Param inside of the model it can be saved/loaded like other Spark ML models
missing the -P flag to specify download location
wget http://advancedspark.com/keys/pipeline-training-gce.pem -P ~/.ssh
per @BrentDorsey:
Did the Kafka topic name change from ratings to item_ratings?
** feeder application.conf (https://github.com/fluxcapacitor/pipeline/blob/master/myapps/feeder/src/main/resources/application.conf) defines the kafkaTopic = “ratings”
On the setup The Environment page, you say that we should launch the following command.
root@docker$ ~/pipeline/bin/initial/RUNME_ONCE.sh
Instead of the previous command, the correct one is
root@docker$ ~/pipeline/bin/RUNME_ONCE.sh
By the way, thank you for your example.
use Eureka-based Dyno client + Dynomite manager
here is a sample ALS recommendation/matrix-factorization model generated by Spark 1.6.1:
here are the 3 subdir's generated by the Spark code detailed below:
drwxr-xr-x 2 root root 4096 May 15 06:47 itemFactors/
drwxr-xr-x 2 root root 4096 May 15 06:47 metadata/
drwxr-xr-x 2 root root 4096 May 15 06:47 userFactors/
here is the relevant Spark 1.6.1 code that generated this model: https://github.com/apache/spark/blob/branch-1.6/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L242
we'll have to dig around the code a bit, but the key is the DefaultParamsWriter code from that 2nd link.
btw, here's the Spark 2.0.0 version which is similar. https://github.com/apache/spark/blob/branch-2.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
we should make sure 2.0.0 is similar.
problem is that data science teams train and develop in an environment that's very different from production
use similar docker image for development, but with file watcher to enable rapid iteration of model creation, deployment, and testing
this also has the benefit of being able to reproduce and debug issues in prod since the dev environment is the same (except maybe the size of the dataset)
dependency on this: RedisLabs/spark-redis#30
Thanks for this great demo-box. How do I start nifi?
Just connecting to the port doesn't work...
currently, using Matrix Factorization with Gradient Descent Optimization. better algos exist.
https://github.com/Netflix/Hystrix/
http://192.168.59.103:38989/hystrix-examples-webapp/
cd ~/Hystrix/hystrix-examples
./gradlew run &
Something like the following:
http://<ip>:5070/classify?url=https://static01.nyt.com/images/2007/04/02/us/02mormon.600.jpg
When the below command is run, getting "docker: Error response from daemon: client is newer than server (client API version: 1.22, server API version: 1.20)." error.
Appreciate your assistance.
docker run -i --privileged --name pipeline -h docker -m 8g -p 80:80 -p 36042:6042 -p 39160:9160 -p 39042:9042 -p 39200:9200 -p 37077:7077 -p 38080:38080 -p 38081:38081 -p 36060:6060 -p 36061:6061 -p 36062:6062 -p 36063:6063 -p 36064:6064 -p 36065:6065 -p 32181:2181 -p 38090:8090 -p 30000:10000 -p 30070:50070 -p 30090:50090 -p 39092:9092 -p 36066:6066 -p 39000:9000 -p 39999:19999 -p 36081:6081 -p 35601:5601 -p 37979:7979 -p 38989:8989 -p 34040:4040 -p 34041:4041 -p 34042:4042 -p 34043:4043 -p 34044:4044 -p 34045:4045 -p 34046:4046 -p 34047:4047 -p 34048:4048 -p 34049:4049 -p 34050:4050 -p 34051:4051 -p 34052:4052 -p 34053:4053 -p 34054:4054 -p 34055:4055 -p 34056:4056 -p 34057:4057 -p 34058:4058 -p 34059:4059 -p 34060:4060 -p 36379:6379 -p 38888:8888 -p 34321:54321 -p 38099:8099 -p 38754:8754 -p 37379:7379 -p 36969:6969 -p 36970:6970 -p 36971:6971 -p 36972:6972 -p 36973:6973 -p 36974:6974 -p 36975:6975 -p 36976:6976 -p 36977:6977 -p 36978:6978 -p 36979:6979 -p 36980:6980 -p 35050:5050 -p 35060:5060 -p 37060:7060 fluxcapacitor/pipeline bash
9d502da0bc8e: Error pulling image (latest) from docker.io/fluxcapacitor/pipeline, ApplyLayer exit status 1 stdout: stderr: write /root/zeppelin-0.6.0-spark-1.5.1-hadoop-2.6.0-fluxcapacitor/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar: read-only file system ubating-SNAPSHOT.jar: read-only file system
I will try without zeppelin
even though in the docker machine commands and docker commands it is clear that min 8G is required for image should state that up front (would have told me I needed to run from 16G MBP rather than day to day 8G MBA)
Per @BrentDorsey
Tachyon raised three errors when I tried to start it manually:
log4j:ERROR Could not instantiate class [tachyon.Log4jFileAppender].
log4j:WARN No such property [deletionPercentage]
Storage format error
To get the Tachyon Web UI working I propose the following changes:
tachyon.Log4JFileAppender
with org.apache.log4j.RollingFileAppender
in https://github.com/fluxcapacitor/pipeline/blob/master/config/tachyon/log4j.propertiestachyon format
at end of tachyon config section in https://github.com/fluxcapacitor/pipeline/blob/master/bin/config-services-before-starting.shapt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r
\
uname-r for the instance I am using is
3.10.0-229.14.1.el7.x86_64
Reading state information...
E: Unable to locate package linux-tools-3.10.0-229.14.1.el7.x86_64
E: Couldn't find any package by regex 'linux-tools-3.10.0-229.14.1.el7.x86_64'
The command '/bin/sh -c apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:webupd8team/java && apt-get update && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-get install -y oracle-java8-installer && apt-get install -y oracle-java8-set-default && apt-get install -y curl && apt-get install -y wget && apt-get install -y vim && apt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r
&& apt-get install -y nodejs && apt-get install -y npm && mkdir -p ~/.vim/{ftdetect,indent,syntax} && for d in ftdetect indent syntax ; do curl -o ~/.vim/$d/scala.vim \ https://raw.githubusercontent.com/derekwyatt/vim-scala/master/syntax/scala.vim; done && cd ~ && apt-get install -y git && apt-get install -y openssh-server && apt-get install -y default-jdk && apt-get install -y apache2 && apt-get install -y cmake && git clone --depth=1 https://github.com/jrudolph/perf-map-agent && cd perf-map-agent && cmake . && make && cd ~ && git clone --depth=1 https://github.com/brendangregg/FlameGraph && wget https://dl.bintray.com/sbt/native-packages/sbt/${SBT_VERSION}/sbt-${SBT_VERSION}.tgz && tar xvzf sbt-${SBT_VERSION}.tgz && rm sbt-${SBT_VERSION}.tgz && ln -s /root/sbt/bin/sbt /usr/local/bin && cd ~ && git clone https://github.com/fluxcapacitor/pipeline.git && sbt clean clean-files' returned a non-zero code: 100
[ec2-user@ip-172-31-26-253 pipeline]$ whoami
ec2-user
&& wget http://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz
&& tar xvzf elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz
&& rm elasticsearch-${ELASTICSEARCH_VERSION}.tar.gz \
Use https://github.com/jupyter/nbconvert, make sure SparkSession, main method, etc are exported.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.