pipelineai / pipeline Goto Github PK

PipelineAI

Home Page: https://generativeaionaws.com

License: Apache License 2.0

Shell 0.31% HTML 0.12% Scala 0.15% JavaScript 0.75% Python 1.32% Java 5.69% CSS 0.19% Groovy 0.01% Clojure 0.08% Dockerfile 0.05% Jsonnet 86.35% Jupyter Notebook 0.08% Ruby 0.01% Makefile 0.07% Go 4.27% TypeScript 0.49% PowerShell 0.01% Pug 0.02% SCSS 0.01% Jinja 0.02%

machine-learning artificial-intelligence tensorflow kubernetes cassandra spark kafka airflow docker redis

pipeline's Introduction

Quick Start

pipeline's People

Contributors

Stargazers

Watchers

Forkers

andypetrella retroryan velvia productpro hyonschu nikolayvoronchikhin vicchugu lucentcosmos viveksaini07 altonga amlee carol270 shirleyyoung0812 viplav mkolod bwboy rkulan007 wangmiao1981 jordancheah edwardt miguelperalvo vybstat jamesbconner mindis bryantravissmith ronstein2000 zzl0 jhn316 ashbt kristinaplazonic yilab mr1azl sandeepk17 jeperez nemosdad seregasheypak vlad777 valery-barysok sepidehsss rogersmarin halfnhav4 dynamicdeploy uvfive iopenstack ggj2010 mavencode agilemobiledev gazimahmud vparikh10 zhenxu66 deepesch 5haman neveroddoreven manisnesan vidur89 hainm jeeveshmishra bivas markrey codeaudit bveliqi spyderrivera yifzhang snowsky gdtm86 tylerjharden kwkwan00 anukat2015 makoxiao yahu sikang99 shaohua-zhang ssyue eliasah realmichaelzyy data-mining davidtanluc savking303 cnjelita vibhorag gentang akarray aravindr18 upids zhuohuwu0603 bekterra sambitdixit shrinba harikiranvuyyuru wanghaisheng benjamesbabala jsmith50500 baiyancheng20 falconzyx azzurolilc turboquant satishkt nikogamulin smopart klonikar

pipeline's Issues

upgrade to kafka 0.10.0 and confluent 3.0

Port assignment in pipeline-pyspark.sh for 8754

Working through the instruction wiki - using only the code provided - when I try running pipeline-pyspark.sh or pyspark.sh I get the following:

[W 15:23:11.420 NotebookApp] server_extensions is deprecated, use nbserver_extensions
/usr/local/lib/python2.7/dist-packages/widgetsnbextension/__init__.py:30: UserWarning: To use the jupyter-js-widgets nbextension, you'll need to update
    the Jupyter notebook to version 4.2 or later.
  the Jupyter notebook to version 4.2 or later.""")
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using authentication. This is highly insecure and not recommended.
[I 15:23:11.477 NotebookApp] The port 8754 is already in use, trying another port.
[C 15:23:11.477 NotebookApp] ERROR: the notebook server could not be started because no available port could be found.

Running

lsof -Pnl +M -i4

I get that jupyter should be using 8754

jupyter-n 2541        0    3u  IPv4  30881      0t0  TCP *:8754 (LISTEN)

Trying to then call pyspark still gets the port error.

Am I missing something that I should have done?

Setup Alerts on system metrics + model prediction metrics (accuracy) vs. current model

PagerDuty, here we come!

Improve Fallback and Timeout Support Ensemble-based Predictions

related to #154

fallback in the following order

statically-generated version of the most-recent live model (s3 or local disk burned at Docker image creation time?)
if static version not available, fallback back to the statically-generated version of a previous live model (s3 or local disk burned at Docker image creation time)
fallback to completely non-personalized as last resort (local disk burned at Docker image creation time)

Build Fatal Error: dist/cassandra/2.2.3/apache-cassandra-2.2.3-bin.tar.gz NOT FOUND

Build Fatal Error: dist/cassandra/2.2.3/apache-cassandra-2.2.3-bin.tar.gz NOT FOUND error raised when building docker image from source.

Implement Spark ML prediction batching similar to TensorFlow Serving batching

https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md

while not GPU-specific, this batching does help GPU performance by batching up matrix operations which need to be moved between CPU and GPU memory.

moving batch calculations is much more efficient than individual calculations

Enable shadowed canary deployment to compare current and new/canary models against the same live production traffic

helps sync dev/local and prod environments by allowing devs/data scientists to tap directly into the production stream during development and model training

add a flink streaming example similar to the existing spark streaming example

Follow this pattern:

https://github.com/fluxcapacitor/pipeline/blob/master/myapps/spark/streaming/src/main/scala/com/advancedspark/streaming/rating/store/NifiKafkaCassandra.scala

place the code in this path:

https://github.com/fluxcapacitor/pipeline/tree/master/myapps/flink/streaming/src/main/scala/com/advancedflink

Explore Services JDBC ODBC

In the JDBC ODBC Hive ThriftServer "quit" needs to be changed to "!quit"

Setup Netflix Dynomite Manager sidecar + UI for Dynomite/Redis management similar to Cassandra Priam

https://github.com/Netflix/dynomite-manager/wiki

File ~/pipeline/flux-one-time-setup.sh in the wiki does not exist

Hi Chris,
Looks like the file name has changed. Is it flux-setup.sh?
Thanks.

In wiki https://github.com/fluxcapacitor/pipeline/wiki/Start-Docker-Container

In Windows:
%USERPROFILE%\notebooks

don't work, see:
docker-archive/toolbox#80

something like:
//c/directory

worked

implement NetflixOSS Hystrix for circuit breaker failover

Integrate Docker, Jenkins, Netflix Spinnaker and/or Google Kubernetes Dashboard into Pipeline Deployment Workflow for Blue-Green and Rolling Deployments

Add Chaos Monkey, Latency Monkey, and Chaos Kong Support on Prediction Service Cluster

https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey

implement file-upload with NetflixOSS Zuul for images and other assets for prediction/classification

Integrate NetflixOSS Vector to monitor guest containers + host system stats

http://vectoross.io/blog/release/2016/06/01/whats-new-in-vector.html

change Docker start command to start CMD script and stay up using supervise

CMD ["bin/bash", "-c", "'/root/pipeline/bin/setup/RUNME_ONCE.sh'", ...]

Modify StreamingLatentMatrixFactorization to use Spark 2.0's DefaultParamsReadable and DefaultParamsWritable

store everything as Param inside of the model it can be saved/loaded like other Spark ML models

add image-based item-to-item similarity recommendations to the main demo based on the eigenfaces similarity demo notebook

http://demo.advancedspark.com:8080/#/notebook/2BJB7GHA8

Download PEM file - minor typo

missing the -P flag to specify download location

wget http://advancedspark.com/keys/pipeline-training-gce.pem -P ~/.ssh

feeder still using "ratings" kafka queue after rename to "item_ratings"

per @BrentDorsey:

Did the Kafka topic name change from ratings to item_ratings?

** feeder application.conf (https://github.com/fluxcapacitor/pipeline/blob/master/myapps/feeder/src/main/resources/application.conf) defines the kafkaTopic = “ratings”

Setup the Environment: bug on path

On the setup The Environment page, you say that we should launch the following command.
root@docker$ ~/pipeline/bin/initial/RUNME_ONCE.sh
Instead of the previous command, the correct one is
root@docker$ ~/pipeline/bin/RUNME_ONCE.sh

By the way, thank you for your example.

integrate dyno client and dynomite+redis server with eureka and dynomite manager

use Eureka-based Dyno client + Dynomite manager

Read Spark ML-generated Parquet model data from Java (Spark/NetflixOSS Serving) and C++ (TensorFlow Serving)

here is a sample ALS recommendation/matrix-factorization model generated by Spark 1.6.1:

https://github.com/fluxcapacitor/pipeline/blob/master/datasets/serving/recommendations/spark-1.6.1/als.tar.gz

here are the 3 subdir's generated by the Spark code detailed below:

drwxr-xr-x 2 root root 4096 May 15 06:47 itemFactors/
drwxr-xr-x 2 root root 4096 May 15 06:47 metadata/
drwxr-xr-x 2 root root 4096 May 15 06:47 userFactors/

here is the relevant Spark 1.6.1 code that generated this model: https://github.com/apache/spark/blob/branch-1.6/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L242

https://github.com/apache/spark/blob/6a6010f0015542dc2753b2cb12fdd1204db63ea6/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L263

we'll have to dig around the code a bit, but the key is the DefaultParamsWriter code from that 2nd link.

btw, here's the Spark 2.0.0 version which is similar. https://github.com/apache/spark/blob/branch-2.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

we should make sure 2.0.0 is similar.

Add Netflix Hystrix + Turbine Dashboard for cluster-wide monitoring of circuit-breaker activity

Synchronize local dev and production deployment environments

problem is that data science teams train and develop in an environment that's very different from production

use similar docker image for development, but with file watcher to enable rapid iteration of model creation, deployment, and testing

this also has the benefit of being able to reproduce and debug issues in prod since the dev environment is the same (except maybe the size of the dataset)

Upgrade Flux to latest Kafka and Confluent

Implement Canary Analysis of newly-deployed ML models to run along side existing models - allow rollback if possible

integrate spark-redis connector

dependency on this: RedisLabs/spark-redis#30

Add NetflixOSS Eureka Service Discovery to support Chaos Monkey, Hystrix/Turbine, Dynomite, and various other NetflixOSS key components

how do I start nifi?

Thanks for this great demo-box. How do I start nifi?

Just connecting to the port doesn't work...

Create kafka streams code to predict and aggregate data from kafka

Improve algorithms for incremental/partial-fit training in streaming environment

currently, using Matrix Factorization with Gradient Descent Optimization. better algos exist.

move spark 2.0.0 streaming to kafka 0.10/confluent 3.0

Add NetflixOSS Hystrix Circuit Breaker to REST API

[Work In Progress] Netflix-Hystrix Demo

https://github.com/Netflix/Hystrix/

Hystrix Sample WebApp

http://192.168.59.103:38989/hystrix-examples-webapp/

Generating Sample Data for Hystrix WebApp

cd ~/Hystrix/hystrix-examples
./gradlew run &

Hystrix Circuit Breaker Dashboard

http://192.168.59.103:37979/hystrix-dashboard/monitor/monitor.html?stream=http%3A%2F%2F192.168.59.103%3A38989%2Fhystrix-examples-webapp%2Fhystrix.stream

integrate elasticsearch graph

modify tensorflow image classification service to take url param

Something like the following:

http://<ip>:5070/classify?url=https://static01.nyt.com/images/2007/04/02/us/02mormon.600.jpg

Run the Docker Container using the Loaded Image

When the below command is run, getting "docker: Error response from daemon: client is newer than server (client API version: 1.22, server API version: 1.20)." error.

Appreciate your assistance.

docker run -i --privileged --name pipeline -h docker -m 8g -p 80:80 -p 36042:6042 -p 39160:9160 -p 39042:9042 -p 39200:9200 -p 37077:7077 -p 38080:38080 -p 38081:38081 -p 36060:6060 -p 36061:6061 -p 36062:6062 -p 36063:6063 -p 36064:6064 -p 36065:6065 -p 32181:2181 -p 38090:8090 -p 30000:10000 -p 30070:50070 -p 30090:50090 -p 39092:9092 -p 36066:6066 -p 39000:9000 -p 39999:19999 -p 36081:6081 -p 35601:5601 -p 37979:7979 -p 38989:8989 -p 34040:4040 -p 34041:4041 -p 34042:4042 -p 34043:4043 -p 34044:4044 -p 34045:4045 -p 34046:4046 -p 34047:4047 -p 34048:4048 -p 34049:4049 -p 34050:4050 -p 34051:4051 -p 34052:4052 -p 34053:4053 -p 34054:4054 -p 34055:4055 -p 34056:4056 -p 34057:4057 -p 34058:4058 -p 34059:4059 -p 34060:4060 -p 36379:6379 -p 38888:8888 -p 34321:54321 -p 38099:8099 -p 38754:8754 -p 37379:7379 -p 36969:6969 -p 36970:6970 -p 36971:6971 -p 36972:6972 -p 36973:6973 -p 36974:6974 -p 36975:6975 -p 36976:6976 -p 36977:6977 -p 36978:6978 -p 36979:6979 -p 36980:6980 -p 35050:5050 -p 35060:5060 -p 37060:7060 fluxcapacitor/pipeline bash

Error pulling image stderr: write /root/zeppelin-0.6.0-spark

9d502da0bc8e: Error pulling image (latest) from docker.io/fluxcapacitor/pipeline, ApplyLayer exit status 1 stdout: stderr: write /root/zeppelin-0.6.0-spark-1.5.1-hadoop-2.6.0-fluxcapacitor/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar: read-only file system ubating-SNAPSHOT.jar: read-only file system

I will try without zeppelin

List minimum specs for creating docker containers

even though in the docker machine commands and docker commands it is clear that min 8G is required for image should state that up front (would have told me I needed to run from 16G MBP rather than day to day 8G MBA)

Tachyon startup and initial format are failing due to log4j:ERROR Could not instantiate class [tachyon.Log4jFileAppender]

Per @BrentDorsey

Tachyon raised three errors when I tried to start it manually:

log4j:ERROR Could not instantiate class [tachyon.Log4jFileAppender]. 
log4j:WARN No such property [deletionPercentage] 
Storage format error

To get the Tachyon Web UI working I propose the following changes:

Replaced tachyon.Log4JFileAppender with org.apache.log4j.RollingFileAppender in https://github.com/fluxcapacitor/pipeline/blob/master/config/tachyon/log4j.properties
Removed the deletionPercentage settings
Implement tachyon format at end of tachyon config section in https://github.com/fluxcapacitor/pipeline/blob/master/bin/config-services-before-starting.sh

replace elasticsearch with redis cache for serving layer of offline-generated recommendations

Add Titan Graph DB to Flux

Build Issue - Unable to locate package linux-tools-3.10.0-229.14.1.el7.x86_64

Steps to Reproduce

provision RHEL 7 instance in amazon.
Checkout the latest code from Git repo
build
After installing Oracle linux
the build fails at

apt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r \

uname-r for the instance I am using is

3.10.0-229.14.1.el7.x86_64

Check below build error.

Reading state information...
E: Unable to locate package linux-tools-3.10.0-229.14.1.el7.x86_64
E: Couldn't find any package by regex 'linux-tools-3.10.0-229.14.1.el7.x86_64'
The command '/bin/sh -c apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:webupd8team/java && apt-get update && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-get install -y oracle-java8-installer && apt-get install -y oracle-java8-set-default && apt-get install -y curl && apt-get install -y wget && apt-get install -y vim && apt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r && apt-get install -y nodejs && apt-get install -y npm && mkdir -p ~/.vim/{ftdetect,indent,syntax} && for d in ftdetect indent syntax ; do curl -o ~/.vim/$d/scala.vim \ https://raw.githubusercontent.com/derekwyatt/vim-scala/master/syntax/scala.vim; done && cd ~ && apt-get install -y git && apt-get install -y openssh-server && apt-get install -y default-jdk && apt-get install -y apache2 && apt-get install -y cmake && git clone --depth=1 https://github.com/jrudolph/perf-map-agent && cd perf-map-agent && cmake . && make && cd ~ && git clone --depth=1 https://github.com/brendangregg/FlameGraph && wget https://dl.bintray.com/sbt/native-packages/sbt/${SBT_VERSION}/sbt-${SBT_VERSION}.tgz && tar xvzf sbt-${SBT_VERSION}.tgz && rm sbt-${SBT_VERSION}.tgz && ln -s /root/sbt/bin/sbt /usr/local/bin && cd ~ && git clone https://github.com/fluxcapacitor/pipeline.git && sbt clean clean-files' returned a non-zero code: 100
[ec2-user@ip-172-31-26-253 pipeline]$ whoami
ec2-user