Giter VIP home page Giter VIP logo

sparkapps's Introduction

Spark应用程序(陆续更新)

一. 基于Spark streaming的 模拟 网站日志实时分析系统

说明

  1. ScalaJava两种实现.

环境依赖

  • hadoop-2.6.2
  • spark-1.5.2-bin-hadoop2.6
  • scala-2.10.5
  • sbt
  1. 基本描述:
  • 日志来源:用Python脚本随机生成Nginx访问日志,并通过Bash脚本上传至HDFS
  • Spark Streaming应用监控HDFS目录
  • 每10s产生的日志文件作为一个批次
  • 输出的流量分析包括指定时间内:
    • 总的PV
    • 各IP的PV
    • 来自各搜索引擎和关键字的PV
    • 终端类型(PC, ios, Android)PV
  1. 使用:
  • 项目根目录下:sbt clean package
  • 启动hadoop hdfs
  • 根据自身Spark和Hadoop配置情况,修改log_analysis.sh
  • 项目根目录下,打开两个终端窗口,分别运行
    • ./create_log.sh
    • ./log_analysis.sh

二. Kafka + Spark Streaming实现男女淘宝购物数量动态展示DashBoard

说明

  1. ScalaJava实现,Java实现中对流中无效数据项进行过滤。
  2. Scala实现采用了json4作为序列化方式,在Java中用json4遇到了问题,自己实现的了序列化方案。

环境依赖

在上面基础上增加:

  • spark-streaming-kafka_2.10
  • python相关(建议用virtualenv 创建python3虚拟环境) pip3 install flask flask-socketio kafka-python
  1. 基本描述:
  • KafkaProducer动态从pykafka/data/user_log.csv(自己创建data目录并下载user_log.csv)来源. 文件动态处理消费记录,发送消费记录中的性别代号(0或1)到Kafka. 主题为'gender'
  • Spark Streaming从Kafka主题'gender'读取处理消息,时间窗口为7秒
  • Spark Streaming将处理后的数据发送给Kafka,主题为'result'
  • Flask Web应用接受Kafka主题为'result'的消息,并利用flask-socketio将数据推送到浏览器
  • 浏览器采用highcharts.js库动态展示结果
  1. 使用
  • sbt clean package assembly(需要添加sbt-assembly插件,见./project/plugins.sbt)
  • cd pykafka,启动virtualenv环境virtualenv python3 venv
  • pip3 install flask flask-socketio kafka-python
  • 安装Kafka
  • kafka安装目录 bin/zookeeper-server-start.sh config/zookeeper.properties & bin/kafka-server-start.sh config/server.properties
  • cd pykafkapython scripts/producer.py
  • dashboard.sh(根据Spark和hadoop配置情况自行调整)
  • cd pykafka, python app.py
  • 访问 localhost:5000
  1. 结果

sparkapps's People

Contributors

yangyli0 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.