Giter VIP home page Giter VIP logo

spark_sql_project's Introduction

Spark_SQL_Project

SparkSql

SparkSql离线日志分析:使用Scala语言开发

需求分析

  1. 网站用户访问时间段分析
  2. 网站用户地址分布城市统计分析

数据清洗

  1. 对不规则数据进行剔除,抽取有用字段存入HDFS文件系统中
  2. 使用第三方工具ipdatabase进行ip地址的解析

数据分析

使用SparkSql对需求进行分析并且存入关系型数据库中

SparkWeb

数据可视化结果展示:使用Java Web开发

需求一结果展示

timeTopn

需求二结果展示

cityTopn

性能调优

  1. 控制文件输出的大小:coalesce
  2. 并行度:spark.sql.shuffle.partitions(参数调优)
  3. 分区字段类型推测(在指定分区时可以设置为false,默认为true):spark.sql.sources.partitionColumnTypeInference.enabled(参数调优)

spark_sql_project's People

Contributors

ljcan avatar

Stargazers

 avatar  avatar  avatar  avatar Rosie avatar ShiYe avatar ChunFuWu avatar kangkang avatar  avatar pengjunjie avatar zhangyu avatar  avatar  avatar FaneZhang avatar Rayfun avatar  avatar  avatar  avatar  avatar  avatar uzkitio avatar  avatar ijk_ avatar  avatar  avatar feature_selection avatar

Watchers

James Cloos avatar zhangyu avatar  avatar

spark_sql_project's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.