Giter VIP home page Giter VIP logo

fpmc-2's Introduction

FPMC

#Produce hive -> DataFrame -> RDD[Features] -> Hbase

生产加工特征存入hbaew: 参数说明: --date:特征的日期,格式为2015-09-22 --sql: 提取数据的sql --ftype: 特征类型(cross,user,item) --mapping:sql row到feature的映射,其中:

attrCate为交叉特征类型ID,需要是整数, user的值对应的sql中user_id的索引, attrid对应sql中id的索引, fIndex为一个字典,键是vw命名空间名,值是对应sql中的索引。 example:

spark-submit \                                                                                                                                                                                                   
   --master local \                                                                                                                                                                                                 
   --num-executors 1 \                                                                                                                                                                                              
   --driver-memory 1g \                                                                                                                                                                                             
   --executor-memory 1g \                                                                                                                                                                                           
   --executor-cores 1 \                                                                                                                                                                                             
   --name test-zhengchen \                                                                                                                                                                                          
   --queue bdp_jmart_jdmp \                                                                                                                                                                                         
   --class com.jd.bdp.fpmc.tools.SparkSql2HbaseFeature \                                                                                                                                                            
   tools-0.1-SNAPSHOT.jar \                                                                                                                                                                                         
   --date $YESTERDAY \                                                                                                                                                                                              
   --sql "select user_id,cast(sku_id as bigint),cast(sum(three_days) as string),cast(sum(seven_days) as string) from adm.o2o_user_sku_features_tmp where dt='$YESTERDAY' and action='buy' group by user_id,sku_id" \
   --ftype cross \                                                                                                                                                                                                  
   --mapping '{"attrCate":1,"user":0,"attrid":1,"fIndex":{"ub3b":2,"ub7b":3}}' \

#Consume hiveTable -> DataFrame -> RDD[Action] -> RDD[Example] <- Hbase

生成从hbase取特征 参数说明: --sql:提取标签数据的sql --output:样本输出的hdfs路径 --mapping:sql row到标签的映射,其中:

user对应sql行数据中用户id的索引, item对应sql行数据中skuid的索引(要求bigint类型), timestamp:时间戳对应的sql行数据索引,单位为s,类型为int, label:标签对应的sql行数据索引, attrmap:交叉特征类别与值映射,建为交叉特征类型,值为交叉特征对应的sql行数据索引 example:

spark-submit \
   --master local \
   --num-executors 1 \
   --driver-memory 1g \
   --executor-memory 1g \
   --executor-cores 1 \
   --name test-zhengchen \
   --queue bdp_jmart_jdmp \
   --class com.jd.bdp.fpmc.tools.MakeExamples \
   tools-0.1-SNAPSHOT.jar \
   --sql "select user_log_acct, sku_id, request_time_sec, 1 from gdm.gdm_m14_online_o2o where dt='$YESTERDAY' and ct_page in ('detail' ,'GoodsInfo') and sku_id is not null and user_log_acct is not null and user_l
   og_acct != '' and refer_page in ('home','Home')" \
   --mapping '{"user":0,"item":1,"timestamp":2,"label":3,"attrmap":{"1":1}}' \
   --output /tmp/fpmc/example

fpmc-2's People

Contributors

ceys avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.