Giter VIP home page Giter VIP logo

neteaseusersimilarity's Introduction

[TOC]

分析结果:

结果

使用方法:

  • download这个项目
  • python crawler.py 获得数据info.json,可以的话用我抓好的100个数据./info.json,自己抓耗时也得几分钟。 info.json json数据结构为:
    {
      'user1Id':{
        'nikeName':nikeName,
        'fans':['fans1','fans2'],//粉丝列表前20个
        'level':level,
        'songsAllRank':{'song1Id':'song1Name','song2Id':'song2Name'}//所有时间听歌排行前100
      }
    }
  • python cluster即可得到图

需要:

个性化运行时可能需要修改的地方:

  • crawler.py Ids换为您需要的id,也可保持不变
  • crawler.py 中的crawler(100),100换为你想要抓取的人数,默认为100个

遇到过的障碍:

  1. ifream 中的数据抓取

    # 获取g_iframe中的元素信息
    driver.switch_to_frame('g_iframe')
  2. selenium的span单击报错:

    • </iframe> is not clickable at point
    • 解决办法:
        # change
        songsAll = driver.find_element_by_css_selector('#songsall')
        action_chains = ActionChains(driver)
        action_chains.click(songsAll)
        action_chains.perform()
      
        # to
        songsAll = driver.find_element_by_css_selector('#songsall')
        driver.execute_script('arguments[0].click();',songsAll)
  3. element找不到的情况:

    # selenium隐式等待2秒
    driver.implicitly_wait(2)
  4. pandas.read_json()会自动转换为时间戳(现已不用pandas方案,直接用json)

    # 禁止转换
    pd.read_json(json.dumps(UserDict),convert_axes=False)
  5. 字典过滤:

    def dataCleaning(data):
        # 字典过滤,将采集数据中搜有时间听歌100首以上的前100首过滤出来
        return {k:v for(k,v) in data.items() if len(v['songsAllRank']) !=0}
  6. matplotlib 中文字例 未解决前: figureCantChinese 解决办法:

    • 第一步:下载字体:msyh.ttf (微软雅黑)放在系统字体文件夹下:/usr/share/fonts

      同时也复制并放在matplotlib的字体文件夹/fonts/ttfmatplotlib目录

    • 第二步:修改matplotlib配置文件:如上图的目录删除font.familyfont.sans-serif两行前的#, font.family 改为Microsoft YaHei 并在font.sans-serif后添加中文字体Microsoft YaHei 如图: changeMatplotlibrc

    • 第三步:删除~/.cache/matplotlib下文件fontList.py3k.cache

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.