Giter VIP home page Giter VIP logo

tsec's Introduction

Taiwan Stock Exchange Crawler

這是一個去爬 台灣證券交易所證券櫃檯買賣中心 的爬蟲,秉持著 Open Data 的理念,公開爬蟲公開資料最安心。

Note

有問題的話可以到 Gitter.im 發問,會盡快回答

Setup

$ git clone https://github.com/Asoul/tsec.git

$ cd tsec

$ pip install -r requirements.txt

Usage

Command

爬當日

$ python crawl.py

爬指定日期

$ python crawl.py YYYY MM DD

e.g.

$ python crawl.py 2016 02 15

Flag

-b, --back: 往回爬直到 2004/2/11

-c, --check: 往回爬 10 天

後處理

清除重複的檔案,按日期排序

$ python post_process.py

資料格式

  • 每個檔案的檔名 XXX.csvXXX 是股票編號
  • 每個檔案中有數列,每列為一天交易的資訊
  • 每列包含:交易日期、成交股數、成交金額、開盤價、最高價、最低價、收盤價、漲跌價差、成交筆數,共 9 欄。
  • 符號說明: +表示漲、- 表示跌、X表示不比價
  • 當日統計資訊含一般、零股、盤後定價、鉅額交易,不含拍賣、標購。

範例:104/02/13,7599922.0,528270219.0,69.35,69.65,69.35,69.45,0.45,1771.0

資料來源

附上免責聲明

本人旨在為廣大投資人提供正確可靠之資訊及最好之服務,作為投資研究的參考依據,若因任何資料之不正確或疏漏所衍生之損害或損失,本人將不負法律責任。是否經由本網站使用下載或取得任何資料,應由您自行考量且自負風險,因任何資料之下載而導致您電腦系統之任何損壞或資料流失,您應負完全責任。

聯絡我

有 Bug 麻煩跟我說:

最後更新時間:2017/02/15

我的其他專案

股票即時資料爬蟲

tsec's People

Contributors

asoul avatar brchiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tsec's Issues

清除無效資料

Hello Asoul~
post_process.py 是不是考慮清除無效的資料
例如代號 2527 2016/5/31 的資料為 105/05/31,0,0,--,--,--,--,,0
您在參考看看

股票 3444 2018年07月13日 encode問題

當data有中文的時候(除息),就會出現這個樣子的error。
請問有沒有什麼encode方面的解決辦法呢?
而且我比較好奇的是為什麼會在price difference這個位置放“除息”的資料
然後我看到你在data這個folder裡面 3444.csv裡面直接是沒有了當天的資料,是不是你的try catch的部分有解決到這個問題呢?

UnicodeEncodeError: 'charmap' codec can't encode characters in position 52-53: character maps to <undefined>

invalid syntax on line 110 in crawl.py

I run the program crawl.py in Mac OS using Python3. Then I got the invalid syntax message as the followings:
$ python3 crawl.py
" File "crawl.py", line 110
print 'Crawling {}'.format(date_str)
^
SyntaxError: invalid syntax"

Also the same results on using python2

Get EPS from tsec

Hi Asoul,

   Could I use tsec to get EPS of compnay (e.g. TSMC)?

Thanks in advance!

snowuyl

twse 改版了 爬蟲要重寫了!

import requests
url = "http://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date=20170523&type=24&_=1495583462499"

response = requests.get(url)
改版後,從post變成get提出request,但需要回傳cookie
由於回傳值是json格式,經過分析,回傳內容中股價資料存在 json格式 中的 "data1":[["1437","勤益控","104,976"…]]

import json
data = json.loads(response.text)
print data.keys()
print data["data1"]
s = data["data1"]
type(s)
for item in s:
print item

for i in range(len(s)):
for j in range(len(s[1]):
print s[i][j]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.