Giter VIP home page Giter VIP logo

Comments (6)

thewayiam avatar thewayiam commented on June 3, 2024

UnicodeJsonItemExporter write by @x4base

因為需要看到中文字,所以還是改回使用scrapy原始輸出
再經由https://github.com/g0v/councilor-voter-guide/blob/master/data/reformat_json.py
產出https://github.com/g0v/councilor-voter-guide/blob/master/data/pretty_format/tccc/councilors.json
也就是:
/data 底下分各縣市原始 JSON
/data/pretty_format/ 底下放各縣市轉過的好讀版 JSON(方便debug等)
也許scrapy也可直接做到這樣的需求?

謝謝回饋!我想我需要點時間把README寫得好一點

from councilor-voter-guide.

y12studio avatar y12studio commented on June 3, 2024

建議原始JSON就輸出為 utf-8字容易除錯,這部份可用 scrapy + UnicodeJsonItemExporter 完成,至於好讀版也另外提供個小工具 prettyjson.py 來方便排版觀察,與 reformat_json.py 差異在於方便個別 json 測試觀察。

歡迎討論 Comparing g0v:master...y12studio:tccc-utf8 · y12studio/councilor-voter-guide ,如可接受再提PR。

export the json file

$ cd crawler/tccc
$ scrapy crawl councilors -o /tmp/test.json

pretty json stdout

$ cat /tmp/test.json | python ../../utils/prettyjson.py

save the pretty json file

$ python ../../utils/prettyjson.py /tmp/test.json /tmp/pretty.json

from councilor-voter-guide.

thewayiam avatar thewayiam commented on June 3, 2024

@x4base ping! 你應該會想看一下
抱歉之前把 tccc jsonexporter改掉沒跟你說阿,因debug需要想加indent就先改了

@y12studio
感覺很方便,scrapy實在應該內建阿
PR please

from councilor-voter-guide.

thewayiam avatar thewayiam commented on June 3, 2024

剛試用了,讚,輸出的檔案大小還小了約一半(4.5MB > 2.6MB)!為什麼啊??有確定資料是一樣的
之後會把各縣市都統一改過
另外README也update過了,歡迎補充!

from councilor-voter-guide.

y12studio avatar y12studio commented on June 3, 2024

應是u1234這類編碼是直接存的關係,並非存其utf-8的碼,差2倍差不多。

from councilor-voter-guide.

x4base avatar x4base commented on June 3, 2024

酷喇~~sorry 之前沒被ping到XDD

from councilor-voter-guide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.