Giter VIP home page Giter VIP logo

facebook-backup-tool's People

Contributors

kong0107 avatar

Stargazers

 avatar  avatar

Forkers

favonia

facebook-backup-tool's Issues

回應中出現非公開連結,何如?

假設使用者看得到某篇非公開文章,而該文章的連結被貼在某篇公開文章的回應中,那麼抓取該回應時,會不會因為使用者有權限看得到,而「不小心」預覽了非公開文章?

私人備份時無妨,但備份後若有公開需求則須留意。

handle filename extensions for media files

For example, photo node 365599393597218 seems to be a PNG instead of JPEG. But such information is lost for index.html to render the full filename to show the downloaded photo.
And prepare for video files....

依數量而不依月分做分頁

利於顯示活動和社團的密集發文。
只要把 filter 改成 limitTo 即可,也不影響 search 的運作,應該不難。

allow searching public groups

May not be so important, and notice that public groups usually have much more messages and spams, which cost memory and time.

Add more fields in DB

Add some fields such as last_crawled_time, status (for deleted nodes), and _id, in order to keep crawled data as raw as possible.

implement *_tags into their text

例如把 message 中的文字用 message_tags 來加上連結。
函數已經寫好了,是 js/utility.jscombine_tags ,問題是如何在 AngularJS 中使用。

handle duplicate comments

For example,
1494066394244026_1538635866453745 (node type "post") and 1538635866453745 (node type "video") are sharing the equivalent set of comments (but with different order), so we don't have to crawl them both.

But what about output?

more options for crawling

例如是否要下載圖片、回應
全部(重)抓,抑或只抓新的,而所謂「新的」是指 updated_time 有變動,還是 created_time 在某個區間的?
還有要否檢查節點是否仍存
related with #10, #14, #15, and #22.

Deprecate floating <dt>s

Use display: table-cell; of CSS to implement single-line term-description pair instead.
It's so sad that only DT and DD are allowed in DL.

Two possible solutions:

Use several DL for each DT-DD pair.

Cons: Not easy for machines to know they are actually one list, which means the terms have some same properties.

<div style="display: table;">
    <dl style="display: table-row;">
        <dt style="display: table-cell;">term</dt>
        <dd style="display: table-cell;">description</dd>
    </dl>
    <dl style="display: table-row;">
        <dt style="display: table-cell;">longer term</dt>
        <dd style="display: table-cell;">description</dd>
    </dl>
</div>

DL has its style display: table-row is for a table-like display. Otherwise the DTs of different DL would have different widths.
A wrapper is needed to separate the content from other list (if exists).

Use LI to wrap each term-description pair.

Just like the above, but change the wrapper to UL, DL to LI, DT and DD to DIV (or other wrapper).
Pros: You can see them as a list still.
Cons: Not easy for machines to know the term-description relation. (I think this is much more terrible than the other solution, but Facebook use this.)

handle "push by GET" in another page

持續抓取時,若要新增另一個要抓的東西進入排程(這個「新增」似乎用 queue的概念較妥),不宜影響 crawler.php 的 sleep 狀態,所以「把要抓的東西放進 session 」這件事應該放在另一個頁面。
這樣主頁面持續讀取 crawler.php 就好。
新頁面也許可叫做 queueAppend.php

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.