youtube-recommendations's People
youtube-recommendations's Issues
MBFC scraper is broken
scripts/data_preparation/scrape_mbfc.py
doesn't work anymore. Non-urgent, but should eventually be updated to reflect the page change that likely broke the original script.
Reconstruct full BFS tree from truncated representation
We store our crawl data in a table with the following schema (mod some key handling):
CREATE TABLE recommendations (
video_id text NOT NULL,
search_id integer NOT NULL,
recommendation text,
depth integer,
);
e.g. SELECT * FROM recommendations LIMIT 5
gives:
8wK7ZyxdELM|1|i5uB9ERXG3o|0
8wK7ZyxdELM|1|h8ftTlzYev0|0
8wK7ZyxdELM|1|DbypJZprPT4|0
8wK7ZyxdELM|1|siyW0GOBtbo|0
i5uB9ERXG3o|1|_mEHfrd43gc|1
i5uB9ERXG3o|1|QlaeirHJpns|1
i5uB9ERXG3o|1|jL8uDJJBjMA|1
i5uB9ERXG3o|1|0nCT8h8gO1g|1
h8ftTlzYev0|1|lpdiA8t8djw|1
h8ftTlzYev0|1|cnpe7d7bBRI|1
For efficiency reasons our crawler does not get the recommendations for a video if we have seen it before. As a result, the "tree" represented in recommendations
is truncated. The implicit assumption is that the recommendations associated with any particular video_id
do not change in the course of the crawl. For certain analyses, however, we might like to have access to the full tree. The question is: what is the best way to do this?
Let's take a small example: suppose we have the following tree:
a
|
|--a
| |--a
| |--b
|
|--b
|--c
|--d
This would be stored in recommendations
(omitting the search_id
column) as:
a|a|0
a|b|0
b|c|1
b|d|1
But the tree that this table represents is truncated:
a
|
|--a
|
|--b
|--c
|--d
In this example, the desired output of some script untruncate.(py?R?)
would be
a|a|0
a|b|0
a|a|1
a|b|1
b|c|1
b|d|1
This problem becomes less dumb when the example isn't a self-loop (in most cases we truncate a path when we land at a video we have seen before). Not a very interesting substantive question, but a data wrangling task that I'm not quite sure how to approach.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.