Mongo Tweet Import

Import script for Twitter and GNIP data into mongodb. Inserts entire JSON records and adds some fields to make the two uniform for queries. This tool includes several options to configure the size of each insert (# of tweets) and to check for existing IDs in the database, which can be useful for merging datasets (albeit this happens in a slow way).

Internally, ids are tracked during insertion to prevent duplicates which can happen occasionally.

Usage and Arguments

python import.py <host> <database> <collection>

argument name	description
`host`	hostname of the database. eg. localhost or `mongo.example.com`
`database`	name of the database to use. eg. `mydatabase`
`collection`	name of the collection in the database to insert. eg. `tweets`

Optional arguments:

shorthand argument	argument	description
-l	--limit	limit the number of tweets to import to x.
-f	--filename	filename of the json file to import. wildcards acceptable. eg. tweets.json or july_*_2016.json
-e	--encoding	json file encoding (default is utf-8)
-b	--batchsize	Number of tweets to insert with a single command. default 1000.
-c	--check	check if tweet exists (same tweet id) before inserting. Use this is something goes wrong during an insert, or you're trying to merge two datasets. CAUTION is incredibly slow.
-r	--no_retweets	do not add embedded retweets. By default the source tweet of a retweet is also inserted like a top-level tweet, but this can bias datasets towards things that are retweeted.
	--no_index	do not create an index for tweet ids. By default an ascending index is created on the tweet id.

Example Usage

To use the tool with all defaults use the following. Make sure to replace the first three arguments with those correct for your system.

python import.py mydatabase.host.com mycollection tweets

If you're trying to merge datasets or something went wrong during the first insert use this:

python import.py mydatabase.host.com mycollection tweets --check

geosoco / tweet_import Goto Github PK

tweet_import's Introduction

Mongo Tweet Import

Usage and Arguments

Example Usage

tweet_import's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent