Giter VIP home page Giter VIP logo

instagram_explorer's Introduction

Instragram Explorer - Scraping and Learning

A simple & basic package to build social media datasets based on Instagram public posts, using web scraping techiniques with BeautifulSoup.

Purpose

This repo provides a pack of scraping functions that work with Instagram "Explore" page as of March, 2018.

Goals

The goal of this project is to provide a tool to analysts, programmers, data scientists and students that need to build datasets from social media posts, such as Instagram. The initial idea was to use Instagram official API, but it's currently not supported an endpoint that retrieves public posts based on hashtags or locations. The intent is also to create tweaks that help on data augmentation of datasets.

Usage

This app can be used as of v0.1.0-beta.2 as with 3 types of arguments: single hashtag, hashtag list from file or hashtag + hashtag similar words.

Single Hashtag

A single hashtag can be passed as an argument on the compiler, such as: python read_tags.py -w soccer The app will explore only the sys.argv[1], which is soccer, and get only it's results.

Hashtag and Similar Words

NLTK provides a WordNet Interface, which is used to discover similar words based on a given word. As this is still being sharpened, it's not that useful as of v0.1.0-beta.2 as , but improvements will come. It won't work with adjectives, for instance.

To use this function, the -wn argument shall be passed to the compiler, as shwon below: python read_tags.py -wn sunshine The console will print all the words used to scrap Instagram data. On the sunshine example, the result will be:

words =  [['sunshine', 1.0], ['sunlight', 1.0], ['fair_weather', 0.1]]

All words within the list will be scraped individually. Currently there is no distinction between choosen words and generated words on the database to provide some kind of identity, but it will be implemented later.

The second element of each element within the list is the similarity score calculated by NLTK a.path_similarity(b) function. Please refer to NLTK documentation for more information. This score will be stored in the database in the future.

Hashtag List

A list of hashtags can be input in this package by using an argument in command line.

  1. Create a textfile with a list of words, containing one word per line, as shown below:
soccer
brazil
neymar
worldcup
ronaldinho
  1. Use the argument -f filename.txt to execute the code, like: python read_tags.py -f my_words.txt
  2. The code will read the file and print it's content in a python list format, as:
words = ['soccer','brazil','neymar','worldcup','ronaldinho']

instagram_explorer's People

Contributors

jpmondoni avatar lcavenaghi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

instagram_explorer's Issues

Read tags - Sem Resultados

Pessoal fiz as instalações das dependências necessárias e rodei o comando py .\read_tags.py -w cerveja, porém toda vez que rodo não captura nenhum post mesmo o chrome percorrendo por minutos o intagram. Tem algo que tem que ser configurado antes?

Abs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.