Giter VIP home page Giter VIP logo

socid-extractor's Introduction

socid_extractor

Extract information about a user from profile webpages / API responses and save it in machine-readable format.

Usage

As a command-line tool:

$ socid_extractor --url https://www.deviantart.com/muse1908
country: France
created_at: 2005-06-16 18:17:41
gender: female
username: Muse1908
website: www.patreon.com/musemercier
links: ['https://www.facebook.com/musemercier', 'https://www.instagram.com/muse.mercier/', 'https://www.patreon.com/musemercier']
tagline: Nothing worth having is easy...

Without installing:

$ ./run.py --url https://www.deviantart.com/muse1908

As a Python library:

>>> import socid_extractor, requests
>>> r = requests.get('https://www.patreon.com/annetlovart')
>>> socid_extractor.extract(r.text)
{'patreon_id': '33913189', 'patreon_username': 'annetlovart', 'fullname': 'Annet Lovart', 'links': "['https://www.facebook.com/322598031832479', 'https://www.instagram.com/annet_lovart', 'https://twitter.com/annet_lovart', 'https://youtube.com/channel/UClDg4ntlOW_1j73zqSJxHHQ']"}

Installation

$ pip3 install socid-extractor

The latest development version can be installed directly from GitHub:

$ pip3 install -U git+https://github.com/soxoj/socid_extractor.git

Sites and methods

More than 100 methods for different sites and platforms are supported!

  • Google (all documents pages, maps contributions), cookies required
  • Yandex (disk, albums, znatoki, music, realty, collections), cookies required to prevent captcha blocks
  • Mail.ru (my.mail.ru user mainpage, photo, video, games, communities)
  • Facebook (user & group pages)
  • VK.com (user page)
  • OK.ru (user page)
  • Instagram
  • Reddit
  • Medium
  • Flickr
  • Tumblr
  • TikTok
  • GitHub

...and many others.

You can also check tests file for data examples, schemes file to expore all the methods.

When it may be useful

  • Getting all available info by the username or/and account UID. Examples: Week in OSINT, OSINTCurious
  • Users tracking, checking that the account was previously known (by ID) even if all public info has changed. Examples: Aware Online
  • Searching by commonly used cross-service UIDs (GAIA ID, Facebook UID, Yandex Public ID, etc.)
    • DB leaks of forums and platforms in SQL format
    • Indexed links that contain target profile ID
  • Searching for tracking data by comparison with other IDs - how it works, how can it be used.
  • Law enforcement investigations

SOWEL classification

This tool uses the following OSINT techniques:

Tools using socid_extractor

  • Maigret - powerful namechecker, generate a report with all available info from accounts found.

  • TheScrapper - scrape emails, phone numbers and social media accounts from a website.

  • InfoHunter - An open source OSINT tool that allows you to search, collect and analyze information online to get a complete picture of the person or company you are interested in.

  • YaSeeker - tool to gather all available information about Yandex account by login/email.

  • Marple - scrape search engines results for a given username.

Testing

python3 -m pytest tests/test_e2e.py -n 10  -k 'not cookies' -m 'not github_failed and not rate_limited'

Contributing

Check separate page if you want to add a new methods of fix anything.

socid-extractor's People

Contributors

cyb3rk0tik avatar dependabot[bot] avatar meowypouncer avatar soxoj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

socid-extractor's Issues

Fix Vimeo

Now user info available only from API

Error / no results

I tested your application but I experience no results on URL's like:

twitter account URL
disqus account URL
facebook account URL

I see: analyzing URL ..... without result

[maigret] behance extracting wrong info

when using maigret the info extracted from behance.net is wrong. It's using info from liked posts.

expected:

key value
Uid ???
First name Maximous
Last name Black
Username maximousblk

actual:

key value
Uid 3298983
First name Nick
Last name Buturishvili
Username nikabuturishvili

images:
image

image

Add Jira

auth required for profile page

Major optional performance boost suggestion when operating on url input

This is relevant to the operation on url input.

socid-extractor sends request to the url and only then tries to parse according to its list of supported websites.

On the one hand this allows to handle generic platforms such as vBulletin which can appear under different domains and urls, on the other hand for supporting most if not all of the other websites which can a specific domain/url, there could have been a check if the website is supported before sending the request to avoid unnecessary request for unsupported website.

So by sacrificing support of vBulletin and adding a pre-request url support check, you get a major performance improvement.

To make it optional for those who do not want to sacrifice vBulletin, this can be dependent on a new flag.

For around 180 urls which contain 25 supported urls it can lower execution time from around 400 seconds to around 200 seconds.

However for the check of url support to work, the dictionary of supported websites needs to contain some word appearing in the url so also need to fix the dictionary names (or to add a domain property for those websites which have specific domain).

So need to add (for temporary solution without adding domain property for every supported website which has a specific domain):

  1. in cli.py:
    def check_url_relevance(url):
    lowercaseUrl = url.lower()
    for scheme_name, scheme_data in schemes.items():
    for name_part in scheme_name.lower().split():
    if len(name_part) > 1 and name_part not in ['api', 'user', 'profile', 'group', 'page', 'file', 'html'] and name_part in lowercaseUrl:
    return True
    return False

  2. in cli.py run method after "print(f'Analyzing URL {url}...')" put everything inside the following conditional check:
    if check_url_relevance(args.url):

  3. in schemes.py change dictionary keys:
    'Linktree' -> 'Linktree linktr.ee'
    'Odnoklassniki' -> 'Odnoklassniki ok.ru'
    'Habrahabr HTML (old)' -> 'Habrahabr HTML (old) habra'
    'Habrahabr JSON' -> 'Habrahabr JSON habra'
    'Telegram' -> 'Telegram t.me'

  4. optional parameter which will trigger this behavior and which can be added to the "if check_url_relevance(args.url):" condition

Add Smule

Example: https://www.smule.com/Blue
User info block is placed right in first script tag:

Profile: {"user":{"account_id":173,"handle":"Blue","pic_url":"https://c-sf.smule.com/rs-z0/account/icon/v4_defpic.png","url":"/Blue","followers":"155","followees":"0","num_performances":"0","is_following":false,...
      

FB_UID

Hi, I needed some information.
I noticed that the fb_uid is also obtained from resolving an instagram profile. Does this data correspond to an existing Facebook UID? I did some tests and it does not return any valid facebook profile.

Loosen requirements versions for installation as a package; python-Levenshtein==0.12.0 is insecure

$ printf '%s\n' socid-extractor >reqs.txt
$ pip install -Uqr reqs.txt
WARNING: The candidate selected for download or install is a yanked version: 'python-levenshtein' candidate (version 0.12.0 at https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz#sha256=033a11de5e3d19ea25c9302d11224e1a1898fe5abd23c61c7c360c25195e3eb1 (from https://pypi.org/simple/python-levenshtein/))
Reason for being yanked: Insecure, upgrade to 0.12.1

I don't think the PyPI package should be so strict about exact dependency versions, so that your package no longer enforces an insecure setup upon installation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.