soxoj / socid-extractor Goto Github PK

View Code? Open in Web Editor NEW

649.0 22.0 77.0 346 KB

⛏️ Extract accounts info from personal pages on various sites for OSINT purpose

License: GNU General Public License v3.0

Python 99.94% Shell 0.06%

socmint uid osint identifiers privacy parsing socid-extractor

socid-extractor's Introduction

socid_extractor

Extract information about a user from profile webpages / API responses and save it in machine-readable format.

Usage

As a command-line tool:

$ socid_extractor --url https://www.deviantart.com/muse1908
country: France
created_at: 2005-06-16 18:17:41
gender: female
username: Muse1908
website: www.patreon.com/musemercier
links: ['https://www.facebook.com/musemercier', 'https://www.instagram.com/muse.mercier/', 'https://www.patreon.com/musemercier']
tagline: Nothing worth having is easy...

Without installing:

$ ./run.py --url https://www.deviantart.com/muse1908

As a Python library:

>>> import socid_extractor, requests
>>> r = requests.get('https://www.patreon.com/annetlovart')
>>> socid_extractor.extract(r.text)
{'patreon_id': '33913189', 'patreon_username': 'annetlovart', 'fullname': 'Annet Lovart', 'links': "['https://www.facebook.com/322598031832479', 'https://www.instagram.com/annet_lovart', 'https://twitter.com/annet_lovart', 'https://youtube.com/channel/UClDg4ntlOW_1j73zqSJxHHQ']"}

Installation

$ pip3 install socid-extractor

The latest development version can be installed directly from GitHub:

$ pip3 install -U git+https://github.com/soxoj/socid_extractor.git

Sites and methods

More than 100 methods for different sites and platforms are supported!

Google (all documents pages, maps contributions), cookies required
Yandex (disk, albums, znatoki, music, realty, collections), cookies required to prevent captcha blocks
Mail.ru (my.mail.ru user mainpage, photo, video, games, communities)
Facebook (user & group pages)
VK.com (user page)
OK.ru (user page)
Instagram
Reddit
Medium
Flickr
Tumblr
TikTok
GitHub

...and many others.

You can also check tests file for data examples, schemes file to expore all the methods.

When it may be useful

Getting all available info by the username or/and account UID. Examples: Week in OSINT, OSINTCurious
Users tracking, checking that the account was previously known (by ID) even if all public info has changed. Examples: Aware Online
Searching by commonly used cross-service UIDs (GAIA ID, Facebook UID, Yandex Public ID, etc.)
- DB leaks of forums and platforms in SQL format
- Indexed links that contain target profile ID
Searching for tracking data by comparison with other IDs - how it works, how can it be used.
Law enforcement investigations

SOWEL classification

This tool uses the following OSINT techniques:

Tools using socid_extractor

Maigret - powerful namechecker, generate a report with all available info from accounts found.
TheScrapper - scrape emails, phone numbers and social media accounts from a website.
InfoHunter - An open source OSINT tool that allows you to search, collect and analyze information online to get a complete picture of the person or company you are interested in.
YaSeeker - tool to gather all available information about Yandex account by login/email.
Marple - scrape search engines results for a given username.

Testing

python3 -m pytest tests/test_e2e.py -n 10  -k 'not cookies' -m 'not github_failed and not rate_limited'

Contributing

Check separate page if you want to add a new methods of fix anything.

socid-extractor's People

Contributors

Stargazers

Watchers

Forkers

pavelshpettt pantja 06opotehb 666apfelsaft666 get09 tanya2120 cyber-squirrel laura1206 serguk89 teakolik spaghettizombie shadycat-media actorexpose sashka3076 nimitzufo martin-lgtm jorik041 n4rr34n6 hackbigcock daed5 security888test flowsta 5l1v3r1 parushv97 mishav78 dimkalin z003 zammalhabe nas122 sk3lk0 matsyu ekmixon n1kot1ne tadryanom 0xsojalsec securitystuffbackup aldi8686 xueqing-chen qqqqtest123 c3n7ral051nt4g3ncy shibabyte fenrirapi slothsrule890 tonylyal87 yinjun322 ricardoverdeja1 capuanob fishke22 lexilex4 ahrvo-technologies poudels940 ankit130 anonroot41 krobis infinitesephiroth jgoodacre93 bl4cklabel88 klo7000 contactsanegon farsroom vaginessa jbird5665 solomos1313 imronsf kingakeem nicholas-tapiaa jeffmartson ethicalsecurity-agency meowypouncer skycopke digitalarche jeanpseven sbuxreg freezgames conglesolutionx

socid-extractor's Issues

Add VK foaf info extractor

Example: https://vk.com/foaf.php?id=460060000

Error / no results

I tested your application but I experience no results on URL's like:

twitter account URL
disqus account URL
facebook account URL

I see: analyzing URL ..... without result

Instagram Url doesnt seem to work

After Analyzing URL ,It gives no result.

I have tried with Public account (https://www.instagram.com/zuck).

Please try to see if u can recreate this problem.

Add Twitch

API requests only, Chrome extension available: https://chrome.google.com/webstore/detail/twitch-username-and-user/laonpoebfalkjijglbjbnkfndibbcoon

Add sites parsing methods from https://sn0int.com/

plase remove github autentication

Hi Team

while downloading files it ask for git hub credentials please give alternatives as we cannot enter OTP in Linux

[maigret] behance extracting wrong info

when using maigret the info extracted from behance.net is wrong. It's using info from liked posts.

expected:

key	value
Uid	???
First name	Maximous
Last name	Black
Username	maximousblk

actual:

key	value
Uid	3298983
First name	Nick
Last name	Buturishvili
Username	nikabuturishvili

images:

Later check periscope (pscp.tv) and add in tests

Major optional performance boost suggestion when operating on url input

This is relevant to the operation on url input.

socid-extractor sends request to the url and only then tries to parse according to its list of supported websites.

On the one hand this allows to handle generic platforms such as vBulletin which can appear under different domains and urls, on the other hand for supporting most if not all of the other websites which can a specific domain/url, there could have been a check if the website is supported before sending the request to avoid unnecessary request for unsupported website.

So by sacrificing support of vBulletin and adding a pre-request url support check, you get a major performance improvement.

To make it optional for those who do not want to sacrifice vBulletin, this can be dependent on a new flag.

For around 180 urls which contain 25 supported urls it can lower execution time from around 400 seconds to around 200 seconds.

However for the check of url support to work, the dictionary of supported websites needs to contain some word appearing in the url so also need to fix the dictionary names (or to add a domain property for those websites which have specific domain).

So need to add (for temporary solution without adding domain property for every supported website which has a specific domain):

in cli.py:
def check_url_relevance(url):
lowercaseUrl = url.lower()
for scheme_name, scheme_data in schemes.items():
for name_part in scheme_name.lower().split():
if len(name_part) > 1 and name_part not in ['api', 'user', 'profile', 'group', 'page', 'file', 'html'] and name_part in lowercaseUrl:
return True
return False
in cli.py run method after "print(f'Analyzing URL {url}...')" put everything inside the following conditional check:
if check_url_relevance(args.url):
in schemes.py change dictionary keys:
'Linktree' -> 'Linktree linktr.ee'
'Odnoklassniki' -> 'Odnoklassniki ok.ru'
'Habrahabr HTML (old)' -> 'Habrahabr HTML (old) habra'
'Habrahabr JSON' -> 'Habrahabr JSON habra'
'Telegram' -> 'Telegram t.me'
optional parameter which will trigger this behavior and which can be added to the "if check_url_relevance(args.url):" condition

Add Linktr

Example: linktr.ee/annetlovart

[DeviantArt] Unable to analyze URLs

When I attempt to analyze a URL, I'm receiving a JSON decoding error. I've attached a screenshot of what this error looks like below.

Twitter page data format changed

Add Disqus

API-requests only

https://disqus.com/api/3.0/users/details?user=username%3Arohfsim&attach=userFlaggedUser&api_key=E8Uh5l5fHZ6gD8U3KycjAIAk46f68Zw7C6eW8WSjZvCLXebZ7p0r1yrYDrLilk2F

Add Smule

Example: https://www.smule.com/Blue
User info block is placed right in first script tag:

Profile: {"user":{"account_id":173,"handle":"Blue","pic_url":"https://c-sf.smule.com/rs-z0/account/icon/v4_defpic.png","url":"/Blue","followers":"155","followees":"0","num_performances":"0","is_following":false,...

Extract and parse links from Pinterest

Example with google plus URL: https://www.pinterest.com/melgaspar666/

No result

When I search: socid_extractor --url https://www.deviantart.com/muse190

I get no result:

:$ socid_extractor --url https://www.deviantart.com/muse190
Analyzing URL https://www.deviantart.com/muse190...
:$

Ttt

Add Weibo

With username: https://weibo.com/clairekuo
With id: https://weibo.com/u/6215884155

Add TikTok

Facebook parsing is broken

I get this when trying to use it

https://mobile.twitter.com/SuccubusSensual

@mama_rostov_bot

FB_UID

Hi, I needed some information.
I noticed that the fb_uid is also obtained from resolving an instagram profile. Does this data correspond to an existing Facebook UID? I did some tests and it does not return any valid facebook profile.

Add uCoz and uID.me

Example: https://av.3dn.ru/index/8-0-Maikl_401 => http://uid.me/uguid/176168901 => http://uid.me/mihail_ko1_5

$ pip3 install socid-extrac

Add LocalCryptos API

Add Instagram API response processing

https://osintcurio.us/2019/10/01/searching-instagram-part-2/

Loosen requirements versions for installation as a package; python-Levenshtein==0.12.0 is insecure

$ printf '%s\n' socid-extractor >reqs.txt
$ pip install -Uqr reqs.txt
WARNING: The candidate selected for download or install is a yanked version: 'python-levenshtein' candidate (version 0.12.0 at https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz#sha256=033a11de5e3d19ea25c9302d11224e1a1898fe5abd23c61c7c360c25195e3eb1 (from https://pypi.org/simple/python-levenshtein/))
Reason for being yanked: Insecure, upgrade to 0.12.1

I don't think the PyPI package should be so strict about exact dependency versions, so that your package no longer enforces an insecure setup upon installation.

Can't force to find

Hello. Could you help me with this problem? What my mistake with using this?
`>>> /home/danbasko/Desktop/socid_extractor-master/socid_extractor.py https://twitter.com/orika_art
File "", line 1
/home/danbasko/Desktop/socid_extractor-master/socid_extractor.py https://twitter.com/orika_art
^
SyntaxError: invalid syntax

./socid_extractor.py https://twitter.com/orika_art
File "", line 1
./socid_extractor.py https://twitter.com/orika_art
^
SyntaxError: invalid syntax
`