je-suis-tm / web-scraping Goto Github PK

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Home Page: https://je-suis-tm.github.io/web-scraping

License: Apache License 2.0

Python 100.00%

sraping scrapper futures futures-historical-data reuters wall-street-journal bloomberg python-web-scraper news-websites financial-times

web-scraping's Introduction

👋 there

My domains of interest would be

Bayesian Statistics: Latent Variables, Statistical Modeling, Statistical Inference;
Graph Theory: Complex System, Agent-based Modelling, Application in Ecology and Epidemiology;
Machine Learning: Matrix Completion, Recommender System, Feature Selection, Causality Detection;
Operations Research: Convex Optimization, Network Analysis, Game Theory;

If you look for collaboration in these fields or happen to know anyone who shares the same enthusiasm, please kindly refer to me, thank you!

+++++++++++++++++++++++++++++++++++++++++++++++++++

Thank you for visiting 😇 I am not sure how many people actually have the patience to reach here. If that's you, 🎩 off. I really appreciate how many ⭐ you guys have given to Quant Trading. As you can see, I have devoted more and more energy towards other respositories such as Graph Theory and Machine Learning. Take a tour and you won't be disappointed 😏 Meanwhile, if you have any questions or thoughts, feel free to raise issues in the repository so we can start 💬 I genuinely enjoy conversations with people from diverse background and they never stop to inspire me to develop new perspectives to tackle challenges in life 💪

web-scraping's People

Contributors

Stargazers

Watchers

Forkers

ongbe tobby2002 whappy1900 karagul stellaywu rosey99 keg5038 alexanu calm-rock peterbaldridge sreesaiteja rtvt123 afcarl fagan2888 fintrek jcmd7 christophhaushofer denicomc kaushikvijay rsquared2016 yt-feng jaelkw nikolausn jasonyum engrrajib munkarkin96 bobolando nabhgarg alvisesembenico adhyandhull manoj-nain mingfengkeith wealthcreating pr2486 aditya-putta dioconnoi benwaldner khumo94 mook-it dilettante77 caozq19 sultanarif-p amitparmar01 fidelisgalla pratik-nagdeve gargankush brunoprogramming layzonb rajatdas seanahmad h2oessence abhishek-kaudare newrain7803 dominic-sylvester-finance solaraww jaimehuang168 finite-abelian sandagomipieris g0dspeak iyyappana cdinsmore vluchkin mortenwillendrup 0xneox ray-ane vaillanc-h1 licjavierbarrios mitrofanovdmitry alienware izhao-ea hkdavid2008 0xdeus hobbit19 apingali kogelet alirezamdv lymphocyte dohtem1 23errg sdoof manish-potdar davincee ukaserge kommven wrapperband ingted ahntea omsharma43 rmallof datatalking climbermel hafizmulla atiq-sust skr612 marcus-arcadius robintux coursera-kp digital-jailbreak-llc bgonzalez6 edutanaka

web-scraping's Issues

wallstreetbets

Hello, I have a problem with the wallstreetbets.py , I am receiving the following error :

\Python36-32\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'new.reddit.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
InsecureRequestWarning,

Do you have an idea how to fix it ?

CME Doesn't work anymore...

Hi,
Thank for your scripts, I'm trying to fork it to scrape the cme for ES, NQ and YM future contracts I just need the globex open price but it seems that they can detect if you are spoffing an user agent now..

Edit: It can works if you just add this in you User Agent pattern:

    session.headers.update(
            { 'Accept-Language' : 'en-US,en;q=0.9',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'})

Error when trying CME2.py

Hello,
Love your work - trying to use it to learn from!

When I try to run CME2.py, I get the error: "string indices must be integers" & it points to the line output['prior settle']=[i['priorSettle'] for i in df['quotes']]

I am a novice when it comes to webscraping, but hoping you may be able to shed some light on the issue. Thanks so much!

Question on Web Scraper

Hello there I hope all is well.

Thank you so much for:

https://github.com/tattooday/web-scraping/blob/master/CME2.py

Its helping a lot, as I am basically looking for a python scraper to scrape futures prices. Just one question for you.

In your code, you designate code numbers for silver gold palladium and copper to be:

458, 437, 445, and 438

I am curious where I would go to find these sequence matchups? Along with find more alternatives and associated codes so I can scrape them too, such as wheat, oil, gold, etc....

Basically where do I go on the web to see what how silver is linked to 458 for example , and gold is linked to 437. And also find codes for other commodities.

I really appreciate your help in advance and hope to speak soon!

Best,

Sam

Volume - Front Month Incorrect

Hi!
Great work on this - thanks so much!
Wanted to point out one small thing - the 'front month' logic doesn't work as intended, as you're reading those in as a string with commas in the volume numbers. This means it doesn't find the true highest volume, rather the highest volume less than 1,000.

On my version, I slightly amended the list comprehension to read this:

output['volume']=[i['volume'] for i in df['quotes']]
output['volume'] = output['volume'].replace(',','', regex=True).astype(int)
output['front month']=output['volume']==max(output['volume'])

This worked as intended, but there's probably a better way. Thanks again!

error for the wallstreetbets subreddit scraper

had to disable insecure request warning before running as it showed error.

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

But after running, it throws this error:

line 207, in main
    if val[0] == '$' and not val[1].isdigit():
IndexError: string index out of range

Open to collaborators

I've just found your codebase and there are a few things I would like to add if you are open to having collaborators. The documentation so far is really good, I'd like to improve upon that and test out other sites that web-scraping was disabled or not working anymore.

Nymex

Which of the available scrapers can filter by date?

Hi there,

I came across this repo and wanted to know which if any of the available scrapers can scrape by date? I noticed most of them have been combined into the MENA newsletter.py, and do not see an option to search between dates of any of them.

I want to scrape a dataset from the last year about specific daily news from a handful of companies for a university project. One of the few resources i found was newsapi.org but it is quite costly to search for news older than 1 month.

je-suis-tm / web-scraping Goto Github PK

web-scraping's Introduction

👋 there

web-scraping's People

Contributors

Stargazers

Watchers

Forkers

web-scraping's Issues

wallstreetbets

CME Doesn't work anymore...

Error when trying CME2.py

Question on Web Scraper

Volume - Front Month Incorrect

error for the wallstreetbets subreddit scraper

Open to collaborators

Nymex

Which of the available scrapers can filter by date?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent