Giter VIP home page Giter VIP logo

je-suis-tm / web-scraping Goto Github PK

View Code? Open in Web Editor NEW
719.0 30.0 170.0 1.92 MB

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Home Page: https://je-suis-tm.github.io/web-scraping

License: Apache License 2.0

Python 100.00%
sraping scrapper futures futures-historical-data reuters wall-street-journal bloomberg python-web-scraper news-websites financial-times

web-scraping's Introduction

๐Ÿ‘‹ there

My domains of interest would be

  • Bayesian Statistics: Latent Variables, Statistical Modeling, Statistical Inference;
  • Graph Theory: Complex System, Agent-based Modelling, Application in Ecology and Epidemiology;
  • Machine Learning: Matrix Completion, Recommender System, Feature Selection, Causality Detection;
  • Operations Research: Convex Optimization, Network Analysis, Game Theory;

If you look for collaboration in these fields or happen to know anyone who shares the same enthusiasm, please kindly refer to me, thank you!

+++++++++++++++++++++++++++++++++++++++++++++++++++

Thank you for visiting ๐Ÿ˜‡ I am not sure how many people actually have the patience to reach here. If that's you, ๐ŸŽฉ off. I really appreciate how many โญ you guys have given to Quant Trading. As you can see, I have devoted more and more energy towards other respositories such as Graph Theory and Machine Learning. Take a tour and you won't be disappointed ๐Ÿ˜ Meanwhile, if you have any questions or thoughts, feel free to raise issues in the repository so we can start ๐Ÿ’ฌ I genuinely enjoy conversations with people from diverse background and they never stop to inspire me to develop new perspectives to tackle challenges in life ๐Ÿ’ช

web-scraping's People

Contributors

je-suis-tm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

web-scraping's Issues

wallstreetbets

Hello, I have a problem with the wallstreetbets.py , I am receiving the following error :

\Python36-32\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'new.reddit.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
InsecureRequestWarning,

Do you have an idea how to fix it ?

CME Doesn't work anymore...

Hi,
Thank for your scripts, I'm trying to fork it to scrape the cme for ES, NQ and YM future contracts I just need the globex open price but it seems that they can detect if you are spoffing an user agent now..

Edit: It can works if you just add this in you User Agent pattern:

    session.headers.update(
            { 'Accept-Language' : 'en-US,en;q=0.9',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'})

Error when trying CME2.py

Hello,
Love your work - trying to use it to learn from!

When I try to run CME2.py, I get the error: "string indices must be integers" & it points to the line output['prior settle']=[i['priorSettle'] for i in df['quotes']]

I am a novice when it comes to webscraping, but hoping you may be able to shed some light on the issue. Thanks so much!

Question on Web Scraper

Hello there I hope all is well.

Thank you so much for:

https://github.com/tattooday/web-scraping/blob/master/CME2.py

Its helping a lot, as I am basically looking for a python scraper to scrape futures prices. Just one question for you.

In your code, you designate code numbers for silver gold palladium and copper to be:

458, 437, 445, and 438

I am curious where I would go to find these sequence matchups? Along with find more alternatives and associated codes so I can scrape them too, such as wheat, oil, gold, etc....

Basically where do I go on the web to see what how silver is linked to 458 for example , and gold is linked to 437. And also find codes for other commodities.

I really appreciate your help in advance and hope to speak soon!

Best,

Sam

Volume - Front Month Incorrect

Hi!
Great work on this - thanks so much!
Wanted to point out one small thing - the 'front month' logic doesn't work as intended, as you're reading those in as a string with commas in the volume numbers. This means it doesn't find the true highest volume, rather the highest volume less than 1,000.

On my version, I slightly amended the list comprehension to read this:

output['volume']=[i['volume'] for i in df['quotes']]
output['volume'] = output['volume'].replace(',','', regex=True).astype(int)
output['front month']=output['volume']==max(output['volume'])

This worked as intended, but there's probably a better way. Thanks again!

error for the wallstreetbets subreddit scraper

had to disable insecure request warning before running as it showed error.

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

But after running, it throws this error:

line 207, in main
    if val[0] == '$' and not val[1].isdigit():
IndexError: string index out of range

Open to collaborators

I've just found your codebase and there are a few things I would like to add if you are open to having collaborators. The documentation so far is really good, I'd like to improve upon that and test out other sites that web-scraping was disabled or not working anymore.

Which of the available scrapers can filter by date?

Hi there,

I came across this repo and wanted to know which if any of the available scrapers can scrape by date? I noticed most of them have been combined into the MENA newsletter.py, and do not see an option to search between dates of any of them.

I want to scrape a dataset from the last year about specific daily news from a handful of companies for a university project. One of the few resources i found was newsapi.org but it is quite costly to search for news older than 1 month.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.