curiouslearner / geeksforgeeksscrapper Goto Github PK
View Code? Open in Web Editor NEWScrapes g4g and creates PDF
License: MIT License
Scrapes g4g and creates PDF
License: MIT License
Idea is to convert into g4g-scrapper package so that people can install this easily using pip
Gives recognition to people who came forward to help with the project.
The Current Script to scrape the website contains both python as well as the HTML code . Thus in order to scale the project i.e to modify the HTML code to redesign the Web Page and PDF document, we may need to modify the Script also. Thus it will great to use a template rendering library like Jinja2 to render the HTML content so that the HTML and Python Part Can be separated.
@CuriousLearner It will be great to extract the Featured Article of the GeeksofGeeks.org
I think it will be easy for users.
Create a requirements.txt file for installing dependencies.
It will be helpful to navigate.
Scraping link no: 244 Link: http://www.geeksforgeeks.org/count-number-of-bits-to-be-flipped-to-convert-a-to-b/
Traceback (most recent call last):
File "g4g.py", line 106, in
scrape_category(categoryUrl)
File "g4g.py", line 82, in scrape_category
link_soup = BeautifulSoup(requests.get(link).text)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 65, in get
return request('get', url, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 49, in request
response = session.request(method=method, url=url, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 461, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 567, in send
adapter = self.get_adapter(url=request.url)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 646, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for ' http://www.geeksforgeeks.org/count-number-of-bits-to-be-flipped-to-convert-a-to-b/'
Code base is to be refactored to follow a directory structure and also use proper naming conventions for variable.
It should also follow PEP8 for better readability.
Indicates: 0 links found
It will be great to create a navigation links at the starting of the documents to navigate in the html and pdf documents.The user does not knows about which article are extracted , so it will be nice to have it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.