Comments (6)
Copied scraper here. It's still in need of significant changes, as the other page was wildly different looking, even if the content was the same.
from colly-draft-prospects.
Some changes have been made, but the scraper is still in a debug state. There are two tables on the web page, and we only need data from the second one, but we're getting data from both tables. I'm sure it has something to do with how c.OnHTML("tr td:nth-of-type(4)"
and the similar collectors are working, but I don't have a ton of experience with the DOM, so I'm going to need to do some more troubleshooting.
from colly-draft-prospects.
#2 is a great start. It looks like OnHTML(...)
uses GoQuery, so I'll need to read up on that.
Link to GoQuery:
https://github.com/PuerkitoBio/goquery
from colly-draft-prospects.
#3 adds the html element without goquery to the scraper, since it appears that goquery may or may not be necessary to complete this task.
This article shows an implementation that uses colly both with goquery and with html.
https://benjamincongdon.me/blog/2018/03/01/Scraping-the-Web-in-Golang-with-Colly-and-Goquery/
from colly-draft-prospects.
#4 tries to use the data obtained in statements like c.OnHTML("tr td:nth-of-type(4)"
and puts them into slices. Rather than trying to filter out the data as it goes into the slice, we can make the assumption that the only garbage data is coming from the start of the slice, so if we look for the category header, we may be able to just write all the data that follows to the csv.
from colly-draft-prospects.
This is fixed by #5.
from colly-draft-prospects.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colly-draft-prospects.