Giter VIP home page Giter VIP logo

Comments (6)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
Can you provide a URL which contains those things you want to scrape ?


Original comment by [email protected] on 17 Dec 2014 at 10:26

  • Changed state: Accepted

from crawler4j.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
Thanks for the reply avrah! Here is my aim I want to crawl this domain 
"http://www.sakshi.com" and extract all the iframe codes,base 64 codes etc.. 
only if they are present I am quite sure that this domain contains iframes,but 
i am not sure about the rest(base 64, embed codes). 

Original comment by [email protected] on 17 Dec 2014 at 12:06

from crawler4j.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
The way you try to do it it seems that it will take the iFrame URLs and put 
them into the list of the URLs of the page - it seems to be ok, but I am not 
sure this is what you want.

I think the best way for you to do it (if I understand your requirement) is to 
use the visit() method, where you can find the html code of every visited page, 
extract the iframe code from the html string!


Does this help ?


Original comment by [email protected] on 17 Dec 2014 at 12:26

from crawler4j.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
Exactly! extracting iframes from html string is what I have tried before 
posting the issue and I have attached the code to extract iframes and save the 
iframe code in to a text file.But the problem is that I know iframe starts with 
<iframe tag and ends with </iframe> tag. But in case base 64 code,vb scripts, 
embed codes I am not understanding how they start and end in a html.So that is 
y I am trying to htmlcontenthandler class! can u please help on that!

Original comment by [email protected] on 17 Dec 2014 at 12:44

Attachments:

from crawler4j.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
To parse iFrame use these:
http://stackoverflow.com/questions/13646163/how-to-get-body-holding-the-content-
of-iframe-in-java

http://stackoverflow.com/questions/26515383/jsoup-not-parsing-iframe-out-of-html



In order to try to parse anything else I need a solid example - scenario, give 
me a URL with that code and I will see how to parse it.

Without an example you can't even check if it works

Original comment by [email protected] on 17 Dec 2014 at 12:49

from crawler4j.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 15, 2024
Invalid as discussion was stopeed and the need is probably gone

Original comment by [email protected] on 22 Jan 2015 at 11:42

  • Changed state: Invalid

from crawler4j.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.