crawljax / crawljax Goto Github PK
View Code? Open in Web Editor NEWCrawljax
License: Apache License 2.0
Crawljax
License: Apache License 2.0
Original author: [email protected] (September 10, 2010 12:16:25)
At the end of a crawling session, long exception messages are logged for example about how it was not possible to close some browser(s) properly.
This usually happens when a max timeout or max number of states is defined in the crawl specification.
I think it is best to catch such exceptions and log more user-friendly messages.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=33
Original author: [email protected] (January 06, 2011 11:09:45)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
It should give max 2 states, however it is giving me 3 states when I am having the following line
"<a> Test</a>"
Manually browsing the application is having only 2 states as the anchor tag is not having the "href" attribute
What version of the product are you using? On what operating system?
Crawljax 1.9
Windows XP
IE 7
Please provide any additional information below.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=40
Original author: [email protected] (May 29, 2010 18:55:10)
Trying to run crawljax 1.9 with a class written in Java on the web page
"www.google.it", setting 'clickDefaultElements', I have got this error:
The error is returned after it fires on the element 'iGoogle'.
My main class:
package Test;
import com.crawljax.browser.EmbeddedBrowser.BrowserType;
import com.crawljax.core.CrawljaxController;
import com.crawljax.core.CrawljaxException;
import com.crawljax.core.configuration.CrawlSpecification;
import com.crawljax.core.configuration.CrawljaxConfiguration;
public final class Test1_9 {
public static void main(String[] args) {
CrawlSpecification crawler = new CrawlSpecification
("http://www.google.it");
crawler.clickDefaultElements();
crawler.setRandomInputInForms(true);
crawler.setMaximumStates(40);
crawler.setDepth(1);
CrawljaxConfiguration config = new CrawljaxConfiguration
();
config.setCrawlSpecification(crawler);
config.setBrowser(BrowserType.firefox);
try {
CrawljaxController crawljax = new
CrawljaxController(config);
crawljax.run();
} catch
(org.apache.commons.configuration.ConfigurationException e) {
e.printStackTrace();
System.exit(1);
} catch (CrawljaxException e) {
e.printStackTrace();
System.exit(1);
}
}
}
Best regards,
Barbara Farina
Original issue: http://code.google.com/p/crawljax/issues/detail?id=25
Original author: frankgroeneveld (February 03, 2010 13:49:50)
I keep having problems like this:
There is a page with a link <a href="#">something</a>
So, I do this in my configuration:
crawler.lookFor("a").withAttribute("href", "#");
The log looks like this:
Why is there nothing printed behind "TAG"? If I change my code to:
crawler.lookFor("a").withAttribute("href", "%");
The links are found just fine and this is the log:
I've been trying to debug this, but I can't find the problem.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=7
Original author: [email protected] (December 22, 2010 09:20:36)
What steps will reproduce the problem?
import com.crawljax.browser.EmbeddedBrowser.BrowserType;
import com.crawljax.core.CrawljaxController;
import com.crawljax.core.configuration.CrawlSpecification;
import com.crawljax.core.configuration.CrawljaxConfiguration;
public class TestCrawljax {
public static void main(String[] args) {
CrawlSpecification crawler = new CrawlSpecification("http://crawljax.com");
crawler.clickDefaultElements();
crawler.setRandomInputInForms(true);
crawler.setWaitTimeAfterEvent(5000);
crawler.setWaitTimeAfterReloadUrl(5000);
CrawljaxConfiguration config = new CrawljaxConfiguration();
config.setCrawlSpecification(crawler);
config.setBrowser(BrowserType.ie);
try {
CrawljaxController crawljax = new CrawljaxController(config);
crawljax.run();
} catch (Exception e) {
e.printStackTrace();
}
}
}
What is the expected output? What do you see instead?
It should crawl on http://crawljax.com properly in IE however in IE it is not crawling whereas it is working perfectly in Firefox.
What version of the product are you using? On what operating system?
Crawljax 1.0
Windows XP
IE 6.0
Please provide any additional information below.
Here is the Crawljax output ..
Starting Crawljax...
Used plugins:
No plugins loaded because CrawljaxConfiguration is empty
Embedded browser implementation: ie
Crawl depth: 0
Crawljax initialized!
Running PreCrawlingPlugins...
Loading Page http://crawljax.com
Running OnUrlLoadPlugins...
Running OnNewStatePlugins...
Start crawling with 4 crawl elements
Looking in state: index for candidate elements with
TAG: A
Found new candidate element: A: href=http://crawljax.com/download/ xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[1]
Found new candidate element: A: href=http://crawljax.com/documentation/changes/ xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[2]
Found new candidate element: A: href=documentation/writing-plugins/ xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/UL[1]/LI[5]/STRONG[1]/A[1]
Found new candidate element: A: href=/download/ valign=middle xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[1]
Found new candidate element: A: href=/download/ xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[2]
Found new candidate element: A: href=/documentation/ valign=middle xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[1]
Found new candidate element: A: href=/documentation/ xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[2]
Found new candidate element: A: href=http://crawljax.com/wp-login.php rel=nofollow xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[2]/DIV[1]/UL[1]/LI[1]/A[1]
Found new candidate element: A: href=# id=gotop onclick=MGJS.goTop();return false; xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[4]/A[1]
TAG: BUTTON
TAG: INPUT: type="submit"
TAG: INPUT: type="button"
Found 9 new candidate elements to analyze!
Running PreStateCrawlingPlugins...
Executing click on element: "crawljax-2.0 release" A: href="http://crawljax.com/download/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[1]; State: index
Could not fire eventable: "crawljax-2.0 release" A: href="http://crawljax.com/download/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[1]
Running OnFireEventFailedPlugins...
Executing click on element: "improvements" A: href="http://crawljax.com/documentation/changes/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[2]; State: index
Could not fire eventable: "improvements" A: href="http://crawljax.com/documentation/changes/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/P[1]/A[2]
Running OnFireEventFailedPlugins...
Executing click on element: "plugin" A: href="documentation/writing-plugins/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/UL[1]/LI[5]/STRONG[1]/A[1]; State: index
Could not fire eventable: "plugin" A: href="documentation/writing-plugins/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/DIV[2]/UL[1]/LI[5]/STRONG[1]/A[1]
Running OnFireEventFailedPlugins...
Executing click on element: A: href="/download/" valign="middle" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[1]; State: index
Could not fire eventable: A: href="/download/" valign="middle" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[1]
Running OnFireEventFailedPlugins...
Executing click on element: "Download now!" A: href="/download/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[2]; State: index
Could not fire eventable: "Download now!" A: href="/download/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[1]/A[2]
Running OnFireEventFailedPlugins...
Executing click on element: A: href="/documentation/" valign="middle" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[1]; State: index
Could not fire eventable: A: href="/documentation/" valign="middle" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[1]
Running OnFireEventFailedPlugins...
Executing click on element: "Documentation" A: href="/documentation/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[2]; State: index
Could not fire eventable: "Documentation" A: href="/documentation/" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[1]/P[2]/A[2]
Running OnFireEventFailedPlugins...
Executing click on element: "Log in" A: href="http://crawljax.com/wp-login.php" rel="nofollow" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[2]/DIV[1]/UL[1]/LI[1]/A[1]; State: index
Could not fire eventable: "Log in" A: href="http://crawljax.com/wp-login.php" rel="nofollow" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[2]/DIV[2]/DIV[1]/UL[1]/LI[1]/A[1]
Running OnFireEventFailedPlugins...
Executing click on element: "Top" A: href="#" id="gotop" onclick="MGJS.goTop();return false;" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[4]/A[1]; State: index
Could not fire eventable: "Top" A: href="#" id="gotop" onclick="MGJS.goTop();return false;" click xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[1]/DIV[4]/A[1]
Running OnFireEventFailedPlugins...
Finished executing
All Crawlers finished executing, now shutting down
CrawlerExecutor terminated
Closing the browser...
Total Crawling time(11985ms) ~= 0 min, 16 sec
EXAMINED ELEMENTS: 9
CLICKABLES: 0
STATES: 1
Dom average size (byte): 13848
Running PostCrawlingPlugins...
DONE!!!
Original issue: http://code.google.com/p/crawljax/issues/detail?id=37
Original author: [email protected] (February 01, 2010 19:13:19)
Crawljax should not be restricted only to the "click" event type.
WebDriver has also a "hover" functionality, which we should support.
The way to proceed would be to link each candidate element to its
EventType. This requires, however, a radical refactoring in the way
candidate elements are created and examined.
So eventually we will have:
crawler.click("a");
crawler.dontClick("a").withText("logout");
crawler.hover("div");
crawler.dontHover("a").withText("I have a mouseoverEvent");
Original issue: http://code.google.com/p/crawljax/issues/detail?id=6
Original author: frankgroeneveld (March 17, 2010 13:48:01)
Find out if Helper.getXpathExpression can be replaced with
XPathHelper.getSpecificXpathExpression without breaking any core
functionality. I believe some test will need to be modified for this to happen.
One of the cons for this change is the fact that this method is a little
bit slower.
Please comment on this whether or not we should do this.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=13
Original author: [email protected] (August 18, 2010 07:37:48)
What steps will reproduce the problem?
Checking for the "://" instead of checking prefix of the link would be more fault tolerant.
Thank you.
This is super awesome project.
I love this too much, already.
Keep up the miraculous work!
Original issue: http://code.google.com/p/crawljax/issues/detail?id=31
Original author: [email protected] (April 20, 2010 22:10:07)
What steps will reproduce the problem?
1.Every isEquivalent() fonctions of the comparators
2.The compare(String originalDom, String newDom, EmbeddedBrowser browser)
function is always called with the originalDom string empty.
Please provide any additional information below.
I found this issue while I'm using Editdistance. Everytime EditDistance is
called, the first string is always empty.
I've used also other comparators, and the result is same.
I think that the problem is the compare() function of StateComparator class.
Can you tell me if that is right?
If not, can you tell me why?
Thank you
Original issue: http://code.google.com/p/crawljax/issues/detail?id=19
Original author: frankgroeneveld (February 23, 2010 10:34:28)
When running mvn test, testCorrectNamesMultiThread fails with the following
error:
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.352 sec
<<< FAILURE!
testCorrectNamesMultiThread(com.crawljax.core.CrawlerExecutorTest) Time
elapsed: 0.141 sec <<< FAILURE!
junit.framework.AssertionFailedError: Thread 1 ok
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at
com.crawljax.core.CrawlerExecutorTest.testCorrectNamesMultiThread(CrawlerExecutorTest.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.junit.internal.runners.TestMethodRunner.executeMethodBody(TestMethodRunner.java:99)
at
org.junit.internal.runners.TestMethodRunner.runUnprotected(TestMethodRunner.java:81)
at
org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
at
org.junit.internal.runners.TestMethodRunner.runMethod(TestMethodRunner.java:75)
at
org.junit.internal.runners.TestMethodRunner.run(TestMethodRunner.java:45)
at
org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(TestClassMethodsRunner.java:71)
at
org.junit.internal.runners.TestClassMethodsRunner.run(TestClassMethodsRunner.java:35)
at
org.junit.internal.runners.TestClassRunner$1.runUnprotected(TestClassRunner.java:42)
at
org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
at
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:62)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:140)
at
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:127)
at org.apache.maven.surefire.Surefire.run(Surefire.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:345)
at
org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1009)
Original issue: http://code.google.com/p/crawljax/issues/detail?id=10
Original author: [email protected] (January 06, 2011 17:11:30)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Let’s say if we are having following anchor tag (fcn14736 is my machine name and 10.34.6.181 is my IP address) in our HTML page
<a href="http://fcn14736:8080/sampleApp/HelloWorld.html">HelloWorld</a>
Or
<a href="http://10.34.6.181:8080/sampleApp/HelloWorld.html">HelloWorld</a>
Then crawljax is not crawling that link both in IE & Firefox however if I am having the following link then it is crawling it properly.
<a href="http://localhost:8080/sampleApp/HelloWorld.html">HelloWorld</a>
What version of the product are you using? On what operating system?
Crawljax 1.9
Windows XP
IE 7
Firefox 3.0
Please provide any additional information below.
I have also tried it directly with WebDriver, there it is working fine.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=41
Original author: [email protected] (December 21, 2010 06:08:20)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
It should open an IE window and crawl the website however it didn't open the IE window till the time any other IE window is opened. If the previously opened IE window is closed then it start working perfectly. This does not happen in case of Firefox & Chrome.
What version of the product are you using? On what operating system?
Crawljax 2.0
Windows XP
IE 6.0
Firefox 3.0
Chrome 8.0
Please provide any additional information below.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=35
Original author: [email protected] (February 28, 2011 11:58:20)
What steps will reproduce the problem?
1.Crawled using URL - http://my.fp055.qa.ebay.com/ws/eBayISAPI.dll?MyEbayBeta&MyeBay=&guest=1
2. User Login Credentials given are - username- us_seller_bsc and password as password
3 . It crawls for few pages
What is the expected output? What do you see instead?
Expected is to click on found events , but even after finding examined elements as some 6 it gives CLICKABLE=0
What version of the product are you using? On what operating system?
2.0
Please provide any additional information below.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=46
Original author: [email protected] (December 21, 2010 06:19:12)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
It should open an IE window and crawl the website without closing the previously opened IE Window however while crawling Crawljax used to close the previously opened IE window. Due to that we are not able to run the multiple session of Crawljax on the same machine. This does not happen in case of Firefox.
What version of the product are you using? On what operating system?
Crawljax 1.9
Windows XP
IE 6.0
Firefox 3.0
Please provide any additional information below.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=36
Original author: [email protected] (February 05, 2010 19:21:12)
I think it is a good idea to include more information in CrawljaxException.
Things like, OS, browser, version, number of threads, can be very useful for
debugging and stuff
Original issue: http://code.google.com/p/crawljax/issues/detail?id=9
Original author: [email protected] (January 28, 2010 08:13:13)
Original issue: http://code.google.com/p/crawljax/issues/detail?id=4
Original author: frankgroeneveld (June 15, 2010 12:16:24)
getBrowser on a session returns null in an OnNewStatePlugin always returns null (or at least when going from initial state to the first state).
Original issue: http://code.google.com/p/crawljax/issues/detail?id=26
Original author: [email protected] (March 28, 2010 22:17:56)
I have a simple question for you...
I don't understand how (and what) two Doms are compared during the crawling...
Can you help me?
Thank you
Original issue: http://code.google.com/p/crawljax/issues/detail?id=15
Original author: frankgroeneveld (May 19, 2010 13:31:55)
Element names in XPath expresssions shouldn't need to be in uppercase.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=21
Original author: [email protected] (January 24, 2011 15:57:44)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Expected to Click all the Anchor tags in the page. While only a specific set of Links are only clicked.
What version of the product are you using? On what operating system?
I am Currently using Crawljax 2.0. on Windows XP, and running the crawl on firefox browser. (Later observed the same result in ie also).
Please provide any additional information below.
Attached the output of the States Captured Using CrawlOverview plugin.
Do let me know in case any more information is required in debugging/reproducing the issue.
Thanks
Sharath.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=43
Original author: frankgroeneveld (May 19, 2010 15:12:41)
Check whether dropdown boxes are set by text or value
Original issue: http://code.google.com/p/crawljax/issues/detail?id=23
Original author: [email protected] (December 27, 2010 06:42:32)
What steps will reproduce the problem?
import com.crawljax.browser.EmbeddedBrowser.BrowserType;
import com.crawljax.core.CrawljaxController;
import com.crawljax.core.configuration.CrawlSpecification;
import com.crawljax.core.configuration.CrawljaxConfiguration;
public class TestCrawljax {
public static void main(String[] args) {
CrawlSpecification crawler = new CrawlSpecification(
"http://localhost:8080/iframe/index.html");
crawler.clickDefaultElements();
crawler.setRandomInputInForms(true);
crawler.setWaitTimeAfterEvent(1000);
crawler.setWaitTimeAfterReloadUrl(1000);
crawler.dontClick("input");
CrawljaxConfiguration config = new CrawljaxConfiguration();
config.setCrawlSpecification(crawler);
//config.setBrowser(BrowserType.ie);
try {
CrawljaxController crawljax = new CrawljaxController(config);
crawljax.run();
} catch (Exception e) {
e.printStackTrace();
}
}
}
What is the expected output? What do you see instead?
It should not click on input button however it is clicking on input buttons.
What version of the product are you using? On what operating system?
Crawljax 1.9
Windows XP
Please provide any additional information below.
Am I missing anything here?
Original issue: http://code.google.com/p/crawljax/issues/detail?id=39
Original author: [email protected] (April 21, 2010 08:48:32)
The website should be updated so that it is clear for external users how
they can use the API to add and use Invariants and Comparators, with some
examples for each.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=20
Original author: [email protected] (December 24, 2010 04:57:38)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Crawljax should successfully crawl that external site and connects to the the internet using the Firefox Proxy settings.
However it used to open the Firefox Window and displays "Page not found" error, when I have stopped the crawljax in between and checks the Proxy settings, I found that it does not have the Proxy settings which are present in my Firefox Configuration. If I am opening the Firefox Window separately then it is having the Proxy settings.
What version of the product are you using? On what operating system?
Windows 7 (64 bit)
Crawljax 1.9
Firefox 3.0
Please provide any additional information below.
Do I need to do anything else also so that Crawljax will use the Proxy Settings in Firefox? I have not faced this issue in IE 8.0 over there.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=38
Original author: [email protected] (January 31, 2011 08:52:23)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
The Crawling Threads needs to be killed and the control needs to return back to the calling program. The thread where the exception raised will be killed and the rest of the spawned threads will be in running state forever. So, the control never comes back to the callee.
What version of the product are you using? On what operating system?
Currently using Crawljax 2.o. on Windows XP.
Please provide any additional information below.
The logfiles attached.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=44
Original author: [email protected] (January 10, 2011 15:19:26)
What steps will reproduce the problem?
public class TestCrawljax {
public static void main(String[] args) {
CrawlSpecification crawler = new CrawlSpecification(
"http://localhost:8080/sampleApp");
crawler.setClickOnce(false);
crawler.setRandomInputInForms(false);
crawler.clickDefaultElements();
crawler.setWaitTimeAfterEvent(1000);
crawler.setWaitTimeAfterReloadUrl(1000);
CrawljaxConfiguration config = new CrawljaxConfiguration();
config.setCrawlSpecification(crawler);
//config.setBrowser(BrowserType.ie);
try {
CrawljaxController crawljax = new CrawljaxController(config);
crawljax.run();
} catch (Exception e) {
e.printStackTrace();
}
}
}
What is the expected output? What do you see instead?
It should click on the single link mentioned in the index page in Firefox. However in Firefox it is not clicking on that link however in case of IE it is clicking on the link.
What version of the product are you using? On what operating system?
Crawljax 2.0
Windows XP
Firefox 3.0
IE 7
Please provide any additional information below.
I have also tried the same thing with WebDriver using the following program, there it has worked fine.
public class TestWebDriver {
public static void main(String[] args) throws Exception {
WebDriver driver = new FirefoxDriver();
driver.get("http://localhost:8080/sampleApp");
driver.findElement(By.linkText("HelloWorld")).click();
driver.close();
}
}
Here is the crawljax.com output for both Firefox & IE
Firefox output:
Starting Crawljax...
Used plugins:
No plugins loaded because CrawljaxConfiguration is empty
Embedded browser implementation: firefox
Number of threads: 1
Crawl depth: 0
Crawljax initialized!
Start crawling with 4 crawl elements
Starting new Crawler: Thread 1 Crawler 1 (initial)
Running PreCrawlingPlugins...
Running OnBrowserCreatedPlugins...
Loading Page http://localhost:8080/sampleApp
Running OnUrlLoadPlugins...
Running OnNewStatePlugins...
Looking in state: index for candidate elements with
TAG: A
Found new candidate element: A: href=HelloWorld.html xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]
TAG: BUTTON
TAG: INPUT: type="submit"
TAG: INPUT: type="button"
Found 1 new candidate elements to analyze!
Starting preStateCrawlingPlugins...
Running PreStateCrawlingPlugins...
Executing click on element: "HelloWorld" A: href="HelloWorld.html" click xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]; State: index
Could not fire eventable: "HelloWorld" A: href="HelloWorld.html" click xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]
Running OnFireEventFailedPlugins...
Finished executing
All Crawlers finished executing, now shutting down
CrawlerExecutor terminated
Running PostCrawlingPlugins...
Total Crawling time(14390ms) ~= 0 min, 14 sec
EXAMINED ELEMENTS: 1
CLICKABLES: 0
STATES: 1
Dom average size (byte): 346
DONE!!!
IE output:
Starting Crawljax...
Used plugins:
No plugins loaded because CrawljaxConfiguration is empty
Embedded browser implementation: ie
Number of threads: 1
Crawl depth: 0
Crawljax initialized!
Start crawling with 4 crawl elements
Starting new Crawler: Thread 1 Crawler 1 (initial)
Running PreCrawlingPlugins...
Running OnBrowserCreatedPlugins...
Loading Page http://localhost:8080/sampleApp
Running OnUrlLoadPlugins...
Running OnNewStatePlugins...
Looking in state: index for candidate elements with
TAG: A
Found new candidate element: A: href=HelloWorld.html xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]
TAG: BUTTON
TAG: INPUT: type="submit"
TAG: INPUT: type="button"
Found 1 new candidate elements to analyze!
Starting preStateCrawlingPlugins...
Running PreStateCrawlingPlugins...
Executing click on element: "HelloWorld" A: href="HelloWorld.html" click xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]; State: index
Dom is Changed!
Correcting state name from state2 to state1
State state1 added to the StateMachine.
StateMachine's Pointer changed to: state1
StateMachine's Pointer changed to: state1 FROM index
Running OnNewStatePlugins...
Running GuidedCrawlingPlugins...
RECURSIVE Call crawl; Current DEPTH= 1
Looking in state: state1 for candidate elements with
TAG: A
TAG: BUTTON
TAG: INPUT: type="submit"
TAG: INPUT: type="button"
Found 0 new candidate elements to analyze!
StateMachine's Pointer changed to: index
Finished executing
All Crawlers finished executing, now shutting down
CrawlerExecutor terminated
Running PostCrawlingPlugins...
Interaction Element= "HelloWorld" A: href="HelloWorld.html" click xpath /HTML[1]/BODY[1]/TABLE[1]/FORM[1]/TBODY[1]/TR[1]/TD[1]/A[1]
Total Crawling time(12375ms) ~= 0 min, 12 sec
EXAMINED ELEMENTS: 1
CLICKABLES: 1
STATES: 2
Dom average size (byte): 289
DONE!!!
Original issue: http://code.google.com/p/crawljax/issues/detail?id=42
Original author: [email protected] (February 22, 2011 09:17:15)
When i give URL as "http://my.qa.ebay.com/ws/eBayISAPI.dll?MyEbay&gbh=1" with specific login input for username and passowrd it throws error given below after crawling 2 to 3 pages.
Exception in thread "Thread 1 Crawler 1 (initial)" java.lang.NoSuchMethodError: org.apache.commons.lang.builder.HashCodeBuilder.reflectionHashCode(Ljava/lang/Object;[Ljava/lang/String;)I
at com.crawljax.core.state.Eventable.hashCode(Eventable.java:136)
at java.util.HashMap.getEntry(Unknown Source)
at java.util.HashMap.containsKey(Unknown Source)
at org.jgrapht.graph.AbstractBaseGraph.containsEdge(Unknown Source)
at org.jgrapht.graph.AbstractBaseGraph.addEdge(Unknown Source)
at com.crawljax.core.state.StateFlowGraph.addEdge(StateFlowGraph.java:147)
at com.crawljax.core.state.StateMachine.addStateToCurrentState(StateMachine.java:136)
at com.crawljax.core.state.StateMachine.update(StateMachine.java:171)
at com.crawljax.core.Crawler.clickTag(Crawler.java:313)
at com.crawljax.core.Crawler.crawlAction(Crawler.java:384)
at com.crawljax.core.Crawler.crawl(Crawler.java:450)
at com.crawljax.core.Crawler.run(Crawler.java:578)
at com.crawljax.core.InitialCrawler.run(InitialCrawler.java:98)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Original issue: http://code.google.com/p/crawljax/issues/detail?id=45
Original author: [email protected] (July 06, 2010 12:52:22)
When running a (large) CrawlSpec and put a MaxRuntime constraint on it when the MaxRuntime is reached the Crawler is not terminated directly.
Basically what happens is, the current Crawler is terminated. Afterwards all waiting Crawlers gets executed and start back-tracking and when in the previous state the crawl run terminates the Crawler.
This should be changed to; when the MaxRuntime is reached, therminate the current Crawler and make a call the all other running Crawlers and shutdown the queue and empty it.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=27
Original author: frankgroeneveld (May 19, 2010 15:44:27)
Support CrawlSpecification.click(somexpath)
Original issue: http://code.google.com/p/crawljax/issues/detail?id=24
Original author: [email protected] (September 15, 2010 10:21:30)
When a browser crashes for whatever reason the crawling continues as the exception is catched and suppressed. As the browser is currently 'dead' and not reachable the crawling can not continue with the current browser.
What happens now when running with multiple browsers is the crashed browser-object stays within crawljax and the calls to it continue to be made which results in I/O-Exceptions etc. slowly dying and killing all the crawlers which are in the queue trying to open the index->exception Crawler dies->next starts. etc.
The correct behavior should be:
there are two points for discussion; should we restart the browser, I think that is safe to do while restarting the Crawler could cause a infinite loop in which the browser crashes over and over again.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=34
Original author: [email protected] (March 20, 2010 11:54:16)
What steps will reproduce the problem?
What is the expected output?
Complete the test.
Normal runs through completion like http://google.com
What do you see instead?
What version of the product are you using? On what operating system?
crawljax-1.8 System info: os.name: 'Linux', os.arch: 'i386', os.version:
'2.6.31.6-rt19.2811209', java.version: '1.6.0_18'
Driver info: driver.version: firefox
Please provide any additional information below.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=14
Original author: frankgroeneveld (February 01, 2010 13:56:46)
Today, I finished the PropertiesFile class. This class allows you to read
in a very simple property file (no support for database) and sets all the
settings on a CrawljaxConfiguration object. This means that we can remove
the PropertyHelper and use CrawljaxConfiguration objects everywhere.
However, we need to decide how we are going to access the
CrawlaxConfiguration object.
I propose this:
add a static final CrawljaxConfiguration config to CrawljaxController which
is set in the CrawljaxController constructor.
Add a static getConfig method to CrawljaxConfiguration to get the instance.
This means that the configuration object is always consistent, because you
can't start the controller without a config. (So you can't use an old
config like with the PropertyHelper because you forgot to call
PropertyHelper.init()).
Original issue: http://code.google.com/p/crawljax/issues/detail?id=5
Original author: frankgroeneveld (February 05, 2010 08:01:57)
This is due to:
/**
Which obviously gives problems if crawlPath.size() == 0
Original issue: http://code.google.com/p/crawljax/issues/detail?id=8
Original author: [email protected] (January 18, 2010 19:24:23)
While crawling, it should be possible to automatically click OK on the following pop-ups: alert, confirm, and prompt
In some web applications, the crawler just keeps waiting for the OK (or
cancel) button to be clicked.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=2
Original author: [email protected] (January 18, 2010 18:52:35)
It would be useful to specify the number of times certain elements have to
be examined to see if they cause a state change.
Currently each candidate clickable is only clicked once.
A possible solutions would look like:
crawler.click("a").withText("next").nrOfTimes(5);
crawler.click("a").withText("add").randomNrOfTimes();
crawler.click("a").withText("more").asLongAsExists();
Original issue: http://code.google.com/p/crawljax/issues/detail?id=1
Original author: [email protected] (July 23, 2011 02:15:49)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
test failure
What version of the product are you using? On what operating system?
firefox 5
Please provide any additional information below.
Updating the POM file to replace all selenium dependencies with the following makes the tests pass.
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>2.1.0</version>
</dependency>
Original issue: http://code.google.com/p/crawljax/issues/detail?id=48
Original author: [email protected] (January 18, 2010 20:38:40)
Frames are currently ignored by Crawljax.
There should be an algorithm that switched into and crawls each frame
automatically. Each frame could be seen either as a new state or part of the
initial DOM state that included the frame.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=3
Original author: frankgroeneveld (March 05, 2010 14:07:20)
Apparently the "handlePopups" function is invalid for chrome. The error
console gives me this:
"Uncaught SyntaxError: Unexpected token ILLEGAL".
Furthermore, when running the largecrawltest for Chrome (with handlepopups
disabled), it will result in a "hang" or broken connection between Crawljax
and Chrome when executing the precrawling plugins.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=11
Original author: [email protected] (April 16, 2010 15:36:13)
I have two ideas for speeding up the backtracking
example: Crawljax opens test.com, clicks about-> test.com/about.php ->
clicks some element which does some Ajax stuff..... When backtracking go to
test.com/about.php in stead of test.com.
When backtracking Crawljax knows which link it want to click, so immediately after Crawljax finds the element it should click on it in stead
of waiting a finite amount of type
Original issue: http://code.google.com/p/crawljax/issues/detail?id=18
Original author: [email protected] (August 16, 2011 08:12:30)
What steps will reproduce the problem?
1.saving the states into database or flat files.
2.rerun the session from the last state.
3.if any crash happened it will save the state automatically.
What is the expected output? What do you see instead?
database handler to handle the states to support reading the states & resuming the crawler
What version of the product are you using? On what operating system?
2.0
Please provide any additional information below.
I'm developing a GUI For crawler Configuration , when I finish it I can give the source to you to if u need it.
Good luck
Original issue: http://code.google.com/p/crawljax/issues/detail?id=49
Original author: frankgroeneveld (April 16, 2010 09:22:09)
For the next release, we need more coverage and better tests.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=17
Original author: [email protected] (August 04, 2010 15:55:09)
When using a/one wait condition during crawling and a / the first waitcondition takes a long time (> timeout) but is successful a IndexOutOfBounceException is thrown. Due to a bug.
In WaitCondition:98 the index is increased and later the log event uses the increased index number to retrieve the WaitCondition (line 110).
Original issue: http://code.google.com/p/crawljax/issues/detail?id=30
Original author: [email protected] (October 27, 2011 15:14:22)
What steps will reproduce the problem?
What is the expected output?
Get a state-flow graph where state A and B can reach C
What do you see instead?
A state-flow graph where only state A or B has an outgoing transition to C, depending on which state was crawled first.
What version of the product are you using? On what operating system?
2.1-SNAPSHOT
Please provide any additional information below.
The reason why the behavior is different, is because Crawljax checks if potential candidate elements have been checked before, in order to prevent duplicate work (see com.crawljax.core.CandidateElementExtractor, lines 282 and 169). While it is indeed true that these transitions should not be /followed/ (since it is already known where they lead to), they should still be /added/ to the sfg.
Workaround:
Removing the optimization by ignoring the checkedElements manager yields good results, albeit at the cost of some extra work.
In com.crawljax.core.CandidateElementExtractor, change lines 282 to 284 to
if (matchesXpath && isElementVisible(dom, element)
&& !filterElement(attributes, element)) {
And in the same file, the if condition on line 169 becomes obsolete, i.e., change it to
if (!clickOnce || true) {
Original issue: http://code.google.com/p/crawljax/issues/detail?id=50
Original author: frankgroeneveld (March 17, 2010 13:45:32)
Find out if Helper.getContents can be replaced with
Helper.getContentsWithLineEndings without breaking any core functionality.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=12
Original author: [email protected] (July 08, 2010 00:23:49)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
The test exists immediately saying that zero elements were examined (" EXAMINED ELEMENTS: 0"), which is obviously incorrect.
What version of the product are you using? On what operating system?
1.91 from sources on Windows 7
Please provide any additional information below.
I'm running on older versions of chrome that I compiled from source to avoid running version 5.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=28
Original author: [email protected] (April 05, 2010 18:00:50)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
I use this with dontClick, and crawljax would ignore the first result while
continue crawling the rest of the results even though it shouldn't.
What version of the product are you using? On what operating system?
Latest version of crawljax on Windows 7
Please provide any additional information below.
I believe this problem is because
EventableConditionChecker.checkXpathStartsWithXpathEventableCondition uses
XPathHelper.getXpathForXPathExpression which only returns the first node,
whereas if all nodes that match are returned and checked then it would work
properly.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=16
Original author: [email protected] (August 30, 2010 11:35:23)
The session.getExactEventPath() method returns an incorrect list of clickables.
This list should always contain the exact clickable elements from the index state to the current state.
getExactEventPath().get(getExactEventPath().size() - 1) should return the last clickable item. The best way to check that this functionality is broken is to add an OnNewState plugin:
{{{
crawljaxConfiguration.addPlugin(new OnNewStatePlugin() {
public void onNewState(CrawlSession session) {
List<Eventable> events = session.getExactEventPath();
if (events.size() > 0) {
Eventable lastEvent = events.get(events.size() - 1);
LOGGER.info(lastEvent);
}
}
});
}}}
Original issue: http://code.google.com/p/crawljax/issues/detail?id=32
Original author: [email protected] (May 18, 2011 04:15:12)
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Even after you set crawlWaitEvent, it doesn't get set.
Because of this order, the vars get each other's value as theirs.
What version of the product are you using? On what operating system?
2.1-SNAPSHOT
Please provide any additional information below.
Just make crawlWaitEvent comes first than crawlWaitReload in all the constructors.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=47
Original author: frankgroeneveld (May 19, 2010 14:42:27)
Invariant checks should be executed during back-tracking as well, because
you might find failures then as well.
Original issue: http://code.google.com/p/crawljax/issues/detail?id=22
Original author: [email protected] (July 20, 2010 14:22:44)
Enable Crawljax to exclude certain parts of iframes.
This should be specified in the following way(s):
The problem with 1 is that all source must be loaded to know which part of the iframe must included. While with 2 the decision to decent into a iframe can be taken before descending into a iframe
Original issue: http://code.google.com/p/crawljax/issues/detail?id=29
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.