My personal repository!
This repo represents myself. If I have any issues, please report them and I will fix them as soon as possible.
Yes, this is silly but yes, I am being serious :)
Find broken links, missing images, etc within your HTML.
License: MIT License
Ran a test on wired.com, and it spits out ├─BROKEN─ https://www.youtube.com/wired (HTTP_404) , but that url is perfectly fine. Any tip on what the reason could be?
Version: Installed today. node v7
Hey folks,
Was there a particular reason why HtmlChecker
has a different API than the rest of the checkers? By that I mean scan
vs enqueue
that allows custom data. I'm asking because I was trying to do HTML checking but I do need the option to pass custom data.
Cheers
Hey,
in my company we are using a proxy so we have some environment variables set to make e.g. npm work correctly. Therefore we set HTTP_PROXY
, HTTPS_PROXY
and NO_PROXY
environment variables accordingly.
It would be great if you would support basic proxy usage with these environment variables.
Why I still can not run on linux, node v0.10.36 npm 1.3.6, object-assign 4.1.0, promise 3.2.1
I think this is an edge case but since it happened to me... I would like to note this here. I'll try to dive in and who knows..maybe submit a pr.
Steps To Reproduce
Run blc http://devpatch.com:3000 --filter-level 3 -ro
More Info
I am running a dockerized version of a wordpress site. Testing both locally and the dev instance hosted on devpatch, the broken link checker never fetches or checks a page. Looking at the logs I see the request from BLC but that is it. Below is a screenshot. Left is log, Right is Console output.
I verified I could run BLC on a static non-docker hosted locally at the same port without issue.
It would be really nice to have a test runner that watches our workspace while working to give real time feedback on tests rather than having to manually re-run mocha after each change.
Something like karma
would be ideal: http://attackofzach.com/setting-up-a-project-using-karma-with-mocha-and-chai/
Running the blc binary breaks on my Travis build (code available here) with the following error:
TypeError: Object function Object() { [native code] } has no method 'assign'
at parseOptions (/home/travis/build/sxlijin/git-scm.com/node_modules/broken-link-checker/lib/internal/parseOptions.js:42:20)
at new SiteChecker (/home/travis/build/sxlijin/git-scm.com/node_modules/broken-link-checker/lib/public/SiteChecker.js:22:27)
at run (/home/travis/build/sxlijin/git-scm.com/node_modules/broken-link-checker/lib/cli.js:484:14)
at cli.input (/home/travis/build/sxlijin/git-scm.com/node_modules/broken-link-checker/lib/cli.js:147:3)
at Object.<anonymous> (/home/travis/build/sxlijin/git-scm.com/node_modules/broken-link-checker/bin/blc:3:31)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:945:3
Hi Steve,
It will be good if the broken-link-checker has an option to specify the recursive level ranging from 0 to some number. Currently the app seems to run into infinite recursion and sometimes some websites are super big and checking the links in some 100th recursive level doesn't makes sense most of the time because an user mostly never gets such deep into a website. I used the recursive option and for my website(which is huge) and it seems to run for almost 3 days and still haven't finished the job.
Thanks,
Jeb
/broken-link-checker/lib/internal/parseOptions.js:42
options = Object.assign({}, defaultOptions, options);
^
TypeError: Object function Object() { [native code] } has no method 'assign'
at parseOptions (/home/avatar/node_modules/broken-link-checker/lib/internal/parseOptions.js:42:20)
at new SiteChecker (/home/avatar/node_modules/broken-link-checker/lib/public/SiteChecker.js:22:27)
at run (/home/avatar/node_modules/broken-link-checker/lib/cli.js:467:14)
at cli.input (/home/avatar/node_modules/broken-link-checker/lib/cli.js:144:3)
at Object. (/home/avatar/node_modules/broken-link-checker/bin/blc:3:31)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:929:3
This is a problem I found in the demo you give. Give some help, thks!
var blc = require("broken-link-checker");
var html = '<a href="https://google.com">absolute link</a>';
html += '<a href="/path/to/resource.html">relative link</a>';
html += '<img src="http://fakeurl.com/image.png" alt="missing image"/>';
var htmlChecker = new blc.HtmlChecker(null, {
link: function(result) {
console.log(result.html.index, result.broken, result.html.text, result.url.resolved);
//-> 0 false "absolute link" "https://google.com/"
//-> 1 false "relative link" "https://mywebsite.com/path/to/resource.html"
//-> 2 true null "http://fakeurl.com/image.png"
},
complete: function() {
console.log("done checking!");
}
});
htmlChecker.scan(html, "https://mywebsite.com");
I have encountered a link which is considered broken by blc
but opens well in curl
or browser.
Here it is:
blc https://www.nginx.com
CURL works fine:
$ curl -I https://www.nginx.com
HTTP/1.1 200 OK
Date: Mon, 02 Jan 2017 06:52:15 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Pingback: https://www.nginx.com/xmlrpc.php
Link: <https://www.nginx.com/wp-json/>; rel="https://api.w.org/"
Link: <https://www.nginx.com/>; rel=shortlink
Link: <https://www.nginx.com/wp-json>; rel="https://github.com/WP-API/WP-API"
X-User-Agent: standard
X-Cache-Config: 0 0
Vary: Accept-Encoding, User-Agent
X-Cache-Status: MISS
Server: nginx
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Sucuri-ID: 14010
but BLC does not:
$ blc https://www.nginx.com
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved
User agent does not help:
$ blc --input https://www.nginx.com --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.3 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.3"
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved
What is the problem?
Is it a particularly NGINX bug, or larger set of websites is affected?
bhttp
pulls in old version of tough-cookie
, which has a >=
dependency. And that pulls in the latest Node.js 6+ version of punycode
. Not sure if it's a problem for your codepaths, but with npm
v3, this got promoted to the top level over other stuff, and webpack pulled it in, causing problems.
├─┬ [email protected]
│ └─┬ [email protected]
│ └─┬ [email protected]
│ └── [email protected]
When trying to install on windows I get the following error, also fails when using as bower dependency:
npm ERR! Error: ENOENT, chmod 'C:\node_modules\broken-link-checker\bin\broken-li
nk-checker'
npm ERR! If you need help, you may report this entire log,
npm ERR! including the npm and node versions, at:
npm ERR! http://github.com/npm/npm/issues
npm ERR! System Windows_NT 6.2.9200
npm ERR! command "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" "install" "broken-link-checker"
npm ERR! cwd C:
npm ERR! node -v v0.10.36
npm ERR! npm -v 1.4.28
npm ERR! path C:\node_modules\broken-link-checker\bin\broken-link-checker
npm ERR! code ENOENT
npm ERR! errno 34
npm ERR! not ok code 0
Was trying to scan a semi-private site which is using a self-signed certificate.
It would be awesome to have a flag (e.g. --no-check-certificate
) to circumvent this:
Error: unable to verify the first certificate
Hello!
I'm loving this package, but seem to be running into this error a lot. Specifically when checking png and jpg clickable links.
Is this a known issue?
$ npm install -g broken-link-checker
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "-g" "broken-link-checker"
npm ERR! node v0.12.2
npm ERR! npm v2.8.3
npm ERR! path /usr/local/lib/node_modules/broken-link-checker/bin/broken-link-checker
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! enoent ENOENT, chmod '/usr/local/lib/node_modules/broken-link-checker/bin/broken-link-checker'
npm ERR! enoent This is most likely not a problem with npm itself
npm ERR! enoent and is related to npm not being able to find a file.
npm ERR! enoent
I'm getting HTTP_404
for this url https://nationalcareersservice.direct.gov.uk/job-profiles/home
Full response:
{ url:
{ original: 'https://nationalcareersservice.direct.gov.uk/job-profiles/home',
resolved: URL {},
rebased: URL {},
redirected: null },
base: { resolved: null, rebased: null },
html:
{ index: null,
offsetIndex: null,
location: null,
selector: null,
tagName: null,
attrName: null,
attrs: null,
text: null,
tag: null,
base: null },
http:
{ cached: false,
response:
{ headers: [Object],
status: 404,
statusText: 'Not Found',
url: URL {},
redirects: [] } },
broken: true,
internal: null,
samePage: null,
excluded: null,
brokenReason: 'HTTP_404',
excludedReason: null }
I've used bhttp
directly and it reports 200
back.
Curl:
$ time curl https://nationalcareersservice.direct.gov.uk/job-profiles/home -v > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 13.81.8.21...
* Connected to nationalcareersservice.direct.gov.uk (13.81.8.21) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=nationalcareersservice.direct.gov.uk,O=Skills Funding Agency,OU=IM Services,L=Coventry,ST=West Midlands,C=GB
* start date: Oct 22 09:56:02 2016 GMT
* expire date: Oct 23 09:56:02 2017 GMT
* common name: nationalcareersservice.direct.gov.uk
* issuer: CN=GlobalSign Organization Validation CA - SHA256 - G2,O=GlobalSign nv-sa,C=BE
> GET /job-profiles/home HTTP/1.1
> User-Agent: curl/7.40.0
> Host: nationalcareersservice.direct.gov.uk
> Accept: */*
>
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Pragma: no-cache
< Content-Type: text/html; charset=utf-8
< Expires: -1
< Server: Microsoft-IIS/8.5
< X-Frame-Options: SAMEORIGIN
< Date: Fri, 24 Feb 2017 17:00:21 GMT
< Content-Length: 21195
< X-FRAME-OPTIONS: SAMEORIGIN
<
{ [7932 bytes data]
100 21195 100 21195 0 0 11437 0 0:00:01 0:00:01 --:--:-- 11432
* Connection #0 to host nationalcareersservice.direct.gov.uk left intact
real 0m1.858s
user 0m0.060s
sys 0m0.020s
It takes around 2s to get back first byte. Couldn't be a timeout problem?
Hi there,
There's some prior talk about problems with spinners here: npm/npm#5340
Would be great to make it optional! Will send a PR.
Jun
Installed and ran blc on http://poetikon.no. Getting a HTTP_404 on a link containing a Norwegian special character "ø". http://poetikon.no/category/gj%C3%B8r-det-selv/ (http://bit.ly/2eRQv52 if the urlencoded url fails).
This url works when visiting in a browser.
I'm having issues making this work with sites on Amazon S3 with gzipped content. I get a "Finished! 0 links found." message back. Other sites without gzip work perfectly though.
Is this a known issue and are there plans to support sites that use gzip?
Do you have an example project that uses this?
When used along with Browserify, the FileSystem related methods does not work.
On the below lines:
errorCss = fs.readFileSync(__dirname + '/static/error.css', 'utf8');
errorHtml = fs.readFileSync(__dirname + '/static/error.html', 'utf8');
We get the below error:
Uncaught TypeError: fs.readFileSync is not a function
On line 36 of lib/internal/getHtmlFromUrl.js
, some filetypes (I saw it with .woff2 font files) don't come back with a content-type header, so the if conditional (indexOf) on this line fails. I worked around it as below, but it's a quick fix and I'm not positive if that's adequate.
Original code:
if (response.headers["content-type"].indexOf("text/html") === 0)
I modified it to:
if (response.headers["content-type"] && response.headers["content-type"].indexOf("text/html") === 0)
Hope this is helpful, let me know if you need anything.
Hi,
I got this error while installing your app.
Error: ENOENT, chmod 'C:\Users\ADA-LT\AppData\Roaming\npm\node_modules\broken-link-checker\bin\broken-link-checker'
npm ERR! System Windows_NT 6.1.7601
npm ERR! command "C:\Program Files (x86)\nodejs\node.exe" "C:\Program Files (x86)\nodejs\node_modules\npm\bin\npm-cli.js" "install" "broken-link-checker" "-g"
npm ERR! cwd C:{working_dir}
npm ERR! node -v v0.10.13
npm ERR! npm -v 1.3.2
npm ERR! path C:\Users{user-login}\AppData\Roaming\npm\node_modules\broken-link-checker\bin\broken-link-checker
npm ERR! code ENOENT
npm ERR! errno 34
Any idea what it does mean and how to fix it ?
Frankly, I am eager to try it. :)
Thanks guys for all this work. Seems promising.
> blc http://tw.example.com/ -ro
Getting links from: http://tw.example.com/
├───OK─── http://tw.example.com/location.png
├───OK─── ...
...
Finished! 69 links found. 16 excluded. 0 broken.
...
Getting links from: http://tw.example.com/location.png
Error: Expected type "text/html" but got "image/jpeg"
...
Finished! 219 links found. 114 excluded. 0 broken.
Elapsed time: 1 minute, 15 seconds
what does this mean? And how do I fix it?
ps. I put images on github page.
Would it be possible to add to the plugin the capacity to run a local http server to serve static files and then check the broken link ?
A command line like blc _site -ro
could start a local server, serving the files in _site and then start the analysis. Maybe something along those lines:
var finalhandler = require('finalhandler');
var http = require('http');
var serveStatic = require('serve-static');
var serve = serveStatic(<directory>);
var server = http.createServer(function onRequest (req, res) {
serve(req, res, finalhandler(req, res))
})
server.listen(9001, function(){
console.log('Server running on 9001...');
// Call broken-link-checker on URL http://localhost:9001
});
Or maybe is there a way, in one command to start a local server, wait for it to be started and then start broken-link-checker ?
The problem with node http-server _site -p 9001 & blc http://0.0.0.0:9001 -ro
is that broken-link-checker is executed before the webserver finish to startup and therefore produce an error.
When testing the links of some websites like the nytimes.com you will notice a warning message :
(node:4687) Warning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit
This is due, I think, to some links have a great deal of redirect. What do you think of having a MaxRedirect option ? (like the request module for instance).
My set up:
const check = new LinkChecker.HtmlChecker({excludeExternalLinks: true, filterLevel: 0}, {
link: (result) => {
links.goodLinks++;
if (result.broken)
links.brokenLinks++;
},
complete: () => {
return wrapper(links.goodLinks, links.brokenLinks);
}
});
check.scan(this.resource.html(), this.url, links);
I am getting the below error when running the code via the API.
Unhandled rejection Error: getaddrinfo EAI_AGAIN github.com:443
at Object.exports._errnoException (util.js:874:11)
at errnoException (dns.js:31:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:77:26)
Looks like some proxy issue and how to overcome this..
It appears that NPM is using your .gitignore file in place of .npmignore
For more information see the SO thread: http://stackoverflow.com/questions/17990647/npm-install-errors-with-error-enoent-chmod
I am not able to load your module with:
npm install broken-link-checker
However as the thread mentions I am able to load it with:
npm install broken-link-checker --no-bin-links
Which leads me to believe that if you check in a blank .npmignore file this issue will be resolved.
Hi Steven,
I am using your awesome module to develop some api services. But recently my api hangs while facing some target web links which can not provide response in time. Then I searched your provide api documentation, no timeout value could be configured.
I was just debugging your source code, then noticed that you were using node-bhttp as dependency. I checked the request options you were using -
{
discardResponse: true,
headers: { "user-agent":options.userAgent },
method: retry!==405 ? options.requestMethod : "get"
}
There is no timeout set. I've checked the bhttp doc - https://github.com/joepie91/node-bhttp
Seems there is an advanced options that can be used,
responseTimeout: The timeout, in milliseconds, after which the request should be considered to have failed if no response is received yet. Note that this measures from the start of the request to the start of the response, and is not a connection timeout. If a timeout occurs, a ResponseTimeoutError will be thrown asynchronously (see error documentation below).
Also there are some more references -
bhttp.ConnectionTimeoutError
The connection timed out.
The connection timeout is defined by the operating system, and cannot currently be overridden.
bhttp.ResponseTimeoutError
The response timed out.
The response timeout can be specified using the responseTimeout option, and it is measured from the start of the request to the start of the response. If no response is received within the responseTimeout, a ResponseTimeoutError will be thrown asynchronously, and the request will be aborted.
You should not set a responseTimeout for requests that involve large file uploads! Because a response can only be received after the request has completed, any file/stream upload that takes longer than the responseTimeout, will result in a ResponseTimeoutError.
Could you please help on this? I only expect that my api won't hang even if it's facing some strange links.
Thank you very much!
Hi,
Nice tool!
I have this in my style.css:
a[href$='.pdf']:after { content: url("graphics/pdficon.gif"); padding: 0 3px 0 0; }
It would be nice to check url's in css.
regards,
Peter
I check the url http://store.meizu.com/
by HtmlUrlChecker, it report the broken link http://ordercenter.meizu.com/list/index.html
with a HTTP_404
brokenReason, but it's a redirect link, not a 404.
Here is the result param in the HtmlUrlChecker link
callback function.
{ url:
{ original: 'http://ordercenter.meizu.com/list/index.html',
resolved: 'http://ordercenter.meizu.com/list/index.html',
redirected: 'https://login.flyme.cn/vCodeLogin?useruri=http%3A%2F%2Fstore.meizu.com%2Fmember%2Flogin.htm?useruri=http://ordercenter.meizu.com/list/index.html&sid=unionlogin&service=&autodirct=true' },
base:
{ original: 'http://store.meizu.com/',
resolved: 'http://store.meizu.com/' },
html:
{ index: 11,
offsetIndex: 9,
location: { line: 51, col: 44, startOffset: 2842, endOffset: 2893 },
selector: 'html > body > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > ul:nth-child(1) > li:nth-child(2) > a:nth-child(1)',
tagName: 'a',
attrName: 'href',
attrs:
{ class: 'topbar-link',
href: 'http://ordercenter.meizu.com/list/index.html',
target: '_blank' },
text: '我的订单',
tag: '<a class="topbar-link" href="http://ordercenter.meizu.com/list/index.html" target="_blank">' },
http:
{ cached: false,
response:
{ headers: [Object],
httpVersion: '1.1',
statusCode: 404,
statusMessage: 'Not Found',
url: 'https://login.flyme.cn/vCodeLogin?useruri=http%3A%2F%2Fstore.meizu.com%2Fmember%2Flogin.htm?useruri=http://ordercenter.meizu.com/list/index.html&sid=unionlogin&service=&autodirct=true',
redirects: [Object] } },
broken: true,
internal: false,
samePage: false,
excluded: false,
brokenReason: 'HTTP_404',
excludedReason: null }
Is there a feature to crawl a website looking for broken links? If not, could this be mentioned in the documentation?
New Feature - support basic HTTP auth
Hi,
Not an issue more of a question that I cannot find the answer to.
I have been spiking out BLC for a project that I am working on. However when I run via command line there seems to be many links that are ignored.
I am executing the tests like:
NODE_TLS_REJECT_UNAUTHORIZED=0 blc https://foo.com -ro
The result is:
Finished! 16516 links found. 15854 excluded. 50 broken.
Is there a way to force blc to check all links?
Thanks,
Ian
Did a global install of broken-link-checker
npm install -g broken-link-checker
Tried to run it and got this error:
/home/myhome/Projects/SciServer/Dev.das $blc http://www.sdss.org -ro
/home/myhome/.local/lib/node_modules/broken-link-checker/lib/internal/parseOptions.js:42
options = Object.assign({}, defaultOptions, options);
^
TypeError: Object function Object() { [native code] } has no method 'assign'
at parseOptions (/home/myhome/.local/lib/node_modules/broken-link-checker/lib/internal/parseOptions.js:42:20)
at new SiteChecker (/home/myhome/.local/lib/node_modules/broken-link-checker/lib/public/SiteChecker.js:22:27)
at run (/home/myhome/.local/lib/node_modules/broken-link-checker/lib/cli.js:467:14)
at cli.input (/home/myhome/.local/lib/node_modules/broken-link-checker/lib/cli.js:144:3)
at Object.<anonymous> (/home/myhome/.local/lib/node_modules/broken-link-checker/bin/blc:3:31)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:906:3
Then I looked at the readme:
Node.js >= 0.10 is required; < 4.0 will need Promise and Object.assign polyfills.
I have node 0.10.33. Not sure what module "< 4.0" refers to, but I also tried
npm install -g promise
and
npm install -g object.assign
but got the same error.
Did I not install the right object.assign module?
It would be great if there would be way to get the current link, which is checked.
I got some errors in my links like Invalid
but do not know which one is broken.
var blc = require('broken-link-checker');
var htmlChecker = new blc.HtmlChecker({}, {
link: function(result) {
if (result.broken) {
console.log(blc[result.brokenReason]);
}
}
});
Output:
Invalid
Invalid
Invalid
Invalid
Invalid
Invalid
As of now, the tool exits with code 0
in all scenarios.
I know, you have a recent commit fixing it, but the tool in NPM does not have that fix yet.
Have you updated the NPM package?
Or is it a problem on my side?
It seems the brokenlinkchecker's recursive feature was not working with intranet websites.. for example the recursive scan doesn't happens for a url of type http://dev.websitename.com but it works with http://www.websitename.com
If your site is using HTTPS, if it embeds images with a non-HTTPS protocol, the browser will display a warning in the URL bar. If it embeds scripts or iframe with a non-HTTPS protocol, the browser will refuse to load that content altogether.
It would be awesome if this tool would detect and report an issue like that.
I'm not able to check local file using the file:// protocol. I got an:
ReferenceError: protocol is not defined
Unhandled rejection Error: undefined
If you have CSP policies, it's possible that some of the pages on your website are embedding content that is banned in the CSP policy.
It would be awesome if this tool would detect and report that.
Currently a link like <a href="javascript:void(0);">
will resolve as null
which means it indicates a broken link. In cases where a link begins with a scripting tag, the link should be treated as a success or ignored and not queried.
option to parse Markdown in HtmlChecker for links
You could use a markdown parser like marked to work with Markdown files.
var marked = require('marked');
htmlChecker.scan(marked('I am using __markdown__.'));
I have a link to a zero-byte HTML file in one of my pages. As soon as blc tries to access that empty HTML page, an error is thrown and blc is aborted:
$ node node_modules/broken-link-checker/bin/blc -fvr http://beta.grossweber.com/blc
Getting links from: http://beta.grossweber.com/blc
└───OK─── http://beta.grossweber.com/blc/empty.html
Finished! 1 links found. 0 broken.
Getting links from: http://beta.grossweber.com/blc/empty.html
Error: Unhandled Rejection. TypeError: Cannot read property 'length' of undefined
How use blc behind a proxy.
All externals links are broken
Getting links from: http://localhost:8080/it/actus/articleB.html
├─BROKEN─ http://placehold.it/900x300 (HTTP_undefined)
Thank
Ami44
Hi,
Test link here - http://stackoverflow.com/questions/12507021/best-configuration-of-c3p0
Issue 1 - I tried this webpage by htmlUrlChecker and get 61 results. Then reran then get 18 results. This is so wired and I can not find out a way how can this happen.
Issue 2- Actually the results for this page don't make any sense. Let's say following output -
{ "originalUrl": "//webapps.stackexchange.com", "resolvedUrl": "http://webapps.stackexchange.com/questions/12507021/", "brokenReason": "HTTP_404" }
I checked the page and the tag a with href '//webapps.stackexchange.com' is really existing. But how can it be resolved as "http://webapps.stackexchange.com/questions/12507021/" with the path in my original link?
I looked into the source code, the resolving operation was done by your another module "urlobj". When "function resolveUrl(from, to, options)" invoked, after
var pathname = joinDirs(urlobj.extra.directory, urlobj.extra.directoryLeadingSlash);
The original path would be appended after the resolved url. Could you please look into this?
Best Regards,
Jet
Thanks for this package. So useful! :0)
I downloaded it and tried it on my blog today, and it was giving me just four lines of output. Confused, I started looking at the code, and realized that it finished in the middle of enqueuing its first set of links. With a new try/catch
in HtmlChecker/enqueueLink
I found that the process was indeed silently crashing:
TypeError: source.hasOwnProperty is not a function
at cloneObject (lib/internal/linkObj.js:239:14)
at cloneObject (lib/internal/linkObj.js:245:18)
at cloneObject (lib/internal/linkObj.js:245:18)
at Function.linkObj.resolve (lib/internal/linkObj.js:168:64)
at enqueueLink (lib/public/HtmlChecker.js:158:10)
at lib/public/HtmlChecker.js:110:5
at process._tickCallback (internal/process/next_tick.js:103:7)
On node 6.1.0
it throws this error, and on node 4.4.3
it works properly. Seems that there are two bugs here - one about the hasOwnProperty
issue, and the other about silently crashing!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.