Giter VIP home page Giter VIP logo

snapshooter's People

Contributors

arboleya avatar hems avatar thiagocoelho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

snapshooter's Issues

TypeError: Cannot call method 'exit' of undefined

When testing snapshooter with a local address i got an error after a time out occured.

โ€ข ERROR http://hems.local:11235/store-locator took too long to render, skipping

/usr/local/lib/node_modules/snapshooter/src/core/crawler.coffee:111
do @ph.exit
^
TypeError: Cannot call method 'exit' of undefined
at Crawler.module.exports.Crawler.error (/usr/local/lib/node_modules/snapshooter/src/core/crawler.coffee:111:7)
at module.exports.Crawler.keep_on_checking (/usr/local/lib/node_modules/snapshooter/src/core/crawler.coffee:94:17)
at Proto.apply (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/dnode/node_modules/dnode-protocol/index.js:123:13)
at Proto.handle (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/dnode/node_modules/dnode-protocol/index.js:99:19)
at D.dnode.handle (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/dnode/lib/dnode.js:140:21)
at D.dnode.write (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/dnode/lib/dnode.js:128:22)
at SockJSConnection.ondata (stream.js:38:26)
at SockJSConnection.EventEmitter.emit (events.js:88:17)
at Session.didMessage (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/shoe/node_modules/sockjs/lib/transport.js:207:25)
at WebSocketReceiver.didMessage (/usr/local/lib/node_modules/snapshooter/node_modules/phantom/node_modules/shoe/node_modules/sockjs/lib/trans-websocket.js:109:40)

Ability to crawl local HTML files

Besides the existente signature:

snapshooter [http://your_url] [output_folder]

It'd be nice to be able to crawl local websites without HTTP protocol:

snapshooter [local_html_file] [output_folder]

Is this working?

Using "http" before url address crashs the app

So when trying to make a server to render a local theoricus app:

snapshooter -i http://hems.local:11235 -s -P 3000 -o snapshooting

i get the following error:

/usr/local/lib/node_modules/snapshooter/src/core/shoot.coffee:90
first_url = first_url.replace //index.\w+$/m, ''
^
TypeError: Cannot call method 'replace' of undefined
at new Shoot (/usr/local/lib/node_modules/snapshooter/src/core/shoot.coffee:90:16)
at Snapshooter.module.exports.Snapshooter.shoot (/usr/local/lib/node_modules/snapshooter/src/snapshooter.coffee:91:8)
at module.exports.Snapshooter.init (/usr/local/lib/node_modules/snapshooter/src/snapshooter.coffee:78:15)
at ReadStream.module.exports.Snapshooter.prompt (/usr/local/lib/node_modules/snapshooter/src/snapshooter.coffee:130:8)
at ReadStream.g (events.js:185:14)
at ReadStream.EventEmitter.emit (events.js:88:17)
at TTY.onread (net.js:396:14)

Error scanning URL

Maybe I'm doing something wrong, but when I execute the following code:
snapshooter http://bedhead.dev/ www/

I get the following error:

- initializing...
 > http://bedhead.dev/
 - scanning links - http://bedhead.dev/

/Users/LMotta/Desktop/snapshooter/src/shoot.coffee:121
      filename = (reg.exec(url))[1];
                                ^
TypeError: Cannot read property '1' of null
    at Shoot.module.exports.Shoot.save_page (/Users/LMotta/Desktop/snapshooter/src/shoot.coffee:121:33)
    at Shoot.module.exports.Shoot.after_render (/Users/LMotta/Desktop/snapshooter/src/shoot.coffee:66:14)
    at module.exports.Shoot.get (/Users/LMotta/Desktop/snapshooter/src/shoot.coffee:56:15)
    at Object.module.exports.Crawler.keep_on_checking [as cb] (/Users/LMotta/Desktop/snapshooter/src/crawler.coffee:48:18)
    at Socket.module.exports.create.io.sockets.on.socket.on.id (/Users/LMotta/Desktop/snapshooter/node_modules/node-phantom/node-phantom.js:156:19)
    at Socket.EventEmitter.emit [as $emit] (events.js:88:17)
    at SocketNamespace.handlePacket (/Users/LMotta/Desktop/snapshooter/node_modules/node-phantom/node_modules/socket.io/lib/namespace.js:335:22)
    at Manager.onClientMessage (/Users/LMotta/Desktop/snapshooter/node_modules/node-phantom/node_modules/socket.io/lib/manager.js:488:38)
    at WebSocket.Transport.onMessage (/Users/LMotta/Desktop/snapshooter/node_modules/node-phantom/node_modules/socket.io/lib/transport.js:387:20)
    at Parser.<anonymous> (/Users/LMotta/Desktop/snapshooter/node_modules/node-phantom/node_modules/socket.io/lib/transports/websocket/default.js:36:10)

Any ideas of what that could be?

Considerer using Selenium instead of PhantomJS

PhantomJS doesn't support audio/video tags and inevitably adds some complexity to the the code if you're looking forward for full indexing, because your page should not render any audio/video tags in indexing mode for phantom to able to properly index it.

One alternative to this drawback is to use Selenium automation instead of PhantomJS. There are successful cases about using Selenium headlessly with Firefox and Xfvb on *nix systems.

Sounds like a good try for providing full indexing without any incompatibility on a real browser.

Add introspection (code) usage

Add ability to use snapshooter as a library, from another library. Useful for integrating with another libraries under the hoods.

Reuse phantom's instance

Phantom's instance can be reused to increase performance.

Basic tests have increased the speed in 60%, this should do some good while crawling large websites recursively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.