Giter VIP home page Giter VIP logo

sinew's People

Contributors

gurgeous avatar nkriege avatar sixtyfive avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sinew's Issues

License missing from gemspec

RubyGems.org doesn't report a license for your gem. This is because it is not specified in the gemspec of your last release.

via e.g.

  spec.license = 'MIT'
  # or
  spec.licenses = ['MIT', 'GPL-2']

Including a license in your gemspec is an easy way for rubygems.org and other tools to check how your gem is licensed. As you can imagine, scanning your repository for a LICENSE file or parsing the README, and then attempting to identify the license or licenses is much more difficult and more error prone. So, even for projects that already specify a license, including a license in your gemspec is a good practice. See, for example, how rubygems.org uses the gemspec to display the rails gem license.

There is even a License Finder gem to help companies/individuals ensure all gems they use meet their licensing needs. This tool depends on license information being available in the gemspec. This is an important enough issue that even Bundler now generates gems with a default 'MIT' license.

I hope you'll consider specifying a license in your gemspec. If not, please just close the issue with a nice message. In either case, I'll follow up. Thanks for your time!

Appendix:

If you need help choosing a license (sorry, I haven't checked your readme or looked for a license file), GitHub has created a license picker tool. Code without a license specified defaults to 'All rights reserved'-- denying others all rights to use of the code.
Here's a list of the license names I've found and their frequencies

p.s. In case you're wondering how I found you and why I made this issue, it's because I'm collecting stats on gems (I was originally looking for download data) and decided to collect license metadata,too, and make issues for gemspecs not specifying a license as a public service :). See the previous link or my blog post about this project for more information.

Bestsellers.sinew: "superclass mismatch for class DateTime (TypeError)"

Hi! Sinew's approach to scraping is exactly what I'm after. Trouble is, after installing the gem and running the Amazon example, the following error returned. Have you seen this kind of thing before? Personally, I have not - and google's results for "superclass mismatch for class DateTime" are uncharacteristically sparse, as far as error-searching goes.

/home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require': superclass mismatch for class DateTime (TypeError)
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/date.rb:3:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0/lib/active_support/core_ext/string/conversions.rb:1:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0/lib/active_support/core_ext/string.rb:1:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0/lib/active_support/core_ext.rb:3:in `block in <top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0/lib/active_support/core_ext.rb:1:in `each'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/activesupport-4.0.0/lib/active_support/core_ext.rb:1:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:106:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/sinew-1.0.3/lib/sinew/text_util.rb:1:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/sinew-1.0.3/lib/sinew.rb:5:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
    from /home/u/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/sinew-1.0.3/bin/sinew:3:in `<top (required)>'
    from /home/u/.rbenv/versions/2.0.0-p247/bin/sinew:23:in `load'
    from /home/u/.rbenv/versions/2.0.0-p247/bin/sinew:23:in `<main>'

Sorry, Sinew requires tidy. Please install it.

which: no tidy in (/usr/local/rvm/gems/ruby-1.9.2-p320/bin:/usr/local/rvm/gems/ruby-1.9.2-p320@global/bin:/usr/local/rvm/rubies/ruby-1.9.2-p320/bin:/usr/local/rvm/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[17:57:30] Sorry, Sinew requires tidy. Please install it.

i have installed tidy via gem install tidy

please help

HTML5

So I talked to the Nokogiri people and it turns out that they have problems getting HTML5 support out of the underlying libxml. The only reliable way right now is to use nokogumbo, which uses Google's gumbo instead. Is there anything that would speak against changing sinew to use that?

Using Sinew as class

It should be possible to use Sinew in other project as normal Ruby class. It is common to use crawler in other project and there should be a way to call it directly.
Now Sinew is shipped only as a binary.

How to get rid of the transliteration magic?

Hi, first off, thanks for a great gem! I'm trying to do some work with Arabic websites and was already able to put the basics together following the docs and examples. Unfortunately, the CSV file contains no Arabic characters at all. E.g., the name of the American state secretary John Kerry, in Arabic "كيري" shows up at "kyry", which is entirely unusable for most scientific purposes. Apparently Sinew got the text off the website just fine, but it's applying some sort of unwanted transliteration to it. Could you give me a hint as to how I could have the original Arabic text (in UTF-8 if at all possible) written to the CSV instead?

Sinew is Ruby but how exactly?

Trying to do something that might not have been part of the plan:

def get_article_body(url)
  get url
  noko.at_css('article').text
end

def get_list_of_articles(url)
  get url
  noko.css('.articles').each do |div|
    path = div.at_css('.header a')[:href]
    row[:url] = path
    row[:body] = get_article_body("#{url}/#{path}")
    csv_emit(row)
  end
end

get_list_of_articles("http://www.domain.tld")

Unfortunately only the first call to get works, after that nothing seems to happen anymore. Am I going about this wrong or is it just not supported?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.