Giter VIP home page Giter VIP logo

datasift-connector's People

Contributors

andimiller avatar andyjs avatar chrisyoung77 avatar dugjason avatar jamesbloomer avatar mheap avatar ollieparsley avatar quipo avatar stut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasift-connector's Issues

Packer hangs

For release https://github.com/datasift/datasift-connector/releases/tag/1.0.19-1

I run ./build.sh ....

I get:

+ cd ../chef
+ berks vendor vendor/cookbooks
No entry for terminal type "rxvt-unicode-256color";
using dumb terminal settings.
Resolving cookbook dependencies...
Fetching 'datasift-kafka' from source at cookbooks/datasift-kafka
Fetching 'datasift-stats' from source at cookbooks/datasift-stats
Fetching 'datasift-writer' from source at cookbooks/datasift-writer
Fetching 'gnip-reader' from source at cookbooks/gnip-reader
Fetching 'historics-api' from source at cookbooks/historics-api
Fetching 'historics-reader' from source at cookbooks/historics-reader
Fetching 'influxdb' from https://github.com/datasift/chef-influxdb.git (at 232e2af)
Fetching 'kafka' from https://github.com/mthssdrbrg/kafka-cookbook.git (at v0.7.1)
Fetching 'supervisor' from https://github.com/poise/supervisor.git (at v0.4.12)
Fetching 'twitterapi-reader' from source at cookbooks/twitterapi-reader
Fetching 'webapp' from source at cookbooks/webapp
Fetching cookbook index from https://supermarket.chef.io...
Installing 7-zip (1.0.2)
Installing apt (2.7.0)
Installing ark (0.9.0)
E, [2015-08-13T09:36:08.353261 #10417] ERROR -- : Actor crashed!
Errno::ETIMEDOUT: Connection timed out - connect(2) for "app-supermarket-prod-i-6171d6a7.opscode.us" port 443
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:879:in `open'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
    /opt/chefdk/embedded/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:878:in `connect'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:852:in `start'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:1369:in `request'
    /opt/chefdk/embedded/lib/ruby/2.1.0/net/http.rb:1128:in `get'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:80:in `perform_request'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:40:in `block in call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:32:in `call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/request/retry.rb:110:in `call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/ridley-4.2.0/lib/ridley/middleware/follow_redirects.rb:67:in `perform_with_redirection'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/ridley-4.2.0/lib/ridley/middleware/follow_redirects.rb:60:in `call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/response.rb:8:in `call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/response.rb:8:in `call'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/rack_builder.rb:139:in `build_response'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/connection.rb:377:in `run_request'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/faraday-0.9.1/lib/faraday/connection.rb:140:in `get'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/community_rest.rb:119:in `find'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/community_rest.rb:103:in `download'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/downloader.rb:62:in `try_download'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/downloader.rb:36:in `block in download'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/downloader.rb:35:in `each'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/downloader.rb:35:in `download'
    /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/installer.rb:105:in `install'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:63:in `dispatch'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
    /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
E, [2015-08-13T09:36:08.353510 #10417] ERROR -- : Actor crashed!

I tried this 4 times over a one hour period and always got the same error.

Maybe related to #12 ?

Do you think you could publish the amazon AMI so that we wouldn't have to run packer at all?

Disable auto retry on writer http client

After a rate limit is reached and the writer has finished waiting the httpclient automatically retries sending the post, causing connection errors. The errors do not interfere with delivery of the message.

Packer failing on InfluxDB database

Hey,

The new ruby InfluxDB ruby gem has been released, and this breaks everything. The base chef cookbook doesn't do any version pinning, so it isn't easily fixed.

Add an intermittent logging update on Gnip-Reader status

Reader currently doesn't output any info level logging whilst it's finished with it's start-up actions.

Add an info level log call to summarise the Reader's status. Could contain number of items received/sent so far/time connected etc.

Undecided on duration between updates. Once every 10 seconds?

Problems with packer instructions in README.md

For version https://github.com/datasift/datasift-connector/releases/tag/1.0.15-1

I wanted to deploy the datasift-connector to EC2 and followed these instructions https://github.com/datasift/datasift-connector#quick-start---deployment-to-ec2 . I ran into a number of problems:

  • The quickstart fails to mention that https://downloads.chef.io/chef-dk/ must be installed (it's mentioned only later in the page)
  • When packer tried to create the l2.tiny instance to load the first ami I was getting "'A subnet ID or network interface ID is required to carry out the request." errors. I had to edit the datasift-connector/packer/ami.json file to set the instance type to m3.medium which can run in EC2 classic.
  • Once the ami was provisioned and packer started running I had to accept the license agreement https://aws.amazon.com/marketplace/ordering?productId=74e73035-3435-48d6-88e0-89cc02ad83ee&ref_=dtl_psb_continue&region=us-east-1 . There is no mention of that in the docs, and also I don't really know what the financial impact is? Am I paying extra for the ami used by packer during the build? For the ami that packer ends up building and that I will be running for a long time?
  • The image gets built in the us-east region. I had to manually copy it to the eu-region where my servers are running (and it took about 30 minutes).

Would it be possible for you to just publish the ami with the datasift-connector to EC2?

When I restart the datasift-writer it resends items it already sent

For version https://github.com/datasift/datasift-connector/releases/tag/1.0.15-1

the resent tweets then get sent to the gnip managed source in datasift and then get pushed further.
Finally I get duplicate tweets in my application (exactly the same content but different interaction.interaction.id)

In /var/log/datasift/writer.log I can see

 (2015,08,11,18,08,56,(059))  INFO com.dat.con.DataSiftWriter:170 - Initialising Kafka consumer manager
 (2015,08,11,18,08,56,(066))  INFO com.dat.con.wri.SimpleConsumerManager:377 - Consumer connecting to zookeeper instance at localhost:2181
 (2015,08,11,18,08,56,(125))  INFO org.I0I.zkc.ZkEventThread:64 - Starting ZkClient event thread.
 (2015,08,11,18,08,56,(374))  WARN org.apa.zoo.ClientCnxnSocket:139 - Connected to an old server; r-o mode will be unavailable
 (2015,08,11,18,08,56,(376))  INFO org.I0I.zkc.ZkClient:449 - zookeeper state changed (SyncConnected)
 (2015,08,11,18,08,56,(377))  INFO com.dat.con.wri.SimpleConsumerManager:467 - Consumer looking up leader for twitter-gnip, 0 at localhost:6667
 (2015,08,11,18,08,57,(371))  INFO com.dat.con.wri.SimpleConsumerManager:367 - Consumer is connecting to lead broker <redacted>:6667 under client id Client_twitter-gnip_0                                                                                                           
 (2015,08,11,18,08,57,(482))  INFO com.dat.con.wri.SimpleConsumerManager:370 - Consumer is going to being reading from offset 0
 (2015,08,11,18,08,57,(482))  INFO com.dat.con.DataSiftWriter:173 - Initialising bulk uploads

Nginx Chef recipe can potentially throw NoMethodError during AMI provisioning

Exception appears to rarely be an issue, but has been encountered whilst Packer provisions to t2.micro builder instance. This has not been observed since defaulting to t2.small instance_type in ami.json, but further investigation is needed. Re-running build.sh usually leads to a successful build.

Recipe: nginx::package
    amazon-ebs: * yum_package[nginx] action install
    amazon-ebs:
    amazon-ebs: ================================================================================
    amazon-ebs: Error executing action `install` on resource 'yum_package[nginx]'
    amazon-ebs: ================================================================================
    amazon-ebs:
    amazon-ebs: NoMethodError
    amazon-ebs: -------------
    amazon-ebs: undefined method `>' for nil:NilClass
    amazon-ebs:
    amazon-ebs: Resource Declaration:
    amazon-ebs: ---------------------
    amazon-ebs: # In /tmp/packer-chef-solo/cookbooks-0/nginx/recipes/package.rb
    amazon-ebs:
    amazon-ebs: 41: package node['nginx']['package_name'] do
    amazon-ebs: 42:   options package_install_opts
    amazon-ebs: 43:   notifies :reload, 'ohai[reload_nginx]', :immediately
    amazon-ebs: 44:   not_if 'which nginx'
    amazon-ebs: 45: end
    amazon-ebs: 46:

Writer requires restarting after VM boots

  1. Build VM and provision using Vagrant.
  2. Restart the VM via Vagrant
  3. datasift-writer service does not operate correctly, as reported by Grafana dashboard.

Issue is intermittent.

Support for Docker deployment via Packer

The packer/build.sh script currently defaults to building an AMI for EC2 deployment.

Docker deployment via docker.json requires a seperate shell script, or modifications to the existing script, to build correctly. If deployed to a docker container, Grafana dashboard currently reports N/A metrics. Additional work required to ensure the connector operates correctly.

Ami does not exist

You state to change the source_ami to one within the table.
However, changing 'instance_type' to: ami-b3523089
results in: "ami-b3523089 does not exist"

What AMI should I be using for ap-southeast-1/2 now?

What am I supposed to set for SourceID in DataSift Writer config?

I have been using DataSift to pull historical data from Twitter and have to move to GNIP now. For that, I just launched AWS instance with DataSift Connector (using your instruction) and I want to connect it to my DataSift account. There's SourceID parameter that I need to provide. It is used in POST url:

POST https://in.datasift.com:443/<SourceID>

It should definitely be something like twitter:

POST https://in.datasift.com:443/twitter/

but for me it is not working. It keeps saying {"error":"Unknown user source"}. Any insights on that?

NOTE: Twitter is activated in my account (https://www.evernote.com/l/ACcmvZQu8xFEtJ-U25nPg1v8sDjk_jy2cjw).

ssl error when datasift-writer tries to connect to https://in.datasift.com

Using the packer distribution from the latest release: https://github.com/datasift/datasift-connector/releases/tag/1.0.15-1

/var/log/datasift/writer.log shows:

(2015,08,11,15,20,54,(996)) ERROR com.dat.con.wri.BulkManager:282 - Could not connect to ingestion endpoint
 javax.net.ssl.SSLException: java.security.ProviderException: java.security.KeyException
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
    at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1916)
    at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1874)
    at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1857)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1378)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)

I was able to fix by upgrading the jdk to version 8 update 45 (I followed http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/ )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.