Giter VIP home page Giter VIP logo

deadoralive's Introduction

CKAN: The Open Source Data Portal Software

License Documentation Support on StackOverflow Build Status Coverage Status Chat on Gitter

CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more. Read more at ckan.org.

Installation

See the CKAN Documentation for installation instructions.

Support

If you need help with CKAN or want to ask a question, use either the ckan-dev mailing list, the CKAN chat on Gitter, or the CKAN tag on Stack Overflow (try searching the Stack Overflow and ckan-dev archives for an answer to your question first).

If you've found a bug in CKAN, open a new issue on CKAN's GitHub Issues (try searching first to see if there's already an issue for your bug).

If you find a potential security vulnerability please email [email protected], rather than creating a public issue on GitHub.

Contributing to CKAN

For contributing to CKAN or its documentation, see CONTRIBUTING.

Mailing List

Subscribe to the ckan-dev mailing list to receive news about upcoming releases and future plans as well as questions and discussions about CKAN development, deployment, etc.

Community Chat

If you want to talk about CKAN development say hi to the CKAN developers and members of the CKAN community on the public CKAN chat on Gitter. Gitter is free and open-source; you can sign in with your GitHub, GitLab, or Twitter account.

The logs for the old #ckan IRC channel (2014 to 2018) can be found here: https://github.com/ckan/irc-logs.

Wiki

If you've figured out how to do something with CKAN and want to document it for others, make a new page on the CKAN wiki and tell us about it on the ckan-dev mailing list or on Gitter.

Copying and License

This material is copyright (c) 2006-2023 Open Knowledge Foundation and contributors.

It is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at:

http://www.fsf.org/licensing/licenses/agpl-3.0.html

deadoralive's People

Contributors

amercader avatar movermeyer avatar seanh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deadoralive's Issues

SSL version issues

We had some problems with some sites only talking when using particular versions of SSL. Some like v23 and some like v3. So we retry:

from requests_ssl import SSLv3Adapter
545         try:
546             response = requests.get(...)
547         except requests.exceptions.ConnectionError, e:
548             if 'SSL23_GET_SERVER_HELLO' not in str(e):
549                 raise
550             log.info('SSLv23 failed so trying again using SSLv3: %r', args)
551             requests_session = requests.Session()
552             requests_session.mount('https://', SSLv3Adapter())
553             func = {requests.get: requests_session.get,
554                     requests.post: requests_session.post}[func]
555             response = func(*args, **kwargs)

See: http://redmine.dguteam.org.uk/issues/1439

AttributeError: 'NoneType' object has no attribute 'encode'

It looks like there might be a problem with resource's that have no URL?

Traceback (most recent call last):
  File "deadoralive/deadoralive.py", line 243, in <module>
    main()
  File "deadoralive/deadoralive.py", line 239, in main
    get_url_for_id, check_url, upsert_result)
  File "deadoralive/deadoralive.py", line 192, in get_check_and_report
    result = check_url(url)
  File "deadoralive/deadoralive.py", line 93, in check_url
    response = requests.get(url)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/sessions.py", line 279, in request
    resp = self.send(prep, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/sessions.py", line 374, in send
    r = adapter.send(request, **kwargs)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/adapters.py", line 174, in send
    timeout=timeout
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 422, in urlopen
    body=body, headers=headers)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 274, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1001, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python2.7/httplib.py", line 911, in putrequest
    host_enc = self.host.encode("ascii")
AttributeError: 'NoneType' object has no attribute 'encode'

Multiprocessing

The link checker is waiting around a lot of the time:

  • Waiting to get resource IDs or URLs from CKAN
  • Waiting to see whether checking a link succeeds or fails
  • Waiting when posting a result back to CKAN
  • Waiting in order to not hit the same domain too frequently

Whenever a resource check task is waiting on any of the above, the link checker could be getting on with another resource check.

Note that we want to put in rate limiting so that when it has multiple URLs on the same domain to check, it doesn't hit that domain too many times too quickly. This rate limiting will have to work across processes.

Handle SSL errors

For https URLs pass vefiry=True to requests.get() and catch requests.exceptions.SSLError

Turn the link checker script into a web service

Current plan is to us Flask and APScheduler.

If it were an actual web app instead of just an API script the link checker could provide its own API for the site to use to request ad-hoc link checks in response to user interaction. Something like:

/check_these_links accepts a list of objects: ``[ {"id" : "...", "url": "..."}, ... ]`. The service will check the links and post the results back to the site as usual.

Authentication

The ad-hoc mode requires auth. I like @wardi's suggestion that the web service is configured with a list of site's that it works for, each time the service receives a request for ad-hoc link checks it contacts the site that the request claims to be from and asks if whether it made this request, before proceeding.

This means the link checker service doesn't need to handle authentication, it can just fall back on the site's to do it.

403 behaviour

You might want to consider how to categorize when a 403 is returned. We have some paid-for data listed which is behind an HTTP Basic Auth password. You would be wrong to say the link is broken. It just can't be tested.

Handle different link check failures differently

We currently just try to catch any exception that we think requests might throw and put together a failed link check report (status and reason) in a generic way.

It might be good to handle certain kinds of known failures specifically and report them with tidier, documented status and reason values, then fall back to the generic version for unknown failures.

Here's a good summary of different kinds of HTTP request failures (check the requests docs for more):

  • DNS lookup failure: requests.exceptions.ConnectionError
  • Timeout when connecting to the server: requests.exceptions.ConnectTimeout
  • Timeout when reading from the server: requests.exceptions.ReadTimeout
  • HTTP errors from the web server (e.g. 500 Internal Server Error if the app behind the web server crashes): response.raise_for_status() will raise requests.exceptions.HTTPError

Add content-type support

Allow the client site to (optionally) specify an expected Content-Type with each link. We can then assert that the Content-Type header is correct.

We could also support parsing for certain content types, for example if the expected content type is JSON and the Content-Type header is correct, then try to parse the body as JSON and assert that this doesn't fail.

UnicodeEncodeError

Calling "blah blah {}".format(url) raises UnicodeEncodeError when "blah blah {}" is a byte string and url is a unicode string with non-ascii characters in it. deadoralive.py is doing this in a number of places e.g. when it logs things.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.