Giter VIP home page Giter VIP logo

memex-explorer's Introduction

Build Status Coverage Status

DISCLAIMER

Memex explorer has currently been put on hold. Support and development on this project has ceased for the immediate future.

memex-explorer

Memex Explorer is a web application that provides easy-to-use interfaces for gathering, analyzing, and graphing web crawl data.

Local Development

To setup your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install Anaconda from http://continuum.io/downloads or Miniconda from http://conda.pydata.org/miniconda.html

Clone the repository, then:

cd memex-explorer/source

Run the following commands:

$ ./app_setup.sh
$ source activate memex
$ supervisord

This script will set up a conda environment named memex, prepare the application by creating an empty database, then launch all of the necessary services for the application. If there are any problems with any of these commands, please report them as a GitHub issue.

If you have already run the install script, simply run supervisord from the memex-explorer/source directory to restart all of the services.

The supervisord will start supervisord in the foreground, which will in turn ensure that all services associated with the core Memex Explorer environment are running. To stop supervisord and the associated services, send an interrupt to the process with Ctrl-c.

Memex Explorer will now be running locally at http://localhost:8000

Testing

To run memex-explorer tests, use the following command from within an active environment:

$ py.test

Building the Documentation

The project documentation is written in reStructuredText and can be built using Sphinx.

$ cd docs
$ make html

The documentation is then available within build/html/index.html

Administration

To access the administration panel, navigate to http://localhost:8000/admin (or the equivalent deployed URL) after starting Memex Explorer. Here you will be able to view and make manual changes to the database.

memex-explorer's People

Contributors

ahmadia avatar amfarrell avatar anthonytw avatar aterrel avatar brittainhard avatar chdoig avatar chrismattmann avatar kdodia avatar lewismc avatar nipurndoshi avatar purg avatar quasiben avatar rrgirish avatar shivikathapar avatar tbpalsulich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

memex-explorer's Issues

Install not complete

When I followed through the install today, I get the following error upon visiting the entry page:

http://0.0.0.0:5000/
 * Running on http://0.0.0.0:5000/
 * Restarting with reloader
http://0.0.0.0:5000/
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET / HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/aterrel/workspace/apps/memex/memex-viewer/app/views.py", line 64, in index
    return render_template('index.html')
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/templating.py", line 126, in render_template
    ctx.app.update_template_context(context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/flask/app.py", line 716, in update_template_context
    context.update(func())
  File "/Users/aterrel/workspace/apps/memex/memex-viewer/app/views.py", line 41, in inject_crawls
    crawls = Crawl.query.all()
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2320, in all
    return list(self)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
    return self._execute_and_instances(context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2453, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/Users/aterrel/workspace/apps/anaconda/py3/envs/memex-viewer/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (OperationalError) no such table: crawl u'SELECT crawl.id AS crawl_id, crawl.name AS crawl_name, crawl.endpoint AS crawl_endpoint, crawl.description AS crawl_description \nFROM crawl' ()
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=style.css HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=jquery.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=debugger.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=ubuntu.ttf HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=console.png HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:51] "GET /?__debugger__=yes&cmd=resource&f=source.png HTTP/1.1" 200 -
127.0.0.1 - - [07/Nov/2014 08:56:52] "GET /?__debugger__=yes&cmd=resource&f=console.png HTTP/1.1" 200 -

Refactor Image Space

After talking with @chdoig, we decided to

Images will be stored in a central directory, indexed by ID. Each image should have a parent project and at least one linked crawl.

Clicking on the "Image Space" application in a particular project should return all corresponding images.

Dashboard take too long start

We are currently not displaying any plots with frontierpages.csv data, but we still depend on that file existing and having data for the dashboard page to display. The frontierpages file takes a long time to be generated (~10min, on a test). I tried removing the frontierpages.csv dependency on the code, but was still getting an error.

The ultimate solution probably involves digging into the ACHE code and providing a better output, a streaming API instead of a bulk write to files every X seconds. For now remove frontier pages dependency.

Add basic ache crawler test

Check that an explorer install ache runs and crawls correctly. Will need to set up dummy web service to check against.

Add basic nutch testing

Check that an explorer install nutch runs and crawls correctly. Will need to set up dummy web service to check against.

Add Summary Statistics

Ideas:

  • How long the crawl has been running
  • How many pages have been fetched so far
  • Average harvest rate

db_add_crawl bug

This happened when i started a nutch crawl that had no data model:

Traceback (most recent call last):
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask_restful/__init__.py", line 262, in error_router
    return original_handler(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask_restful/__init__.py", line 262, in error_router
    return original_handler(e)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/brittainchristopherhard/anaconda/envs/memex/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/brittainchristopherhard/Documents/memex-explorer/app/views.py", line 162, in add_crawl
    crawl = db_add_crawl(project, form, seed_filename)
  File "/Users/brittainchristopherhard/Documents/memex-explorer/app/db_api.py", line 73, in db_add_crawl
    data_model_id=form.data_model.data.id,
AttributeError: 'NoneType' object has no attribute 'id'

@kdodia, @chdoig

image space table link to image compare not working

Error message:

  File "/Users/cdoig/anaconda/envs/foo2/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/cdoig/work/test_memex/memex-explorer/app/views.py", line 417, in compare
    img = get_image(image_name)
TypeError: get_image() takes exactly 2 arguments (1 given)

Responsive design?

The application looks odd when someone looks at it on a small screen. It's not really built for "Responsive Design" ©.

So, do we want to make it so it looks good on everything? Phones, tablets, etc? This shouldn't be too difficult to do, but it will take me some time.

A good way to look at this is with the web inspector in chrome (command + alt + j, then click the button that looks like a phone).

Let me know what you guys think. Do we want our customers to be able to use this thing on tablets and/or smartphones?

Backbone Integration

So there are a lot of ways this application can be streamlined through the use of Backbone. Setting up projects, for example, can be done on one page instead of on multiple pages that require multiple page refreshes.

It might be worthwhile to discuss (1) the extent to which we want Backbone to supplant Flask in areas such as project creation and (2) areas of the project that could benefit from the inclusion of Backbone.

Just something to discuss for the near future.

Delete project behavior

Update the delete project behavior so that, when you delete a project, all the resources of that project, move to your home directory. This involves changing all the project_id to the project with project.name=="Home" and just removing that project from the project table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.