Giter VIP home page Giter VIP logo

ancir / grano Goto Github PK

View Code? Open in Web Editor NEW
188.0 37.0 28.0 3.74 MB

A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at http://granoproject.org

Home Page: http://granoproject.org

License: MIT License

Makefile 0.11% Python 99.03% PLpgSQL 0.46% Mako 0.17% Dockerfile 0.22%
entities network-analysis python connected-africa siyazana grano node-networks graphs graph-networks africa

grano's People

Contributors

andkamau avatar clkao avatar davidlemayian avatar gabelula avatar jazzido avatar pudo avatar rizziepit avatar stefanw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grano's Issues

Entity types: best practice?

Suppose I'm working with this schema:

- name: fellow
  label: An OpenNews fellow
  obj: entity
  hidden: no
  attributes:
    - name: twitter_handle
      label: Twitter handle

- name: news_organization
  label: A news organization
  obj: entity
  hidden: no
  attributes:
    - name: url
      label: URL

- name: fellowship
  label: A Fellowship
  label_in: Was hosted by
  label_out: Worked for
  obj: relation
  attributes:
    - name: start_date
      label: Start date
    - name: end_date
      label: End date

That defines a bipartite graph (two clases of vertices). How would I use grano-client to get all the "fellows" or the "news organizations"? In other words, is it possible to get the schema that a given properties was defined in?

Attribute validation

Something along these lines:

- name: organizacion
  label: Una organizacion
  obj: entity
  hidden: no
  attributes:
    - name: nombre
      label: Nombre
      required: true
    - name: date
      label: Date
      pattern: \d\d\d\d-\d\d-\d\d

Entity and relation facets

You want to facet entities by:

  • Properties
  • Schemata
  • Relations (#), schema
  • Status

And relations by:

  • Properties
  • Schema
  • Source
  • Target

Query syntax

How would you query these different facets?

  • Properties: ?facet=properties.name
  • Schemata: ?facet=schema
  • Status: ?facet=status

More complex (i.e. nested) facets:

  • Relations: ?facet=inbound.count, ?facet=outbound.count or ?facet==relations.count
  • Nested: ?facet=inbound.schema, ?facet=inbound.properties.started_at

SQL Syntax error on initial table creation (Postgres)

  1. The Alembic migration tries to create the relation table first, but it depends (FK) on schema and entity. Temporarily fixed moving the op.create_table('relation', ...) method call to the bottom of the file

  2. Alembic (or SQLAlchemy) generates faulty SQL:

    CREATE TABLE account (
    created_at TIMESTAMP WITHOUT TIME ZONE,
    updated_at TIMESTAMP WITHOUT TIME ZONE,
    id INTEGER DEFAULT 'nextval('account_id_seq'::regclass)' NOT NULL,
    github_id INTEGER,
    login VARCHAR,
    email VARCHAR,
    api_key VARCHAR,
    CONSTRAINT account_pkey PRIMARY KEY (id)
    )

    'nextval('account_id_seq'::regclass)' as a string makes no sense —plus the obvious escaping issue—. Full traceback below.

    INFO  [alembic.migration] Context impl PostgresqlImpl.
    INFO  [alembic.migration] Will assume transactional DDL.
    INFO  [alembic.migration] Running upgrade None -> 4f21a77e91be, init
    Traceback (most recent call last):
    File "/Users/manuel/Work/pyenvs/grano/bin/alembic", line 11, in <module>
      sys.exit(main())
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 298, in main
      CommandLine(prog=prog).main(argv=argv)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 293, in main
      self.run_cmd(cfg, options)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 279, in run_cmd
      **dict((k, getattr(options, k)) for k in kwarg)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/command.py", line 124, in upgrade
      script.run_env()
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/script.py", line 199, in run_env
      util.load_python_file(self.dir, 'env.py')
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/util.py", line 205, in load_python_file
      module = load_module_py(module_id, path)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/compat.py", line 58, in load_module_py
      mod = imp.load_source(module_id, path, fp)
    File "alembic/env.py", line 75, in <module>
      run_migrations_online()
    File "alembic/env.py", line 68, in run_migrations_online
      context.run_migrations()
    File "<string>", line 7, in run_migrations
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/environment.py", line 681, in run_migrations
      self.get_context().run_migrations(**kw)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/migration.py", line 225, in run_migrations
      change(**kw)
    File "alembic/versions/4f21a77e91be_init.py", line 26, in upgrade
      sa.PrimaryKeyConstraint('id', name=u'account_pkey')
    File "<string>", line 7, in create_table
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/operations.py", line 647, in create_table
      self._table(name, *columns, **kw)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/ddl/impl.py", line 149, in create_table
      self._exec(schema.CreateTable(table))
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/ddl/impl.py", line 76, in _exec
      conn.execute(construct, *multiparams, **params)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 664, in execute
      return meth(self, multiparams, params)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 67, in _execute_on_connection
      return connection._execute_ddl(self, multiparams, params)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 718, in _execute_ddl
      compiled
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
      context)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1023, in _handle_dbapi_exception
      exc_info
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
      reraise(type(exception), exception, tb=exc_tb)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
      context)
    File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
      cursor.execute(statement, parameters)
    sqlalchemy.exc.ProgrammingError: (ProgrammingError) syntax error at or near "account_id_seq"
    LINE 5:  id INTEGER DEFAULT 'nextval('account_id_seq'::regclass)' NO...
    

Graph export function

Given a core node, export a set of adjacent nodes (with a given recursive depth) via the API as JSON (or GEXF).

Data quality ranks

Wikidata seems to have three ranks of data. This seems useful to rate user-provided property values over scraper-generated input. In effect, the following data quality ranks should be considered:

  • Known to be false / discredited (score 10)
  • Information submitted by uncredited user (score 20)
  • Information acquired by an automatic process (score 30)
  • Information validated by an trusted user / editor (score 40)
  • Information with court-level evidence (score 50)

The question now is: does this apply to properties only, or also to entities and relations directly? If it does apply to the main objects, would that mean a new column on the relevant tables?

Make async loading optional

Wrote a simple loader using the Loader API. It wants to connect to an ampq queue.

For small-ish data sets, installing a queue and running a job broker, might not be worth the hassle.

Activity feeds for edits.

Could be done based on #14, i.e. editing an entity or relation triggers a background daemon, which, in turn, generates a static fragment.

Image/media properties

Support for adding images/media as a property. Since we already have a File type, this should somehow be a reference rather than an actual upload.

Property types?

I noticed that properties are stored as a db.Unicode value, regardless of their type.

Is this by design?

Normalize property table

(just so you remember)

  • create schema_property table
  • remove name from property
  • add a FK from property to schema_property

Prefixed table names

I'm currently integrating grano with another application that already uses a DB. The grano tables have pretty generic names, might make sense to prefix them with grano_.

Exception: No module named service.indexer

When running a loader script:

WARNING:grano.logic.entities:Processing change in entity: ea38762f90cdaf448
DEBUG:stevedore.extension:found extension EntryPoint.parse('indexer = grano.service.indexer:AutoIndexer')
ERROR:stevedore.extension:Could not load 'indexer': No module named service.indexer
ERROR:stevedore.extension:No module named service.indexer
Traceback (most recent call last):
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/stevedore/extension.py", line 162, in _load_plugins
    verify_requirements,
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/stevedore/enabled.py", line 66, in _load_one_plugin
    verify_requirements,
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/stevedore/extension.py", line 177, in _load_one_plugin
    plugin = ep.load(require=verify_requirements)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/pkg_resources.py", line 2021, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
ImportError: No module named service.indexer
WARNING:grano.logic.entities:Processing change in entity: 1b16485bddec93d23
Traceback (most recent call last):
  File "loader.py", line 29, in <module>
    fellowship = fellowship.set('end_date', record[3])
AttributeError: 'NoneType' object has no attribute 'set'

Feature Request: datatype `enum`

It would look like this:

    - name: person_id_type
      label: ID Type
      datatype: ['foo', 'bar', 'quux']::tuple

Not sure about the possible values syntax. What do you think?

Allow weighted graph exports

The graph export currently generates a multigraph, there needs to be an option to aggregate multiople relations into one weighted link.

Can't define multiple entities/relationships on a single YAML file

It'd be nice to be able to do something like this, instead of having to create a separate file for each entity or relation type.

- name: person
  label: A person
  obj: entity
  hidden: no
  attributes:
    - name: first_name
      label: First name
    - name: last_name
      label: Last name
    - name: title
      label: Title
      hidden: yes

- name: gift
  label: A present
  label_in: Gifts received
  label_out: Gifts given
  obj: relation
  attributes:
    - name: description
      label: Description
    - name: value
      label: Value (in EUR)

CSV/JSON Bulk Importer

Have a simple mapping format, e.g.:

source_url: http://...
mapping:
  - column: person_a
    element: source
    schema: base
    attribute: name
  - column: person_b
    element: target
    schema: base
    attribute: name
  - column: relation_description
    element: relation
    schema: my_schema
    attribute: description

Sub-tasks:

  • pipeline model for data process logging
  • implement an importer
  • make a command-line tool
  • document the mapping format.

Implement 'fragment' domain object

Fragment is another type of relation which can link to any number of entities through a piece of marked-up text. The reason for implementing this is to combine narrative elements into the database, so that you can capture non-structured parts of a story in a way that is easy to understand for readers.

The metaphor for this is the Facebook timeline, where each update would be shown on the feeds for all entities mentioned within. Subscriptions would help users make their own news feeds with the updates of all entities to which they are subscribed.

A domain model could look like this:

  • fragment: has text, start_date, end_date, account_id, source_url.
  • fragment_entity: has fragment_id, entity_id, text_start, text_end.
  • entity_subscription: has account_id, entity_id, since_date (used to subscribe users to an entity's news feed - but do we want all subscribers to have to be users?)

There's a mockup of what the finished function could look like here: http://opendatalabs.org/misc/demo/grano/_mockup/

Project ACLs

  • Have a Permission table
  • Add authorization to project and schema APIs
  • Filter entity and relation APIs.

[meta] Frontend/Admin application

grano will need some sort of front-end. What functions does it have to support?

  • Emergency editing of entities and relations
  • Verifying proposed changes (needs proposed changes first)
  • Importing and mapping CSVs to entities and relations
  • WYSIWYG schema editor

Optional/Nice-to-have:

  • Merging entities / Deduplication
  • Default rendering for entities and relations (tables?)
  • OpenCorporates recon integrator

Exception while running migrations

Followed the instructions, got an exception when running alembic upgrade head:

INFO  [alembic.migration] Context impl SQLiteImpl.
INFO  [alembic.migration] Will assume non-transactional DDL.
INFO  [alembic.migration] Running upgrade None -> 4f21a77e91be, init
Traceback (most recent call last):
  File "/Users/manuel/Work/pyenvs/grano/bin/alembic", line 11, in <module>
    sys.exit(main())
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 298, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 293, in main
    self.run_cmd(cfg, options)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/config.py", line 279, in run_cmd
    **dict((k, getattr(options, k)) for k in kwarg)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/command.py", line 124, in upgrade
    script.run_env()
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/script.py", line 199, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/util.py", line 205, in load_python_file
    module = load_module_py(module_id, path)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/compat.py", line 58, in load_module_py
    mod = imp.load_source(module_id, path, fp)
  File "alembic/env.py", line 75, in <module>
    run_migrations_online()
  File "alembic/env.py", line 68, in run_migrations_online
    context.run_migrations()
  File "<string>", line 7, in run_migrations
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/environment.py", line 681, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/migration.py", line 225, in run_migrations
    change(**kw)
  File "alembic/versions/4f21a77e91be_init.py", line 38, in upgrade
    sa.PrimaryKeyConstraint('id', name=u'account_pkey')
  File "<string>", line 7, in create_table
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/operations.py", line 647, in create_table
    self._table(name, *columns, **kw)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/ddl/impl.py", line 149, in create_table
    self._exec(schema.CreateTable(table))
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/alembic/ddl/impl.py", line 76, in _exec
    conn.execute(construct, *multiparams, **params)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 664, in execute
    return meth(self, multiparams, params)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 67, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 718, in _execute_ddl
    compiled
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
    context)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1023, in _handle_dbapi_exception
    exc_info
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
    context)
  File "/Users/manuel/Work/pyenvs/grano/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (OperationalError) near "account_id_seq": syntax error u"\nCREATE TABLE account (\n\tcreated_at TIMESTAMP, \n\tupdated_at TIMESTAMP, \n\tid INTEGER DEFAULT 'nextval('account_id_seq'::regclass)' NOT NULL, \n\tgithub_id INTEGER, \n\tlogin VARCHAR, \n\temail VARCHAR, \n\tapi_key VARCHAR, \n\tCONSTRAINT account_pkey PRIMARY KEY (id)\n)\n\n" ()

Here's my settings.py:

import os

DEBUG = True
ASSETS_DEBUG = False

SECRET_KEY = 'test'
SQLALCHEMY_DATABASE_URI = 'sqlite:////tmp/grano.db'
APP_NAME = CELERY_APP_NAME = ES_INDEX = 'grano'
CELERY_BROKER_URL = 'amqp://guest:guest@localhost:5672//'

GITHUB_CLIENT_ID = 'da79a6b5868e690ab984'
GITHUB_CLIENT_SECRET = '1701d3bd20bbb29012592fd3a9c64b827e0682d6'

# Generate a public URI for an entity, based on its ID.
ENTITY_VIEW_PATTERN = 'http://grano.io/e/%s'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.