Giter VIP home page Giter VIP logo

t2wml's Introduction

T2WML: A Cell-Based Language To Map Tables Into Wikidata Records

Coverage Status

Operating system:macOS / OS X, Linux, Windows
Python version:Python 3.8+

This is the repository for the T2WMl server-based GUI. You may be looking for the T2WML API

Running T2WML for Development

Setting up the sources

For developing t2wml you need both the t2wml-api repository and the t2wml (this) repository. Create a directory called t2wml-root and clone both repositories under it:

cd t2wml-root
git clone https://github.com/usc-isi-i2/t2wml-api
cd t2wml-api
git checkout development
cd t2wml-root
git clone https://github.com/usc-isi-i2/t2wml
cd t2wml
git checkout development

Setting up the Electron Frontend

First you have to make sure you have Node version 12 or higher installed. Then you can run:

cd t2wml-root/t2wml/electron
yarn install

Note for developers adding package dependencies: The electron builder takes all the prod dependencies from package.json and adds them to the installation, even though webpack already takes care of everything. It is very important not to add prod dependencies, use yarn add --dev to add more packages.

Creating the Python virtual environment

cd t2wml-root/t2wml/backend
python3.6 -m venv env
source env/bin/activate     # on Windows just run env/bin/activate.ps1
pip install --upgrade pip
pip install -e ../../t2wml-api   # Install t2wml-api from the cloned repository at t2wml-root/t2wml-api
pip install -r requirements.txt

Note: Python 3.6 and higher are supported.

Running outside of an IDE

Running the backend

cd t2wml-root/t2wml/backend
python t2wml-server.py

Running backend tests

cd t2wml-root/t2wml/backend
pytest tests

Generating a self-contained tests report

cd t2wml-root/t2wml/backend
pytest --cov=. --cov-report html:htmlcov tests
open htmlcov/index.html

Running the frontend GUI

Since we're using Electron, you need to run two scripts to run the GUI on development:

cd t2wml-root/t2wml/electron
yarn dev

This script compiles all the frontend files, making them ready for electron. Wait until the compilation reaches 100% (you may see some warnings, that's fine). The script will keep running, recompiling as the frontend files are updated.

Open another shell window and run

cd t2wml-root/t2wml/electron
yarn start

This script starts Electron, and you should see the GUI.

Note that in development, the GUI will wait for the backend to start on port 13000, so you will need to run it.

Using Visual Studio Code

The project has preconfigured settings file for Visual Studio Code. Before starting you need to copy the settings template appropriate for your OS.

On Macs and Linux machines, copy .vscode/settings.linux.json to .vscode/settings.json . On Windows, copy .vscode/settings.windows.json to .vscode/settings.json Start Visual Studio Code and open it in the t2wml-root/t2wml directory.

GUI Development

To develop the GUI you need to run three tasks:

  1. Backend - runs the Python backend.
  2. Build Dev GUI Continiously - this task runs the npm dev script which builds the GUI and contiously watches for changes.
  3. Report GUI Coding Errors - this tasks continuously scans the sources for errors and updates the Problems pane.
  4. Start GUI - opens the Electron based GUI.

The GUI will not work unless the backend is up and running.

When updating the GUI code, it will be automatically rebuilt by the Build and Watch GUI task. You will need to reload the GUI - you can use Reload from the Debug menu.

You can also open the Chrome Developers Tools from the GUI's Debug menu.

Backend Development

To develop the backend, you need to launch the Backend from the debug menu. You will be able to set breakpoints and debug properly. If you want to run the GUI, start the Build and Watch GUI and t2wml GUI tasks, as well.

Windows service

To run the backend as a service on windows:
Download the file windows-service.exe, and run with Administrator privilges:
Install: t2wml-service.exe install
Start: t2wml-service.exe start
Debug: t2wml-service.exe debug
Stop: t2wml-service.exe stop
Uninstall: t2wml-service.exe remove

Usage with GUI

  1. Open the GUI
  2. In Table Viewer,
    1. click Upload to open a table file (.csv/.tsv/.xls/.xlsx)
  3. In Wikifier,
    1. define and wikify the regions you need [demo], and/or
    2. click Upload to open a wikifier file (.csv)
    3. correct mismatched qnode if necessary [demo]
  4. In YAML Editor,
    1. type/paste in T2WML code, or
    2. click Upload to open a YAML file (.yaml)
    3. click Apply to highlight some regions in Table Viewer
  5. In Output,
    1. preview result by clicking cell in Table Viewer [demo], or
    2. click Download to get all results

Writing T2WML

Check out the grammar guide

Features

Note: All screenshots below are captured in GUI v1.3. Minor inconsistencies may appear.

⬇️ t2wml-gui-demo

FAQs

  • Installation failed due to etk?

    Run the following commands in terminal/cmd:

    pip uninstall etk
    pip install https://github.com/usc-isi-i2/etk/archive/development.zip
    
  • Login failed or encountered an authentication error like 400 (OAuth2 Error)?

    Access T2WML at http://localhost:13000/ instead of http://127.0.0.1:13000.

  • Error saying can't find static/index.html?

    Make sure you install t2wml-standalone in a folder that does not contain the T2WML repo or there will be a configurations clash.

  • Encountered any other error not mentioned in the FAQs?

    Post the issue in the T2WML repository along with a detailed description.

t2wml's People

Contributors

abhinav-kumar-thakur avatar bhatiadivij avatar chanachelem avatar ckxz105 avatar dependabot[bot] avatar devowit avatar dgarijo avatar g1eb avatar greatyyx avatar jiashengwu avatar kyao avatar saggu avatar szeke avatar talyashra avatar xkgoodbest avatar zmbq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

t2wml's Issues

Same Location Value in Multiple Columns

In the attached screenshot, "United Kingdom" in X79 is being mapped correctly according to Qnode in Wikifier file, however same value not being mapped in Y87.
None of the value in column Y is being mapped despite their values defined with qnodes in wikifier file. (This is not a problem if the values are numeric. They map to multiple columns without such problem)
(To reproduce : The file, wikifier-file, and yaml can be found at https://github.com/akankshadiwedy/Wikidata-UCDP/tree/master/location)

Screen Shot 2019-11-07 at 3 29 01 PM

500 Internal Server Error while downloading file

There are 2 versions of this error:

  1. Uncaught Error 500: This occurs when I try to download the file and the file is already present in the cache.
  2. Caught Error 500: This occurs when cache is not used and a new file is generated to download and none of the cells raise an error while generating the statement.

I have fixed the error in Branch_json_tests. Please refer to this commit for details.
84dc237

Static value for `value` attribute raises an exception

When I try to give a static value to value attribute instead of a T2WML value expression, the system generates this exception:

"error": {"errorCode": 500, "errorTitle": "Undefined Backend Error", "errorDescription": "not enough values to unpack (expected 3, got 1)"}

Here is an example T2WML spec (based on homicide data table-1a):

statementMapping:
  region:
    - left: D
      right: F
      top: 4
      bottom: 9
  template:
    item: item[A, $row]
    property: P100024 # murder
    value: Q6030821
    #unit: D1002
    qualifier:
      - property: P585
        value: value[$col, 3]
        calendar: Q1985727
        precision: year
        time_zone: 0
        format: "%Y"
      - property: P6001 # applies to people
        value: item[C, $row]
      - property: P123 #source
        value: item[B, $row]
      - property: P1640 # curator
        value: Q6030821 # ISI

Both template->value and template->qualifier->value raise exceptions.

Import Error ItemTable

I am getting an ImportError while running with the latest staging branch. @devowit

$ sh run_t2wml_food_prices.sh
Traceback (most recent call last):
  File "../generate.py", line 20, in <module>
    from driver import run_t2wml
  File "/home/kyao/dev/t2wml/driver.py", line 2, in <module>
    from backend_code.item_table import ItemTable
  File "/home/kyao/dev/t2wml/backend_code/item_table.py", line 3, in <module>
    from backend_code.utility_functions import query_wikidata_for_label_and_description
  File "/home/kyao/dev/t2wml/backend_code/utility_functions.py", line 12, in <module>
    from backend_code.wikidata_property import get_property_type as gp
  File "/home/kyao/dev/t2wml/backend_code/wikidata_property.py", line 1, in <module>
    from app_config import db
  File "/home/kyao/dev/t2wml/app_config.py", line 41, in <module>
    from backend_code.models import *
  File "/home/kyao/dev/t2wml/backend_code/models.py", line 7, in <module>
    from backend_code.item_table import ItemTable
ImportError: cannot import name 'ItemTable'
Traceback (most recent call last):
  File "../generate.py", line 20, in <module>
    from driver import run_t2wml
  File "/home/kyao/dev/t2wml/driver.py", line 2, in <module>
    from backend_code.item_table import ItemTable
  File "/home/kyao/dev/t2wml/backend_code/item_table.py", line 3, in <module>
    from backend_code.utility_functions import query_wikidata_for_label_and_description
  File "/home/kyao/dev/t2wml/backend_code/utility_functions.py", line 12, in <module>
    from backend_code.wikidata_property import get_property_type as gp
  File "/home/kyao/dev/t2wml/backend_code/wikidata_property.py", line 1, in <module>
    from app_config import db
  File "/home/kyao/dev/t2wml/app_config.py", line 41, in <module>
    from backend_code.models import *
  File "/home/kyao/dev/t2wml/backend_code/models.py", line 7, in <module>
    from backend_code.item_table import ItemTable
ImportError: cannot import name 'ItemTable'

qualifiers for factbook not part of output

YAML:

# irrigated_land
statementMapping:
  region:
    - left: CB
      right: CB
      top: 9
      bottom: 26
  template:
    item: item[B, $row]
    property: P1082
    value: value[$col, $row] 
    unit: Q712226 # sq km 
    qualifier:
      - property: P585
        value: value[CD, 9]
        calendar: Q1985727
        precision: year
        time_zone: 0
        format: "%Y"
    #reference:
    #  - property: P246 # stated in
    #    value: Q11191 # The World Factbook

Malformed query sent to sparql endpoint

One example of query sent to SPARQL is

SELECT ?qnode (MIN(?label) AS ?label) (MIN(?desc) AS ?desc) WHERE {
  VALUES ?qnode { wd:Q30271987}
  ?qnode rdfs:label ?label; <http://schema.org/description> ?desc.
  FILTER (langMatches(lang(?label),"EN"))
  FILTER (langMatches(lang(?desc),"EN"))
}
GROUP BY ?qnode

which throws an error, the right query should be

SELECT ?qnode (MIN(?label) AS ?label_1) (MIN(?desc) AS ?desc_1) WHERE {
  VALUES ?qnode { wd:Q30271987}
  ?qnode rdfs:label ?label; <http://schema.org/description> ?desc.
  FILTER (langMatches(lang(?label),"EN"))
  FILTER (langMatches(lang(?desc),"EN"))
}
GROUP BY ?qnode

Notice the change (MIN(?label) AS ?label_1) (MIN(?desc) AS ?desc_1)

Please fix this

t2wml specification file

Could it be linked from the readme of the project? It's a little complicated to know which are the supported functions (e.g., to skip rows) if the the spec is not available

Cannot load Excel files with dates

Exception:

 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
[2019-07-04 16:10:37,387] ERROR in app: Exception on /upload_excel [POST]
Traceback (most recent call last):
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "application.py", line 70, in upload_excel
    return upload_file(user_id, sheet_name)
  File "application.py", line 40, in upload_file
    data = excel_to_json(file_path, sheet_name)
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/utility_functions.py", line 100, in excel_to_json
    return json.dumps(result)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable

Change default endpoint in the GUI

Currently the queries go the the standard Wikidata endpoint. By default, it should go to our endpoint:

http://sitaware.isi.edu:8080/bigdata/namespace/wdq/sparql

Error when switching projects

After I'm done with a project (a pair of Excel file and Wikifier file), I want to open another project. The request of /upload_excel goes to an error if I open an Excel file first, since the previous Wikifier file may not applicative to the new Excel file.

It would also happens when I upload an inapplicable Wikifier file first and then the Excel file.

Download as KGTK produces duplicate ids for qualifier edges

For example:

oecd;OECD-Latvia g2g9e7f8-en..csv;D6	Q211	P1082	19782	number	True	0	19782													
oecd;OECD-Latvia g2g9e7f8-en...csv;D6;D4	oecd;OECD-Latvia g2g9e7f8-en..csv;D6	P585	^2011-01-01T00:00:00/9	date_and_times	True	0											"2011-01-01T00:00:00"	9		
oecd;OECD-Latvia g2g9e7f8-en...csv;D6;	oecd;OECD-Latvia g2g9e7f8-en..csv;D6	P248	Q41550	symbol	True	0														Q41550
oecd;OECD-Latvia g2g9e7f8-en...csv;D6;	oecd;OECD-Latvia g2g9e7f8-en..csv;D6	P2006010001	Q2006050001	symbol	True	0														Q2006050001

This seems to happen for qualifier with fixed values where no cell is used to supply the value.

A simple solution is to not generate ids for qualifier edges as KGTK can easily add them later.

TTL for fertilizer data is incomplete

There are two problems. One in interactive mode, where no file is produced. In batch mode, most of the data is missing. The files are:

For some strange reason the interactive and batch behavior are different, and in the TTL most of the data is missing.

value[A+n, 3] doesn't work

I want to define a YAML file as follows so that the left is the first column that has the value irrigated_land:

# irrigated_land
statementMapping:
  region:
    - left: value[A+n, 3] == "irrigated_land" -> A+n
      right: CB
      top: 9
      bottom: $end
  template:
    item: item[B, $row]
    property: P1082
    value: value[$col, $row]

Looks like it works to have the +n expression with $left as in value[$left+n,3]

Source and wikifier files attached:

2007-10-01_factbook-small.xlsx

wikifier.zip

Conflicting "lark" namespace

In the requirements.txt file, both lark==0.0.4 and lark_parser==0.7.1 are included, but both use the namespace "lark" for imports, which leads to a confusion where the wrong library gets imported. The only places where the "lark" namespace is used is in t2wml_parser.py where the reference is to lark_parser, so it appears "lark==0.0.4" is not even needed for function. This is a minor inconvenience as one can simply remove the package from the requirements.txt or manually uninstall it, but it's a source of confusion for fresh users.

Cannot put constants in value

The value attribute should be an arbitrary expression, including a constant, for example, the following should be valid value: 2012, but causes an error:

    qualifier:
      - property: P585
        value: 2012
        calendar: Q1985727
        precision: year
        time_zone: 0 

Support use of formulas in property, units, precision, etc.

In general, formulas can be used to compute the value of any attribute in a YAML file. For example:

# OECD mapping
statementMapping:
  region: 
    - range: D6:K12
      skip_cell:
        - =value[$col, $row] == " .."
  template:
    # The next line should wikify the extracted country
    #item: '=get_item(regex(value[B, 2], "profile: (.*) \d{4}", 1))' 
    item: Q31 # Belgium
    property: =item[B, $row]
    value: =replace(value[$col, $row], "[^\d.-]", "")
    unit: =item[C, $row]
    qualifier:
      - property: P585 #point in time
        value: =value[$col, 4]
        calendar: Q1985727
        precision: year
        time_zone: 0
        format: "%Y"
    reference: 
      - property: P246 # stated in
        value: Q41550 # OECD
      - property: P2006010001 # Datamart dataset id
        value: Q2006050001 # OECD dataset

In this example, formulas are used in property and unit, but in general formulas could be used anywhere.

Exception in download

on file: SL.EMP.TOTL.SP.NE.ZS.xls

[2019-07-08 19:40:38,370] ERROR in app: Exception on /download [POST]
Traceback (most recent call last):
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/handler.py", line 175, in generate_download_file
    stat = yaml_parser.get_template()
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/YamlParser.py", line 68, in get_template
    self.resolve_template(template)
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/YamlParser.py", line 59, in resolve_template
    result = parse_evaluate_and_get_cell(qualifier_value)
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/t2wml_parser.py", line 144, in parse_evaluate_and_get_cell
    root = generate_tree(text_to_parse)
  File "/Users/pedroszekely/Documents/GitHub/t2wml/Code/t2wml_parser.py", line 22, in generate_tree
    parse_tree = parser.parse(program)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/lark.py", line 292, in parse
    return self.parser.parse(text)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/parser_frontends.py", line 170, in parse
    return self.parser.parse(text)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/parsers/earley.py", line 307, in parse
    return self.forest_tree_visitor.visit(solutions[0])
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/parsers/earley_forest.py", line 281, in visit
    return super(ForestToTreeVisitor, self).visit(root)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/parsers/earley_forest.py", line 204, in visit
    vtn(current)
  File "/Users/pedroszekely/.virtualenvs/t2wml/lib/python3.7/site-packages/lark/parsers/earley_forest.py", line 284, in visit_token_node
    self.output_stack[-1].append(node)
IndexError: deque index out of range

Statements are overwritten for cells which are evaluated first with the values of the statements which are generated later

While downloading a JSON/TTL file the statements are being overwritten with values of statements which are generated after that particular cell.
Refer to this log, this is the value of variable data during 2 different iterations of generate_download_file function defined in t2wml_handling , Check the value of P585 qualifier for the cell D4. The first log is when cell D9 is processed and the second is for cell E4. This issue isn't just restricted to qualifiers but to the whole statement.

[
  {
    "cell": "D4",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  },
  {
    "cell": "D5",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  },
  {
    "cell": "D6",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  },
  {
    "cell": "D7",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  },
  {
    "cell": "D8",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  },
  {
    "cell": "D9",
    "statement": {
      "item": "Q977",
      "property": "P100024",
      "value": 1,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2000-01-01T00:00:00",
          "cell": "D3"
        },
        {
          "property": "P6001",
          "value": "Q6581097",
          "cell": "C9"
        },
        {
          "property": "P123",
          "value": "Q6039400",
          "cell": "B9"
        }
      ],
      "cell": "A9"
    }
  }
]

********************

[
  {
    "cell": "D4",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "D5",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "D6",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "D7",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "D8",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "D9",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  },
  {
    "cell": "E4",
    "statement": {
      "item": "Q967",
      "property": "P100024",
      "value": 2,
      "qualifier": [
        {
          "property": "P585",
          "calendar": "Q1985727",
          "precision": 9,
          "time_zone": 0,
          "format": "%Y",
          "value": "2001-01-01T00:00:00",
          "cell": "E3"
        },
        {
          "property": "P6001",
          "value": "Q6581072",
          "cell": "C4"
        },
        {
          "property": "P123",
          "value": "Q7649586",
          "cell": "B4"
        }
      ],
      "cell": "A4"
    }
  }
]
********************

Not able to use GUI in Chrome browser on Windows

Using a Chrome browser on Windows, I get the following error:

idpiframe

If I ignore the error and click on Log in with Google, I get this error:
error

Also, I have allowed all cookies and reset the browser cache.

references not part of the system yet?

YAML:

# irrigated_land
statementMapping:
  region:
    - left: CB
      right: CB
      top: 9
      bottom: 26
  template:
    item: item[B, $row]
    property: P1082
    value: value[$col, $row] 
    unit: Q712226 # sq km 
    qualifier:
      - property: P585
        value: value[CD, 9]
        calendar: Q1985727
        precision: year
        time_zone: 0
        format: "%Y"
    reference:
      - property: P246 # stated in
        value: Q11191 # The World Factbook

Support uploading properties in KGTK format

Currently, properties must be uploaded in JSON. We need support to upload properties in KGTK TSV format as in the attached file. The idea is that the T2WML backend will scan the uploaded file to select rows of the form:

	P2006050001	data_type	quantity
	P2006050002	data_type	quantity

The set of property types are, the following, using the terminology in the KGTK command to generate triples. We may need to change or add aliases to this list:

item
time
globe-coordinate
quantity
monolingualtext
string
external-identifier
url
property

Wikifier GUI does not show label and description

Ghost highlighted region

After I applied a YAML file to one sheet, then I switched to another sheet and switched back, the highlighted regions disappeared, which was ok. But when I click some cells in the data region, the request of /resolve_cell still return something, which affects the table viewer and the output.

I suppose deleting the YAML file on backend whenever a /upload_excel request is fired would solve this issue.

image

Apply YAML for indicator files takes a really long time

Applying the YAML file to this CSV takes a really long time:
DT.ODA.ODAT.GI.ZS.csv.zip

Is it taking a long time in the server or in the browser?

statementMapping:
  region:
    - left: D
      right: BL
      top: 5
      bottom: 269
  template:
    item: item(A/$row)
    property: item(D/$row)
    value: value($col/$row)
    #unit: # need to define the units
    qualifier:
      - property: P585
        value: value($col/4)
        calendar: Q1985727
        precision: year
        time_zone: 0

etk library installation

After following the installation instructions from the readme and running application.py, I got a ModuleNotFoundError for 'etk.wikidata' on line: 'from etk.wikidata.entity import WDItem' (from the triple_generator.py file).
I think that the etk library (https://github.com/usc-isi-i2/etk) that downloads from the requirements.txt installation process doesn't contain WikiData modules. Instead I have tried installing https://github.com/fatestigma/etk/tree/wikidata which does contain WikiData modules, but the file structure doesn't match the imports in the code.

skip-row isn't skipping all rows that satisfy the conditions

Please check this example. The skip-row isn't skipping row 9.
But If I try to skip rows with value[D, $row] == 2 it works as expected. I think the issue might be with trimming the cell values.
image
Here is the sample YAML file based on Homicide data Table-1a:

statementMapping:
  region:
    - left: D
      right: F
      top: 4
      bottom: 9
      skip_row:
        - value[D, $row] == 1
  template:
    item: item[A, $row]
    property: P100024 # murder
    value: value[$col, $row]
    #unit: D1002
    qualifier:
      - property: P585
        value: value[$col, 3]
        calendar: Q1985727
        precision: year
        time_zone: 0
        format: "%Y"
      - property: P6001 # applies to people
        value: item[C, $row]
      - property: P123 #source
        value: item[B, $row]

Integration of pathlib in older versions of Python

I am working on a system with Python version 3.5.5. I keep on getting the error TypeError: invalid file: WindowsPath("/somepath")

pathlib integrates seemlessly with "open" only in Python 3.6 and later

The built-in open() function has been updated to accept os.PathLike objects, as have all relevant functions in the os and os.path modules, and most other functions and classes in the standard library.

A standard fix for this would be to convert the object to a string before opening the files.

unit not part of output or download

YAML:

# irrigated_land
statementMapping:
  region:
    - left: CB
      right: CB
      top: 9
      bottom: 26
  template:
    item: item[B, $row]
    property: P1082
    value: value[$col, $row]
    unit: Q712226 # sq km

The main problem is that the units are not part of the JSON, so they will not appear in the final output.

A secondary problem is that the units are not shown in the output on the screen, less important to fix, but would be nice. It should get the label of the Qnode for the units and put it next to the value.

data and wikifier same as in issue #89

image

References should be lists of lists

References are currently implemented the same way as qualifiers. However, references should be list of lists as in

# OECD mapping
statementMapping:
  region: 
    - range: D6:K12
      skip_cell:
        - =value[$col, $row] == " .."
  template:
    # The next line should wikify the extracted country
    #item: '=get_item(regex(value[B, 2], "profile: (.*) \d{4}", 1))' 
    item: Q31 # Belgium
    property: =item[B, $row]
    value: =replace(value[$col, $row], "[^\d.-]", "")
    reference: 
      - - property: P246 # stated in
          value: Q41550 # OECD
        - property: P2006010001 # Datamart dataset id
          value: Q2006050001 # OECD dataset
      - - property: P246 # stated in
          value: Q123456 
        - property: PP1234567 # Datamart dataset id
          value: "hi there"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.