Giter VIP home page Giter VIP logo

recognizingquantityname's Introduction

Recognizing Quantity Names for Tabular Data

This is the datasets and Python code for Recognizing Quantity Names for Tabular Data. The paper can be found here. The presentation slides can be found here.

Prepare for the Dataset

Since the size of datasets used in this experiments is too large, we share our datasets by providing a CSV file called id_url.csv, which contains dataset ID, CSV ID, and download URL for each individual dataset in each row. In our experiment, we put all the datasets in a folder named 'data', and name the dataset folder as its dataset ID, and CSV file folder as its CSV ID.

The structure is shown below:

|___ data
    |___ dataset ID
        |___ CSV ID
            |___ data.csv
    |___ dataset ID
        |___ CSV ID
    	    |___ data.csv
    |___ dataset ID
        |___ CSV ID
    	    |___ data.csv
    ......

In each of the six .txt files name starts with 'quantity', there is a list of dataset IDs, CSV IDs, and column names that consist of our dataset for training and testing for each quantity name.

Run the Code

There are two Python files in this repository, first run:

python feature_build.py

This process is relatively slow, since it needs to parse all the datasets list in those text files. The index along with a message 'ok' will be printed if that dataset is successfully read. If only the index is printed, it means there's an error reading the dataset.

After it is finished, six CSV files will be created containing the features of instances. Then run:

python cross_validation.py

Since we have already provided six CSV files here, this command can also be run individually.

The results will be printed then.

recognizingquantityname's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.