Giter VIP home page Giter VIP logo

feat2vec's Introduction

Feature Embedding

Author: Yi Yang

contact: [email protected]

Basic Description

Python code for

Requirements

  • Install gensim by
    • pip install --upgrade gensim
  • If you want a faster version of this tool, you may also want to
    • install Cython by
      • pip install cython
    • compile the code by running
      • python setup.py build_ext --inplace

Demo

A demo for saving feature embeddings to a txt/bin file is available (python save_embeddings.py -h).

Given a feature file (data/twitter_feat.txt) in which each line corresponds to features of one instance, save feature embeddings to a txt file (data/twitter_embeddings.txt):

  1. If features employ bag-of-word (BoW) representation (no feature templates involved)
  • python save_embeddings.py --bow 1 --dim 25 data/twitter_feat.txt data/twitter_embeddings.txt
  1. If features employ structured representation (extract features by feature templates), and given the feature-template mapping file (data/twitter_feat_template.txt)
  • python save_embeddings.py --feature_template_file data/twitter_feat_template.txt --dim 25 data/twitter_feat.txt data/twitter_embeddings.txt
  1. If features employ structured representation (extract features by feature templates), and given the template prefix file (data/twitter_template_prefix.txt)
  • python save_embeddings.py --template_prefix_file data/twitter_template_prefix.txt --dim 25 data/twitter_feat.txt data/twitter_embeddings.txt

See save_features method of twproc.py for how to generate data/twitter_feat.txt and data/twitter_feat_template.txt files given files in CONLL POS format.

Domain Adaptation for Twitter POS tagging

A light demo for part-of-speech tagging of tweets is also provided, using data from CMU Twitter NLP project.

oct27 dataset is regarded as source data, and daily547 dataset is regarded as target data. We also sample some unlabeled tweets randomly (see data/twitter folder).

Run the demo:

  1. Prepare the data (extract features, select pivots, etc.) by running
  • python twproc.py
  1. Obtain the baseline (no adaptation) SVM tagging results by running
  • python twpos.py none
  1. Obtain the marginalized Denoising Autoencoders adaptation results by running
  • python twpos.py mldae
  1. Obtain the feature embedding adaptation results by running
  • python twpos.py feat2vec

The first step will create a file data/dataset_twitter.pkl. I got results of 0.8839, 0.8889 and 0.8924 for step 2, 3 and 4. The feat2vec results may vary a litter due to the negative sampling technique. You should obtain even better results with feat2vec by using more unlabeled data.

feat2vec's People

Contributors

yiyang-gt avatar

Watchers

James Cloos avatar Farhad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.