Giter VIP home page Giter VIP logo

cotk's Introduction

Conversational Toolkits

CodeFactor codebeat badge Coverage Status Build Status Actions Status Actions Status

cotk is an open-source lightweight framework for model building and evaluation. We provides standard dataset and evaluation suites in the domain of general language generation. It easy to use and make you focus on designing your models!

Features included:

  • Light-weight, easy to start. Don't bother your way to construct models.
  • Predefined standard datasets, in the domain of language modeling, dialog generation and more.
  • Predefined evaluation suites, test your model with multiple metrics in several lines.
  • A dashboard to show experiments, compare your and others' models fairly.
  • Long-term maintenance and consistent development.

This project is a part of dialtk (Toolkits for Dialog System by Tsinghua University), you can follow dialtk or cotk on our home page.

Note: master branch is the developing branch. The newest release is v0.1.0.

Quick links

Index

Installation

Requirements

  • python 3
  • numpy >= 1.13
  • nltk >= 3.4
  • tqdm >= 4.30
  • checksumdir >= 1.1
  • pytorch >= 1.0.0 (optional, accelerating the calculation of some metrics)
  • transformers (optional, used for pretrained models)

We support Unix, Windows, and macOS.

Install from pip

You can simply get the latest stable version from pip using

    pip install cotk

Install from source code

  • Clone the cotk repository
    git clone https://github.com/thu-coai/cotk.git
  • Install cotk via pip
    cd cotk
    pip install -e .

Quick Start

Let's skim through the whole package to find what you want.

Dataloader

Load common used dataset and do preprocessing:

  • Download online resources or import from local path
  • Split training set, development set and test set
  • Construct vocabulary list
    >>> import cotk.dataloader
    >>> # automatically download online resources
    >>> dataloader = cotk.dataloader.MSCOCO("resources://MSCOCO_small")
    >>> # or download from a url
    >>> dl_url = cotk.dataloader.MSCOCO("http://cotk-data.s3-ap-northeast-1.amazonaws.com/mscoco_small.zip#MSCOCO")
    >>> # or import from local file
    >>> dl_zip = cotk.dataloader.MSCOCO("./MSCOCO.zip#MSCOCO")

    >>> print("Dataset is split into:", dataloader.fields.keys())
    dict_keys(['train', 'dev', 'test'])

Inspect vocabulary list

    >>> print("Vocabulary size:", dataloader.frequent_vocab_size)
    Vocabulary size: 2597
    >>> print("First 10 tokens in vocabulary:", dataloader.frequent_vocab_list[:10])
    First 10 tokens in vocabulary: ['<pad>', '<unk>', '<go>', '<eos>', '.', 'a', 'A', 'on', 'of', 'in']

Convert between ids and strings

    >>> print("Convert string to ids", \
    ...           dataloader.convert_tokens_to_ids(["<go>", "hello", "world", "<eos>"]))
    Convert string to ids [2, 6107, 1875, 3]
    >>> print("Convert ids to string", \
    ...           dataloader.convert_ids_to_tokens([2, 1379, 1897, 3]))
	Convert ids to string ['hello', 'world']

Iterate over batches

    >>> for data in dataloader.get_batches("train", batch_size=1):
    ...     print(data)
    {'sent':
        array([[ 2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 1, 1099, 4, 3]]),
        # <go> This is an old photo of people and a <unk> wagon.
     'sent_allvocabs':
        array([[ 2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3]]),
        # <go> This is an old photo of people and a horse-drawn wagon.
     'sent_length': array([14])}
    ......

Or using while (another iteration method) if you like

    >>> dataloader.restart("train", batch_size=1):
    >>> while True:
    ...    data = dataloader.get_next_batch("train")
    ...    if data is None: break
    ...    print(data)

note: If you want to know more about Dataloader, please refer to docs of dataloader.

Metrics

We found there are different versions of the same metric in different papers, which leads to unfair comparison between models. For example, whether considering unk, calculating the mean of NLL across sentences or tokens in perplexity may introduce huge differences.

We provide a unified implementation for metrics, where hashvalue is provided for checking whether the same data is used. The metric object receives mini-batches.

    >>> import cotk.metric
    >>> metric = cotk.metric.SelfBleuCorpusMetric(dataloader, gen_key="gen")
    >>> metric.forward({
    ...    "gen":
    ...        [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],
    ...         [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]
    ... })
    >>> print(metric.close())
    {'self-bleu': 0.02253475750490193, 'self-bleu hashvalue': 'f7d75c0d0dbf53ffba4b845d1f61487fd2d6d3c0594b075c43111816c84c65fc'}

You can merge multiple metrics together by cotk.metric.MetricChain.

    >>> metric = cotk.metric.MetricChain()
    >>> metric.add_metric(cotk.metric.SelfBleuCorpusMetric(dataloader, gen_key="gen"))
    >>> metric.add_metric(cotk.metric.FwBwBleuCorpusMetric(dataloader, reference_test_list=dataloader.get_all_batch('test')['sent_allvocabs'], gen_key="gen"))
    >>> metric.forward({
    ...    "gen":
    ...        [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],
    ...         [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]
    ... })
    >>> print(metric.close())
    100%|██████████| 1000/1000 [00:00<00:00, 5281.95it/s]
	{'self-bleu': 0.02253475750490193, 'self-bleu hashvalue': 'f7d75c0d0dbf53ffba4b845d1f61487fd2d6d3c0594b075c43111816c84c65fc', 'fw-bleu': 0.28135593382545376, 'bw-bleu': 0.027021522872801896, 'fw-bw-bleu': 0.04930753293488745, 'fw-bw-bleu hashvalue': '60a39f381e065e8df6fb5eb272984128c9aea7dee4ba50a43bfb768395a70762'}

We also provide recommended metrics for selected dataloader.

    >>> metric = dataloader.get_inference_metric(gen_key="gen")
    >>> metric.forward({
    ...    "gen":
    ...        [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],
    ...         [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]
    ... })
    >>> print(metric.close())
    100%|██████████| 1000/1000 [00:00<00:00, 4857.36it/s]
	100%|██████████| 1250/1250 [00:00<00:00, 4689.29it/s]
	{'self-bleu': 0.02253475750490193, 'self-bleu hashvalue': 'f7d75c0d0dbf53ffba4b845d1f61487fd2d6d3c0594b075c43111816c84c65fc', 'fw-bleu': 0.3353037449663603, 'bw-bleu': 0.027327995838287513, 'fw-bw-bleu': 0.050537105917262654, 'fw-bw-bleu hashvalue': 'c254aa4008ae11b1bc4955e7cd1f7f3aad34b664178a585a218b1474970e3f23', 'gen': [['inside', 'is', 'an', 'elephant', 'shirt', 'of', 'people', 'and', 'a', 'grasslands', 'pulls', '.'], ['An', 'elephant', 'girls', 'baggage', 'sidewalk', 'with', 'a', 'clock', 'on', 'it', '.']]}

note: If you want to know more about metrics, please refer to docs of metrics.

Predefined Models

We have provided some baselines for the classical tasks, see Model Zoo in docs for details.

You can also use cotk download thu-coai/MODEL_NAME/master to get the codes.

Issues

You are welcome to create an issue if you want to request a feature, report a bug or ask a general question.

Contributions

We welcome contributions from community.

  • If you want to make a big change, we recommend first creating an issue with your design.
  • Small contributions can be directly made by a pull request.
  • If you like make contributions for our library, see issues to find what we need.

Team

cotk is maintained and developed by Tsinghua university conversational AI group (THU-coai). Check our main pages (In Chinese).

License

Apache License 2.0

cotk's People

Contributors

aaa123git avatar chujiezheng avatar codacy-badger avatar cospui avatar fsygd avatar gentlesmile avatar haozheji avatar heylinsir avatar hzhwcmhf avatar jianguanthu avatar kepei1106 avatar t1101675 avatar tissuec avatar z123z123d avatar zhihongshao avatar zqwerty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cotk's Issues

[Model] LSTM language modelling

Write a model for Language Generation Dataloader. Either in tensorflow or pytorch.

If you write in tensorflow, please use a newer version of tensorflow like 1.13.

Tests are required.

[BUG] typo in metric.py

Describe the bug
PerlplexityMetric ->PerplexityMetric

Move ./tests/dataloader/test_metric to ./tests/metric/test_metric

[Model] CVAE

Refer to Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

[Models] CopyNet

Refer to Incorporating Copying Mechanism in Sequence-to-Sequence Learning.

[Enhancement] Adapt test for metric using allvocabs

Description:

Now dataloader have added new attributes: valid vocabs and invalid vocabs
valid vocabs mean the vocabularies used by models
all vocabs(== valid vocabs + invalid vocabs) mean the vocabularies used by metrics.
If a word is not any kind of all vocabs, it is unkown vocabs, which are ignored by metrics.

Metric unittest must be adapted for new metrics.

Requirements:

  • Pull invalid_vocab branch
  • FakeDataloader should have new attributes like all_vocab_size, ...
  • Bleu & Recorder metrics have to use all vocabs
  • Perplexity used a smoothing algorithm (You can see the code in PerlplexityMetric as reference):
    • If models predict valid vocabs, perplexity is calculated as it was
    • If models predict UNK, the probability is divided evenly to invalid vocabs
    • If the reference is UNK, the word is ignored.
      So, you have to write tests for the new PerplexityMetric and MultiturnPerplexityMetric
      Try to cover the 3 conditions above.

[Maintenance] Refactor dataloader of SwitchBoard

  • _build_vocab has to use multi_ref data
  • renamed to inference metric. embedding should have a default realization (use wordvec from Glove)
  • add unittest for unique feature of SwitchBoard

  • add hashvalue

[Enhancement] gather download links of data

Gather the download links of data, make a 'dataset_config.json' in ./contk/dataloader

{
"MSCOCO": "https://XXXX"
}

It is best reference from the original link, can use gzip or other compressed format.

[Enhancement] Metrics check whether models use the same data

Problems

It may be hard to evaluate 2 models using the same test data in the same way.
So it's important to make the metrics be able to telling which data is used.

Proposal A

Make metrics binding the dataloader. Data must be processed in the same order.

Drawback:

  • must be in same order

Proposal B

Make a hash value of data. It's able to tell the differences.

Drawback:

  • hard to find bugs

[BUG] bleu will crash

Describe the bug
BleuMetric will crashed when len(hypothesis) == 1?
possible because of smoothingFunction?

It's an upstreaming bug, just comment and give up

To Reproduce

checked

[Model] HRED

Refer to Building end-to-end dialogue systems using generative hierarchical neural network models

[Enhancement] Make unit test for models

Requirement

  • Run models test only in cpu mode
  • Just check the arguments and the connection with the main library
  • Don't need to check performance
  • make the test standalone, because it may need packages like tensorflow or pytorch.

[Model] SeqGAN

Refer to SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

[Feature] Use a stable link on github for data

User may use same id to download same data from different sources:

like “glove” default from github
"glove~github" explicit from github
"glove~tsinghua" explicit download from coai.tsinghua

[Enhancement] Vocab List in Dataloader

For implemention of #8 copynet, dataloader should change behaviours.

In our mind, there should be 3 vocab list:

  • For model trainning, smallest. Only include words from train set. Call it set $V.
  • For metric, bigger. The model will be evaluated on this vocab list, including words from train set and test set. Call it set $M. But almostly all models can't generate words from $V-$M, because they haven't seen these. Howerver, copyNet can gen words from $V-$M by copy mechanism. It's necessary to take these words into accounts when we implement metrics. $V-$M can be expressed as UNK token for some models. Dataloader have to tranlate them into a uniform distribution on $V-$M.
  • The whole space of word, include not seen in all the data. Call it set $N. The words in $N-$M, we don't care about them, ignore them in evaluating models, as #37 . $N-$M is the TRUE UNK.

Require:

  • Change the behavior of dataloader, metric.

[BUG] fix hred test

Describe the bug
hred test is wrong.

Why the turn of generated sentences > turn of reference ???

[Feature] Report system

Write a script that push results to dashboard

Command:
'''
cotk-report [--result result.json] [--only-upload] [--entry main] [other parameter]
'''
result: indicates the test results.
only-upload: indicates push results without running model
entry: means the entry point of models

If running in only upload, the result should be comparable
If runing in full mode, the result can reproducible

Provide a list of api for dashboard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.