neulab / explainaboard_client Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 307 KB

Python 99.67% Shell 0.33%

explainaboard_client's People

Contributors

Stargazers

Watchers

Forkers

kaylin1224

explainaboard_client's Issues

API key to dev environment in tests

As I was looking through, I noticed that we’re sharing the staging API key for [email protected] in the tests: https://github.com/neulab/explainaboard_cli/blob/main/explainaboard_cli/tests/test_utils.py#L11

Do you think we should do that? I feel like maybe we shouldn’t, as it would allow an arbitrary person to upload systems without having any way to trace where they came from.

Tutorial on integration with huggingface-trained models

Huggingface transformers is a widely used toolkit, so it'd be cool to have an example of how we can take the output of a huggingface model and upload it to ExplainaBoard through the client.

Documentation of file format for tasks

It'd be good to have documentation of the file formats accepted by every task supported by each task. The "full list" in the documentation links to code, and is not very comprehensible.

Creating a `dev` branch to align with the dev server and having `main` aligned with prod

As mentioned in #64, it is common and normal for prod and dev servers to have different API versions. The CLI should reflect this difference by having a dev branch aligned with the dev server and the original main branch aligned with the prod server.

Re-think `upload_system` as `evaluate_system`

We might want to re-think upload_system as evaluate_system by:

Renaming the script and.
Printing out basic statistics of the output upon upload of a system. At the very least, it'd be good to print out the main scores.

Reduce the number of required arguments

From @pfliu-nlp in this comment: As a demonstration example in README.md, I think this would make the product more attractive: "shortest code makes most powerful things". Based on this excerpt code could be further simplified in the future, for example,

We can reduce the need for task name, source_language, which could be obtained in named datasets
We can even remove the system_name, split, system_output_file_type, by providing some default values.

Of course, we can instruct users to see more comprehensive usage of general examples.

Documentation for Windows users?

We probably also want to make some documentation for users who are accustomed to Windows OS.

Documentation: only python >=3.9 is supported?

It seems that the client only supports python >=3.9, if yes,

(1) we need to clarify this and,
(2) we'd better document how to install python 3.9 WITH and WITHOUT sudo as well.

Add documentation of programmatic usage

Currently we are missing documentation of programmatic usage:
https://github.com/neulab/explainaboard_client#programmatic-usage

It should be added.

bug: SystemOutputProps and passed type was NoneType at ['custom_dataset']

When performing:

python -m explainaboard_client.cli.upload_system \
  --email XXX --api_key XXX \
  --task named-entity-recognition \
  --system_name test_ner \
  --system_output conll2003-elmo-output.conll --output_file_type conll \
  --dataset conll2003 --sub_dataset ner --split test \
  --source_language en --target_language en \
  --public

error:

value = validate_and_convert_types(
  File "/usr1/data/pliu3/neulab/ExplainaBoard-Walkthrough/roles/instructors/fudan_nlp/venv/lib/python3.9/site-packages/explainaboard_api_client/model_utils.py", line 1582, in validate_and_convert_types
    raise get_type_error(input_value, path_to_item, valid_classes,
explainaboard_api_client.exceptions.ApiTypeError: Invalid type for variable 'custom_dataset'. Required value type is SystemOutputProps and passed type was NoneType at ['custom_dataset']

Explainaboard find cli-client fails when not specifying sort-field

Example:

$python -m explainaboard_client.cli.find_systems --username $EB_USERNAME --api-key $EB_API_KEY --output-format tsv
Traceback (most recent call last):
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_client/cli/find_systems.py", line 108, in main
    system_list: list[dict] = client.find_systems(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_client/client.py", line 293, in find_systems
    result: SystemsReturn = self._default_api.systems_get(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_api_client/api/default_api.py", line 1807, in systems_get
    return self.systems_get_endpoint.call_with_http_info(**kwargs)
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_api_client/api_client.py", line 831, in call_with_http_info
    self.__validate_inputs(kwargs)
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_api_client/api_client.py", line 725, in __validate_inputs
    fixed_val = validate_and_convert_types(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_api_client/model_utils.py", line 1577, in validate_and_convert_types
    converted_instance = attempt_convert_item(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/explainaboard_api_client/model_utils.py", line 1456, in attempt_convert_item
    raise get_type_error(input_value, path_to_item, valid_classes,
explainaboard_api_client.exceptions.ApiTypeError: Invalid type for variable 'sort_field'. Required value type is str and passed type was NoneType at ['sort_field']
failed to query systems

The link in README doesn't work

The link https://github.com/neulab/explainaboard_client/blob/main/docs/tasks.py returns a 404 page.

From README.md

[TASK_ID] is the ID of the task you want to perform. A full list is [here](https://github.com/neulab/explainaboard_client/blob/main/docs/tasks.py).

UI Design Suggestion of ExplainaBoard Client

Current Implementation (in the latest PRs)

from explainaboard_client.client import ExplainaboardClient
from explainaboard_client.config import Config

client = ExplainaboardClient(Config(
    "user_email",
    "api_key",
    "environment"
))

result= client.evaluate_system_file(
    system_output_file="sst2-output.txt",
    system_output_file_type="text",
    task="text-classification",
    system_name="test_cli",
    metric_names=["Accuracy"],
    source_language="en",
    target_language="en",
    dataset="sst2",
    split="test",
    sh

Unfriendly Points

(1) explainaboard_client.client: too long and verbose and no one wants to remember this
(2) from explainaboard_client.config import Config: the additional burden to remember this (importing Config)
(3) users need to import two lines of unfamiliar packages, which are not good
(4) Config requires too much information, for example, could we remove the need of email or even remove the config!

Some potential better designs in my mind (from worse -> better)

1.

import os
from explainaboard_client import ExplainaboardClient


explainaboard_client.api_key = os.getenv("API_KEY") # api_key could be a global/system variable


result= ExplainaboardClient.evaluate_system_file(
    system_output_file="sst2-output.txt",
    system_output_file_type="text",
    task="text-classification",
    system_name="test_cli",
    metric_names=["Accuracy"],
    source_language="en",
    target_language="en",
    dataset="sst2",
    split="test",
    ..)

2.

import os
from explainaboard_client

explainaboard_client.api_key = os.getenv("API_KEY") 

result= explainaboard_client.ExplainaboardClient.evaluate_system_file(...)

3.

import os
from inspiredco import ExplainaboardClient  # from  inspiredco import EaasClient in the future

inspiredco.api_key = os.getenv("API_KEY")


result= ExplainaboardClient.evaluate_system_file(...)

3.14

import os
from inspiredco

inspiredco.api_key = os.getenv("API_KEY")

result= inspiredco.ExplainaboardClient.evaluate_system_file(...)

Failed to delete systems for custom dataset.

I upload a system with custom dataset using with the following:

python -m explainaboard_client.cli.upload_system \
  --email $EB_EMAIL --api_key $EB_API_KEY \
  --task text-classification \
  --system_name test-cli-1 \
  --system_output /mnt/c/Users/paulc/Desktop/exb-test-systems/multi-classification/pred.tsv \
  --output_file_type text \
  --custom_dataset /mnt/c/Users/paulc/Desktop/exb-test-systems/multi-classification/dataset.tsv \
  --custom_dataset_file_type tsv \
  --source_language en \
  --target_language en \
  --server local

Then I tried to delete it with

python -m explainaboard_client.cli.delete_systems \
  --email $EB_EMAIL \
  --api_key $EB_API_KEY \
  --system_ids 63288597c7cb5738d5b9826b

CLI gives the following error

Traceback (most recent call last):
  File "/home/paul/explainaboard_client/explainaboard_client/cli/delete_systems.py", line 66, in main
    f'dataset={system_dict["system_info"]["dataset_name"]}, '
KeyError: 'dataset_name'
Failed to delete systems

I think some fields do not exist for custom dataset, and these are only for logging purposes. Deleting itself only require a system ID. So to fix it, I think we can simply give a default text for missing fields.

customized datasets can not work?

It seems that the current command (system outputs on customized datasets) cannot work

python -m explainaboard_client.upload_system --email xxx  --api_key yyy --task text-classification --system_name test_customized_tc --system_output sst2-cnn-output.txt --output_file_type tsv --custom_dataset sst2-dataset.tsv --custom_dataset_file_type tsv --source_language en --target_language en

explainaboard_api_client.exceptions.ApiException: (400)
Reason: BAD REQUEST
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 18 May 2022 12:20:53 GMT', 'Content-Type': 'application/json', 'Content-Length': '65', 'Connection': 'keep-alive', 'Server': 'nginx/1.18.0', 'Access-Control-Allow-Origin': '*'})
HTTP response body: {"detail":"dataset: None---None does not exist","error_code":-1}

the above data can be found here

Redesign API to be simpler

Right now the API is complicated, requiring creation of undocumented data members and use of two libraries explainaboard_client and explainaboard_api_client. We should redesign this to be simpler. Here are some functions to be implemented.

evaluate_system_file(): Perform evaluation from system output files (@neubig is working on this now)
evaluate_system(): Perform evaluation from data that is stored in memory
find_systems(): Find systems
delete_system(): Delete a system by ID

How to inform users to update client and api_client packages?

This is a question I raised during the meeting, and @lyuyangh suggested that we do a version check every time an API call is performed.

I think it's a great idea. A simple implementation would be to check both the client and api_client against their latest versions, and always instructs users to upgrade to the latest versions. Would love to hear any other suggestions!

Follow GNU conventions for command line arguments

Following ExplainaBoard, the client library should also follow GNU conventions for command line arguments: neulab/ExplainaBoard#407

Breaking changes to resolve before upgrading to 0.0.21

The changes in PR neulab/explainaboard_web#526 will break the following code:

explainaboard_client/explainaboard_client/cli/evaluate_system.py

Line 159 in e4ec429

results = evaluation_data["results"]["example"].items()

We should resolve this when upgrading to API client 0.0.21

Tutorial on integration with fairseq-trained models

Fairseq is a widely used toolkit, so it'd be cool to have an example of how we can take the output of a fairseq run and upload it to ExplainaBoard through the client.

For tabular regression/classification, use all features by default?

Currently classification and regression over tabular data (extracted features) are supported through the tabular-regression and tabular-classification tasks. However in the processor for these tasks, they use basically no input features for analysis by default.

Because of this, any features that you want to analyze need to be declared as custom features in a JSON file.

It'd be nice to make this process as easy as possible. Here is an example for a front-end interface we could aim for, similar to a combination of

The only additional thing that would need to be implemented would be theexplainaboard_client.wrap_tabular_data function.

# Import libraries and classes required for this example:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd 
import explainaboard_client

# Import dataset:
url = “iris.csv”

# Assign column names to dataset:
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Convert dataset to a pandas dataframe:
dataset = pd.read_csv(url, names=names) 

# Use head() function to return the first 5 rows: 
dataset.head() 
# Assign values to the X and y variables:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values 

# Split dataset into random train and test subsets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20) 

# Standardize features by removing mean and scaling to unit variance:
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) 

# Use the KNN classifier to fit data:
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train) 

# Predict y data with classifier: 
y_predict = classifier.predict(X_test)

wrapped_test = explainaboard_client.wrap_tabular_dataset(
    X_test,
    y_test,
    column_names=names[:-1],
    columns_to_analyze=['sepal-length', 'sepal-width', 'petal-length', 'petal-width'],
)

# Do the evaluation
evaluation_result = client.evaluate_system(
    task='text-classification',
    system_name='text-classification-test',
    system_output=y_test,
    custom_dataset=wrapped_test,
    split='test',
    source_language='en',
)

# Print the results
print(f'Successfully submitted system!\n'
      f'Name: {evaluation_result["system_name"]}\n'
      f'ID: {evaluation_result["system_id"]}')
results = evaluation_result['results']['example'].items()
for metric_name, value in results:
    print(f'{metric_name}: {value:.4f}')

Rethink the design of explainaboard_web, explainaboard_api_client, explainaboard_client

Currently, we have

(1) explainaboard_web -> explainaboard_api_client -> explainaboard_client

@neubig I'm curious about what the limitations are if we have

(2) explainaboard_web -> explainaboard_client

(considering that we have already had the different types of APIs from explainaboard_web)

To achieve (2):
part of implementation of evaluate_system()

explainaboard_client/explainaboard_client/client.py

Line 46 in 40b17c8

def evaluate_system(

could be moved into explainaboard_web, which could be further utilized by developers who are customized to other languages (e.g., js)

Bug?: No module named explainaboard_client.cli.evaluate_system

When I ran:

python -m explainaboard_client.cli.evaluate_system \
  --email $EB_EMAIL --api_key $EB_API_KEY \
  --task [TASK_ID] \
  --system_name [MODEL_NAME] \
  --system_output [SYSTEM_OUTPUT] --output_file_type [FILE_TYPE] \
  --dataset [DATASET] --sub_dataset [SUB_DATASET] --split [SPLIT] \
  --source_language [SOURCE] --target_language [TARGET] \
  [--public]

Got:

No module named explainaboard_client.cli.evaluate_system

Maybe need version upgrade?

Add integration tests for the CLI

We have tests for the client but we may also need to add some tests to make sure the CLI works.

Bug?: explainaboard_api_client.exceptions.ApiTypeError

When evaluating named datasets from DataLab,

python -m explainaboard_client.cli.upload_system \
  --email XXX --api_key YYY \
  --task named-entity-recognition \
  --system_name test_ner \
  --system_output conll2003-elmo-output.conll --output_file_type conll \
  --dataset conll2003 --sub_dataset ner --split test \
  --source_language en --target_language en \
  --public

Errors:

File "/usr1/data/pliu3/neulab/ExplainaBoard-Walkthrough/roles/instructors/fudan_nlp/venv/lib/python3.9/site-packages/explainaboard_api_client/model_utils.py", line 1582, in validate_and_convert_types
    raise get_type_error(input_value, path_to_item, valid_classes,
explainaboard_api_client.exceptions.ApiTypeError: Invalid type for variable 'dataset_metadata_id'. Required value type is str and passed type was tuple at ['dataset_metadata_id']