kili-technology / kili-python-sdk Goto Github PK

Simplest and fastest image and text annotation tool.

License: Apache License 2.0

Python 7.20% Jupyter Notebook 92.80%

image-annotation-tool text-annotation-tool document-annotation-tool annotation-tool-online annotation-tool-offline annotation labeling labeling-tool image-labeling text-labeling

kili-python-sdk's Introduction

Kili Python SDK

SDK Reference: https://python-sdk-docs.kili-technology.com/

Kili Documentation: https://docs.kili-technology.com/docs

App: https://cloud.kili-technology.com/label/

Website: https://kili-technology.com/

What is Kili?

Kili is a platform that empowers a data-centric approach to Machine Learning through quality training data creation. It provides collaborative data annotation tools and APIs that enable quick iterations between reliable dataset building and model training. More info here.

Annotation tools examples

Named Entities Extraction and Relation	PDF classification and bounding-box	Object detection (bounding-box)

and many more.

What is Kili Python SDK?

Kili Python SDK is the Python client for the Kili platform. It allows to query and manipulate the main entities available in Kili, like projects, assets, labels, api keys...

It comes with several tutorials that demonstrate how to use it in the most frequent use cases.

Requirements

Python >= 3.8
Create and copy a Kili API key
Add the KILI_API_KEY variable in your bash environment (or in the settings of your favorite IDE) by pasting the API key value you copied above:

export KILI_API_KEY='<your api key value here>'

Installation

Install the Kili client with pip:

pip install kili

If you want to contribute, here are the installation steps.

Usage

Instantiate the Kili client:

from kili.client import Kili
kili = Kili()
# You can now use the Kili client!

Note that you can also pass the API key as an argument of the Kili initialization:

kili = Kili(api_key='<your api key value here>')

For more details, read the SDK reference or the Kili documentation.

Tutorials

Check out our tutorials! They will guide you through the main features of the Kili client.

You can find several other recipes in this folder.

Examples

Here is a sample of the operations you can do with the Kili client:

Creating an annotation project

json_interface = {
    "jobs": {
        "CLASSIFICATION_JOB": {
            "mlTask": "CLASSIFICATION",
            "content": {
                "categories": {
                    "RED": {"name": "Red"},
                    "BLACK": {"name": "Black"},
                    "WHITE": {"name": "White"},
                    "GREY": {"name": "Grey"}},
                "input": "radio"
            },
            "instruction": "Color"
        }
    }
}
project_id = kili.create_project(
    title="Color classification",
    description="Project ",
    input_type="IMAGE",
    json_interface=json_interface
)["id"]

Importing data to annotate

assets = [
    {
        "externalId": "example 1",
        "content": "https://images.caradisiac.com/logos/3/8/6/7/253867/S0-tesla-enregistre-d-importantes-pertes-au-premier-trimestre-175948.jpg",
    },
    {
        "externalId": "example 2",
        "content": "https://img.sportauto.fr/news/2018/11/28/1533574/1920%7C1280%7Cc096243e5460db3e5e70c773.jpg",
    },
    {
        "externalId": "example 3",
        "content": "./recipes/img/man_on_a_bike.jpeg",
    },
]

external_id_array = [a.get("externalId") for a in assets]
content_array = [a.get("content") for a in assets]

kili.append_many_to_dataset(
    project_id=project_id,
    content_array=content_array,
    external_id_array=external_id_array,
)

See the detailed example in this tutorial.

Importing predictions

prediction_examples = [
    {
        "external_id": "example 1",
        "json_response": {
            "CLASSIFICATION_JOB": {
                "categories": [{"name": "GREY", "confidence": 46}]
            }
        },
    },
    {
        "external_id": "example 2",
        "json_response": {
            "CLASSIFICATION_JOB": {
                "categories": [{"name": "WHITE", "confidence": 89}]
            }
        },
    }
]

kili.create_predictions(
    project_id=project_id,
    external_id_array=[p["external_id"] for p in prediction_examples],
    json_response_array=[p["json_response"] for p in prediction_examples],
    model_name="My SOTA model"
)

See detailed examples in this recipe.

Exporting labels

kili.export_labels("your_project_id", "export.zip", "yolo_v4")

See a detailed example in this tutorial.

kili-python-sdk's People

Contributors

Stargazers

Watchers

kili-python-sdk's Issues

Allow to edit the boundary box

Allow to edit the boundary box once it's created in order to adjust more precisely to the object and remove margins.

Retrouver dans labels() le model_name associé à une prédiction create_predictions()

J'ai envoyé des préannotations avec la méthode create_predictions() en précisant le paramètre model_name_array.
Je récupère les labels avec la méthode labels() et les fields par défaut ['author.email', 'author.id', 'id', 'jsonResponse', 'labelType', 'secondsToLabel'] comment récupérer le model_name ?

why harvest emails and spam!?

@edarchimbaud you can't use a private message mate? $10m funding so you can spam harvested emails? 🔥 👎

append_many_to_dataset: Mutation "data" failed with error: unknown field

I'm currently trying to add some data in an NLP project via the API with,

      playground.append_many_to_dataset(project_id=project_id,
                                        content_array=['test document 1'],
                                        external_id_array=['doc1'])

this fails with,

  File "kili\mutations\asset\__init__.py", line 61, in append_many_to_dataset
    projects = playground.projects(project_id)
  File "kili\queries\project\__init__.py", line 68, in projects
    return format_result('data', result)
  File "kili\helpers.py", line 17, in format_result
    raise GraphQLError(name, result['errors'])
kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': 'unknown field', 'path': ['variable', 'where', 'updatedAtGte'], 'extensions': {'code': 'GRAPHQL_VALIDATION_FAILED'}}]"

it turns out that just running playground.projects(project_id) produced the same error. I noticed that providing a random project_id will also produce the same error. So at this point I have 2 hypothesis,

either that error is just an indication that the project_id is incorrect. I'm using the ID from the URL e.g. /label/projects/<project_id>. If it is incorrect is there another way I can find it?
or there is an unrelated issue with this project somewhere.

Suggestions how to debug this would be very much appreciated. Thanks!

Mise à jour de la documentation/reference create_predictions() pour les paramètres model_name_array & model_name

Lorsqu'on utilise utilise create_predictions() avec le paramètre on obtient un message de warning informant que model_name_array est un paramètre déprécié et qu'il faut utiliser model_name.
Le paramètre model_name ne figure pas dans la doc
Le parmaètre model_name_array n'est pas signalé comme déprécié dans la doc

Comment contrôler la file d'annotation ?

Bonjour,

J'aimerais contrôler précisément l'enchaînement des assets à annoter.

Pour cela j'ai utilisé la fonction update_properties_in_asset() en ajustant d'une part le responsable to_be_labeled_by_array et la priorité priorities que j'ai mise à 1 (j'ai compris qu'elle est à 0 par défaut).

Pourtant lorsque je passe sur le Studio d'annotation, je choisis un verbatim dans le tableau "Explore", je l'annote et le document suivant est un document avec une priority à 0 et qui n'est assigned to personne !

Je choisis un document depuis l'interface Explore :

Après l'avoir annoté et cliqué sur Submit, voici le document qu'il m'est demandé d'annoter :

Je retourne dans la vue Explore pour vérifier la priorité et la personne à qui ce document est affecté :

Le problème ne vient pas du statut puisqu'après la préannotation, j'ai bien veillé à remettre tous les documents en statut TODO

Quels critères sont utilisés pour définir le prochain verbatim à annoter ? Comment modifier ce critère ?

Warning: labelOf { externalId } must be an instance of <enum 'Label'>

I'm trying to get the externalId of assets associated to a label when calling https://cloud.kili-technology.com/docs/python-graphql-api/python-api/#labels

Initially I tried,

>>> labels = playground.labels(project_id=project_id,  fields=['id', 'labelOf'])

which produces,

  File "lib\site-packages\kili\queries\label\__init__.py", line 126, in labels
    return format_result('data', result)
  File "lib\site-packages\kili\helpers.py", line 17, in format_result
    raise GraphQLError(name, result['errors'])
kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': 'Field "labelOf" of type "Asset" must have a selection of subfields. Did you mean "labelOf { ... }"?', 'locations': [{'line': 4, 'column': 47}], 'extensions': {'code': 'GRAPHQL_VALIDATION_FAILED'}}]"

So indeed if I change as suggested,

>>> labels = playground.labels(project_id=project_id,  fields=['id', 'labelOf  { externalId }'])

this works as expected, however I now get the following message on stdout

labelOf { externalId } must be an instance of <enum 'Label'>

printed here. Because it's not a warning but a print I cant silence it either.

The returned results are correct and include the "externalId", so it's not a big issue, but it would be nice to be able to silence this message.

Comment mettre à jour l'interface (ontologie) grâce au SDK ?

Je crée un JSON en suivant les recommandations de la doc :
https://docs.kili-technology.com/docs/customizing-project-interface
https://docs.kili-technology.com/docs/customizing-the-interface-through-json-settings
Mais cette documentations fait uniquement référence au studio.
Comment publier le JSON grâce au SDK ?

Extract masks instead of poly

Hello,

I was wondering if it possible to extract directly the masks instead of the poly vertices for a semantic segmentation task (i.e. each pixel is one class)
Here is an example where poly is a problem, in the following image from left to right annotations I made in Kili:
Base Image | Plate annotation | Class 1 Annotations | Class 2 Annotations | Class 3 Annotations | Class 4 Annotations

Now if I extract the annotations under Kili format, the Plate Annotation will be:

Is there a way to solve this ?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

chore(deps): update dependency dev/pyright to v1.1.361
chore(deps): update dependency dev/pylint to v3.1.0
chore(deps): update lycheeverse/lychee-action action to v1.10.0
chore(deps): update pre-commit hook astral-sh/ruff-pre-commit to v0.4.3
chore(deps): update pre-commit hook pre-commit/pre-commit-hooks to v4.6.0
chore(deps): update slackapi/slack-github-action action to v1.26.0
Click on this checkbox to rebase all open PRs at once

Detected dependencies

github-actions

.github/workflows/bump_commit_release_branch.yml

actions/checkout v4

actions/setup-python v5

.github/workflows/ci.yml

actions/checkout v4

actions/setup-python v5

pre-commit/action v3.0.1

actions/checkout v4

actions/setup-python v5

actions/checkout v4

actions/setup-python v5

actions/checkout v4

actions/setup-python v5

actions/checkout v4

lycheeverse/lychee-action v1.9.3

actions/checkout v4

actions/setup-python v5

actions/checkout v4

actions/setup-python v5

actions/checkout v4

actions/setup-python v5

.github/workflows/create_draft_release.yml

actions/checkout v4

actions/setup-python v5

slackapi/slack-github-action v1.25.0

.github/workflows/datadog.yml

actions/setup-python v5

.github/workflows/deploy_doc.yml

actions/checkout v4

actions/setup-python v5

slackapi/slack-github-action v1.25.0

.github/workflows/e2e_tests.yml

actions/checkout v4

actions/setup-python v5

slackapi/slack-github-action v1.25.0

.github/workflows/pr.yml

amannn/action-semantic-pull-request v5

.github/workflows/publish.yml

actions/checkout v4

actions/setup-python v5

slackapi/slack-github-action v1.25.0

pep621

pyproject.toml

pandas >= 1.0.0, < 3.0.0

click >= 8.0.0, < 9.0.0

requests >= 2.0.0, < 3.0.0

tabulate >= 0.9.0, < 0.10.0

tenacity >= 8.0.0, < 9.0.0

tqdm >= 4.0.0, < 5.0.0

typeguard >= 4, < 5

typing-extensions >= 4.1.0, < 5.0.0

pyparsing >= 3.0.0, < 4.0.0

websocket-client >= 1.0.0, < 2.0.0

pyyaml >= 6.0, < 7.0

Pillow >=9.0.0, <11.0.0

cuid >= 0.4, < 0.5

urllib3 >= 1.26, < 3

ffmpeg-python >= 0.2.0, < 0.3.0

gql >= 3.5.0b5, < 4.0.0

filelock >= 3.0.0, < 4.0.0

pip-system-certs >= 4.0.0, < 5.0.0

pyrate-limiter >= 3, < 4

dev/pre-commit >= 3.3.0, < 4.0.0

dev/pylint ==3.0.3

dev/pyright ==1.1.347

dev/vulture ==2.11

dev/dead ==1.5.2

dev/opencv-python >= 4.0.0, < 5.0.0

dev/azure-storage-blob >= 12.0.0, < 13.0.0

image-utils/opencv-python >= 4.0.0, < 5.0.0

azure/azure-storage-blob >= 12.0.0, < 13.0.0

pre-commit

.pre-commit-config.yaml

pre-commit/pre-commit-hooks v4.5.0

PyCQA/docformatter v1.7.5

asottile/pyupgrade v3.15.2

srstevenson/nb-clean 3.2.0

astral-sh/ruff-pre-commit v0.1.15

Check this box to trigger a request for Renovate to run again on this repository

export_labels() sur Databricks renvoie "No such file or directory"

J'exécute le code suivant sur Azure Databricks

from kili.client import Kili

kili = Kili(api_key=api_key['kili-secret'])

path_to_labels = "/databricks/driver/export.zip"

kili.export_labels(project_id=project_id,filename = path_to_labels,fmt="kili",layout = "merged",single_file=True,asset_ids=list("cld3aqa68078x0jw2ds463b3k"),with_assets=False)

renvoie l'erreur suivante :

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpx9xpowaa/clbgd78ic00sw0k3a0y0dcbgv/data.json'

Avec le traceback :

FileNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 path_to_labels = "/databricks/driver/export.zip"
----> 2 kili.export_labels(project_id=project_id,filename = path_to_labels,fmt="kili",layout = "merged",single_file=True,asset_ids=list("cld3aqa68078x0jw2ds463b3k"),with_assets=False)

/databricks/python/lib/python3.9/site-packages/kili/queries/label/init.py in export_labels(self, project_id, filename, fmt, asset_ids, layout, single_file, disable_tqdm, with_assets, external_ids)
310
311 try:
--> 312 services.export_labels(
313 self,
314 asset_ids=asset_ids,

/databricks/python/lib/python3.9/site-packages/kili/services/export/init.py in export_labels(kili, asset_ids, project_id, export_type, label_format, split_option, single_file, output_file, disable_tqdm, log_level, with_assets)
82 ) # ensures full mapping
83 exporter_class = format_exporter_selector_mapping[label_format]
---> 84 exporter_class(
85 export_params, kili, logger, disable_tqdm, content_repository
86 ).export_project()

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/base.py in export_project(self)
157 local_media_dir=str(self.images_folder),
158 )
--> 159 self.process_and_save(assets, self.output_file)
160
161 @Property

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/kili/init.py in process_and_save(self, assets, output_filename)
55 """
56 clean_assets = self.process_assets(assets, self.label_format)
---> 57 return self._save_assets_export(
58 clean_assets,
59 output_filename,

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/kili/init.py in _save_assets_export(self, assets, output_filename)
34 if self.single_file:
35 project_json = json.dumps(assets, sort_keys=True, indent=4)
---> 36 with (self.base_folder / "data.json").open("wb") as output_file:
37 output_file.write(project_json.encode("utf-8"))
38 else:

/usr/lib/python3.9/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1240 the built-in open() function does.
1241 """
-> 1242 return io.open(self, mode, buffering, encoding, errors, newline,
1243 opener=self._opener)
1244

/usr/lib/python3.9/pathlib.py in _opener(self, name, flags, mode)
1108 def _opener(self, name, flags, mode=0o666):
1109 # A stub for the opener argument to built-in open()
-> 1110 return self._accessor.open(self, flags, mode)
1111
1112 def _raw_open(self, flags, mode=0o777):

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpx9xpowaa/clbgd78ic00sw0k3a0y0dcbgv/data.json'

Please stop spamming GitHub users emails

This is spam and will be reported as such.

More detailed changelog for breaking changes

It would be helpful to have a more detailed changelog (ideally as a file in this repository) that would in particular mention breaking changes.

For instance, with 2.0.4 I have a script that worked fine to create a new project with,

    json_interface = {
        "filetype": "TEXT",
        "jobs": {  .... }
    }

(adapted from a documentation example).
In 2.1.1 this same script produces the following error,

kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': '[unsupportedType] Unsupported project input type "NA". Input type should be one of: IMAGE, PDF, TEXT, VIDEO', 'path': ['data']}]"

presumably because the input type now needs to be specified via the input_type parameter of update_properties_in_project (related to #160)

This is a breaking change and it would help if was explicitly specified in the release notes, so one wouldn't have to search through the source code or the the commit history as to what needs to be changed.

Made dataset splits possible during export

I am working on a project using Kili and exporting my datasets with different types of annotations.

I think it would be useful to add a new feature that allows me to split the datasets into train/validation/test sets directly during the export process.
Currently, I have to create another script after the Kili export, and I believe that every other data scientist has to do the same.

Is this feature being developed internally, and do you think it would be a valuable addition?

Spammed by Kili github actions

Hello everyone,

Due to one of my last PR's, I'm getting spammed by failing integration tests in CI (cf image below), I get like 5 emails per day.

It seems like there is a leak in one of the pipelines

Supprimer un label avec le SDK

J'effectue la migration du projet initié avec un outil d'annotation custom vers Kili.
Pour ce faire j'utilise le SDK Python.
D'une part je dois pousser des pré-annotations réalisées, pour cela j'ai trouvé la méthode create_predictions()
D'autre part je dois pousser les annotations existantes, pour cela j'utilise la méthode append_labels()

Pour certains verbatim, j'ai à la fois ajouté une préannotation et l'annotation existante.
Je voudrais supprimer la préannotation. Comment faire ?

De plus j'aimerais bien comprendre le mécanisme de append_labels(), est-ce vraiment un append, c'est à dire un ajout de nouveaux labels en plus des labels existants ? ou bien un overwrite c'est à dire un remplacement des labels existants ?

Correction du lien vers le notebook "Basic Project Setup"

Dans la doc située ici : https://python-sdk-docs.kili-technology.com/2.127/sdk/tutorials/basic_project_setup/
le lien renvoie un code 404 https://python-sdk-docs.kili-technology.com/2.127/sdk/tutorials/basic_project_setup/

Max. length of text documents?

Hi,
I want to use your platform to annotate text documents. But after uploading the txt files, not the entire text document is displayed. It seems to cut each document after 200 lines (+- depending on the file). Is there a specific limit for txt files?

Thanks,
Max