Giter VIP home page Giter VIP logo

kili-technology / kili-python-sdk Goto Github PK

View Code? Open in Web Editor NEW
223.0 10.0 27.0 393.34 MB

Simplest and fastest image and text annotation tool.

Home Page: https://kili-technology.com

License: Apache License 2.0

Python 7.20% Jupyter Notebook 92.80%
image-annotation-tool text-annotation-tool document-annotation-tool annotation-tool-online annotation-tool-offline annotation labeling labeling-tool image-labeling text-labeling

kili-python-sdk's Issues

Made dataset splits possible during export

I am working on a project using Kili and exporting my datasets with different types of annotations.

I think it would be useful to add a new feature that allows me to split the datasets into train/validation/test sets directly during the export process.
Currently, I have to create another script after the Kili export, and I believe that every other data scientist has to do the same.

Is this feature being developed internally, and do you think it would be a valuable addition?

export_labels() sur Databricks renvoie "No such file or directory"

J'exécute le code suivant sur Azure Databricks

from kili.client import Kili

kili = Kili(api_key=api_key['kili-secret'])

path_to_labels = "/databricks/driver/export.zip"

kili.export_labels(project_id=project_id,filename = path_to_labels,fmt="kili",layout = "merged",single_file=True,asset_ids=list("cld3aqa68078x0jw2ds463b3k"),with_assets=False)

renvoie l'erreur suivante :

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpx9xpowaa/clbgd78ic00sw0k3a0y0dcbgv/data.json'

Avec le traceback :

FileNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 path_to_labels = "/databricks/driver/export.zip"
----> 2 kili.export_labels(project_id=project_id,filename = path_to_labels,fmt="kili",layout = "merged",single_file=True,asset_ids=list("cld3aqa68078x0jw2ds463b3k"),with_assets=False)

/databricks/python/lib/python3.9/site-packages/kili/queries/label/init.py in export_labels(self, project_id, filename, fmt, asset_ids, layout, single_file, disable_tqdm, with_assets, external_ids)
310
311 try:
--> 312 services.export_labels(
313 self,
314 asset_ids=asset_ids,

/databricks/python/lib/python3.9/site-packages/kili/services/export/init.py in export_labels(kili, asset_ids, project_id, export_type, label_format, split_option, single_file, output_file, disable_tqdm, log_level, with_assets)
82 ) # ensures full mapping
83 exporter_class = format_exporter_selector_mapping[label_format]
---> 84 exporter_class(
85 export_params, kili, logger, disable_tqdm, content_repository
86 ).export_project()

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/base.py in export_project(self)
157 local_media_dir=str(self.images_folder),
158 )
--> 159 self.process_and_save(assets, self.output_file)
160
161 @Property

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/kili/init.py in process_and_save(self, assets, output_filename)
55 """
56 clean_assets = self.process_assets(assets, self.label_format)
---> 57 return self._save_assets_export(
58 clean_assets,
59 output_filename,

/databricks/python/lib/python3.9/site-packages/kili/services/export/format/kili/init.py in _save_assets_export(self, assets, output_filename)
34 if self.single_file:
35 project_json = json.dumps(assets, sort_keys=True, indent=4)
---> 36 with (self.base_folder / "data.json").open("wb") as output_file:
37 output_file.write(project_json.encode("utf-8"))
38 else:

/usr/lib/python3.9/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1240 the built-in open() function does.
1241 """
-> 1242 return io.open(self, mode, buffering, encoding, errors, newline,
1243 opener=self._opener)
1244

/usr/lib/python3.9/pathlib.py in _opener(self, name, flags, mode)
1108 def _opener(self, name, flags, mode=0o666):
1109 # A stub for the opener argument to built-in open()
-> 1110 return self._accessor.open(self, flags, mode)
1111
1112 def _raw_open(self, flags, mode=0o777):

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpx9xpowaa/clbgd78ic00sw0k3a0y0dcbgv/data.json'

Spammed by Kili github actions

Hello everyone,

Due to one of my last PR's, I'm getting spammed by failing integration tests in CI (cf image below), I get like 5 emails per day.

It seems like there is a leak in one of the pipelines

image

Supprimer un label avec le SDK

J'effectue la migration du projet initié avec un outil d'annotation custom vers Kili.
Pour ce faire j'utilise le SDK Python.
D'une part je dois pousser des pré-annotations réalisées, pour cela j'ai trouvé la méthode create_predictions()
D'autre part je dois pousser les annotations existantes, pour cela j'utilise la méthode append_labels()

Pour certains verbatim, j'ai à la fois ajouté une préannotation et l'annotation existante.
Je voudrais supprimer la préannotation. Comment faire ?

De plus j'aimerais bien comprendre le mécanisme de append_labels(), est-ce vraiment un append, c'est à dire un ajout de nouveaux labels en plus des labels existants ? ou bien un overwrite c'est à dire un remplacement des labels existants ?

Extract masks instead of poly

Hello,

I was wondering if it possible to extract directly the masks instead of the poly vertices for a semantic segmentation task (i.e. each pixel is one class)
Here is an example where poly is a problem, in the following image from left to right annotations I made in Kili:
Base Image | Plate annotation | Class 1 Annotations | Class 2 Annotations | Class 3 Annotations | Class 4 Annotations
CleanShot 2023-10-20 at 08 51 06@2x

Now if I extract the annotations under Kili format, the Plate Annotation will be:
CleanShot 2023-10-20 at 08 47 17@2x

Is there a way to solve this ?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/bump_commit_release_branch.yml
  • actions/checkout v4
  • actions/setup-python v5
.github/workflows/ci.yml
  • actions/checkout v4
  • actions/setup-python v5
  • pre-commit/action v3.0.1
  • actions/checkout v4
  • actions/setup-python v5
  • actions/checkout v4
  • actions/setup-python v5
  • actions/checkout v4
  • actions/setup-python v5
  • actions/checkout v4
  • lycheeverse/lychee-action v1.9.3
  • actions/checkout v4
  • actions/setup-python v5
  • actions/checkout v4
  • actions/setup-python v5
  • actions/checkout v4
  • actions/setup-python v5
.github/workflows/create_draft_release.yml
  • actions/checkout v4
  • actions/setup-python v5
  • slackapi/slack-github-action v1.25.0
.github/workflows/datadog.yml
  • actions/setup-python v5
.github/workflows/deploy_doc.yml
  • actions/checkout v4
  • actions/setup-python v5
  • slackapi/slack-github-action v1.25.0
.github/workflows/e2e_tests.yml
  • actions/checkout v4
  • actions/setup-python v5
  • slackapi/slack-github-action v1.25.0
.github/workflows/pr.yml
  • amannn/action-semantic-pull-request v5
.github/workflows/publish.yml
  • actions/checkout v4
  • actions/setup-python v5
  • slackapi/slack-github-action v1.25.0
pep621
pyproject.toml
  • pandas >= 1.0.0, < 3.0.0
  • click >= 8.0.0, < 9.0.0
  • requests >= 2.0.0, < 3.0.0
  • tabulate >= 0.9.0, < 0.10.0
  • tenacity >= 8.0.0, < 9.0.0
  • tqdm >= 4.0.0, < 5.0.0
  • typeguard >= 4, < 5
  • typing-extensions >= 4.1.0, < 5.0.0
  • pyparsing >= 3.0.0, < 4.0.0
  • websocket-client >= 1.0.0, < 2.0.0
  • pyyaml >= 6.0, < 7.0
  • Pillow >=9.0.0, <11.0.0
  • cuid >= 0.4, < 0.5
  • urllib3 >= 1.26, < 3
  • ffmpeg-python >= 0.2.0, < 0.3.0
  • gql >= 3.5.0b5, < 4.0.0
  • filelock >= 3.0.0, < 4.0.0
  • pip-system-certs >= 4.0.0, < 5.0.0
  • pyrate-limiter >= 3, < 4
  • dev/pre-commit >= 3.3.0, < 4.0.0
  • dev/pylint ==3.0.3
  • dev/pyright ==1.1.347
  • dev/vulture ==2.11
  • dev/dead ==1.5.2
  • dev/opencv-python >= 4.0.0, < 5.0.0
  • dev/azure-storage-blob >= 12.0.0, < 13.0.0
  • image-utils/opencv-python >= 4.0.0, < 5.0.0
  • azure/azure-storage-blob >= 12.0.0, < 13.0.0
pre-commit
.pre-commit-config.yaml
  • pre-commit/pre-commit-hooks v4.5.0
  • PyCQA/docformatter v1.7.5
  • asottile/pyupgrade v3.15.2
  • srstevenson/nb-clean 3.2.0
  • astral-sh/ruff-pre-commit v0.1.15

  • Check this box to trigger a request for Renovate to run again on this repository

Comment contrôler la file d'annotation ?

Bonjour,

J'aimerais contrôler précisément l'enchaînement des assets à annoter.

Pour cela j'ai utilisé la fonction update_properties_in_asset() en ajustant d'une part le responsable to_be_labeled_by_array et la priorité priorities que j'ai mise à 1 (j'ai compris qu'elle est à 0 par défaut).

Pourtant lorsque je passe sur le Studio d'annotation, je choisis un verbatim dans le tableau "Explore", je l'annote et le document suivant est un document avec une priority à 0 et qui n'est assigned to personne !

Je choisis un document depuis l'interface Explore :

image

Après l'avoir annoté et cliqué sur Submit, voici le document qu'il m'est demandé d'annoter :

image

Je retourne dans la vue Explore pour vérifier la priorité et la personne à qui ce document est affecté :

image

Le problème ne vient pas du statut puisqu'après la préannotation, j'ai bien veillé à remettre tous les documents en statut TODO

Quels critères sont utilisés pour définir le prochain verbatim à annoter ? Comment modifier ce critère ?

Warning: labelOf { externalId } must be an instance of <enum 'Label'>

I'm trying to get the externalId of assets associated to a label when calling https://cloud.kili-technology.com/docs/python-graphql-api/python-api/#labels

Initially I tried,

>>> labels = playground.labels(project_id=project_id,  fields=['id', 'labelOf'])

which produces,

  File "lib\site-packages\kili\queries\label\__init__.py", line 126, in labels
    return format_result('data', result)
  File "lib\site-packages\kili\helpers.py", line 17, in format_result
    raise GraphQLError(name, result['errors'])
kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': 'Field "labelOf" of type "Asset" must have a selection of subfields. Did you mean "labelOf { ... }"?', 'locations': [{'line': 4, 'column': 47}], 'extensions': {'code': 'GRAPHQL_VALIDATION_FAILED'}}]"

So indeed if I change as suggested,

>>> labels = playground.labels(project_id=project_id,  fields=['id', 'labelOf  { externalId }'])

this works as expected, however I now get the following message on stdout

labelOf { externalId } must be an instance of <enum 'Label'>

printed here. Because it's not a warning but a print I cant silence it either.

The returned results are correct and include the "externalId", so it's not a big issue, but it would be nice to be able to silence this message.

append_many_to_dataset: Mutation "data" failed with error: unknown field

I'm currently trying to add some data in an NLP project via the API with,

      playground.append_many_to_dataset(project_id=project_id,
                                        content_array=['test document 1'],
                                        external_id_array=['doc1'])

this fails with,

  File "kili\mutations\asset\__init__.py", line 61, in append_many_to_dataset
    projects = playground.projects(project_id)
  File "kili\queries\project\__init__.py", line 68, in projects
    return format_result('data', result)
  File "kili\helpers.py", line 17, in format_result
    raise GraphQLError(name, result['errors'])
kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': 'unknown field', 'path': ['variable', 'where', 'updatedAtGte'], 'extensions': {'code': 'GRAPHQL_VALIDATION_FAILED'}}]"

it turns out that just running playground.projects(project_id) produced the same error. I noticed that providing a random project_id will also produce the same error. So at this point I have 2 hypothesis,

  • either that error is just an indication that the project_id is incorrect. I'm using the ID from the URL e.g. /label/projects/<project_id>. If it is incorrect is there another way I can find it?
  • or there is an unrelated issue with this project somewhere.

Suggestions how to debug this would be very much appreciated. Thanks!

Max. length of text documents?

Hi,
I want to use your platform to annotate text documents. But after uploading the txt files, not the entire text document is displayed. It seems to cut each document after 200 lines (+- depending on the file). Is there a specific limit for txt files?

Thanks,
Max

More detailed changelog for breaking changes

It would be helpful to have a more detailed changelog (ideally as a file in this repository) that would in particular mention breaking changes.

For instance, with 2.0.4 I have a script that worked fine to create a new project with,

    json_interface = {
        "filetype": "TEXT",
        "jobs": {  .... }
    }

(adapted from a documentation example).
In 2.1.1 this same script produces the following error,

kili.helpers.GraphQLError: Mutation "data" failed with error: "[{'message': '[unsupportedType] Unsupported project input type "NA". Input type should be one of: IMAGE, PDF, TEXT, VIDEO', 'path': ['data']}]"

presumably because the input type now needs to be specified via the input_type parameter of update_properties_in_project (related to #160)

This is a breaking change and it would help if was explicitly specified in the release notes, so one wouldn't have to search through the source code or the the commit history as to what needs to be changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.