Giter VIP home page Giter VIP logo

deepsearch-toolkit's People

Contributors

agiova avatar blankenberg avatar cau-git avatar ceberam avatar dolfim-ibm avatar exomishra avatar github-actions[bot] avatar holymichael avatar imgbot[bot] avatar imgbotapp avatar kdinkla avatar kmyusk avatar kwehden avatar peterstaar-ibm avatar santanatiago avatar vagenas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deepsearch-toolkit's Issues

Enable upgrading to poetry 1.2.*

The broken CI build documented in #38 was addressed with a temporary workaround of constraining CI to poetry<1.2.

Going forward, a better solution, allowing for upgrading to poetry 1.2.*, should be provided.

Login Configuration Python error

After running the following login configuration code:

import deepsearch as ds

auth = ds.DeepSearchKeyAuth(
    username="USER-EMAIL",
    api_key="API-KEY",
)

config = ds.DeepSearchConfig(
    host="https://deepsearch-experience.res.ibm.com",
    auth=auth,
)

client = ds.CpsApiClient(config)
api = ds.CpsApi(client)

I got this error:

Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fd41bef2af0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "toolkit.py", line 21, in
client = ds.CpsApiClient(config)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 30, in init
bearer_token=self._authenticate_with_api_key(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 81, in _authenticate_with_api_key
access_token = api.get_access_token(options={"admin": False}).access_token
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 167, in get_access_token
return self.get_access_token_with_http_info(**kwargs) # noqa: E501
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 242, in get_access_token_with_http_info
return self.api_client.call_api(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 181, in __call_api
response_data = self.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 265, in POST
return self.request("POST", url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 163, in request
r = self.pool_manager.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='deepsearch-experience.res.ibm.com', port=443): Max retries exceeded with url: /api/cps/user/v1/user/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd41bef2af0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

TypeError: issubclass() arg 1 must be a class in deepsearch/documents/core/models.py

Description:

I encountered an issue while using the deepsearch library. The error message is as follows:
TypeError: issubclass() arg 1 must be a class

This error occurs in the deepsearch/documents/core/models.py file, specifically in the MongoS3Target class definition.

Here is the traceback of the error:

...
alan_backend              |     import deepsearch as ds
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/__init__.py", line 7, in <module>
alan_backend              |     from .cps.data_indices import utils
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/data_indices/__init__.py", line 1, in <module>
alan_backend              |     from .utils import upload_files
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/data_indices/utils.py", line 14, in <module>
alan_backend              |     from deepsearch.documents.core import convert, input_process
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/__init__.py", line 1, in <module>
alan_backend              |     from .core import convert_documents
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/__init__.py", line 1, in <module>
alan_backend              |     from .main import convert_documents
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 7, in <module>
alan_backend              |     from deepsearch.documents.core.input_process import (
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 9, in <module>
alan_backend              |     from deepsearch.cps.client.components.documents import DocumentConversionResult
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/client/components/documents.py", line 8, in <module>
alan_backend              |     from deepsearch.documents.core.convert import (
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 18, in <module>
alan_backend              |     from .models import ConversionSettings, ExportTarget, ZipTarget
alan_backend              |   File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/models.py", line 94, in <module>
alan_backend              |     class MongoS3Target(BaseModel):
alan_backend              |   File "pydantic/main.py", line 198, in pydantic.main.ModelMetaclass.__new__
alan_backend              |   File "pydantic/fields.py", line 506, in pydantic.fields.ModelField.infer
alan_backend              |   File "pydantic/fields.py", line 436, in pydantic.fields.ModelField.__init__
alan_backend              |   File "pydantic/fields.py", line 552, in pydantic.fields.ModelField.prepare
alan_backend              |   File "pydantic/fields.py", line 668, in pydantic.fields.ModelField._type_analysis
alan_backend              |   File "/usr/local/lib/python3.8/typing.py", line 774, in __subclasscheck__
alan_backend              |     return issubclass(cls, self.__origin__)
alan_backend              | TypeError: issubclass() arg 1 must be a class

Requirements using Python3.8:

pydantic==1.10.2

I have installed the library using pip and am using version deepsearch-toolkit==0.15.0.

It seems like the problem may be related to a type hint that is not a class within the MongoS3Target class definition. I would appreciate it if you could look into this issue and provide any suggestions or fixes to resolve it.

If you need any additional information, please let me know. Thank you for your assistance.

Unable to sequence the objects after parsing

Im parsing a huge document, and Im trying to put the parsed json in the correct sequence as how it appears in the document. But im unable to do that. Ex., the tables are put at the end of the json without any reference which headers or index it belongs to. How do I link the tables with the currect position of it in the document

HTTP 404 Error when trying to instantiate a ConversionSettings object

What happened + What you expected to happen

I followed the user guide for modifying the document conversion settings. When trying to instantiate a ConversionSettings object, I get the following error:

Traceback (most recent call last):
  File "/home/timurcarstensen/EFSA/src/debugging_reproduction.py", line 23, in <module>
    conv_settings = ConversionSettings.from_defaults(api=api)
  File "/home/timurcarstensen/miniconda3/envs/efsa3.9/lib/python3.9/site-packages/deepsearch/documents/core/models.py", line 333, in from_defaults
    request_conv_settings.raise_for_status()
  File "/home/timurcarstensen/miniconda3/envs/efsa3.9/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://deepsearch-experience.res.ibm.com/api/linked-ccs/public/v4/settings/conversion_defaults

Versions / Dependencies
deepsearch-toolkit 0.10.1
python 3.11
OS Ubuntu through WSL2

Reproduction script
import deepsearch as ds
from deepsearch.documents.core.models import ConversionSettings

with open('api_key') as f:
api_key=f.readline()

auth = ds.DeepSearchKeyAuth(username="EMAIL", api_key=api_key)
config = ds.DeepSearchConfig(host="https://deepsearch-experience.res.ibm.com", auth=auth)
client = ds.CpsApiClient(config)
api = ds.CpsApi(client)

conv_settings = ConversionSettings.from_defaults(api=api)

What have you tried to resolve the issue?

  • Installing prior versions of the toolkit
  • Downgrading to a less recent python version (3.9)
  • Installing from source

Automate version and release management

The current releasing setup is based on the following manual steps:

  1. determination of next version
  2. update of version in pyproject.toml
  3. creation of release on GitHub

Besides requiring manual effort, this process is also somewhat error-prone, as the maintainer needs to ensure that release (tag) version is in sync with the one in pyproject.toml.

Automate the above-mentioned steps, to make the process more efficient, maintainable, and robust.

convert_document not working

Hello,
When I run this line of code:

import deepsearch as ds
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)

I get the following error:


                      Welcome to the DeepSearch Toolkit

Processing input: : 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1144.11it/s]
Submitting input: : 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "toolkit.py", line 27, in
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 48, in convert_document
process_local_input(cps_proj_key=proj_key, local_file=Path(local_file))
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 37, in process_local_input
task_ids = send_files_for_conversion(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 195, in send_files_for_conversion
private_download_url = upload_single_file(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 281, in upload_single_file
api = CpsApi.default_from_env()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 114, in default_from_env
config = DeepSearchConfig.parse_file(config_file_path())
File "pydantic/main.py", line 556, in pydantic.main.BaseModel.parse_file
File "pydantic/parse.py", line 57, in pydantic.parse.load_file
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1229, in read_bytes
with self.open(mode='rb') as f:
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1222, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1078, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/swamipatil/Library/Application Support/DeepSearch/deepsearch_toolkit.json'

convert_document not working

When I run this code:
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)

I get this error:

                      Welcome to the DeepSearch Toolkit

Processing input: : 100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1326.05it/s]
Submitting input: : 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fa1b2276610>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "toolkit.py", line 29, in
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 48, in convert_document
process_local_input(cps_proj_key=proj_key, local_file=Path(local_file))
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 37, in process_local_input
task_ids = send_files_for_conversion(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 195, in send_files_for_conversion
private_download_url = upload_single_file(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 281, in upload_single_file
api = CpsApi.default_from_env()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 116, in default_from_env
client = CpsApiClient(config)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 30, in init
bearer_token=self._authenticate_with_api_key(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 81, in _authenticate_with_api_key
access_token = api.get_access_token(options={"admin": False}).access_token
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 167, in get_access_token
return self.get_access_token_with_http_info(**kwargs) # noqa: E501
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 242, in get_access_token_with_http_info
return self.api_client.call_api(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 181, in __call_api
response_data = self.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 265, in POST
return self.request("POST", url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 163, in request
r = self.pool_manager.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cps.foc-deepsearch.zurich.ibm.com', port=443): Max retries exceeded with url: /api/cps/user/v1/user/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa1b2276610>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

Inconsistent Table data extraction (same page, different documents)

Hi all,

I am experiencing a strange issue here: the Deep Seach toolkit is not able to format consistently the content extracted from the table. In fact, the table['cells']['data'] contains extracted features but the table['data'] is empty. I can understand this behavior when the table layout is complex but for easy layout, there is no explanation.

The stranger is the fact that this issue is not reproducible. When I tried to debug the issue with a pdf file containing the single page where the table is located, this time the table['data'] field is populated with the content. So it looks like the table layout understanding is also linked to the length of the pdf file, more precisely the page numbers.

Could it be because of a memory overflow somewhere? Or because there is some memory limit reached?
Several tables located after the faulty table in the original document are extracted correctly...

I attach here the both pdf files and the JSON files received from the toolkit.

Access to files here.

In the long JSON file, the issue is located at line 12317, which corresponds to line 375 in the short JSON file.

Thanks in advance for investigating this issue which is causing major damage in my application.

Best regards

Jerome

refresh_token(True) function does not provide admin permission

I am having an issue using the refresh_token function from the toolkit. The value of the Bearer token is indeed changed after calling refresh_token() but I still get a 403: Forbidden response from the server when I call a function that requires admin permission such as list_system_celery_tasks.
The attached script reproduces the problem. Credentials and links are redacted for security.

from deepsearch.cps.client.api import CpsApi, CpsApiClient
from deepsearch.core.client import DeepSearchConfig, DeepSearchKeyAuth


class test:

    def main():

        auth = DeepSearchKeyAuth(
            username="USERNAME",
            api_key="API_KEY"
        )
        dsconfig = DeepSearchConfig(
            host="LINK_TO_HOST",
            auth=auth,
            verify_ssl=False,
        )

        client = CpsApiClient(dsconfig)
        api = CpsApi(client)

        print("Bearer token: ", api.client.bearer_token_auth)
        api.refresh_token(True)
        print("Bearer token: ", api.client.bearer_token_auth)

        resp = api.tasks.sw_api.list_system_celery_tasks(
            proj_key="PROJECT_KEY", project_task_id="PROJECT_TASK_ID", limit=0)
        print(resp)

    if __name__ == "__main__":
        main()

auth key in DeepSearchConfig gets replaced at instantiation of CpsApiClient

This issue is about the following lines: https://github.com/DS4SD/deepsearch-toolkit/blob/main/deepsearch/cps/client/api.py#L26-L35

There, DeepSearchKeyAuth in config.auth is replaced with DeepSearchBearerTokenAuth, instead of maintaining the former. In addition, as in line 26 the config dictionary is only referenced, config is modified also outside the class instantiation.

I would suggest to add a key for bearer_token_auth in self.config and not change the config file accessible outside of the class.

Unable to convert the documents

I was trying to convert a 800 page pdf document both from url and from local. The process completed but the output didnt have any results. I got just task_ids.txt file as a result with a code in it. No error messages or anything

Only Pictures bounding boxes are returned. All text is by-passed

Hi IBM DST team,
just to inform you that the DeepSearch Toolkit layour parser returns only boxes of "picture" type. All other types are not returned in the json file. This behavior happens with PDF documents made from scanned images.
I am a little bit surprised as I was thinking the DeepSearch was a Vision AI tool only. Does it need the PDF layers too?
Thanks
Best regards
Jerome

Fix CI build

Fix the CI build, which is currently not working – probably due to the recent update of poetry to 1.2.0, which includes some changes wrt allowed package version naming.

Allow for optional project in CLI functions

The majority of the Deep Search Experience users system will operate on a single (auto-assigned) project. Many of the CLI function could be simplify, such that no explicit proj_key will be required.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.