ds4sd / deepsearch-toolkit Goto Github PK
View Code? Open in Web Editor NEWInteract with the Deep Search platform for new knowledge explorations and discoveries
Home Page: https://ds4sd.github.io/deepsearch-toolkit
License: MIT License
Interact with the Deep Search platform for new knowledge explorations and discoveries
Home Page: https://ds4sd.github.io/deepsearch-toolkit
License: MIT License
The broken CI build documented in #38 was addressed with a temporary workaround of constraining CI to poetry<1.2
.
Going forward, a better solution, allowing for upgrading to poetry 1.2.*, should be provided.
We want to create a complete example which generated epub documents from the input PDF.
After running the following login configuration code:
import deepsearch as ds
auth = ds.DeepSearchKeyAuth(
username="USER-EMAIL",
api_key="API-KEY",
)
config = ds.DeepSearchConfig(
host="https://deepsearch-experience.res.ibm.com",
auth=auth,
)
client = ds.CpsApiClient(config)
api = ds.CpsApi(client)
I got this error:
Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fd41bef2af0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "toolkit.py", line 21, in
client = ds.CpsApiClient(config)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 30, in init
bearer_token=self._authenticate_with_api_key(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 81, in _authenticate_with_api_key
access_token = api.get_access_token(options={"admin": False}).access_token
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 167, in get_access_token
return self.get_access_token_with_http_info(**kwargs) # noqa: E501
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 242, in get_access_token_with_http_info
return self.api_client.call_api(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 181, in __call_api
response_data = self.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 265, in POST
return self.request("POST", url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 163, in request
r = self.pool_manager.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='deepsearch-experience.res.ibm.com', port=443): Max retries exceeded with url: /api/cps/user/v1/user/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd41bef2af0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Description:
I encountered an issue while using the deepsearch library. The error message is as follows:
TypeError: issubclass() arg 1 must be a class
This error occurs in the deepsearch/documents/core/models.py file, specifically in the MongoS3Target class definition.
Here is the traceback of the error:
...
alan_backend | import deepsearch as ds
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/__init__.py", line 7, in <module>
alan_backend | from .cps.data_indices import utils
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/data_indices/__init__.py", line 1, in <module>
alan_backend | from .utils import upload_files
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/data_indices/utils.py", line 14, in <module>
alan_backend | from deepsearch.documents.core import convert, input_process
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/__init__.py", line 1, in <module>
alan_backend | from .core import convert_documents
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/__init__.py", line 1, in <module>
alan_backend | from .main import convert_documents
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 7, in <module>
alan_backend | from deepsearch.documents.core.input_process import (
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 9, in <module>
alan_backend | from deepsearch.cps.client.components.documents import DocumentConversionResult
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/cps/client/components/documents.py", line 8, in <module>
alan_backend | from deepsearch.documents.core.convert import (
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 18, in <module>
alan_backend | from .models import ConversionSettings, ExportTarget, ZipTarget
alan_backend | File "/usr/local/lib/python3.8/site-packages/deepsearch/documents/core/models.py", line 94, in <module>
alan_backend | class MongoS3Target(BaseModel):
alan_backend | File "pydantic/main.py", line 198, in pydantic.main.ModelMetaclass.__new__
alan_backend | File "pydantic/fields.py", line 506, in pydantic.fields.ModelField.infer
alan_backend | File "pydantic/fields.py", line 436, in pydantic.fields.ModelField.__init__
alan_backend | File "pydantic/fields.py", line 552, in pydantic.fields.ModelField.prepare
alan_backend | File "pydantic/fields.py", line 668, in pydantic.fields.ModelField._type_analysis
alan_backend | File "/usr/local/lib/python3.8/typing.py", line 774, in __subclasscheck__
alan_backend | return issubclass(cls, self.__origin__)
alan_backend | TypeError: issubclass() arg 1 must be a class
Requirements using Python3.8:
pydantic==1.10.2
I have installed the library using pip and am using version deepsearch-toolkit==0.15.0.
It seems like the problem may be related to a type hint that is not a class within the MongoS3Target class definition. I would appreciate it if you could look into this issue and provide any suggestions or fixes to resolve it.
If you need any additional information, please let me know. Thank you for your assistance.
Im parsing a huge document, and Im trying to put the parsed json in the correct sequence as how it appears in the document. But im unable to do that. Ex., the tables are put at the end of the json without any reference which headers or index it belongs to. How do I link the tables with the currect position of it in the document
The enforced typer
constraints typer<0.5.0 and >=0.4.0
seem a bit restrictive and don't allow installing it i.c.m. argilla.
Welcome and verbose status messages should be printed only when operating in interacting mode, e.g. in the CLI or in Jupyter notebooks.
What happened + What you expected to happen
I followed the user guide for modifying the document conversion settings. When trying to instantiate a ConversionSettings object, I get the following error:
Traceback (most recent call last):
File "/home/timurcarstensen/EFSA/src/debugging_reproduction.py", line 23, in <module>
conv_settings = ConversionSettings.from_defaults(api=api)
File "/home/timurcarstensen/miniconda3/envs/efsa3.9/lib/python3.9/site-packages/deepsearch/documents/core/models.py", line 333, in from_defaults
request_conv_settings.raise_for_status()
File "/home/timurcarstensen/miniconda3/envs/efsa3.9/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://deepsearch-experience.res.ibm.com/api/linked-ccs/public/v4/settings/conversion_defaults
Versions / Dependencies
deepsearch-toolkit 0.10.1
python 3.11
OS Ubuntu through WSL2
Reproduction script
import deepsearch as ds
from deepsearch.documents.core.models import ConversionSettings
with open('api_key') as f:
api_key=f.readline()
auth = ds.DeepSearchKeyAuth(username="EMAIL", api_key=api_key)
config = ds.DeepSearchConfig(host="https://deepsearch-experience.res.ibm.com", auth=auth)
client = ds.CpsApiClient(config)
api = ds.CpsApi(client)
conv_settings = ConversionSettings.from_defaults(api=api)
What have you tried to resolve the issue?
The current releasing setup is based on the following manual steps:
pyproject.toml
Besides requiring manual effort, this process is also somewhat error-prone, as the maintainer needs to ensure that release (tag) version is in sync with the one in pyproject.toml
.
Automate the above-mentioned steps, to make the process more efficient, maintainable, and robust.
Hello,
When I run this line of code:
import deepsearch as ds
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
I get the following error:
Welcome to the DeepSearch Toolkit
Processing input: : 100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1144.11it/s]
Submitting input: : 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "toolkit.py", line 27, in
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 48, in convert_document
process_local_input(cps_proj_key=proj_key, local_file=Path(local_file))
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 37, in process_local_input
task_ids = send_files_for_conversion(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 195, in send_files_for_conversion
private_download_url = upload_single_file(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 281, in upload_single_file
api = CpsApi.default_from_env()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 114, in default_from_env
config = DeepSearchConfig.parse_file(config_file_path())
File "pydantic/main.py", line 556, in pydantic.main.BaseModel.parse_file
File "pydantic/parse.py", line 57, in pydantic.parse.load_file
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1229, in read_bytes
with self.open(mode='rb') as f:
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1222, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/pathlib.py", line 1078, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/swamipatil/Library/Application Support/DeepSearch/deepsearch_toolkit.json'
When I run this code:
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
Welcome to the DeepSearch Toolkit
Processing input: : 100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1326.05it/s]
Submitting input: : 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fa1b2276610>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "toolkit.py", line 29, in
documents = ds.convert_document(proj_key = PROJ_KEY, local_file = PATH_DOCS)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/main.py", line 48, in convert_document
process_local_input(cps_proj_key=proj_key, local_file=Path(local_file))
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/input_process.py", line 37, in process_local_input
task_ids = send_files_for_conversion(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 195, in send_files_for_conversion
private_download_url = upload_single_file(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/documents/core/convert.py", line 281, in upload_single_file
api = CpsApi.default_from_env()
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 116, in default_from_env
client = CpsApiClient(config)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 30, in init
bearer_token=self._authenticate_with_api_key(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/client/api.py", line 81, in _authenticate_with_api_key
access_token = api.get_access_token(options={"admin": False}).access_token
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 167, in get_access_token
return self.get_access_token_with_http_info(**kwargs) # noqa: E501
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api/users_api.py", line 242, in get_access_token_with_http_info
return self.api_client.call_api(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 181, in __call_api
response_data = self.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 265, in POST
return self.request("POST", url,
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/deepsearch/cps/apis/user/rest.py", line 163, in request
r = self.pool_manager.request(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/Users/swamipatil/opt/anaconda3/envs/python_38/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cps.foc-deepsearch.zurich.ibm.com', port=443): Max retries exceeded with url: /api/cps/user/v1/user/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa1b2276610>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Hi all,
I am experiencing a strange issue here: the Deep Seach toolkit is not able to format consistently the content extracted from the table. In fact, the table['cells']['data'] contains extracted features but the table['data'] is empty. I can understand this behavior when the table layout is complex but for easy layout, there is no explanation.
The stranger is the fact that this issue is not reproducible. When I tried to debug the issue with a pdf file containing the single page where the table is located, this time the table['data'] field is populated with the content. So it looks like the table layout understanding is also linked to the length of the pdf file, more precisely the page numbers.
Could it be because of a memory overflow somewhere? Or because there is some memory limit reached?
Several tables located after the faulty table in the original document are extracted correctly...
I attach here the both pdf files and the JSON files received from the toolkit.
Access to files here.
In the long JSON file, the issue is located at line 12317, which corresponds to line 375 in the short JSON file.
Thanks in advance for investigating this issue which is causing major damage in my application.
Best regards
Jerome
While preparing the documentation we improved the concepts of a few examples, which are unfortunately not yet reflected in the toolkit code.
I am having an issue using the refresh_token function from the toolkit. The value of the Bearer token is indeed changed after calling refresh_token() but I still get a 403: Forbidden response from the server when I call a function that requires admin permission such as list_system_celery_tasks.
The attached script reproduces the problem. Credentials and links are redacted for security.
from deepsearch.cps.client.api import CpsApi, CpsApiClient
from deepsearch.core.client import DeepSearchConfig, DeepSearchKeyAuth
class test:
def main():
auth = DeepSearchKeyAuth(
username="USERNAME",
api_key="API_KEY"
)
dsconfig = DeepSearchConfig(
host="LINK_TO_HOST",
auth=auth,
verify_ssl=False,
)
client = CpsApiClient(dsconfig)
api = CpsApi(client)
print("Bearer token: ", api.client.bearer_token_auth)
api.refresh_token(True)
print("Bearer token: ", api.client.bearer_token_auth)
resp = api.tasks.sw_api.list_system_celery_tasks(
proj_key="PROJECT_KEY", project_task_id="PROJECT_TASK_ID", limit=0)
print(resp)
if __name__ == "__main__":
main()
This issue is about the following lines: https://github.com/DS4SD/deepsearch-toolkit/blob/main/deepsearch/cps/client/api.py#L26-L35
There, DeepSearchKeyAuth
in config.auth
is replaced with DeepSearchBearerTokenAuth
, instead of maintaining the former. In addition, as in line 26 the config dictionary is only referenced, config is modified also outside the class instantiation.
I would suggest to add a key for bearer_token_auth in self.config and not change the config file accessible outside of the class.
I was trying to convert a 800 page pdf document both from url and from local. The process completed but the output didnt have any results. I got just task_ids.txt file as a result with a code in it. No error messages or anything
Hi IBM DST team,
just to inform you that the DeepSearch Toolkit layour parser returns only boxes of "picture" type. All other types are not returned in the json file. This behavior happens with PDF documents made from scanned images.
I am a little bit surprised as I was thinking the DeepSearch was a Vision AI tool only. Does it need the PDF layers too?
Thanks
Best regards
Jerome
Fix the CI build, which is currently not working – probably due to the recent update of poetry to 1.2.0, which includes some changes wrt allowed package version naming.
The majority of the Deep Search Experience users system will operate on a single (auto-assigned) project. Many of the CLI function could be simplify, such that no explicit proj_key
will be required.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.