Comments (1)
@liturrig Yes, this is the expected behavior. The unstructured
library does not extract images for the DOC, DOCX, PPT, or PPTX formats.
Image extraction will be available for these formats soon on the REST APIs, including the freemium API. For PPT and PPTX that's likely to be within two weeks. For DOC and DOCX somewhat longer.
from unstructured.
Related Issues (20)
- pptx initial error HOT 1
- bug/<Compatibility Issue with Chinese Text in Document Parsing> HOT 4
- ImportError: cannot import name 'CompositeElement' from 'unstructured.documents.elements'bug/<short-name> HOT 1
- Unable to load file HOT 3
- bug/bounding boxes using strategy="hi_res" are wrong HOT 1
- unstructured-ingest s3 command causes Fsspec.Downloader.download_config.download_dir to be None HOT 1
- bug/PIL.UnidentifiedImageError: cannot identify image file HOT 9
- DOCX doesn't recognize listitems within textbox HOT 7
- `partition_doc` fails the first time it is run in the AMD64 container HOT 2
- bug/partition_html ouputs different results with different args HOT 5
- bug/parsing pdf error - new_cells as str has no "copy" HOT 6
- bug/<pdfminer> HOT 1
- bug/docker images at quay.io not up to date HOT 1
- Not respecting NLTK_DATA environment variable HOT 4
- feat/Allow max-pages/max-total-characters that should be parsed HOT 2
- bug/combineUnderNChars not working properly HOT 1
- docx - error while parsing table with merged cells HOT 2
- bug/HTMLTitle doesn't have `type` attribute
- bug/docx parse table without row.grid_cols_before or row.grid_cols_after
- feat/Retain text indentations in PDF files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unstructured.