Comments (1)
I've also wanted this. The title, but also meta tags like the keywords and description, and the og tags. Currently I fetch the URL myself, parse these things out with beautifulsoup, then pass the response text to partition
for the rest. But, would somehow be nicer if partition_html
could return these things in a more structured way. Especially for title, would be nice if it came back as an e.g. PageTitle
(or, I guess HTMLHeadTitle
?) element type, or something like that.
from unstructured.
Related Issues (20)
- ModuleNotFoundError: No module named 'torch._C' HOT 1
- Deprecate `CheckBox` so that all `Element` objects are a subclass of `Text` HOT 3
- feat/Move the category field to Element
- partition_pdf is loading the model at every call HOT 3
- Switch `skip_infer_table_types` default to `None` instead of list HOT 1
- Add support for pinecone serverless indexes HOT 2
- Add manual coordinate constraints to `partition_pdf()`. HOT 2
- Unstrutured library is unable to extract CDATA from the xml data HOT 1
- bug/windows reopen temp file (pdf hi_res) HOT 1
- Set `resolve_entities=False` by default in `lxml` parser for `partition_xml`
- feat/custom-metadata HOT 6
- pptx initial error HOT 1
- bug/<Compatibility Issue with Chinese Text in Document Parsing> HOT 4
- ImportError: cannot import name 'CompositeElement' from 'unstructured.documents.elements'bug/<short-name> HOT 1
- Unable to load file HOT 3
- bug/bounding boxes using strategy="hi_res" are wrong HOT 1
- unstructured-ingest s3 command causes Fsspec.Downloader.download_config.download_dir to be None HOT 1
- bug/PIL.UnidentifiedImageError: cannot identify image file HOT 1
- DOCX doesn't recognize listitems within textbox HOT 4
- `partition_doc` fails the first time it is run in the AMD64 container HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unstructured.