akaalias / obsidian-extract-pdf-highlights Goto Github PK

View Code? Open in Web Editor NEW

208.0 10.0 9.0 4.02 MB

Extract highlights, underlines and annotations from your PDFs into Obsidian

TypeScript 96.55% JavaScript 3.40% CSS 0.05%

extract-highlights highlight-color obsidian-vault extract-pdf pdf-text-highlights obsidian obsidian-md obsidian-plugin

obsidian-extract-pdf-highlights's Introduction

obsidian-extract-pdf-highlights's People

Contributors

Stargazers

Watchers

Forkers

steven-kraft furlan kongolan diegomc252525 spencer-marcu zagahr zsn-17 s-e-b-g olafesq

obsidian-extract-pdf-highlights's Issues

No effect when clicking pdf button

I tried the following:

install + activate plugin
open a pdf in PDF Expert
highlight a few sentences
copy the pdf into Obsidian vault
open in Obsidian (I can see the highlighted text)
click PDF button

Nothing happens. Does it matter which tool is used to create the highlights?

No effect on clicking extract PDF icon

Hello @akaalias,

Thank you for creating the plug in.

I have been trying to use it but it does not seem to work. These are the steps I have followed:

I highlight the text on PDF (on a PDF preview or acrobat reader)
I import the pdf into obsidian by dragging it.
I preview the pdf into obsidian
I click the pdf icon on the left bar.
No new file is created with the annotations.

Could you suggest how to address the issue?

Thank you.
Regards

Expose API to process PDFs

Hi there,

I recently built a plugin to pull in data from Zotero to Obsidian: https://github.com/mgmeyers/obsidian-zotero-desktop-connector

My plugin can access PDFs stored in Zotero, and I'd love to be able to send them to this plugin to extract highlights. I think I could do this if ExtractPDFHighlightsPlugin had something similar to processPDFHighlights that received an ArrayBuffer. That way I could load the PDF and send it to your plugin for processing.

Let me know what you think!

Advice on the highlight-note's format

Hello,thanks for your excellent plugin! It's very useful for me to arrange my pdf documents.
I have a suggestion about the highlight-note's format. Is it possible to give each highlight a wiki link with the page information just as [[filename.pdf#page=n]] ? In this way, it can be convenient and quick to preview the context!

Thanks!

Allowing "screenshot" of the page inside a rectangle/square box to capture images/charts/diagrams

Benefits

Having a way to include the screenshot of the region inside a created square(or rectangle) would help in

Extracting useful images or charts off the pdf by simply drawing a rectangle around it.
Extracting hand written scribbles or diagrams by simple enclosing them in a rectangle.

Image naming and handling

The extracted images could be sent directly to the asset/image/attachment location in the vault while assigning a incremental naming to them or based on the page numbers or plain random just like what obsidian does when pasting.

Example

PDF sample

Outpul sample

I had implemented it in a crude way but using Python, and I am a newbie to programming. If you could work on something like this, it would be really helpful as I have a lot of diagrams that I now manually screenshot and paste. This could make life easier for everyone who need to extract images or diagrams.

Also:

A post I wrote on Forum : https://forum.obsidian.md/t/discussion-extracting-annotations-from-pdfs/24411
A pdfAnnotate library that I found which might be useful : https://github.com/highkite/pdfAnnotate

Copy synchronized functional from org-noter?

Any change to copy the synchronized function from org-noter?

duplicate highlights when extracting a second time

I was trying to import a PDF book with 300 pages/5mb. It took about a minute for the note with the extracted highlights to load.
After reading and highlighting more pages in the PDF I clicked the "PDF" button again and it appended all highlights below the original highlights.
Maybe my use case isn't a good use case for this plugin?

Accept pull request

Hi can you please accept the pull request by steven kraft. It works a lot faster that way.

extraction on same pdf multiple time not working

hello,
after extracting once, adding highlights to the same pdf and trying to update the extraction is not working.
Obsidian version 0.11.0

thanks

Export page num when extract highlights

Sorry I found the config part..LOL

Output Pdf Page Num with Obsidian Style

After obsidian ver 0.10.8, obsidian allows you use [[book.pdf#page=3]] to jump to page 3 of pdf.

So is it possible to add a feature to output those highlights and notes with this style?

BLABLABLABLA ——[[book.pdf#page=3]]

In the Hebrew language, the text is written upside down

Capture

image extraction?

Hi folks, I often grab image selection of portions of pdfs (for things other than text like mathematics/figures/plots etc). Is there anyway to use this plugin to import these things into my obsidian note?

Highlights are extracted, annotations aren't

I opened a pdf of my tax draft, highlighted one line, and added an annotation.

The corresponding note only shows the highlight, not the annotation.

The note extracted is

[[New York's 529 college savings program deduction/earnings. (Page 42) ]]

Source

[[Client Copy Return for TF4089.pdf]]

screenshot of the pdf highlight and annotation:

Highllighter colour is not working after extraction of pdf.

Is this due to the recent obsidian update? I'm not sure. But I hope you will fix this soon enough. Thank you.

Whats the current status concerning the PDF handling in Obsidian?

I just try to understand the extent of problem. Its a very power - and useful tool.

Problem with two-column PDF

Thank you @akaalias for this amazing contribution!
I have observed an issue regarding the ordering of the extracted highlights from a pdf where the text is arranged in two separate columns where text flow in each page runs from the 1st column to the 2nd. Specifically, it seems that the plugin extracts highlights in the order they first appear in the pdf but when it comes to a two-column pdf (which is often the case for research articles) this means that the flow of the actual highlights is discontinued in the note of the extracted highlights.
For example, in the first page of a two-column pdf I highlight the text as follows:

Column 1

lines 10 to 15
line 20

Column 2

lines 2 to 5
line 18

In the extracted note, the ordering of the highlighted lines is shown as:

highlight in lines 2-5 from 2nd column
highlight in lines 10-15 from 1st column
highlight in line 18 from 2nd column
highlight in line 20 from 1st column

So, I am wondering whether any workaround can be made to tackle this issue!
Thank again!

Icon not showing in ribbon bar

The icon from the plugin is very faint in the Sidebar with the Minimal Theme. Is there any way for you to increase the contrast on the gif?

Feature Request: Extract annotation without copying files into vault

Thank you for your great plugin. I am wondering if we can add a feature to extract annotation without dropping the file into the vault as this copies the pdf file from it's existing location, ending up deleting the files after extraction.

Can we implement that?

PDF extract with no spaces and no colors

Hello, I annotated a journal article with different colors. I turned on the optional checkboxes for page numbers and colors. However, the file that is created, all the PDF text has no spaces, my annotation comments are not extracted, and there are no color indications. Any solutions?

Extracting links from highlighted text

If highlighted text is a hyperlink, it would be awesome to be able to extract the underlying link

Unnecessary page rendering when extracting highlights

Hi, I noticed that await page.render(renderContext, annotations); gets called even if annotations.length is 0. Is there a reason for this?

I think making it so that only pages with annotations are rendered would greatly improve the time it takes to extract the highlights.

I think it depends on the PDF, but I timed that it takes about 100-200ms per page to render, making a 500 page pdf take 1-2 minutes regardless of the amount of annotations.

There's no spacing between words

After extracting the highlighted pdf, the extracted .md file has no spacing between words

akaalias / obsidian-extract-pdf-highlights Goto Github PK

obsidian-extract-pdf-highlights's Introduction

Extract your PDF text-highlights into Obsidian

How it works

Demo with default settings

Demo with all optional settings turned on

Optional settings

Backlog

ICEBOX

TODO

DOING

DONE

Contribute

Major Thanks