siddharthpant / booky Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 20.0 12 KB

A simple script for pdf bookmarks creation

Python 34.50% Shell 65.50%

booky's People

Stargazers

Watchers

Forkers

mrqianjinsi neariot vimkim jez kant djndl1 oseenix valrcs stephenmjm bangnguyendev sarming nealseah fizzym mafsi afirooz niyaz-ahmad xiaolaba geckoblu-forks adityasz

booky's Issues

Feature request: add function to reformat exported pdftk bookmarks to booky format

Combine to single file for linux.

A much better option is to combine the python file into the shell script so we can put that one file in PATH,

#!/bin/bash

# Change to the directory of pdf file
cd $(dirname "$1")
pdf=$(basename "$1")
pdf_data="${pdf%.*}""_data.txt"
EXTRACT_FILE=booky_bookmarks_extract
bkFile="$2"


if [[ "$OSTYPE" == "darwin"* ]]; then
    SED=gsed
else
    SED=sed
fi

echo "Converting $bkFile to pdftk compatible format"
python3 -c '
import sys

level = 0
startChar = "{"
endChar = "}"
for line in sys.stdin:
	line = line.strip()
	if line == startChar:
		level = level + 1
	elif line == endChar:
		level = level - 1
	elif line:
		commaIndex = line.rfind(",")
		title = line[:commaIndex]
		pageNo = line[commaIndex + 1:].strip()
		print("BookmarkBegin")
		print("BookmarkTitle:", title.strip())
		print("BookmarkLevel:", level)
		print("BookmarkPageNumber:", pageNo.strip())' < "$bkFile" > "$EXTRACT_FILE"

echo "Dumping pdf meta data..."
pdftk "$pdf" dump_data_utf8 output "$pdf_data"

echo "Clear dumped data of any previous bookmarks"
$SED -i '/Bookmark/d' "$pdf_data"

echo "Inserting your bookmarks in the data"
$SED -i "/NumberOfPages/r $EXTRACT_FILE" "$pdf_data"

echo "Creating new pdf with your bookmarks..."
pdftk "$pdf" update_info_utf8 "$pdf_data" output "${pdf%.*}""_new.pdf"

echo "Deleting leftovers"
rm "$EXTRACT_FILE" "$pdf_data"

quick way/tips to prepare TOC text file into booky format?

Hello: Do you have any regex suggestions or tips to quickly prepare the TOC text file into booky required format?
I am very new to regex, so any help would be super.

I was trying to find a tool, where I could create a template for one chapter of the TOC and then apply this format template to all other chapters. Kinda like excel's "paste special" feature.

For example:
1 Insert { and beginning of each TOC block, and } at end of each TOC block
2 replace TOCitem leading dots (.......67) with booky required format of /67
e.g. TOCitem ........67 ==> TOCitem/67
3 replace TOCitem (space space67) or (space space,67) or (space space space67) with booky required format of /67
4 Automate indentation of all child TOC items

Or maybe there is a repository of regex samples that apply to TOC manipulation.
I use sublime texteditor and could not find any specific snippets for TOC text file manipulation

Thankyou

keeps failing

Hi, I hope you can help me work out what I am doing wrong.
10.15.6
PDF version: pdftk_server-2.02-mac_osx-10.11-setup

book called book.pdf
text file containing TOC is TOC.txt
in terminal executed

In Terminal it says:

(base) XXX@XXXs-MacBook-Air booky % ./booky.sh book.pdf TOC.txt
Converting TOC.txt to pdftk compatible format
Dumping pdf meta data...
Clear dumped data of any previous bookmarks
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Inserting your bookmarks in the data
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Creating new pdf with your bookmarks...
Deleting leftovers

In the pdf file, no new bookmarks were created.
I checked I have { } around the bookmark
I checked I have", space" between bookmark and page number

Does it handle - in a bookmark name?

suggest to add the offset

I think your solution is great to create the bookmark automatically. I have observed that many PDFs have page number in the Content section but they do not agree with the corrected PDF page. Therefore one has to calculate manually the page number from the content. This process is error-prone. I have a suggestion to add offset to the page marking, for example:

{
Title1, 1
Title2, 2
offset, 5
{
Subtitle1, 3
Subtitle2, 4
{
SubSubtitle1, 5
...
}
}
}

Then from when the offset keyword is defined, the page number is automatically added up. By this solution, one only needs to copy the page number from the content.

siddharthpant / booky Goto Github PK

booky's People

Stargazers

Watchers

Forkers

booky's Issues

Feature request: add function to reformat exported pdftk bookmarks to booky format

Combine to single file for linux.

quick way/tips to prepare TOC text file into booky format?

keeps failing

suggest to add the offset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent