Giter VIP home page Giter VIP logo

pdf_hide's Introduction

PDF HIDE

This is a steganographic tool in Python for hiding data in PDF files

This tool is an ongoing effort to bring a novel open-source method of steganography to the public. It is able to embed arbitrary data in a covert way inside any PDF file containing enough text. As a result, no one but the intented recipient suspects the existence of embedded data. The same tool can then be used to extract the concealed data.

This project stems from research conducted at the University of Amsterdam, The Netherlands, in December 2012: Using Steganography to hide messages inside PDF files, written by Fahimeh Alizadeh, Nicolas Canceill, Sebastian Dabkiewicz and Diederik Vandevenne.

Basic usage

pdf_hide [-o <embedded.pdf>] embed <data_file> <innocent.pdf>
pdf_hide [-o <extracted_file>] extract <embedded.pdf>

Getting started

Please read the guide.

Requirements

This tool is a Python 3 program: it requires a basic Python 3 installation.

It requires QPDF in order to modify compressed PDF files.

Additionally, it requires GNU Make and pdflatex to build samples for the tests.

Setup

You can find the latest version packaged on the releases page. The current version is 0.0: tgzzip.

Alternatively, you can clone the git repository at: github.com/ncanceill/pdf_hide.git

You can run the tests with: make tests

You can install the package (as root) on your system's Python path with: make install or ./setup.py install

Project status

Current version is 0.0.

Please check the project status for more details.

Stability

The current version is STABLE. It may be run in production.

Contributions

General rule: any contributions are welcome.

Do not hesitate to drop an issue if you found a bug, if you either want to see a new feature or wish to suggest an improvement, or even if you simply have a question.

Please check the contribution status if you want to get involved.

License information

This project, including this README, distributes under GNU General Public License v3 from the Free Software Foundation.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see [http://www.gnu.org/licenses/].


Copyright (C) 2013 Nicolas Canceill

pdf_hide's People

Contributors

ncanceill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdf_hide's Issues

Changelog: v0.1

Version 0.1 changelog

This is a changelog for version 0.1 final release. It will be closed on the release day.

Please do not comment. If you want to contribute, please drop a new issue.

Changelog

  • Version 0.1 alpha — #16

Bug report: `--custom-range` and custom `--nbits` are incompatible in improved algo

Due to the current implementation, using the improvements with the LaTeX custom range option is incompatible with a --nbits option different than 4 (the default).

This causes #6 and #7 as a side-effect.

This is deeply rooted in the algo, because of the way the improved version parses the TJ operators. It could be included in a broader effort of further improving the embedding method.

Refactor code

Need pretty code with messaging, error handling, specifications...

Bug report: `--custom-range` does not support some data/key combinations with improved algo

May be related to #8

Description

When the data is 5 characters (10 bytes, need to test with 11, 12 is ok) or shorter, the improved algo with custom range enabled (and --no-random for embedding) does not find the FlagStr end code.

How to reproduce

  • Use key "abcd" with data "abcd" to get a bad end position
  • Use key "abcd" with data "a" to get an end position not found

For instance:

KEY="abcd"; echo -n "abcd" | ./pdf_hide -vv -o /tmp/test_e.pdf -i -k $KEY --custom-range embed --no-random - /tmp/test.pdf && ./pdf_hide -vv -o /tmp/test_o.pdf -i -k $KEY --custom-range extract /tmp/test_e.pdf

Changelog: v0.0b

Version 0.0 beta changelog

This is a changelog for version 0.0 beta. It will be closed on the release day.

Please do not comment. If you want to contribute, please drop a new issue.

Changelog

  • Refactored code - #2
  • Migrated to Python 3
  • Migrated to argparse - #3 #4 #5

Changelog: v0.0rc0

Version 0.0 rc0 changelog

This is a changelog for version 0.0 release candidate 0. It will be closed on the release day.

Please do not comment. If you want to contribute, please drop a new issue.

Changelog

  • Fixed bugs about custom range and nbits — #6 #7 #8
  • Fixed bugs about custom range — #14
  • Fixed bugs in tests automation — #13
  • Fixed bugs with terminal emulation — mainly c75e87f
  • Packaged portable code — #11
  • Added multi-encoding support for datafile — #12
  • Refactored code — mainly 38b6283 and af0df61

Bug report: Custom redundancy tests can fail for basic algo

May be related to an improper check of the redundancy parameter, or to poor handling during operator selection.

Description

When running the tests, the basic algorithm with custom redundancy may fail due to not finding any valid TJ op.

How to reproduce

Use random seed b'^\xbe\xa1\x12V\xbf\xfd\xf3E\xb1\xf5\x84 x\xc1\x02' in tests.

Bug report: shifted CheckStr with full improvements

EDIT: This is a duplicate of #8.

Testing with full improvements, using RANDOM_SEED=0, results in CheckStr being shifted by 1 when extracting.

Investigating...

See log:

test_algoi_full_embed (__main__.SpecialAlgoImprovedTestCase) ...
DEBUG:  
===== CONFIG =====
DEBUG:  == input: "../sample/test_long.pdf.qdf"
DEBUG:  == redundancy: 0.8444218515250481
DEBUG:  == bit depth: 6
DEBUG:  == using improvements: YES
INFO:   Key: "S3cr3|-"
INFO:   Embedding data, please wait...
DEBUG:  FlagStr1 (CheckStr) (20)    [8, 45, 49, 15, 4, 54, 28, 60, 10, 23, 10, 34, 35, 5, 34, 39, 26, 0, 9, 24]
DEBUG:  FlagStr2 (20)   [7, 41, 36, 27, 33, 46, 27, 55, 41, 14, 28, 53, 41, 35, 62, 0, 21, 30, 31, 21]
DEBUG:  Data (35)   [12, 19, 8, 51, 13, 3, 20, 54, 21, 6, 33, 41, 28, 52, 37, 51, 16, 16, 40, 61, 31, 5, 61, 31, 21, 4, 21, 19, 21, 5, 61, 31, 31, 3, 13]
DEBUG:  Jitter  0
INFO:   Done embedding.
INFO:   Output file: "../sample/test_long.pdf.out.fix.pdf"
DEBUG:  Embedded data (28)  "123456ThisIsA
=|__TEST__|="
DEBUG:  Total nb of TJ ops  7537
DEBUG:  Total nb of TJ ops used 75
DEBUG:  Total nb of TJ ops used for data    35
ok

test_algoi_full_extract (__main__.SpecialAlgoImprovedTestCase) ...
DEBUG:  
===== CONFIG =====
DEBUG:  == input: "../sample/test_long.pdf.out.fix.pdf.qdf"
DEBUG:  == redundancy: 0.8444218515250481
DEBUG:  == bit depth: 6
DEBUG:  == using improvements: YES
INFO:   Key: "S3cr3|-"
INFO:   Input file: "../sample/test_long.pdf.out.fix.pdf"
INFO:   Extracting data, please wait...
DEBUG:  FlagStr (20)    [7, 41, 36, 27, 33, 46, 27, 55, 41, 14, 28, 53, 41, 35, 62, 0, 21, 30, 31, 21]
DEBUG:  End position found  71
DEBUG:  Data Checksum (20)  [22, 57, 5, 40, 55, 44, 36, 8, 55, 38, 1, 55, 43, 39, 49, 2, 43, 51, 55, 54]
DEBUG:  CheckStr (20)   [45, 49, 15, 4, 54, 28, 60, 10, 23, 10, 34, 35, 5, 34, 39, 26, 0, 9, 24, 12]
DEBUG:  Data (32)   [19, 8, 51, 13, 3, 20, 54, 21, 6, 33, 41, 28, 52, 37, 51, 16, 16, 40, 61, 31, 61, 31, 21, 4, 21, 19, 21, 61, 31, 31, 3, 13]
ERROR:  CheckStr does not match embedded data
M�\Ò\ÐB�_õõDU5}}ðÍcorrupted) (24)   L�Í
FAIL

Changelog: v0.0

Version 0.0 changelog

This is a changelog for version 0.0 final release. It will be closed on the release day.

Please do not comment. If you want to contribute, please drop a new issue.

Changelog

  • Cosmetic
  • Fixed a bug with some combinations of options — 2c86318

Changelog: v0.1a

Version 0.1 alpha changelog

This is a changelog for version 0.1 alpha. It will be closed on the release day.

Please do not comment. If you want to contribute, please drop a new issue.

Changelog

Please send me a sample data_file

Hi

Please send me a sample data_file in your command as follows :
python3 pdf_hide -k test -o target.pdf embed --no-random data_file mozilla.pdf -v

my data_file contents is :
This book is authorized to Kris

The output :
File "pdf_hide", line 195, in
main()
File "pdf_hide", line 170, in main
result = ps.embed(args.data.read(),args.key,norandom=args.norandom)
File "/home/hp/pdf_hide-master/pdfhide/pdf_algo.py", line 381, in embed
m = re.match(r'[(.*?)][ ]?TJ',line_[k:])
KeyboardInterrupt
Its taking long long time, waiting indefinitely. Please help

Regards
kris

Running on windows

Hi!

I understand this program is incompatible with Windows: I tried running it and it gave me an error related to qpdf. So I installed qpdf but now it says this:

====================
This is PDF_HIDE v0.0
====================
Please enter key:
'rm' is not recognized as an internal or external command,
operable program or batch file.
ERROR:  Not enough space available (only 0 available, 220 needed)

I understand rm is used to delete a temporary file, but what about Not enough space available?

Can you please provide some directions on how can one run this on Win10?

Many thanks!

ERROR: Not enough space available, though adequate disk space available!

Hi,

Trying to test the script without installing (directly from the pdf_hide executable from git repo).

$ ./pdf_hide -o ~/Downloads/tomb_manpage_pdf_hide.pdf embed ~/Downloads/pdf_hide_embedded.txt ~/Downloads/tomb_manpage.pdf 
====================
This is PDF_HIDE v0.0
====================
Please enter key: 
ERROR:	Not enough space available (only 15 available, 150 needed)

Strange that it reports of inadequate space even when the 4.4TB of space available!

Debug run shows:

$ pdf_hide -vvvvv -i -o tomb_manpage_pdf_hide.pdf embed ~/Downloads/pdf_hide_embedded.txt ~/Downloads/tomb_manpage.pdf
====================
This is PDF_HIDE v0.0
====================

PDF_HIDE  Copyright (C) 2013  Nicolas Canceill
Distributed under GNU General Public License v3
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to
redistribute it under certain conditions.
Please see LICENSE.md or http://www.gnu.org/licenses/ for details.

Please enter key: 
INFO:	Input file: "/home/zenny/Downloads/tomb_manpage.pdf"
INFO:	Embedding data, please wait...
DEBUG:	===== BEGIN CONFIG =====
DEBUG:	redundancy
	float
	0.1
DEBUG:	input
	str	[31]
	"tomb_manpage_pdf_hide.pdf.qdf"
DEBUG:	bit depth
	int
	4
DEBUG:	improvements
	bool
	True
DEBUG:	FlagStr1 (CheckStr)
	list	[20]
	[10, 3, 11, 12, 14, 3, 4, 9, 11, 3, 8, 8, 9, 2, 14, 3, 1, 13, 9, 12]
DEBUG:	Data to embed
	bytes	[55]
	b'Some random text.\n'
DEBUG:	Data to embed (binary)
	str	[440]
	01010100011010000110100101110011001000000110100101110011001000000110100101110011011100110111010101100101011001000010000001110100011011110010000001110011011100000110010101100011011010010110011001101001011000110010000001110000011001010111001001110011011011110110111000100000011101110110100101110100011010000010000001110011011100000110010101100011011010010110011001101001011000110010000001100101011011010110000101101001011011000010111000001010
DEBUG:	Data
	list	[110]
	[5, 4, 6, 8, 6, 9, 7, 3, 2, 0, 6, 9, 7, 3, 2, 0, 6, 9, 7, 3, 7, 3, 7, 5, 6, 5, 6, 4, 2, 0, 7, 4, 6, 15, 2, 0, 7, 3, 7, 0, 6, 5, 6, 3, 6, 9, 6, 6, 6, 9, 6, 3, 2, 0, 7, 0, 6, 5, 7, 2, 7, 3, 6, 15, 6, 14, 2, 0, 7, 7, 6, 9, 7, 4, 6, 8, 2, 0, 7, 3, 7, 0, 6, 5, 6, 3, 6, 9, 6, 6, 6, 9, 6, 3, 2, 0, 6, 5, 6, 13, 6, 1, 6, 9, 6, 12, 2, 14, 0, 10]
DEBUG:	FlagStr2
	list	[20]
	[1, 0, 13, 4, 0, 14, 6, 10, 5, 4, 3, 0, 0, 5, 2, 12, 13, 12, 2, 0]
DEBUG:	===== END CONFIG =====
ERROR:	Not enough space available (only 37 available, 150 needed

Cheers,

Bug report: FlagStr not found with full improvements

EDIT: This is a duplicate of #8.

Testing with full improvements, using RANDOM_SEED=123456, results in FlagStr not being found when extracting.

Investigating...

See log:

test_algoi_full_embed (__main__.SpecialAlgoImprovedTestCase) ...
DEBUG:  
===== CONFIG =====
DEBUG:  == input: "../sample/test_long.pdf.qdf"
DEBUG:  == redundancy: 0.8056271362589
DEBUG:  == bit depth: 6
DEBUG:  == using improvements: YES
INFO:   Key: "S3cr3|-"
INFO:   Embedding data, please wait...
DEBUG:  FlagStr1 (CheckStr) (20)    [8, 45, 49, 15, 4, 54, 28, 60, 10, 23, 10, 34, 35, 5, 34, 39, 26, 0, 9, 24]
DEBUG:  FlagStr2 (20)   [7, 41, 36, 27, 33, 46, 27, 55, 41, 14, 28, 53, 41, 35, 62, 0, 21, 30, 31, 21]
DEBUG:  Data (35)   [12, 19, 8, 51, 13, 3, 20, 54, 21, 6, 33, 41, 28, 52, 37, 51, 16, 16, 40, 61, 31, 5, 61, 31, 21, 4, 21, 19, 21, 5, 61, 31, 31, 3, 13]
DEBUG:  Jitter  0
INFO:   Done embedding.
INFO:   Output file: "../sample/test_long.pdf.out.fix.pdf"
DEBUG:  Embedded data (28)  "123456ThisIsA
=|__TEST__|="
DEBUG:  Total nb of TJ ops  7537
DEBUG:  Total nb of TJ ops used 75
DEBUG:  Total nb of TJ ops used for data    35
ok
test_algoi_full_extract (__main__.SpecialAlgoImprovedTestCase) ...
DEBUG:  
===== CONFIG =====
DEBUG:  == input: "../sample/test_long.pdf.out.fix.pdf.qdf"
DEBUG:  == redundancy: 0.8056271362589
DEBUG:  == bit depth: 6
DEBUG:  == using improvements: YES
INFO:   Key: "S3cr3|-"
INFO:   Input file: "../sample/test_long.pdf.out.fix.pdf"
INFO:   Extracting data, please wait...
DEBUG:  FlagStr (20)    [7, 41, 36, 27, 33, 46, 27, 55, 41, 14, 28, 53, 41, 35, 62, 0, 21, 30, 31, 21]
ERROR:  Ending code FlagStr not found
FAIL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.