Giter VIP home page Giter VIP logo

tcpdi_parser's Introduction

tcpdi_parser

Parser for use with TCPDI, based on TCPDF_PARSER. Supports PDFs up to v1.7.

See pauln/tcpdi for installation and usage instructions.

tcpdi_parser's People

Contributors

audiomason avatar pauln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tcpdi_parser's Issues

Uninitialized string offset on line 828

Uninitialized string offset: 59 application/third_party/tcpdf/tcpdi_parser.php 828
The line on 828

$frag = $data{$offset} . @$data{$offset+1} . @$data{$offset+2} . @$data{$offset+3};

Should change to

$frag = isset($data{$offset}) ? $data{$offset} : '';
                $frag .= isset($data{$offset+1}) ? $data{$offset+1} : '';
                $frag .= isset($data{$offset+2}) ? $data{$offset+2} : '';
                $frag .= isset($data{$offset+3}) ? $data{$offset+3} : '';

PDF merging fails on some 1.5 version documents

Notice: Undefined offset: 1 in E:\Projects\pauln\tcpdi_parser.php on line 587
Notice: Undefined offset: 2 in E:\Projects\pauln\tcpdi_parser.php on line 587
Notice: Undefined offset: 3 in E:\Projects\pauln\tcpdi_parser.php on line 587
Notice: Undefined offset: 4 in E:\Projects\pauln\tcpdi_parser.php on line 587

line no 587 in tcpdi_parser.php is
$sdata[$k][$c] += ($row[$i] << (($wb[$c] - 1 - $b) * 8));

here is the link for sample pdf
https://www.dropbox.com/s/8lgglas6kbacsyg/R-intro.pdf?dl=0

Illegal string offset 'Type'

Dear Paul,

I can't merge the attached file because it stops with "Illegal string offset 'Type" at line 704 of the tcpdi_parser file.

protected function getRawObject($offset=0, $data=null) {
if ($data == null) {
$data =& $this->pdfdata;
}
$objtype = ''; // object type to be returned
$objval = ''; // object value to be returned
// skip initial white space chars: \x00 null (NUL), \x09 horizontal tab (HT), \x0A line feed (LF), \x0C form feed (FF), \x0D carriage return (CR), \x20 space (SP)
while (strspn($data{$offset}, "\x00\x09\x0a\x0c\x0d\x20") == 1) {
$offset++;
}

Can you help me in this problem?

8436.pdf

Allowed memory size exhausted

Hello

I think I have a similar issue to #8 -

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 57065 bytes) in /var/www/html/rmtest/tcpdi_parser.php on line 852

I'm using raw data, but if you would like the pdf I can send it privately if required.

All the best

Ken

PHP v7.4.2 error: Trying to access array offset on value of type bool

File          : tcpdi_parser.php
Version.      : 1.1
Line.         : 1429 
Method.       :  _getPageRotation()
private function _getPageRotation($obj) { // $obj = /Page
        $obj = $this->getObjectVal($obj);
        if (isset ($obj[1][1]['/Rotate'])) {
            $res = $this->getObjectVal($obj[1][1]['/Rotate']);
            if ($res[0] == PDF_TYPE_OBJECT)
                return $res[1];
            return $res;
        } else {
            if (!isset ($obj[1][1]['/Parent'])) {
                return false;
            } else {
                $res = $this->_getPageRotation($obj[1][1]['/Parent']);
               // <<<<< -------   LINE 1429 
                if ($res && $res[0] == PDF_TYPE_OBJECT)
                    return $res[1];
                return $res;
            }
        }
    } 

FIXED With:

 if ($res && $res[0] == PDF_TYPE_OBJECT)

Blank page when writing on a v.1.5 PDF file

Hi,

I'm trying to write on a PDF file v.1.5, but it doesn't work. The original file has 7 pages, the new one has only one blank page except for a black line on the top. For privacy reason I can't attach the file.

I'm using a try - catch block to catch exceptions but I don't get any error...
Here is my code:
`
require('../../../lib/tcpdf/tcpdf.php');
require('../../../lib/tcpdf/tcpdi.php');
try{
$pdf = new TCPDI();
$pageCount = $pdf->setSourceData($content);

$pdf->SetDrawColor(255,0,0);
$pdf->SetTextColor(255,0,0);
$pdf->SetPrintFooter(false);

$pdf->SetFont('helvetica','',8);

for ($pageNo = 1; $pageNo <= $pageCount; $pageNo++) {
	$tplIdx = $pdf->importPage($pageNo);

	$size = $pdf->getTemplateSize($tplIdx);
	$w = $size[w];
	$h = $size[h];
					
	if ($h > $w ){ // Il PDF è Portrait 
		$pdf->AddPage("P");
	}else{	// Il PDF è Landscape
		$pdf->AddPage("L");
	}
					
	$pdf->useTemplate($tplIdx);							
	$pdf->SetXY(0.2,5,true);
	$larghezzaFoglio = $pdf->getPageWidth();
	$pdf->Cell($larghezzaFoglio-0.4, 5, "  ".$stringaTimbro, 1,1,'L');
}
$pdf->Output($name, "D");

} catch (Exception $e) {
echo($e);
}
`

I'm using FPDF_TPL - Version 1.2.3, TCPDF Version 6.2.12, TCPDF parser Version 1.0.16, TCPDI parser Version 1.1, TCPDI - Version 1.0.
Please let me know if you need more informations.

TCPDF_PARSER ERROR: Unknown PNG predictor 1

Hi,

I'm trying to use your tcpdi and tcpdi_parser for handling PDF files in PHP but I have some issue. I have old PDFs in 1.4 version, they're okay. But I have new PDFs in 1.5 version and there is some issue with them. I've got the following error message when I'm trying to import them with tcpdi:
TCPDF_PARSER ERROR: Unknown PNG predictor 1

Here is a code snipplet I used for importing:

setSourceFile($tempfile); //$tempfile is the 1.5 version PDF to import ?>

My php script stops running after the $pagecount row.

PHP 7.2 error: A non-numeric value encountered (tcpdi_parser.php( line 1052 ))

I've got a special set of pdfs which cause this error in the following line:

$this->objstreamobjs[$ints[$j-1]] = array($key, $ints[$j]+$first);

It should be related to the warning here: http://php.net/manual/en/migration71.other-changes.php

According to XDebug it tries to do the following but fails:

$this->objstreamobjs["x��XmO#G��޿��/�Ec�K�[�EbḠܲ(���"] = [[3, 0], 218];

Comments in trailer cause parser to fail

This trailer section will cause parsing to fail with an Unable to find trailer error:

trailer
<</Root 71 0 R/ID [<301161dc6cb4aa570159c409124baab9>]/Info 72 0 R/Size 73>>
%comments-here
startxref
143557
%%EOF

The regex pattern in line 408 can be changed to allow comment lines:
old:

'/trailer[\s]<<(.)>>[\s][\r\n]+startxref[\s][\r\n]+/isU'

new:

'/trailer[\s]<<(.)>>[\s][\r\n]+(?:[%].[\r\n])startxref[\s][\r\n]+/isU'

I'm not 100% certain that comments are allowed in the trailer section, but I've encountered some files that have them. These files appear to have been created by iText.

Uninitialized string offset: 34

I've been using this library for a while and has been great so far. Today I got an error after a user tried to upload a PDF made using LaTeX.

A Screenshot of the error:

image

Here are the PDF properties:
image

An here you can download the PDF file: https://goo.gl/SJyuG2

Hope you guys can help.

Running out of memory in tcpdi_parser

I've been using this package for over a year now and suddenly started getting memory exhaustion errors
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in <[...]>/tcpdi_parser.php on line 882
Here's my code

public function downloadFiles($files, $name = 'name.pdf'){
      $pdf = new \TCPDI();
      $pdf->SetPrintHeader(false);
      $pdf->SetPrintFooter(false);
      $filesArr = [];
      $pdf->SetTitle($name);
      foreach($files as $f){
        $filePath = $f->getFirstMedia('pdf')->getFullUrl();
        $width = $f->sheetsize->width;
        $height = $f->sheetsize->height;
        $port_land = ($width > $height) ? "landscape" : "portrait";
        $pdf->AddPage($port_land, array($width, $height));

  	$pageCount = $pdf->setSourceFile($filePath);
        for($i = 1; $i <= $pageCount; $i+=1)
        {
          $tplId = $pdf->importPage($i);
          $pdf->useTemplate($tplId, 0, 0, $width);
          if($pageCount > $i){
            $pdf->AddPage();
          }
        }

      }
      $name = str_replace('.pdf', '', $name).'.pdf';
      $pdf->Output($name, "I");
    }

I'm attaching one of the pdfs that causes this error, I'm really lost at where should I even begin to look for the problem.
problems.pdf

Parse form fields...

I couldn't find any mention of this in the code, but does this currently support the ability to parse form fields and get their x,y positions or even better allow the ability fill them in?

PHP 7.2 error: count(): Parameter must be an array or an object that implements Countable

Since switching to PHP 7.2.4, we discovered this problem with our PDF merging tool, that makes use of this library.

Stacktrace:

[2018-04-26 16:40:24] local.ERROR: count(): Parameter must be an array or an object that implements Countable {"userId":123,"email":"[email protected]","exception":"[object] (ErrorException(code: 0): count(): Parameter must be an array or an object that implements Countable at /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi_parser.php:486)
[stacktrace]
#0 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi_parser.php(486): Illuminate\\Foundation\\Bootstrap\\HandleExceptions->handleError(2, 'count(): Parame...', '/Users/username...', 486, Array)
#1 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi_parser.php(356): tcpdi_parser->decodeXrefStream(3979271, Array)
#2 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi_parser.php(195): tcpdi_parser->getXrefData()
#3 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi.php(122): tcpdi_parser->__construct('%PDF-1.6\
%\\xE2\\xE3\\xCF\\xD3\
...', '/Users/username...')
#4 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/tcpdf/tcpdi.php(89): TCPDI->_getPdfParser('/Users/username...')
#5 /Users/username/dev/projectname/vendor/lynx39/lara-pdf-merger/src/LynX39/LaraPdfMerger/PdfManage.php(60): TCPDI->setSourceFile('/Users/username...')
#6 /Users/username/dev/projectname/app/Http/Controllers/ExportController.php(56): LynX39\\LaraPdfMerger\\PdfManage->merge('browser', '123-12-12333223...')
#7 [internal function]: App\\Http\\Controllers\\ExportController->App\\Http\\Controllers\\{closure}()
#8 /Users/username/dev/projectname/vendor/symfony/http-foundation/StreamedResponse.php(114): call_user_func(Object(Closure))
#9 /Users/username/dev/projectname/vendor/symfony/http-foundation/Response.php(367): Symfony\\Component\\HttpFoundation\\StreamedResponse->sendContent()
#10 /Users/username/dev/projectname/public/index.php(58): Symfony\\Component\\HttpFoundation\\Response->send()
#11 /Users/username/.composer/vendor/laravel/valet/server.php(147): require('/Users/username...')
#12 {main}
"}

The line in tcpdi_parser.php is this one:

    } elseif (($key == '/Index') AND ($v[0] == PDF_TYPE_ARRAY AND count($v[1] >= 2))) {

count($v[1] >= 2) is the cause of error here.

PHP 7.2 Backward incompatible changes:

http://php.net/manual/en/migration72.incompatible.php

The count() functions need to be replaced, or the objects used in them need to be Countable.

I don't have a solution to drop here yet. Any ideas, why the count is there?

copy paste?

If I just copy paste this instead of the parcer in mpdf , would it work?

Uninitialized string offset: 35 on line 725 tcpdi_parser.php

I get this error
Uninitialized string offset: 35 application/third_party/tcpdf/tcpdi_parser.php 725

The line on 725 is in method getRawObject

while (strspn($data[$offset], "\x00\x09\x0a\x0c\x0d\x20") == 1) {
    $offset++;
}

Some idea how to change it? I don't understand what getRawObject should return in this cases.

Stream length wrapped in an object in PDF-1.3 and PDF-1.4 causes infinite loop

I've been trying to import PDF files using the following code:

$pdf = new TCPDI(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);

// Import template
$pdf->AddPage ();
$pdf->setSourceFile ($path);
$idx = $pdf->importPage (1);
$pdf->useTemplate ($idx);

echo $pdf->Output ();

I had no problems with 1.5 and 1.7 PDF versions, but when I try it with 1.3 or 1.4 versions, the loop in getIndirectObject() never ends.

An example of a PDF not working: https://www.dropbox.com/s/9ax2lc5fed4erit/1.pdf

I've been trying to understand what is wrong, but I don't know enough about PDF formats.

Thanks

PDF 1.5 FILE PARSE PROBLEM

Dear Paul,
the parser crashes because it is not able to parse correctly this kind of PDF objects:

`.. 4 0 obj
[/Indexed
75 0 R
255
<000000 010101 020202 030303 040404 050505 060606 070707 080808 090909 0A0A0A 0B0B0B 0C0C0C 0D0D0D 0E0E0E 0F0F0F 101010 111111 121212 131313 141414 151515 161616 171717 181818 191919 1A1A1A 1B1B1B 1C1C1C 1D1D1D 1E1E1E 1F1F1F 202020 212121 222222 232323 242424 252525 262626 272727 282828 292929 2A2A2A 2B2B2B 2C2C2C 2D2D2D 2E2E2E 2F2F2F 303030 313131 323232 333333 343434 353535 363636 373737 383838 393939 3A3A3A 3B3B3B 3C3C3C 3D3D3D 3E3E3E 3F3F3F
404040 414141 424242 434343 444444 454545 464646 474747 484848 494949 4A4A4A 4B4B4B 4C4C4C 4D4D4D 4E4E4E 4F4F4F 505050 515151 525252 535353 545454 555555 565656 575757 585858 595959 5A5A5A 5B5B5B 5C5C5C 5D5D5D 5E5E5E 5F5F5F 606060 616161 626262 636363 646464 656565 666666 676767 686868 696969 6A6A6A 6B6B6B 6C6C6C 6D6D6D 6E6E6E 6F6F6F 707070 717171 727272 737373 747474 757575 767676 777777 787878 797979 7A7A7A 7B7B7B 7C7C7C 7D7D7D 7E7E7E 7F7F7F
808080 818181 828282 838383 848484 858585 868686 878787 888888 898989 8A8A8A 8B8B8B 8C8C8C 8D8D8D 8E8E8E 8F8F8F 909090 919191 929292 939393 949494 959595 969696 979797 989898 999999 9A9A9A 9B9B9B 9C9C9C 9D9D9D 9E9E9E 9F9F9F A0A0A0 A1A1A1 A2A2A2 A3A3A3 A4A4A4 A5A5A5 A6A6A6 A7A7A7 A8A8A8 A9A9A9 AAAAAA ABABAB ACACAC ADADAD AEAEAE AFAFAF B0B0B0 B1B1B1 B2B2B2 B3B3B3 B4B4B4 B5B5B5 B6B6B6 B7B7B7 B8B8B8 B9B9B9 BABABA BBBBBB BCBCBC BDBDBD BEBEBE BFBFBF
C0C0C0 C1C1C1 C2C2C2 C3C3C3 C4C4C4 C5C5C5 C6C6C6 C7C7C7 C8C8C8 C9C9C9 CACACA CBCBCB CCCCCC CDCDCD CECECE CFCFCF D0D0D0 D1D1D1 D2D2D2 D3D3D3 D4D4D4 D5D5D5 D6D6D6 D7D7D7 D8D8D8 D9D9D9 DADADA DBDBDB DCDCDC DDDDDD DEDEDE DFDFDF E0E0E0 E1E1E1 E2E2E2 E3E3E3 E4E4E4 E5E5E5 E6E6E6 E7E7E7 E8E8E8 E9E9E9 EAEAEA EBEBEB ECECEC EDEDED EEEEEE EFEFEF F0F0F0 F1F1F1 F2F2F2 F3F3F3 F4F4F4 F5F5F5 F6F6F6 F7F7F7 F8F8F8 F9F9F9 FAFAFA FBFBFB FCFCFC FDFDFD FEFEFE FFFFFF

]
endobj ...`

I have fixed it (only to avoid the crash) changing the **protected function getRawObject() from line 851 adding a check for an hypotetical fixed string object.

default: if (preg_match('/^([0-9]+)[\s]+([0-9]+)[\s]+([Robj]{1,3})/i', substr($data, $offset, 33), $matches) == 1) { if ($matches[3] == 'R') { // indirect object reference $objtype = PDF_TYPE_OBJREF; $offset += strlen($matches[0]); $objval = array(intval($matches[1]), intval($matches[2])); } elseif ($matches[3] == 'obj') { // object start $objtype = PDF_TYPE_OBJECT; $objval = intval($matches[1]).'_'.intval($matches[2]); $offset += strlen ($matches[0]); } } elseif (($numlen = strspn($data, '0123456789ABCDEF', $offset)) > 0) { // fixed string object if ($numlen == 6) { $objval = substr($data, $offset, $numlen); $objtype = PDF_TYPE_STRING; $offset += 6; } elseif (($numlen = strspn($data, '+-.0123456789', $offset)) > 0) { // numeric object $objval = substr($data, $offset, $numlen); $objtype = (intval($objval) != $objval) ? PDF_TYPE_REAL : PDF_TYPE_NUMERIC; $offset += $numlen; } } elseif (($numlen = strspn($data, '+-.0123456789', $offset)) > 0) { // numeric object $objval = substr($data, $offset, $numlen); $objtype = (intval($objval) != $objval) ? PDF_TYPE_REAL : PDF_TYPE_NUMERIC; $offset += $numlen; } unset($matches); break;
Best regards
Francesco

Illegal string offset warnings

If a pdf has such an object like the followings, tcpdf_parser fails to continue parsing the data.

2 0 obj
<< /Type /Page % 1
   /Parent 1 0 R
   /MediaBox [ 0 0 839.314286 1186.971429 ]
   /Contents 4 0 R
   /Group <<
      /Type /Group
      /S /Transparency
      /I true
      /CS /DeviceRGB
   >>
   /Resources 3 0 R
>>

getRawObject() is expected to return an array which contains an object and its offset, but it currently returns an object without its offset if the pdf has % comments. It causes Illegal string offset warnings.

return $obj;

I've created a compact example. Would you please check it?
tcpdi_parser_issue.tar.gz

$ ls
sample.php	source.pdf	tcpdi_parser
$ php sample.php
PHP Warning:  Illegal string offset 'Parffo' in /Users/lancelot/Sandbox/PHP/tcpdi_parser_issue/tcpdi_parser/tcpdi_parser.php on line 712
PHP Warning:  Illegal string offset 'Parffp' in /Users/lancelot/Sandbox/PHP/tcpdi_parser_issue/tcpdi_parser/tcpdi_parser.php on line 712
PHP Warning:  Illegal string offset 'Parffq' in /Users/lancelot/Sandbox/PHP/tcpdi_parser_issue/tcpdi_parser/tcpdi_parser.php on line 712
...

Additionally, I've created a PR. I'll be grateful if you'd review it.

#23

TCPDI_PARSER ERROR: Invalid object reference: Array

Hello,
I am trying to add watermarks to a an existing PDF by using this library:

$pdf = new TCPDI(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);  
if (file_exists(WatermarkPDF::$tmp_file)){  
    $pagecount = $pdf->setSourceFile($tmp_file);  
} else {  
    clear();  
    return FALSE;  
}

Unfortuantely I am getting an error:

Notice (8): Array to string conversion [APP/Vendor/WatermarkPDF/tcpdi_parser.php, line 951]  
TCPDI_PARSER ERROR: Invalid object reference: Array

The file is a valid PDF (v1.5) created by Latex so I am wondering whats wrong here..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.