Giter VIP home page Giter VIP logo

php-archive's Introduction

PHPArchive - Pure PHP ZIP and TAR handling

This library allows to handle new ZIP and TAR archives without the need for any special PHP extensions (gz and bzip are needed for compression). It can create new files or extract existing ones.

To keep things simple, the modification (adding or removing files) of existing archives is not supported.

Install

Use composer:

php composer.phar require splitbrain/php-archive

Usage

The usage for the Zip and Tar classes are basically the same. Here are some examples for working with TARs to get you started.

Check the API docs for more info.

require_once 'vendor/autoload.php';
use splitbrain\PHPArchive\Tar;

// To list the contents of an existing TAR archive, open() it and use
// contents() on it:
$tar = new Tar();
$tar->open('myfile.tgz');
$toc = $tar->contents();
print_r($toc); // array of FileInfo objects

// To extract the contents of an existing TAR archive, open() it and use
// extract() on it:
$tar = new Tar();
$tar->open('myfile.tgz');
$tar->extract('/tmp');

// To create a new TAR archive directly on the filesystem (low memory
// requirements), create() it:
$tar = new Tar();
$tar->create('myfile.tgz');
$tar->addFile(...);
$tar->addData(...);
...
$tar->close();

// To create a TAR archive directly in memory, create() it, add*()
// files and then either save() or getArchive() it:
$tar = new Tar();
$tar->setCompression(9, Archive::COMPRESS_BZIP);
$tar->create();
$tar->addFile(...);
$tar->addData(...);
...
$tar->save('myfile.tbz'); // compresses and saves it
echo $tar->getArchive(); // compresses and returns it

Differences between Tar and Zip: Tars are compressed as a whole, while Zips compress each file individually. Therefore you can call setCompression before each addFile() and addData() function call.

The FileInfo class can be used to specify additional info like ownership or permissions when adding a file to an archive.

php-archive's People

Contributors

jgmdev avatar mferaru-bd avatar mreiden avatar peter279k avatar phallobst avatar splitbrain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-archive's Issues

Windows issues

I created a new branch windowstests to run the tests on a windows system and some tests are failing now. Some of the fails may be simply wrong assumptions about file paths in the tests. However some might point to other problems. Eg. the library is currently detecting absolute paths by checking for a leading '/' only. On Windows systems that should probably take something like 'D:' into account, too.

Help by windows users welcome. CC @fietserwin

Query: Are the .tars generated here free of the ustar size limitations?

I took a cursory look at the source and see references to ustar and posix, but no clear statement/s regarding if these tars are coded to the more recent, and unlimited in size, tar file standards.

Phar is for all intents and purposes abandoned and stuck on ustar 8GB limits so if this does allow larger file sizes that would be not only great but should get more promotion all around.

32bit PHP Problem

I have this weird usage of this class to create backup on a data storage server, I know its strange but I have some special needs and I decided to go with PHP. Now the problem is when I attempt to store a 3GB file which doesn't work because I get "File size exceeded". Basically I know that PHP is limited to 2GB because of the integer limit.

Is there any work around to tar such files?

P.S.: I'm not using in-memory files, rather the "directly on file system" method because I'm aware of the issues using memory for large data.

PAX typeFlag 'x'

I have encountered an issue when adding filename in format "._4слайд-150x150.jpg" , the linux tar utility would mark them with typeFlag x, which is similar to the LongLink typeFlag L. This breaks the archive extraction and the generated error would be "Header does not match it's checksum for"

Since the TAR class supports ustar format, it seems it's bound to support pax type as file, so a quick way for fixing this would be to replace this code

// Handle Long-Link entries from GNU Tar
        if ($return['typeflag'] == 'L' ) {
            // following data block(s) is the filename
            $filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
            // next block is the real header
            $block  = $this->readbytes(512);
            $return = $this->parseHeader($block);

            // overwrite the filename
		$return['filename'] = $filename;
        }

with

// Handle Long-Link entries from GNU Tar
        if ($return['typeflag'] == 'L' || $return['typeflag'] == 'x') {
            // following data block(s) is the filename
            $filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
            // next block is the real header
            $block  = $this->readbytes(512);
            $return = $this->parseHeader($block);

            // overwrite the filename
            if($return['typeflag'] == 'L')
            {
				$return['filename'] = $filename;
			}
        }

in the protected function parseHeader($block)

I have tested this and it works fine from processing records with typeFlag x , should i do a pull request?

I am attaching as well the tgz archive i've used for testing
test.tgz.zip

Tar creates archive corrupted?

This is the code that I have used:

<?php
require 'vendor/autoload.php';
$callback = function( $file ) {
    return true;
};
$dir = new RecursiveIteratorIterator( new RecursiveDirectoryIterator( __DIR__ . DIRECTORY_SEPARATOR . 'src', FilesystemIterator::SKIP_DOTS ) );

$archive = new \splitbrain\PHPArchive\Tar();
$archive->setCompression( 9, \splitbrain\PHPArchive\Archive::COMPRESS_NONE );
$archive->create( 'foo.tar' );

/**
 * @var string $filename
 * @var SplFileInfo $file
 */
foreach ( $dir as $filename => $file ) {
    $relative = str_replace( __DIR__ . DIRECTORY_SEPARATOR . 'src' . DIRECTORY_SEPARATOR, '', $filename );
    $relative = str_replace( '\\', '/', $relative );
    echo 'Adding ' . $relative . "...\n";
    $archive->addFile( $filename, $relative );
}
$archive->close();

It only adds the first file and the rest just silently fails although there are no errors appearing. The first file doesn't have contents in the archive (Test/Test.php).

In memory .gz writing invalid files

The tar.gz file created using the following cannot be opened

$tar = new Tar();
$tar->setCompression(6, Tar::COMPRESS_GZIP);
$tar->create();
$tar->addData('package.json', $content);
$tar->save('foo.tar.gz');

I noticed internally the getArchive() method calls gzcompress which uses ZLIB compression. This is wrong, it should be calling gzencode which uses GZIP. Proof of concept:

$tar = new Tar();
$tar->setCompression(6, Tar::COMPRESS_NONE);
$tar->create();
$tar->addData('package.json', $content);
file_put_contents($file, gzencode($tar->getArchive(), 6));

Not working with UTF-8 file or directory name

Hi,

I am trying to create a ZIP file, with files and folders, that contain UTF-8 filenames and directory names. ( for example, a file containing some cyrillic letters:

тестов_файл.txt

		$zip = new \splitbrain\PHPArchive\Zip();
		$zip->create('test.zip');
		$zip->addFile('D:/testdir/тестов_файл.txt');
		$zip->close();

The file in the zip is missing all the letters and corrupted: '_.txt'.

Other UTF-8 filenames other than Cyrillic also fails.

Tested with the latest version of the library on both Windows 10 22H2 and CloudLinux v.7

support bzip2 in ZIP

Zips support a variety of compression mechanisms. The lib currently assumes deflate (gzip) is always used and will fail otherwise. At least bzip2 should be supported as well.

Missing ArchiveIOException

First off - awesome work! This functionality is a thing to come by for modern PHP.

While I was using this, I saw that the ArchiveIOException was not included - I was able to easily rectify the problem by adding it, but was unsure if there was a reason it was omitted, or if you had more specialized function in mind for that exception.

Also - any plans to tag a 1.0.0 release so its easier to include in other packages?

Again, awesome work!

version 1.0.0

Hey @splitbrain - just wondering if you were considering releasing a 1.0.0 on packagist anytime soon? We discussed it briefly in that last issue I opened. Let me know if you need any help!

Thanks again for the great library!

fopen() fails on Windows

A WordPress plugin that uses your library consistently gived an error when the tar file being created tried to add a directory to a tar archive. This because a call to fopen() a folder on Windows will fail (actually the docs say it may succeed on a folder: refsect1-function.fopen-notes, apparently on Linux it succeeds but that is not guaranteed, on Windows it doesn't)

However as a directory does not contain data and we only want to write the header block with a.o. the access rights, there’s no need to make that call to fopen(), nor write any contents block as a tar archive expects 0 or more file contents blocks after the header block, so it is correct to not write anything to the archive when the file size is 0 (empty file or directory).

I changed that part of code to only call fopen() when it is a file that is not empty. This prevents the fopen on Windows on a folder and it saves a few system calls (fopen(), feof(), fclose() for each folder and empty file) on all OS's.

php-archive/src/Tar.php, method addFile(),line 249 and further:
`

    if ($this->closed) {
        throw new ArchiveIOException('Archive has been closed, files can no longer be added');
    }

    // create file header
    $this->writeFileHeader($fileinfo);

    // write data, but only if we have data to write.
    // note: on Windows fopen() on a directory will fail, so we prevent
    // errors on Windows by testing if we have data to write.
    if (!$fileinfo->getIsdir() && $fileinfo->getSize() > 0) {
        $read = 0;
        $fp = @fopen($file, 'rb');
        if (!$fp) {
            throw new ArchiveIOException('Could not open file for reading: ' . $file);
        }
        while (!feof($fp)) {
            $data = fread($fp, 512);
            $read += strlen($data);
            if ($data === false) {
                break;
            }
            if ($data === '') {
                break;
            }
            $packed = pack("a512", $data);
            $this->writebytes($packed);
        }
        fclose($fp);

        if ($read != $fileinfo->getSize()) {
            $this->close();
            throw new ArchiveCorruptedException("The size of $file changed while reading, archive corrupted. read $read expected ".$fileinfo->getSize());
        }
    }
    ...`

Encoding issue when extracting files with 'umlaut'

If you try to decompress an archive that contains files with umlauts (e.g: broschüre.pdf) the file will look something like brosch,,re.pdf after decompression.

We expect the issue to be located around line ~491 in Zip.php
The output of header['filename'] after the following line has already lost the umlaut.

$header['filename'] = fread($this->fh, $header['filename_len']);

43 extra bytes at beginning or within zipfile

Checking a zip file created with the Zip library using the linux tool zipinfo will print above warning currently (it seems 43 bytes are added per file). The zip will extract just fine, but it might still be worth figuring out where it goes wrong.

zipinfo trial.zip 
Archive:  trial.zip
Zip file size: 121 bytes, number of entries: 1
warning [trial.zip]:  43 extra bytes at beginning or within zipfile
  (attempting to process anyway)
-rw----     1.4 fat        9 b- defN 15-Jun-29 21:32 test.txt
1 file, 9 bytes uncompressed, 7 bytes compressed:  22.2%

Add callbacks for verbosity/progress

It would be nice to be able to set a callback that is called for each file being extracted. The callback should receive the following info:

  • file name of file currently being processed
  • number of files to be extracted (do we know?)
  • current file's number

7zip extract issue

Hi there! I am following a bug reported here https://github.com/ovidiul/XCloner-Wordpress/issues/2 regarding 7zip(I am using your class in a wordpress backup plugin), and it seems for some reason, when adding a directory to the tar archive, it's indicated size is not zero, so maybe it would be a good idea to force the directory size to zero before writing the TAR file headers?

Please see my commit here ovidiul/XCloner-Wordpress@9851d8f , it seems to have fixed the error with 7zip on windows

Thank you for your time and effort, maybe you can give me some more insight whether there is a need to have a size greater than zero for directories?

Use yield in contents() for lower memory usage

First thanks for the work on such a nice library! Now to the issue:

The contents() function generates a whole array of FileInfo objects which can use a lot of ram if the file been read contains lots of entries. It seems that the yield keyword was introduced since PHP 5.5 (generators - if that matters) and it is a great way of iterating over a list of data without having to store it inside an array first. Basically the yield keywords does all the magic as shown below:

public function contents()
{
    if ($this->closed || !$this->file) {
        throw new ArchiveIOException('Can not read from a closed archive');
    }
    $result = array();
    while ($read = $this->readbytes(512)) {
        $header = $this->parseHeader($read);
        if (!is_array($header)) {
            continue;
        }
        $this->skipbytes(ceil($header['size'] / 512) * 512);
        yield $this->header2fileinfo($header);
    }
    $this->close();
}

// Can be iterated normally but each element is generated one at a time
foreach($archive->contents() as $object)
{
    //...
}

If returning an array is the intended behavior or to not break backward compatibility one can also introduce a new contents function with a suffix like contentsGenerator() or contentsYield(), not sure what whould be the best naming option here :)

Files cannot be opened under OS X

This was reported for the predecessor of this library currently used in DokuWiki.

I've ZipLib.class.php within a plugin to create and return a .zip file. The files that are produced work fine under Windows and from the command line under OS X, but they cannot be opened using the Archive Utility within Finder - an unhelpful error message is produced!

[1] reports a similar problem caused by the Zend code adding a "data descriptor" which is not declared in the "general purpose bit flag" of the .zip archive that's created. This article recommends removing this data descriptor to fix the problem, and, indeed, removing (line 149 for me):

$fr .= pack('V', $crc).pack('V', $c_len).pack('V', $unc_len);

does fix the issue. The article goes on to say that Apple's interpretation is correct (i.e. the produced .zip is not technically valid), but does note that other .zip utilities simply ignore the spurious data descriptor.

[1] http://ocportal.com/site/news/view/chris_grahams_blog/problem_between_the_mac_2.htm

Set Password

As title, I want to add the features about setPassword to encrypt the specific archive file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.