Giter VIP home page Giter VIP logo

zipdetails's Introduction

Linux build Macos build Windows build Linux Docker build

NAME

zipdetails - display the internal structure of zip files

SYNOPSIS

zipdetails [options] zipfile.zip

DESCRIPTION

This program creates a detailed report on the internal structure of zip files. For each item of metadata within a zip file the program will output

  • the offset into the zip file where the item is located.
  • a textual representation for the item.
  • an optional hex dump of the item.

The program assumes a prior understanding of the internal structure of Zip files. You should have a copy of the zip file definition, APPNOTE.TXT, at hand to help understand the output from this program.

Default Behaviour

By default the program expects to be given a well-formed zip file. It will navigate the zip file by first parsing the zip Central Directory at the end of the file. If the Central Directory is found, it will then walk sequentally through the zip records starting at the beginning of the file. See "Advanced Analysis" for other processing options.

If the program finds any structural or portability issues with the zip file it will print a message at the point it finds the issue and/or in a summary at the end of the output report. Whilst the set of issues that can be detected it exhaustive, don't assume that this program can find all the possible issues in a zip file - there are likely edge conditions that need to be addressed.

If you have suggestions for use-cases where this could be enhanced please consider creating an enhancement request (see "SUPPORT").

Date & Time fields

Date/time fields found in zip files are displayed in local time. Use the --utc option to display these fields in Coordinated Universal Time (UTC).

Filenames & Comments

Filenames and comments are decoded/encoded using the default system encoding of the host running zipdetails. When the sytem encoding cannot be determined cp437 will be used.

The exceptions are

  • when the Language Encoding Flag is set in the zip file, the filename/comment fields are assumed to be encoded in UTF-8.
  • the definition for the metadata field implies UTF-8 charset encoding

See "Filename Encoding Issues" and "Filename & Comment Encoding Options" for ways to control the encoding of filename/comment fields.

OPTIONS

General Options

  • -h, --help

    Display help

  • --redact

    Obscure filenames and payload data in the output. Handy for the use case where the zip files contains sensitive data that cannot be shared.

  • --scan

    Pessimistically scan the zip file loking for possible zip records. Can be error-prone. For very large zip files this option is slow. Consider using the --walk option first. See "Advanced Analysis Options"

  • --utc

    By default, date/time fields are displayed in local time. Use this option to display them in in Coordinated Universal Time (UTC).

  • -v

    Enable Verbose mode. See "Verbose Output".

  • --version

    Display version number of the program and exit.

  • --walk

    Optimistically walk the zip file looking for possible zip records. See "Advanced Analysis Options"

Filename & Comment Encoding Options

See "Filename Encoding Issues"

  • --encoding name

    Use encoding "name" when reading filenames/comments from the zip file.

    When this option is not specified the default the system encoding is used.

  • --no-encoding

    Disable all filename & comment encoding/decoding. Filenames/comments are processed as byte streams.

    This option is not enabled by default.

  • --output-encoding name

    Use encoding "name" when writing filename/comments to the display. By default the system encoding will be used.

  • --language-encoding, --no-language-encoding

    Modern zip files set a metadata entry in zip files, called the "Language encoding flag", when they write filenames/comments encoded in UTF-8.

    Occasionally some applications set the Language Encoding Flag but write data that is not UTF-8 in the filename/comment fields of the zip file. This will usually result in garbled text being output for the filenames/comments.

    To deal with this use-case, set the --no-language-encoding option and, if needed, set the --encoding name option to encoding actually used.

    Default is --language-encoding.

  • --debug-encoding

    Display extra debugging info when a filename/comment encoding has changed.

Message Control Options

  • --messages, --no-messages

    Enable/disable the output of all info/warning/error messages.

    Disabling messages means that no checks are carried out to check that the zip file is well-formed.

    Default is enabled.

  • --exit-bitmask, --no-exit-bitmask

    Enable/disable exit status bitmask for messages. Default disabled. Bitmask values are: 1 for info, 2 for warning and 4 for error.

Default Output

By default zipdetails will output each metadata field from the zip file in three columns.

  1. The offset, in hex, to the start of the field relative to the beginning of the file.
  2. The name of the field.
  3. Detailed information about the contents of the field. The format depends on the type of data:
    • Numeric Values

      If the field contains an 8-bit, 16-bit, 32-bit or 64-bit numeric value, it will be displayed in both hex and decimal -- for example "002A (42)".

      Note that Zip files store most numeric values in little-endian encoding (there area few rare instances where big-endian is used). The value read from the zip file will have the endian encoding removed before being displayed.

      Next, is an optional description of what the numeric value means.

    • String

      If the field corresponds to a printable string, it will be output enclosed in single quotes.

    • Binary Data

      The term Binary Data is just a catch-all for all other metadata in the zip file. This data is displayed as a series of ascii-hex byte values in the same order they are stored in the zip file.

For example, assuming you have a zip file, test,zip, with one entry

$ unzip -l  test.zip
Archive:  test.zip
Length      Date    Time    Name
---------  ---------- -----   ----
    446  2023-03-22 20:03   lorem.txt
---------                     -------
    446                     1 file

Running zipdetails will gives this output

$ zipdetails test.zip

0000 LOCAL HEADER #1       04034B50 (67324752)
0004 Extract Zip Spec      14 (20) '2.0'
0005 Extract OS            00 (0) 'MS-DOS'
0006 General Purpose Flag  0000 (0)
     [Bits 1-2]            0 'Normal Compression'
0008 Compression Method    0008 (8) 'Deflated'
000A Modification Time     5676A072 (1450614898) 'Wed Mar 22 20:03:36 2023'
000E CRC                   F90EE7FF (4178503679)
0012 Compressed Size       0000010E (270)
0016 Uncompressed Size     000001BE (446)
001A Filename Length       0009 (9)
001C Extra Length          0000 (0)
001E Filename              'lorem.txt'
0027 PAYLOAD

0135 CENTRAL HEADER #1     02014B50 (33639248)
0139 Created Zip Spec      1E (30) '3.0'
013A Created OS            03 (3) 'Unix'
013B Extract Zip Spec      14 (20) '2.0'
013C Extract OS            00 (0) 'MS-DOS'
013D General Purpose Flag  0000 (0)
     [Bits 1-2]            0 'Normal Compression'
013F Compression Method    0008 (8) 'Deflated'
0141 Modification Time     5676A072 (1450614898) 'Wed Mar 22 20:03:36 2023'
0145 CRC                   F90EE7FF (4178503679)
0149 Compressed Size       0000010E (270)
014D Uncompressed Size     000001BE (446)
0151 Filename Length       0009 (9)
0153 Extra Length          0000 (0)
0155 Comment Length        0000 (0)
0157 Disk Start            0000 (0)
0159 Int File Attributes   0001 (1)
     [Bit 0]               1 'Text Data'
015B Ext File Attributes   81ED0000 (2179792896)
     [Bits 16-24]          01ED (493) 'Unix attrib: rwxr-xr-x'
     [Bits 28-31]          08 (8) 'Regular File'
015F Local Header Offset   00000000 (0)
0163 Filename              'lorem.txt'

016C END CENTRAL HEADER    06054B50 (101010256)
0170 Number of this disk   0000 (0)
0172 Central Dir Disk no   0000 (0)
0174 Entries in this disk  0001 (1)
0176 Total Entries         0001 (1)
0178 Size of Central Dir   00000037 (55)
017C Offset to Central Dir 00000135 (309)
0180 Comment Length        0000 (0)
#
# Done

Verbose Output

If the -v option is present, the metadata output is split into the following columns:

  1. The offset, in hex, to the start of the field relative to the beginning of the file.
  2. The offset, in hex, to the end of the field relative to the beginning of the file.
  3. The length, in hex, of the field.
  4. A hex dump of the bytes in field in the order they are stored in the zip file.
  5. A textual description of the field.
  6. Information about the contents of the field. See the description in the "Default Output" for more details.

Here is the same zip file, test.zip, dumped using the zipdetails -v option:

$ zipdetails -v test.zip

0000 0003 0004 50 4B 03 04 LOCAL HEADER #1       04034B50 (67324752)
0004 0004 0001 14          Extract Zip Spec      14 (20) '2.0'
0005 0005 0001 00          Extract OS            00 (0) 'MS-DOS'
0006 0007 0002 00 00       General Purpose Flag  0000 (0)
                           [Bits 1-2]            0 'Normal Compression'
0008 0009 0002 08 00       Compression Method    0008 (8) 'Deflated'
000A 000D 0004 72 A0 76 56 Modification Time     5676A072 (1450614898) 'Wed Mar 22 20:03:36 2023'
000E 0011 0004 FF E7 0E F9 CRC                   F90EE7FF (4178503679)
0012 0015 0004 0E 01 00 00 Compressed Size       0000010E (270)
0016 0019 0004 BE 01 00 00 Uncompressed Size     000001BE (446)
001A 001B 0002 09 00       Filename Length       0009 (9)
001C 001D 0002 00 00       Extra Length          0000 (0)
001E 0026 0009 6C 6F 72 65 Filename              'lorem.txt'
               6D 2E 74 78
               74
0027 0134 010E ...         PAYLOAD

0135 0138 0004 50 4B 01 02 CENTRAL HEADER #1     02014B50 (33639248)
0139 0139 0001 1E          Created Zip Spec      1E (30) '3.0'
013A 013A 0001 03          Created OS            03 (3) 'Unix'
013B 013B 0001 14          Extract Zip Spec      14 (20) '2.0'
013C 013C 0001 00          Extract OS            00 (0) 'MS-DOS'
013D 013E 0002 00 00       General Purpose Flag  0000 (0)
                           [Bits 1-2]            0 'Normal Compression'
013F 0140 0002 08 00       Compression Method    0008 (8) 'Deflated'
0141 0144 0004 72 A0 76 56 Modification Time     5676A072 (1450614898) 'Wed Mar 22 20:03:36 2023'
0145 0148 0004 FF E7 0E F9 CRC                   F90EE7FF (4178503679)
0149 014C 0004 0E 01 00 00 Compressed Size       0000010E (270)
014D 0150 0004 BE 01 00 00 Uncompressed Size     000001BE (446)
0151 0152 0002 09 00       Filename Length       0009 (9)
0153 0154 0002 00 00       Extra Length          0000 (0)
0155 0156 0002 00 00       Comment Length        0000 (0)
0157 0158 0002 00 00       Disk Start            0000 (0)
0159 015A 0002 01 00       Int File Attributes   0001 (1)
                           [Bit 0]               1 'Text Data'
015B 015E 0004 00 00 ED 81 Ext File Attributes   81ED0000 (2179792896)
                           [Bits 16-24]          01ED (493) 'Unix attrib: rwxr-xr-x'
                           [Bits 28-31]          08 (8) 'Regular File'
015F 0162 0004 00 00 00 00 Local Header Offset   00000000 (0)
0163 016B 0009 6C 6F 72 65 Filename              'lorem.txt'
               6D 2E 74 78
               74

016C 016F 0004 50 4B 05 06 END CENTRAL HEADER    06054B50 (101010256)
0170 0171 0002 00 00       Number of this disk   0000 (0)
0172 0173 0002 00 00       Central Dir Disk no   0000 (0)
0174 0175 0002 01 00       Entries in this disk  0001 (1)
0176 0177 0002 01 00       Total Entries         0001 (1)
0178 017B 0004 37 00 00 00 Size of Central Dir   00000037 (55)
017C 017F 0004 35 01 00 00 Offset to Central Dir 00000135 (309)
0180 0181 0002 00 00       Comment Length        0000 (0)
#
# Done

Advanced Analysis

If you have a corrupt or non-standard zip file, particulatly one where the Central Directory metadata at the end of the file is absent/incomplete, you can use either the --walk option or the --scan option to search for any zip metadata that is still present in the file.

When either of these options is enabled, this program will bypass the initial step of reading the Central Directory at the end of the file and simply scan the zip file sequentially from the start of the file looking for zip metedata records. Although this can be error prone, for the most part it will find any zip file metadata that is still present in the file.

The difference between the two options is how aggressive the sequential scan is: --walk is optimistic, while --scan is pessimistic.

To understand the difference in more detail you need to know a bit about how zip file metadata is structured. Under the hood, a zip file uses a series of 4-byte signatures to flag the start of a each of the metadata records it uses. When the --walk or the --scan option is enabled both work identically by scanning the file from the beginning looking for any the of these valid 4-byte metadata signatures. When a 4-byte signature is found both options will blindly assume that it has found a vald metadata record and display it.

--walk

The --walk option optimistically assumes that it has found a real zip metatada record and so starts the scan for the next record directly after the record it has just output.

--scan

The --scan option is pessimistic and assumes the 4-byte signature sequence may have been a false-positive, so before starting the scan for the next resord, it will rewind to the location in the file directly after the 4-byte sequecce it just processed. This means it will rescan data that has already been processed. For very lage zip files the --scan option can be really realy slow, so trying the --walk option first.

Important Note: If the zip file being processed contains one or more nested zip files, and the outer zip file uses the STORE compression method, the --scan option will display the zip metadata for both the outer & inner zip files.

Filename Encoding Issues

Sometimes when displaying the contents of a zip file the filenames (or comments) appear to be garbled. This section walks through the reasons and mitigations that can be applied to work around these issues.

Background

When zip files were first created in the 1980's, there was no Unicode or UTF-8. Issues around character set encoding interoperability were not a major concern.

Initially, the only official encoding supported in zip files was IBM Code Page 437 (AKA CP437). As time went on users in locales where CP437 wasn't appropriate stored filenames in the encoding native to their locale. If you were running a system that matched the locale of the zip file, all was well. If not, you had to post-process the filenames after unzipping the zip file.

Fast forward to the introduction of Unicode and UTF-8 encoding. The approach now used by all major zip implementations is to set the Language encoding flag (also known as EFS) in the zip file metadata to signal that a filename/comment is encoded in UTF-8.

To ensure maximum interoperability when sharing zip files store 7-bit filenames as-is in the zip file. For anything else the EFS bit needs to be set and the filename is encoded in UTF-8. Although this rule is kept to for the most part, there are exceptions out in the wild.

Dealing with Encoding Errors

The most common filename encoding issue is where the EFS bit is not set and the filename is stored in a character set that doesnt't match the system encoding. This mostly impacts legacy zip files that predate the introduction of Unicode.

To deal with this issue you first need to know what encoding was used in the zip file. For example, if the filename is encoded in ISO-8859-1 you can display the filenames using the --encoding option

zipdetails --encoding ISO-8859-1 myfile.zip

A less common variation of this is where the EFS bit is set, signalling that the filename will be encoded in UTF-8, but the filename is not encoded in UTF-8. To deal with this scenarion, use the --no-language-encoding option along with the --encoding option.

LIMITATIONS

The following zip file features are not supported by this program:

  • Multi-part/Split/Spanned Zip Archives.

    This program cannot give an overall report on the combined parts of a multi-part zip file.

    The best you can do is run with either the --scan or --walk options against individual parts. Some will contains zipfile metadata which will be detected and some will only contain compressed payload data.

  • Encrypted Central Directory

    When pkzip Strong Encryption is enabled in a zip file this program can still parse most of the metadata in the zip file. The exception is when the Central Directory of a zip file is also encrypted. This program cannot parse any metadata from an encrypted Central Directory.

  • Corrupt Zip files

    When zipdetails encounters a corrupt zip file, it will do one or more of the following

    • Display details of the corruption and carry on
    • Display details of the corruption and terminate
    • Terminate with a generic message

    Which of the above is output is dependent in the severity of the corruption.

TODO

JSON/YML Output

Output some of the zip file metadata as a JSON or YML document.

Corrupt Zip files

Although the detection and reporting of most of the common corruption use-cases is present in zipdetails, there are likely to be other edge cases that need to be supported.

If you have a corrupt Zip file that isn't being processed properly, please report it (see "SUPPORT").

SUPPORT

General feedback/questions/bug reports should be sent to https://github.com/pmqs/zipdetails/issues.

SEE ALSO

The primary reference for Zip files is APPNOTE.TXT.

An alternative reference is the Info-Zip appnote. This is available from ftp://ftp.info-zip.org/pub/infozip/doc/

For details of WinZip AES encryption see AES Encryption Information: Encryption Specification AE-1 and AE-2.

The zipinfo program that comes with the info-zip distribution (http://www.info-zip.org/) can also display details of the structure of a zip file.

AUTHOR

Paul Marquess [email protected].

COPYRIGHT

Copyright (c) 2011-2024 Paul Marquess. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

zipdetails's People

Contributors

atoomic avatar demerphq avatar pmqs avatar ugexe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

5saviahv demerphq

zipdetails's Issues

Complains about length of `0x5855` data field.

Why is zipdetails happy with a length 12 0x5855 extensible data field in the local header:

000000025 Extra ID #0001        5855 'UX: Unix Extra type 1'
000000027   Length              000C
000000029   Access Time         62DF1FC7 'Mon Jul 25 15:57:11 2022'
00000002D   Mod Time            62DF0738 'Mon Jul 25 14:12:24 2022'
000000031   UID                 01F5
000000033   GID                 0014

but then complains about the exact same field in the central directory?

CC90F4A05 Extra ID #0001        5855 'UX: Unix Extra type 1'
CC90F4A07   Length              000C
# ERROR: Offset 0xCC90F4A07: 'Length' field in 'Extra ID' 0x5855 (Unix Extra type 1) invalid: expected 0x8, got 0xC

weak encryption + streaming does not set local CRC value to zero

appnote section 4.4.4 says this about the CRC value when streaming is enabled

    Bit 3: If this bit is set, the fields crc-32, compressed 
           size and uncompressed size are set to zero in the 
           local header.  The correct values are put in the 
           data descriptor immediately following the compressed
           data.  (Note: PKZIP version 2.04g for DOS only 
           recognizes this bit for method 8 compression, newer 
           versions of PKZIP recognize this bit for any 
           compression method.)

In practice, if weak encryption is enabled along with streaming, the CRC value is not set to set to zero. Below is from Info-ZIP source code where it sets the CRC value when weak encryption is enabled

        /* Traditional encryption with an extended header implies that
          * we use (the low 16 bits of) the MS-DOS modification time
          * instead of the real (unknown) CRC as the "CRC" for the
          * pseudo-random seed datum.  (The high 16 bits of the "CRC"
          * are used in the Traditional encryption header.)
          *
          * This use of file time as the CRC and then use of a data
          * descriptor to hold the CRC is not standard, but is readable
          * by various utilities out there.  We do it this way to avoid
          * reading a file more than once and to support streaming.

The zipdetails code currently expects the appnote behaviour. That needs to change so that it uses the de-facto standard that is used by the implementations.

Error decoding `Xceed Unicode extra field` (0x0x554e)

1. Overview

Although I haven't actually run it, I found an error in the way his subroutine decode_Xceed_unicode in the script zipdetails was handled, so I would like to report it.

2. Extra field 0x554e format

As a result of experiments and analysis, it appears that the format of the extra field 0x0x554e is as follows.

2,1 For central directory headers

offset (bytes) length (bytes) value
0 4 signature (0x5843554e)
4 2 Half the number of bytes in the entry name encoded in UTF-16
6 2 Half the number of bytes in the comment encoded in UTF-16
8 (4th byte value) * 2 byte array of entry name encoded in UTF-16
8 + (4th byte value) * 2 (6th byte value) * 2 Byte array of comment encoded in UTF-16

2.2 For local headers

offset (bytes) length (bytes) value
0 4 signature (0x5843554e)
4 2 Half the number of bytes in the entry name encoded in UTF-16
6 (4th byte value) * 2 byte array of entry name encoded in UTF-16

3. About the contents of script zipdetails

The comment of the subroutine decode_Xceed_unicode in the script zipdetails/bin/zipdetails says Found the Null prefix.
My guess is that the reason why there appears to be a 2-byte null at the beginning of the entry name is probably because the comment length field in the 8th byte in "2.1 For central directory headers" happens to be 0.

I don't understand the perl language very well, so I don't know how to fix it. sorry.

add new Info-ZIP extra fields


          Value           Size       Description
          -----           ----       -----------
 (Stream) 0x6C78          2 bytes    Tag for this extra block type ("xl")
          Size            2 bytes    Data size for this block
          Bitmap          m bytes    Determines which fields below this
                                     point are included
          Version Made By 2 bytes    As in Central Directory File Header
          Int File Attrs  2 bytes    As in Central Directory File Header
          Ext File Attrs  4 bytes    As in Central Directory File Header

          This extra block is used to include information in the local file
          header that previously has only been included in fields in the
          central directory file header for this entry.  This extra field
          is intended for use only in the local header, not the central
          directory.

          The information in this local extra field must match the information
          in the central directory header for this entry, or this extra field
          should be considered invalid.  The purpose of this local extra field
          is to provide in the local header a copy of central directory
          information required for the proper extraction of entries so that
          entries can be extracted as they appear in the stream.

          If this streaming extra field is present in the central directory
          header, the information must match the information already in the
          central directory header as well as information in the local header
          streaming extra field, if present.


          The bitmap identifies which fields actually appear in this block.
          Fields always appear in the exact order listed in the bitmap,
          starting from byte zero/bit zero.  If fields are added to this extra
          block in the future, they will appear at the location (in the order)
          indicated by the respective bit in the bitmap.  A 1 at a bit position
          in the bitmap indicates presence of the corresponding field block,
          while a 0 indicates the field block is absent.  This allows for
          removing obsolete fields in the future.  A map bit may map to
          multiple fields, where a 1 means those fields are present.  The size
          of each field block must be determined, either as a fixed size field
          or by including a length count at the start of the field block.

          The bitmap consists of as many bytes as needed to define the
          contents of this extra block.  For each byte in the bitmap, bit 7 is
          1 if and only if there is a following byte in the bitmap.  Bit 7
          will be 0 in the last byte of the bitmap.  Currently only one map
          byte is used, so bit 7 of that byte is 0.  Bitmap bytes are stored
          in order, starting with Byte 0 immediately following Size.  For
          example, for a three byte BitMap:

            +--+--+--+--+--+--+--+--+
            |  Tag                  |
            +--+--+--+--+--+--+--+--+
            |  Tag                  |
            +--+--+--+--+--+--+--+--+
            |  Size                 |
            +--+--+--+--+--+--+--+--+
            |  Size                 |
            +--+--+--+--+--+--+--+--+
            Bitmap (showing locations of bit numbers 0, 1, and 2):
            +--+--+--+--+--+--+--+--+
            | 1|  |  |  |  | 2| 1| 0|  Byte 0
            +--+--+--+--+--+--+--+--+
            | 1|  |  |  |  |  |  |  |  Byte 1
            +--+--+--+--+--+--+--+--+
            | 0|  |  |  |  |  |  |  |  Byte 2
            +--+--+--+--+--+--+--+--+
              ^
              |
              +-------- Bit 7

            +--+--+--+--+--+--+--+--+
            |  Version Made By      |
            +--+--+--+--+--+--+--+--+
            ...

          The current field blocks and bitmap mappings are shown below.
          (MB = Map Byte (numbered starting from 0), BN = Bit Number,
          MV = Mask Value and FBS = Field Block Size = number of bytes in
          this field block when present.)  Currently only one byte is
          needed for the Bitmap.  This one byte map consists of the
          following bits ORed together:

            MB  BN   MV  FBS Description
            --  --   --  --- ------------
             0   0    1   2  "version made by" field is included
             0   1    2   2  "internal file attributes" field is included
             0   2    4   4  "external file attributes" field is included

          The bitmap to include all these fields would be the one byte (bit 0
          on the right):

            00000111

          A user of this extra block should verify that this is a correctly
          formatted block by summing the expected sizes (FBS) of each present
          field block, adding to this the size of the bitmap in bytes, and
          comparing to Size.  If they do not match, this extra block should
          not be used.



         -Info-ZIP Placeholder Extra Field:
          ================================

          This extra field holds no data.  It is only used to reserve space
          in the local header extra field block for possible use by another
          extra field.  Currently this is only used to reserve space for the
          Zip64 local extra field.  No data should ever be stored in this
          extra field and it should always be ignored.
          (Last Revision 20150605)

          Value         Size        Description
          -----         ----        -----------
  (PHold) 0x4850        Short       tag for this extra block type ("PH")
          TSize         Short       total data size for this block
          Data          Variable    see below

          The size of this extra field is selected to match the size of the
          extra field it is reserving space for.  The purpose of this extra
          field is to reserve space in the extra field block so that another
          extra field can use the space later.

          The main use currently is to reserve space for the Zip64 local
          header extra field when the size of the input data is not known
          (for instance, when the input is a stream).  If, once the data is
          read, Zip64 is needed, this Placeholder extra field can be replaced
          by the Zip64 local extra field.  If not, the Placeholder remains
          and should be ignored.  This allows rewriting the local header
          after data is read without it changing size.

          Though the contents of this extra field should never be used and so
          does not need to be specified, it is recommended that the Data field
          be written with all zero (character code 0x00) bytes.

iOS ipa files

These are zip file. Some use compression method 99 to store a payload compressed with LZFSE

Method 99 is already registered in APPNOTE for AES encryption.

May be able to infer LZFSE if it has a signature.

Unzip detects overlap with APK file

test-services-1.1.0.apk sourced from https://issues.apache.org/jira/browse/COMPRESS-562

unzip doesn't like it

$ unzip -t !$
unzip -t test-services-1.1.0.apk
Archive:  test-services-1.1.0.apk
error [test-services-1.1.0.apk]:  missing 237 bytes in zipfile
  (attempting to process anyway)
error: invalid zip file with overlapped components (possible zip bomb)
$ zipdetails -v test-services-1.1.0.apk


00000 00004 50 4B 03 04 LOCAL HEADER #1       04034B50
Can't use an undefined value as an ARRAY reference at /media/paul/Linux-Shared/base/perl/ext/zipdetails/main/bin/zipdetails line 831.

Removal of 32 bit support break core perl i386 test t/porting/utils.t

See also:
#7
Perl/perl5#19618
pmqs/IO-Compress#45
Perl/perl5#19617

Current core fails test on 32 bit architectures with:

# Failed test 83 - utils/zipdetails compiles at porting/utils.t line 85
#      got "Integer overflow in hexadecimal number at utils/zipdetails line 1432.
Integer overflow in hexadecimal number at utils/zipdetails line 2247.
Integer overflow in hexadecimal number at utils/zipdetails line 2248.
Integer overflow in hexadecimal number at utils/zipdetails line 2249.
Integer overflow in hexadecimal number at utils/zipdetails line 2250.
Integer overflow in hexadecimal number at utils/zipdetails line 2251.
Integer overflow in hexadecimal number at utils/zipdetails line 2252.
Integer overflow in hexadecimal number at utils/zipdetails line 2253.
Integer overflow in hexadecimal number at utils/zipdetails line 2254.
utils/zipdetails syntax OK
"
# expected "utils/zipdetails syntax OK
"
t/porting/utils .................................................. FAILED at test 83

I created Perl/perl5#19618 which detects the warning produced by this patch.

Add more test for directory entries

  • Check for trailing "/" when External Attributes has directory flag(s) set.
  • Check if directory bit(s) set in External Attributes when training "/" is present.
  • Check that directory entry does not have uncompressed payload (APPNOTE 6.3.10, sec 4.3.8)
  • Check if compressed payload present & uncompresses to zero bytes.
  • Check that Extract Version is set to 2.0 or greater (APPNOTE 6.3.10, sec 4.4.3.2)

Notes for behaviour of unzip executables when dealing with directories without the trailing "/"

  • On Linux Info-ZIP 6.0 will extract it to an empty file
  • On Linux 7z & bsdtar (aka libarchive) extract it as a directory if the Unix bit is set, but not if the only the dos flag is set

Fails if timezone is not UTC

It seems that there are timezone-related test failures in App-zipdetails-2.105. E.g. with TZ=Europe/Berlin there are differences in the "Last Mod Time" output:

# 000A Last Mod Time         5277983D 'Tue Mar 23 18:01:58 2021'

vs

# 000A Last Mod Time         5277983D 'Tue Mar 23 19:01:58 2021'

NTFS Timestamps displayed in wrong order

ZIPDETAILS 6.3.9 section 4.5.3 says the order should be modification, access, creation

         Tag        Size       Description
         -----      ----       -----------
         0x0001     2 bytes    Tag for attribute #1 
         Size1      2 bytes    Size of attribute #1, in bytes
         Mtime      8 bytes    File last modification time
         Atime      8 bytes    File last access time
         Ctime      8 bytes    File creation time

the code fetches & displays the timestamps in order modification, creation, access.

Decoding of DOS Attributes in External Attributes is wrong & incomplete

First 16 bits of the decoded External Attributes in Central header should map to DOS attributes. Decoding of this field uses an incorrect and incomplete bitmask

Code uses 0x0100 for Offline, 0x200 for Not Indexed & and 0x400 for Encrypted. Values should be 0x1000, 0x2000 and 0x4000 respectively.

The field names for the 0x0100, 0x0200, 0x0400 and 0x0800 value should be Temporary, Sparse, Reparse Point and Compressed.

In practice the 0x4000 and 0x8000 bits are used for a different purpose.

7z/p7zip uses the 0x8000 bit to signal that the high 16-bits are Unix attributes.

Mac/iOS uses 0x4000 bit for some unknown purpose.

References

Modifying header

Hi
Do you plan to add the possibility of modifying the header? Like atime, ctime, mtime etc.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.