meta: id: gzip file-extension: gz xref: forensicswiki: gzip justsolve: Gzip mime: application/gzip pronom: x-fmt/266 rfc: 1952 wikidata: Q10287816 license: CC0-1.0 endian: le doc: | Gzip is a popular and standard single-file archiving format. It essentially provides a container that stores original file name, timestamp and a few other things (like optional comment), basic CRCs, etc, and a file compressed by a chosen compression algorithm. As of 2019, there is actually only one working solution for compression algorithms, so it's typically raw DEFLATE stream (without zlib header) in all gzipped files. doc-ref: https://www.rfc-editor.org/rfc/rfc1952 seq: - id: magic -orig-id: ID1, ID2 contents: [0x1f, 0x8b] - id: compression_method -orig-id: CM type: u1 enum: compression_methods doc: | Compression method used to compress file body. In practice, only one method is widely used: 8 = deflate. - id: flags -orig-id: FLG type: flags - id: mod_time -orig-id: MTIME type: u4 doc: Last modification time of a file archived in UNIX timestamp format. - id: extra_flags -orig-id: XFL type: switch-on: compression_method cases: 'compression_methods::deflate': extra_flags_deflate doc: Extra flags, specific to compression method chosen. - id: os -orig-id: OS type: u1 enum: oses doc: OS used to compress this file. - id: extras type: extras if: flags.has_extra - id: name terminator: 0 if: flags.has_name - id: comment terminator: 0 if: flags.has_comment - id: header_crc16 type: u2 if: flags.has_header_crc - id: body size: _io.size - _io.pos - 8 doc: | Compressed body of a file archived. Note that we don't make an attempt to decompress it here. - id: body_crc32 -orig-id: CRC32 type: u4 doc: | CRC32 checksum of an uncompressed file body - id: len_uncompressed -orig-id: ISIZE type: u4 doc: | Size of original uncompressed data in bytes (truncated to 32 bits). enums: compression_methods: 8: deflate oses: 0: id: fat doc: FAT filesystem (MS-DOS, OS/2, NT/Win32) 1: id: amiga doc: Amiga 2: id: vms doc: VMS (or OpenVMS) 3: id: unix doc: Unix 4: id: vm_cms doc: VM/CMS 5: id: atari_tos doc: Atari TOS 6: id: hpfs doc: HPFS filesystem (OS/2, NT) 7: id: macintosh doc: Macintosh 8: id: z_system doc: Z-System 9: id: cp_m doc: CP/M 10: id: tops_20 doc: TOPS-20 11: id: ntfs doc: NTFS filesystem (NT) 12: id: qdos doc: QDOS 13: id: acorn_riscos doc: Acorn RISCOS 255: id: unknown types: flags: seq: - id: reserved1 type: b3 - id: has_comment -orig-id: FCOMMENT type: b1 - id: has_name -orig-id: FNAME type: b1 - id: has_extra -orig-id: FEXTRA type: b1 doc: If true, optional extra fields are present in the archive. - id: has_header_crc -orig-id: FHCRC type: b1 doc: | If true, this archive includes a CRC16 checksum for the header. - id: is_text -orig-id: FTEXT type: b1 doc: | If true, file inside this archive is a text file from compressor's point of view. extra_flags_deflate: seq: - id: compression_strength type: u1 enum: compression_strengths enums: compression_strengths: 2: best 4: fast extras: seq: - id: len_subfields -orig-id: XLEN type: u2 - id: subfields size: len_subfields type: subfields subfields: doc: | Container for many subfields, constrained by size of stream. seq: - id: entries type: subfield repeat: eos subfield: doc: | Every subfield follows typical [TLV scheme](https://en.wikipedia.org/wiki/Type-length-value): * `id` serves role of "T"ype * `len_data` serves role of "L"ength * `data` serves role of "V"alue This way it's possible to for arbitrary parser to skip over subfields it does not support. seq: - id: id -orig-id: SI1, SI2 type: u2 doc: Subfield ID, typically two ASCII letters. - id: len_data type: u2 - id: data size: len_data