Archiving and compression

The traditional Unix archiving and compression tools are separated according to the Unix philosophy:

  • A file archiver combines several files into one archive file, e.g. tar.
  • A compression tool compresses and decompresses data, e.g. gzip.

These tools are often used in sequence by firstly creating an archive file and then compressing it.

Of course there are also tools that do both, which tend to additionally offer encryption, error detection and recovery.

Archiving only

NamePackageManualsDescription
GNU tartartar(1), infoCore utility for manipulating the ubiquitous tar archives (tarballs).
libarchivelibarchivebsdtar(1)
bsdcpio(1)
Implementation of tar and cpio that also offers a library. Used by pacman and mkinitcpio.
arbinutilsar(1)Legacy Unix archiver before tar. Today only used for creating static library files.
GNU cpiocpiocpio(1). infoFile archiver via stdin/stdout, supports cpio and tar formats.
Tip: Both GNU and BSD tar automatically do decompression delegation for bzip2, compress, gzip, lzip, lzma, lzop, zstd, and xz compressed archives. Only BSD tar supports lz4 natively (but GNU tar can do an equivalent with --use-compress-program=lz4/-Ilz4). When creating archives both support the -a switch to automatically filter the created archive through the right compression program based on the file extension. While BSD tar recognizes compression formats based on the format, GNU tar only guesses based on the file extension.

See also #Archiving only usage.

Compression tools

Compression only

These compression programs implement their own file format.

NamePackageManualExtTar extDescriptionParallel implementations
bzip2bzip2bzip2(1).bz2, .bz.tbz2, .tbzUses the Burrows–Wheeler algorithm.lbzip2, pbzip2
bzip3bzip3bzip3(1).bz3.tbz3Uses the Burrows–Wheeler algorithm.
gzipgzipgzip(1).gz, .z.tgz, .tazGNU zip, based on DEFLATE algorithm.pigz, bgzip(1) (part of htslibAUR), crabzAUR, python-rapidgzipAUR
lrziplrziplrzip(1).lrzImproved version of rzip, uses multiple algorithms.Is multithreaded by default
LZ4lz4lz4(1).lz4Written in C, focused on compression and decompression speed.Is multithreaded by default. See https://lz4.org/ for alternatives.
lziplziplzip(1).lzUses LZMA.plzipAUR
lzoplzoplzop(1).lzo.tzoUses the LZO library (lzo).
xzxzxz(1).xz, .lzma.txz, .tlzUses LZMA2. Default for GNU coreutils and kernel archive files.Is multithreaded by default. An alternative is pixz.
zstdzstdzstd(1).zstUses Zstandard algorithm.is multithreaded
  • Parallel implementations offer improved speeds by using multiple CPU cores.
  • Tar extensions refers to compressed archives where tar and the compression tool is used, e.g. .tzo is .tar.lzo.
  • See also #Compression only usage.

Archiving and compression

NamePackagesManualsExtDescription
7-Zip7zipOfficial manual.7zA file archiver with a high compression ratio.
DARdarAURdar(1).darArchiver to backup large live filesystems, takes care of hard links, extended attributes, sparse files and inode types.
tartartar(1), info.tar.compression-typetar has built-in compression options. See tar(1) §Compression_options.
RARrarAUR, unrarrar(1).rarBoth the format and the rar utility are proprietary.
t2szt2szAUR.tar.zst .tzstTar archiving utility in C with member-aligned zstd-compression
tarlztarlzAURtarlz(1).tar.lz .tlzTar archiving utility in C++ with member-aligned lzip compression
ZIPzip, unzipzip(1), unzip(1).zipWidely used outside of the Linux world.
Unarchiverunarchiverunar(1), lsar(1)manyCommand-line tool of a Mac application, supports over 40 archive formats.
ZPAQzpaqAURzpaq(1).zpaqA high compression ratio archiver written in C++, uses several algorithms.
LHalhasa, lhaAURlha(1).lzh (on Amiga: .lha)LZH/LHA archiver, supports the lh7-method.
WinAceunaceunace(1).aceBoth the ACE file format and the archiving tool are proprietary.

See also #Archiving and compression usage.

Feature charts

Some of the tools above are capable of handling multiple formats, allowing for fewer installed packages.

Decompress

NameFileArchive
gzipbzip2LZMAxzzstdZIPRAR7zCAB
gzipYesNoNoNoNoPartial1NoNoNo
7zipYesYesYesYesYesYesYesYesYes
unarchiverYesYesYesYesNoYesYesYesYes
zstdYesNoYesYesYesNoNoNoNo
  1. gzip's gunzip can only decompress single member ZIP files.

Usage comparison

Archiving only usage

NameCreate archiveExtract archiveList content
tar(1)tar cfv archive.tar file1 file2tar xfv archive.tartar -tvf archive.tar
cpio(1)ls file1 file2 | cpio -o > archive.cpiocpio -i -vd < archive.cpiocpio -t < archive.cpio

Compression only usage

NameCompressDecompressDecompress to stdout
bzip2(1)bzip2 filebzip2 -d file.bz2bzcat file.bz2
gzip(1)gzip filegzip -d file.gzzcat file.gz
lrzip(1)lrzip file
lrztar folder
lrzip -d file.lrz
lrztar -d folder.tar.lrz
lrzcat file.lrz
lz4(1)lz4 filelz4 file.lz4lz4cat file.lz4
xz(1)xz filexz -d file.xzxzcat file.xz
zstd(1)zstd filezstd -d file.zstzstdcat file.zst

Archiving and compression usage

NameCompressDecompressDecompress to stdoutList content
7z7z a archive.7z file1 file27z x archive.7z7z e -so archive.7z file17z l archive.7z
dar(1)dar -c archive -g file1 -g file2dar -x archivedar -l archive
tar(1)tar acvf archive.format file1 file2 tar xfv archive.formattar xfvO archive.formattar tvf archive.format
rar(1)rar a archive.rar file1 file2rar x archive.rarrar p -inul archive.rar file1rar l archive.rar
zip(1), unzip(1)zip archive.zip file1 file2unzip archive.zipunzip -p archive.zip file1unzip -l archive.zip
lha(1)lha ao7 archive.lzh file1 file2lha x archive.lzhminimal: lha l archive.lzh verbose: lha v archive.lzh

Convenience tools

  • atool Script for managing file archives of various types.
https://www.nongnu.org/atool/ || atool
  • dtrx An intelligent archive extraction tool.
https://github.com/brettcs/dtrx || dtrxAUR
  • J7Z GUI for Linux in java which attempts to simplify data compression and backup. It can create 7z, BZip2, Zip, GZip, Tar archives.
http://j7z.xavion.name || j7zAUR
  • ouch A command line utility for easily compressing and decompressing files and directories
https://github.com/ouch-org/ouch || ouch
  • python-unp Command line tool that can unpack archives easily.
https://github.com/mitsuhiko/unp || python-unpAUR
  • unp A script for unpacking a wide variety of archive formats.
https://tracker.debian.org/pkg/unp || unp
  • unpack Wrapper script for handling multiple archive formats.
https://github.com/githaff/unpack || unpack-gitAUR
  • patool Allows various archive types to be created, extracted, tested, listed, compared, searched and repacked.
https://wummel.github.io/patool/ || patoolAUR

Determining archive format

To extract an archive, its file format needs to be determined. If the file is properly named you can deduce its format from the file extension.

Otherwise you can use the file tool, see file(1).

Esoteric, rare or deprecated tools

NamePackagesExtDescription
ARCarcAUR.arc, .arkWas very popular during the early days of the dial-up BBS. Superseded by ZIP.
ARJarj.arjAn archiver used on DOS/Windows in mid-1990s. This is an open source clone.
Cabinetcabextract, unshield.cab, .exeA variety of installation technologies in Windows use the CAB format.
compressncompress.ZThe de facto standard UNIX compression utility to success the Huffman-based pack(1) before gzip become a thing.
Inno Setupinnoextract.exeInstallers created by Inno Setup.
PAR2par2cmdline.par2Parity archiver for increased data integrity. See also Parchive.
sharsharutils.sharCreates self-extracting archives that are valid shell scripts.
ZoozooAUR.zooWas mostly popular on the OpenVMS operating system before PKZIP became popular.

File system compression

Some file systems support on-the-fly compression of file data:

  • Btrfs can be configured to compress individual files, directories, or entire volumes by default.
  • On ZFS, compression can be enabled on pools or file systems.

Device mapper compression

There is work being done to mainline (integrate into the Linux kernel project) the open-sourced VDO project, which provides a deduplication and compression device mapper layer in the interest of increasing storage efficiency. The following packages are available:

  • vdo Userspace tools for managing VDO volumes
https://github.com/dm-vdo/vdo || vdoAUR
  • kvdo A pair of kernel modules which provide pools of deduplicated and/or compressed block storage
https://github.com/dm-vdo/kvdo || kvdo-dkmsAUR[broken link: package not found]

Compression libraries

  • Brotli Compression algorithm for data streams using the LZ77 algorithm, Huffman coding and 2nd order context modeling.
https://github.com/google/brotli || brotli
  • libzip Provides creation and extraction of ZIP files. Used by KDE and Deepin in place of the zip/unzip tools.
https://libzip.org || libzip
  • zlib Compression library implementing the deflate compression method found in gzip and PKZIP.
https://www.zlib.net/ || zlib
  • Zopfli High compress ratio file compressor from Google, using a deflate-compatible algorithm called zopfli.
https://github.com/google/zopfli || zopfli-gitAUR

Troubleshooting

Garbled file names when extracted

See Character encoding#Troubleshooting.

See also

This article is issued from Archlinux. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.