When writing a DMG emulator I thought about how these tiny sprites are often similar. A sprite will be a series of bytes with an address to the beginning.
It got me thinking about a kind of compression algorithm could find byte overlaps and re-pack the sprites, and then adjust the addresses.
This got me thinking: are there any compression algorithms that do not require decompression? You pack them once and then just point at slices.
Then this got me thinking: is that just what unzipping is doing, just eagerly?
That is how basic Lempel-Ziv compression works, which is one of the two fundamental techniques used in data compression. You keep a history of what you've already decompressed, and if you find something you already have, you instead encode a pointer to it. Variants of this (LZW, LZ77, LZSS, and a bajillion custom variants, because game developers like to reinvent their own compression) have been used in many, many games.
The other is entropy coding, which means re-writing bytes and pointer codes into variable length values that are optimized by frequency: fewer bits for common codes, more bits for uncommon ones. The most basic approach for this is Huffman coding, which is kind of like variable length binary phone numbers.
Together, these two techniques are how DEFLATE compression works, which is the algorithm used by zip, gzip, and zlib. zlib is then used in many, many formats, like PNG and PDF.
Trivial "look for dupes and subsets and just change pointers" compression is already standard when building C code. If your program has two strings, "foobar" and "bar", the linker will just store "foobar" in the strings section and point 3 bytes into it for "bar", assuming the right compilation/linking options to make it work.
There was a similar trick used for compressing game backgrounds in Commodore 64 games.
The compressor would create a special font (C64 "charsets" are 8x8 pixels), and slice the picture into 8x8 tiles where each block corresponds to an ASCII character. Although C64's hires resolution was 320x200 (160x200 in multicolor mode), because of repeating blocks it would usually end up using less than 256 characters. The actual background would then be drawn using those characters with the special font. There is no eager decompression involved.
For the n+1 cart, it was always the smallest diff between n and any remaining carts.
Picking the first n, was more difficult. A strong hint was if the ROM name contained a variant of the word Japan. Otherwise, imagine you had 4 images, you could diff them 16 ways, but half of those ways are mirror images of each other ie. a diff of ROM 2->4 is the inverse of of a diff of ROM 4->2.
So ideally, the first ancestor would be the image with the most amount of the smallest children.
Iwata famously implemented RLE in Pokemon Gold/Silver. That gains an easy 50% from all the whitespace.
Huffman coding would be the simplest algorithm to compress the repeated patterns in sprites, but I think even loading up the dictionary to decompress would run into memory limitations on these old systems. If nothing else it would cause hitching.
Yes. Check out succinct data structures. They are closely related to compression algorithms. The above mentioned compression algorithms (incl. Entropy and all LZ variants) need a decompression step. They might try to improve the decompression speed by chunking and sacrifice storage space, but decompression is needed. This is unrelated to how they remove redundant information from the data (e.g. using pointers). Succinct data structures try to avoid decompression at all. Again, at the cost of storage space but some operations can be done on the succinct data directly as it is on disk.
It got me thinking about a kind of compression algorithm could find byte overlaps and re-pack the sprites, and then adjust the addresses.
This got me thinking: are there any compression algorithms that do not require decompression? You pack them once and then just point at slices.
Then this got me thinking: is that just what unzipping is doing, just eagerly?