Compression Algorithms [Archive]

View Full Version : Compression Algorithms

InfoJunkie4Life

Dec 21, 2009, 05:48 AM

Hey, guys. I just had this thought. What if you could create an algorithm that would compress data, and upon retrieval could be expanded into several possible files. I have this theory that if you used a polynomial like algorithm you could take data and compress it with much higher ratios. However, even with this, you would have several possible results on decompression. If you think about this there may be ways around it.

If the file was sequentially compressed, then the compression agent could keep a database of known file headers, much like a data recovery program. It would then decompress only the header of the file for each of its possible variations. In turn it would compare it to the known headers until it found a match. If you wanted you could even leave a decompressed header as part of the file, and a SHA-2 hash to compare it to. This would prevent a known header from randomly popping up, being the wrong one.

Using only those techniques I imagine the error rate would be extremely small, even for a poly-type to the 1024th power. It would be relatively fast compared to density. I would also expect this not to be fool proof. People may use something like this to send viruses. And there would be some fault, on occasion (maybe one in every trillion or so decompression's) that a file fits the SHA-2 hash, has the right header and is nothing but golboldy-gook. That is why the program would have to include an "Exclude" button. Meaning it will disregard that decompression, and keep scanning for the next legitimate header.

The reason for my theory. If the same set of poly-type algorithms are used I think they will behave much like polynomials. You figure, what if you have (10) 512 based polynomial equations. All of the factors associated with each are non-overlapping. Then you could take one number (a few digits) and expand it into this massive equation. Now, given there are 512 answers, you would associate a letter with one of the numbers. This way, unless you get a letter too, you know it's the wrong one.

Perito

Dec 21, 2009, 07:05 AM

Look at the ZIP File Format:

ZIP (file format) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/ZIP_%28file_format%29)

Or the 7Zip file format:

7z (file format) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/7z)

Both of these formats achieve very high compression ratios.

I'm afraid your thought process isn't clear enough to follow at this point. See if you can come up with some examples of what you're talking about.

InfoJunkie4Life

Jan 8, 2010, 01:22 PM

Let me explain this a little better. Lossless compression may be possible to far greater extents than .7z or any other known. The error rate will be higher due to the possibilities.

Current compression algorithms don't really use encryption, just sequentially finding patterns and reducing them.

There are several types of hashing algorithms that have founded reversals. MD5 has been cracked, so certain passwords hashed this way can be retrieved.

Now, say we take the md5 (or some other hashing) of a very large file and attempt to reverse it. There may be as many as 1024 possible outcomes for files. Each of them having the same md5 sum, and separate in data.

In order to reduce the results down, there could also be included an sha hash to verify the output data. If both line up then, there is a very small window for error.

To increase decompression time, the file could be resolved from blocks and partial comparisons, meaning only part of the file would have to be decompressed, and compared to find the correct filetype for comparison.

Say the resulting file was successfully decompressed, and the resultant was not the same as the input, then the user would be allowed to select an error correct button. This would say try yet another set of decryptions, and ignore this exact one. A several gig file could possibly be reduced to a few hundered kb's this way.