InfoJunkie4Life
Dec 21, 2009, 05:48 AM
Hey, guys. I just had this thought. What if you could create an algorithm that would compress data, and upon retrieval could be expanded into several possible files. I have this theory that if you used a polynomial like algorithm you could take data and compress it with much higher ratios. However, even with this, you would have several possible results on decompression. If you think about this there may be ways around it.
If the file was sequentially compressed, then the compression agent could keep a database of known file headers, much like a data recovery program. It would then decompress only the header of the file for each of its possible variations. In turn it would compare it to the known headers until it found a match. If you wanted you could even leave a decompressed header as part of the file, and a SHA-2 hash to compare it to. This would prevent a known header from randomly popping up, being the wrong one.
Using only those techniques I imagine the error rate would be extremely small, even for a poly-type to the 1024th power. It would be relatively fast compared to density. I would also expect this not to be fool proof. People may use something like this to send viruses. And there would be some fault, on occasion (maybe one in every trillion or so decompression's) that a file fits the SHA-2 hash, has the right header and is nothing but golboldy-gook. That is why the program would have to include an "Exclude" button. Meaning it will disregard that decompression, and keep scanning for the next legitimate header.
The reason for my theory. If the same set of poly-type algorithms are used I think they will behave much like polynomials. You figure, what if you have (10) 512 based polynomial equations. All of the factors associated with each are non-overlapping. Then you could take one number (a few digits) and expand it into this massive equation. Now, given there are 512 answers, you would associate a letter with one of the numbers. This way, unless you get a letter too, you know it's the wrong one.
If the file was sequentially compressed, then the compression agent could keep a database of known file headers, much like a data recovery program. It would then decompress only the header of the file for each of its possible variations. In turn it would compare it to the known headers until it found a match. If you wanted you could even leave a decompressed header as part of the file, and a SHA-2 hash to compare it to. This would prevent a known header from randomly popping up, being the wrong one.
Using only those techniques I imagine the error rate would be extremely small, even for a poly-type to the 1024th power. It would be relatively fast compared to density. I would also expect this not to be fool proof. People may use something like this to send viruses. And there would be some fault, on occasion (maybe one in every trillion or so decompression's) that a file fits the SHA-2 hash, has the right header and is nothing but golboldy-gook. That is why the program would have to include an "Exclude" button. Meaning it will disregard that decompression, and keep scanning for the next legitimate header.
The reason for my theory. If the same set of poly-type algorithms are used I think they will behave much like polynomials. You figure, what if you have (10) 512 based polynomial equations. All of the factors associated with each are non-overlapping. Then you could take one number (a few digits) and expand it into this massive equation. Now, given there are 512 answers, you would associate a letter with one of the numbers. This way, unless you get a letter too, you know it's the wrong one.