Yin and Yang of Data Compression

- February 21, 2018

I plan to write about data compression more in the future but for now I just wanted to share a quick diagram I drew up for a colleague.

Any time you're considering data compression on a table, two primary factors come into play. How much will it save you in disk space vs how much I/O will

Y axis: Space Saved
X axis: I/O

On the Y axis, we have the space savings. You might have a table that's 10GB and but after compressing it, its 1GB. You might have another table that's 1TB and compressed it's 700GB. Clearly the first shrinks by a larger percent, but the second saves much more space.

On the X axis, we have table IO. Compressed data has to be decompressed when it's read or modified, and it's all done by the CPU. So the more I/O going on in a compressed table, the more pressure it will put on your CPU.

So how can we understand the diagram, strictly from the standpoint of these two opposing forces?

Tables in the green area you should almost always compress. These are large tables with big space savings and low I/O. The quintessential example here is an archive table; maybe something that's there for compliance, but is not really ever used otherwise. Shrink it and forget about it.

Next the tables in the red area are tables you should almost never compress. These are tables with very volatile data; something like an OLTP table with rows being continuously read and modified, maybe from many connections. The high I/O workload will be taxing on your CPU. Your mileage may vary. The other tables in this section are tables which may not have much I/O, but really won't gain you anything by compressing. IMHO you shouldn't jump straight to compressing everything just because. Prematurely “optimizing” will get you into trouble.

And then the middle section, which, in reality, is where most of your real life examples will probably lie. For those, my dear reader, you'll need resources beyond what this humble article can supply. However even with a broader and deeper understanding of compression, these two major forces must always be in balance. Learn to master them, Jedi.

Search This Blog

Towering Queries

Migrate SQLPrompt Snippets to VSCode

Yin and Yang of Data Compression

Comments

Post a Comment

Popular posts from this blog

Master Data Services on Windows 10

Fixing Git "Unexpected Disconnect while reading sideband packet"

Powershell Script: to Create Git Repo with .gitignore