With tax data breaches back in the headlines due to recent Congressional hearings, the IRS is moving to reassure FATCA subjects that they have absolutely no reason to be concerned about data security & privacy and that they should ignore all the wreckers making defeatist comments. Jim Calvin over at FSI Tax Posts notes last week’s update to the IRS’ FATCA International Data Exchange Service FAQ — made a few days after the Senate hearings — which includes this gem. (Hat tip: bubblebustin in comments).
E16. I read in a recent cybersecurity blog that there is a concern the encryption standards being used for FATCA data are no longer current. Is this correct?
No. The encryption process used to protect your FATCA data was assessed by the IRS prior to granting FATCA-related information technology systems the authority to operate. The IRS also assessed a number of security controls which are documented in NIST Special Publication 800-53 Revision 4. The IRS would not have approved IDES for use in transmitting tax information otherwise.
Jim was too polite to draw attention to the absurdity of the IRS’ claim. AES, when used properly, is secure. Unfortunately the IRS is using it improperly. The specific issue here, discussed by that “recent cybersecurity blog” (Schneier on Security back in February), is that the IRS’ IDES User Guide recommends (at page 89, and on the IDES website itself), for “compatibility”, the use of the insecure Electronic Codebook (ECB) cipher mode for encrypting FATCA data.
How insecure is ECB? Wikipedia has a rather dramatic illustration using Tux the Penguin. This image and others like it come up quite prominently on Google if you can be bothered to search for the phrase “is ECB secure” or something similar. They were all generated by a similar procedure: converting the image to an uncompressed format (e.g. PPM), skipping the header, and encrypting the body. Above I followed a similar procedure with the IRS logo: the image on the left is unencrypted, the image on the right encrypted with the IRS’ recommended settings (openssl enc -aes-256-ecb).
To be perfectly clear: what you see on the right is not a joke or a fake drawing made in order to emphasise the point of this post. It is the actual result of encrypting an image using the IRS’ recommended settings. This is the level of encryption that your bank will be using when transmitting your personal data to the IRS.
Why does this happen?
In the ECB mode, under a given key, any given plaintext block always gets encrypted to the same ciphertext block. If this property is undesirable in a particular application, the ECB mode should not be used.
In other words, if your unencrypted data is longer than one block and has a sequence of bytes which appears repeatedly (e.g. a string of all-white pixels, an address, XML tags, etc.), then the “encrypted” version of those bytes will also appear repeatedly in the output. Even the human eye can see the patterns that result; for example, this is why there are vertical lines in the background of the IRS logo above (if I’d used another image whose width wasn’t an exact multiple of the AES block size, you’d see diagonal lines instead, like in the Tux image on Wikipedia). A computer will have an even easier time recovering large portions of the original data without having to know the key.
Initialization vector all zeroes
Aside from recommending ECB, the IRS also states:
Initialization Vector: No Initialization Vector (IV). If an IV is present, set to all zeros to avoid affecting the encryption.
ECB operates one block at a time, without interactions between blocks, so the initialization vector is irrelevant. So the IRS’ “no initialization vector” recommendation doesn’t make things any worse, in that specific scenario. I’m quite sure of course that the IRS is aware of this fact and that the scurrilous accusations of Bruce Schneier’s commenters (“[l]ooks like someone tinkered with the available settings until some rather limited test case succeeded”) are totally false.
But anyway, say the IT staff at a FATCA’ed “Foreign Financial Institution” ignore the IRS’ misleading reassurances, get half a clue, and read on some “recent cybersecurity blog” that ECB is insecure. So instead they switch to one of the other
three-letter acronyms block-cipher modes. At the level of a single message this even superficially appears more secure.
But if they don’t know enough about encryption to understand what exactly the initialization vector is for, they’ll probably leave that on the IRS’ recommended setting. However, if you reuse the same initialization vector and the same key (e.g. for multiple transmissions from the same bank, whether for entities with different GIINs, or in different years), you still have a problem, even with CBC (let alone OFB).
AES (as any block cipher) strifes to be indistinguishable from a random permutation, so any property like C1 ⊕ C2 = P1 ⊕ P2 would be quite bad.
This property (with K as the “key stream bits”) is valid for synchronous stream ciphers, including the one time pad and stream cipher modes of block ciphers (CTR, OFB, and for the first block also CFB), but not for those modes of operation who actually put the plaintext through the cipher, like CBC.
This is why you should never reuse the same key with the same initialization vector for stream ciphers (or reuse the same key at all for stream ciphers which don’t have an initialization vector).
For CBC, the effect of such a misuse is just that identical plaintexts (or actually identical starting plaintext blocks) give identical ciphertexts (identical starting ciphertext blocks). This is still something one doesn’t want, so you should use a fresh random IV for each message even for CBC mode.
How big a deal is this?
Well, most data breaches probably don’t occur due to thieves intercepting insecurely-encrypted data in transit and then decrypting it. It’s much easier for them to hack into one of the badly-administered endpoint systems where that data is stored, whether by technological methods or social engineering or plain old bribery. Absurd recommendations for encryption settings are thus “merely” a symptom of an underlying sickness: organisations which aren’t following best practices on encryption probably aren’t following best practices anywhere else in their computer infrastructure.
The fact that the IRS publishes comments like “Key size should be verified and moving the key across operating systems can affect the key size” with a straight face demonstrates pretty definitively that anyone there who has a technological clue is not having his or her voice heard. Another question in the IRS’ IDES FAQ suggests something similar:
C17. Are there any characters that pose a security threat and should be avoided in submitted XML documents?
Use of the apostrophe (‘), double dash (–), quotation mark (“), and hash (#) symbols are prohibited as they can be used in a security threat and will cause the transmission to be rejected with a failed threat detection error notification. Replacing the characters with an entity reference will still cause a rejection.
Yep, that character blacklist makes me totally confident they haven’t forgotten any edge cases which will leave open yet another giant security hole.