Crypto Fails

Sunday, December 15, 2013

Crypto Fails has Moved!

Crypto fails has moved to Tumblr.

New posts will only be posted there. This means the email subscription thing will break, too, so if you're using that, subscribe to the tumblr feed or follow it on tumblr.

Thanks :)

Saturday, December 14, 2013

Most Android Apps are Crypto Fails

This study from Carnegie Mellon and UCSB analyzed 11,748 android apps that use crypto and found that 10,327 of them (88%) were flawed. They built a tool to check for extremely obvious crypto implementation errors like

Using ECB mode.
Using a non-random IV for CBC mode.
Using constant encryption keys.
Using constant salts for password hashing.
Using fewer than 1000 iterations in password hashing.
Seeding the random number generator with a static value.

Except for the "1000 iterations" one, these are all obvious flaws, and anyone who knows anything about cryptography should know that they are a bad idea. Especially "using constant encryption keys" - that's insane.

Anyway, here are their results summarized in a table.

Their results make two things clear:

You shouldn't implement crypto yourself. Even when you have a high-level API.
Just because an app "uses military-grade AES encryption", that does not mean it is secure. It probably isn't.

Crypto Noobs #2: Side Channel Attacks

What are side channel attacks and how do they affect cryptography?

Suppose your birthday is coming up soon, and your best friend told you that they bought a gift for you. You're anxious to know what they got you, so you ask them:

"Is it a new watch?"
    "No."
"Is it a hat?"
    "No."
"Is it a computer?"
    "No."
"Is it a book?"
    "No."
"Is it a video game?"
    "No."

You're not getting anywhere. Your friend will say "No." no matter what you guess, even if you guess right. You need another source of information. You try asking again, but this time you pay close attention to their actions as they reply:

"Is it a new watch?"
    "No." (expression=neutral, eyes=looking at you)
"Is it a hat?"
    "No." (expression=neutral, eyes=looking at you)
"Is it a computer?"
    "No." (expression=neutral, eyes=looking at you)
"Is it a book?"
    "No." (expression=nervous, eyes=looking away from you)
"Is it a video game?"
    "No." (expression=relief, eyes=looking at you)

Now can you guess what your gift is? From these results, you can be pretty sure that your gift is a book. If you want to be even more sure, you can ask the questions again. If your friend's expression and eye movements are always changing after asking, "Is it a book?" you can be pretty sure that's what it is.

That's a side channel attack. You're getting no information from the actual data in the response ("No."), but the way the response is delivered is leaking the secret information you want.

Computers don't make side channel attacks quite so obvious. They don't have faces to convey emotions. So how is the information leaked in a side channel attack against a computer? There are a bunch of different ways:

How much power the computer uses when it does something.
How long it takes the computer to do something.
Which areas of the computer's memory have been accessed.
Unintentional electromagnetic radiation emanating from the system.
Sounds coming from the system (beeps, hard drives working, etc.).
The time that network packets get sent out of the system.

These are some of the ways a computer can unintentionally leak information about what it is doing and the secrets that are stored inside of it.

This leakage can be devastating to cryptography software. Even if the cipher is secure, there are no attacks against the protocol, and the implementation follows the specification exactly, a secret might still be leaked out through these avenues, which we call "side channels."

The next sections present some real examples of side channel attacks. They should give you a good idea of how devastating these attacks are on cryptography software.

Timing Analysis of Keystrokes inside SSH

This attack was presented in Timing Analysis of Keystrokes and Timing Attacks on SSH by Dawn Xiaodong Song, David Wagner, and Xuqing Tian.

Secure Shell (SSH) is a secure way of connecting to another computer and controlling it with a shell (terminal). All of the commands and responses are encrypted and authenticated, so someone eavesdropping on your Internet connection should not be able to figure out what you're doing on the remote system.

The ciphers are good, the protocols are good, and the implementation is good. Even so, the trio of researchers realized that information about what the user still gets leaked, through the time between the user's key presses.

When connected to a computer over SSH, every time you press a key, your computer immediately generates an Internet packet and sends it to the remote computer. That means, if you're eavesdropping on an SSH connection, you see a packet sent shortly after each key press. You don't know which keys were pressed, but you know what times they were pressed, and, more importantly, the times between key presses.

The researchers showed that, when a user is typing, the time between their key presses leaks information about what they're typing. For example, on a QWERTY keyboard, the time between pressing "a" then "j" (which is on the home row and is pressed by the index finger of the left then the right hand) is going to be a lot shorter than the time between pressing "z" and "1" (which is pressed by the same finger and the keys are very far apart).

Using this fact, the researchers constructed an attack tool that extracts about 1 bit of information per character pair when a user is typing a password. This means that if you can observe an encrypted SSH connection, and the user types a password, the information leaked through the keystroke timings greatly reduces the number of passwords you'd need to search to find the right password.

To remove this side channel, SSH should send packets at a fixed rate, even if the user is not typing or is in between key presses. If a packet gets sent every 50 ms no matter what, and there is no way to distinguish a keystroke packet from a "chaff" packet, then the eavesdropper won't learn anything about the time between key presses.

Cache Attacks on the AES Cipher

This attack was presented in Cache Attacks and Countermeasures: the Case of AES by Dag Arne Osvik, Adi Shamir, and Eran Tromer. A similar attack was demonstrated by Daniel J. Bernstein in Cache-timing attacks on AES.

All modern computers have a "cache" between main memory (RAM) and the CPU. This cache speeds up access to areas of memory that have been used recently. If an area of memory has been accessed recently, it's probably in the cache, so accessing it again will be quick. If the area hasn't been accessed in a long time, it has probably been evicted from the cache, so accessing it will be slow. The difference in time can leak information about what a process is doing.

This attack was applied to the AES cipher in the two papers linked above. Essentially, fast implementations of AES use lookup tables, which are arrays used to quickly convert one value into another. The indexes AES uses into these tables depend on the secret key, so there's a possibility that the difference in memory access time caused by the cache can leak information about the secret key. That is exactly what the authors demonstrated.

The authors demonstrate two different techniques for extracting the key. I'll give a simplified explanation of each.

In the first, called the Evict+Time attack, they evict part of one of the lookup tables from the cache, so that all of the AES lookup tables are in the cache except for one index in one table, then they run the encryption. If the encryption accesses that index that has been evicted from the cache, it will run slower, otherwise it will run at a normal speed. So, by timing how long the encryption takes, they can figure out if that table index was accessed or not. Since the table index depends on the key, this leaks information about the key.

The second type of attack, called Prime+Probe, first completely fills the cache with the attacker's data. The encryption process is run, and as it is running, the parts of the lookup table that it uses are loaded from main memory into the cache. Since the cache is full of the attacker's data, some of it will have to be evicted to make room for the part of the table. Once the encryption is done, the attacker accesses their data again (to see which parts have been evicted from the cache), and this tells them which table indexes were used by the encryption process, leaking information about the key.

There is no good cross-platform cross-architecture defense against this kind of attack. The only sure defense is to write code whose memory access pattern does not depend on secret information. That means, among other things, no using secret information as indexes into an array.

Power Analysis of RSA

This attack was presented in Power Analysis Attacks of Modular Exponentiation in Smartcards by Thomas S. Messerges, Ezzy A. Dabbish, and Robert H. Sloan.

RSA is a public key cryptosystem. Encrypting a message involves raising it to the power of the secret key. This is done using an algorithm called "square and multiply."

Here's a simple way to think about it: Loop over all bits in the secret exponent, and for each bit, if it is one, perform a multiply operation then a square operation. If the bit is zero, do not perform the multiply, just go straight to the square operation.

It turns out that by watching the amount of power a device uses, you can determine whether the multiply routine executed or not. That leaks the corresponding bit of the secret exponent. If multiply was executed, the bit is 1, if multiply was not executed, the bit is 0. So, by watching the power usage, you can extract the entire secret key. Here's what it looks like in a simulation:

This is obviously a devastating attack, since it leaks the entire secret key. To defend against it, you have to make sure that the amount of power your code uses doesn't depend on secret information.

Noise Floor

At DEF CON 21, @0xabad1dea gave a talk in which she demonstrates how you can use cheap (~$15) software defined radio hardware to learn information about a (very unshielded) system, including getting a (very) rough view of what's being shown on the monitor.

Conclusion

There you have it. That's only a very small sample of the side channel attacks that have been found. I would love to have written about more, especially this one, but I don't want this post to go on forever.

Side channel attack research is still very active. If you're interested, check out some of the papers I've linked to and the papers that they cite. You can even try to implement some of the attacks yourself.

What is "Crypto Noobs"?

"Crypto Noobs" is a series of posts where I answer your crypto questions. Each month, I will select one question and answer it in the next "Crypto Noobs" post.

Please send me your questions. All question askers will remain anonymous, so don't be afraid to ask dumb questions.

Saturday, November 30, 2013

CryptHook: Encrypting and Authenticating Network Traffic

CryptHook is a tool that hooks the send() and recv() system calls to apply encryption to a network application that does not provide encryption. The code is available here.

It uses the same key for both directions of communication. Traffic flowing from the client to the server is encrypted with the same key as the traffic flowing from the server to the client. It tries to use GCM to provide message authentication, but unfortunately, since it uses the same key for both directions and doesn't use sequence numbers, it's possible to:

Replay a party's messages back to itself.
Re-order messages.
Selectively drop messages.

It also derives the keys from a password, and does not exchange a session key, so there is no forward secrecy.

This one isn't so bad, especially given the environment it's operating in, but it's a good reminder that encrypting network traffic is extremely hard, and it's much better to stick to something like TLS or an OpenSSL VPN.

Thursday, October 10, 2013

Crypto Noobs #1: Initialization Vectors

What are Initialization Vectors and how should they be exchanged?

tl;dr: The Initialization Vector (IV) is an unpredictable random number used to make sure that when the same message is encrypted twice, the ciphertext always different. It should be exchanged, in public, as part of the ciphertext.

We don't want the ciphertexts to be the same when we encrypt the same message is because it leaks information. Suppose Alice is asking Bob a series of "yes or no" questions, and that the crypto software they're using makes the mistake of always encrypting the same message to the same ciphertext.

Here is their encrypted conversation:

Alice: DCF0C50A96DC2D9B05F2AC4C24CB9B93
Bob: E0B554B341FF5632DE241FBF4B1DBB37
Alice: 6742FA9512C8C2ACE6942974C8C848FC
Bob: 824783C3B272FF7129F9E153EC10D1AE
Alice: C90BD345639368D951A8B5E267427514
Bob: E0B554B341FF5632DE241FBF4B1DBB37
Alice: AC797939F55E87C361A8F02B4CEA1A08
Bob: 824783C3B272FF7129F9E153EC10D1AE

As you can see, some of Bob's ciphertexts repeat. The ones that are the same are highlighted in red and green. Since we know that Alice is asking Bob "yes or no" questions, we can assume that one of the ciphertexts corresponds to "yes" and the other to "no."

Now, suppose we ask Alice what questions she asked Bob. Alice tells us she asked the following questions (in order):

Are you Bob?
Are you Eve?
Are you Smart?
Are you Tired?

Note: Alice tells us what she asked Bob, but not what Bob's answers were.

If Bob answered the questions truthfully, then we find out that Bob's first response, the ciphertext "E0B55..." is the encryption of "yes" and Bob's second response, the ciphertext "82478..." is the encryption of "no". So "E0B55..." is "yes," and "82478..." is "no." Now we know what Bob's answers to the last two questions were:

Alice: DCF0C50A96DC2D9B05F2AC4C24CB9B93 <- Are you Bob?
Bob: E0B554B341FF5632DE241FBF4B1DBB37 <- yes
Alice: 6742FA9512C8C2ACE6942974C8C848FC <- Are you Eve?
Bob: 824783C3B272FF7129F9E153EC10D1AE <- no
Alice: C90BD345639368D951A8B5E267427514 <- Are you Smart?
Bob: E0B554B341FF5632DE241FBF4B1DBB37 <- yes
Alice: AC797939F55E87C361A8F02B4CEA1A08 <- Are you Tired?
Bob: 824783C3B272FF7129F9E153EC10D1AE <- no

So Bob is smart, but not tired. We can probably assume he's angry that we cracked his encryption, too.

To fix this problem, we have to make sure that encrypting the same message always results in a different ciphertext. We have to make sure that when Bob answers "yes", it encrypts to something different than the last time he answered "yes".

This is accomplished using either an Initialization Vector (IV) or a nonce.

An Initialization Vector is an unpredictable random number used to "initialize" an encryption function. It has to be random, and an adversary shouldn't be able to predict it before the message is encrypted. See here to learn why it needs to be unpredictable.

The term "nonce" comes from "number used once", which is exactly what it is. It's a unique number. It can be random, but it doesn't have to be. As long as it's unique (never used before).

IVs and nonces are used by encryption modes like CBC and CTR to make all plaintexts encrypt differently. The ciphertext is a function of the plaintext and the IV or nonce. When the same plaintext is encrypted many times, the IV or nonce will be different each time so the ciphertext will be different each time.

IVs and nonces do not have to be kept secret. They are usually prefixed to the ciphertext and transmitted in full public view.

IVs should be generated by a cryptographically-secure random number generator, and not derived from the secret key. CakePHP made the mistake of using the key as the IV, making its encryption very easy to crack.

What is "Crypto Noobs"?

"Crypto Noobs" is a new series of posts where I answer your crypto questions. Each month, I will select one question and answer it in a new "Crypto Noobs" post.

Please send me your questions. All question askers will remain anonymous, so don't be afraid to ask dumb questions.

Saturday, August 10, 2013

Very Bad Password Advice

This post on How-To-Geek about generating passwords from the command line advocates generating passwords from the current date and time.

The first command they give is (note double fail: base64-encoding hex):

 date +%s | sha256sum | base64 | head -c 32 ; echo

They do provide a command that gives a good alphanumeric password:

 tr -cd '[:alnum:]' < /dev/urandom | fold -w30 | head -n1

However, they also mention this one:

 date | md5sum

About the above command, they say, "I'm sure that some people will complain that it’s not as random as some of the other options, but honestly, it’s random enough if you’re going to be using the whole thing." That is absolutely wrong and demonstrates a complete lack of understanding what a hash function is.

Lessons Learned:

Hashing does not add randomness. The output of a hash is as random as its input.
Use a cryptographically-secure random number generator to generate passwords.

Monday, July 29, 2013

PHP Documentation Woes

This bit of php documentation has some interesting code:

 <?php  
 $passphrase = 'My secret';  
   
 /* Turn a human readable passphrase  
  * into a reproducable iv/key pair  
  */  
 $iv = substr(md5('iv'.$passphrase, true), 0, 8);  
 $key = substr(md5('pass1'.$passphrase, true) .   
         md5('pass2'.$passphrase, true), 0, 24);  
 $opts = array('iv'=>$iv, 'key'=>$key);  
   
 $fp = fopen('secret-file.enc', 'wb');  
 stream_filter_append($fp, 'mcrypt.tripledes', STREAM_FILTER_WRITE, $opts);  
 fwrite($fp, 'Secret secret secret data');  
 fclose($fp);  
 ?>

3DES takes a 192-bit key (actually 168), so the first two keys are taken from the first MD5 and the third key is taken from the second MD5. I have a strong suspicion that that alignment introduces some kind of vulnerability, but I can't quite put my finger on it. If anyone knows, please leave a comment.

This tweet says it all:

@DefuseSec With documentation like that, it's no wonder we see so much crap code. :-\
— Adam Caudill (@adamcaudill) July 29, 2013

Lessons Learned:

Don't use MD5 to derive keys from passwords. Use PBKDF2.
Use a random, unique IV. Don't derive it from the password.
Documentation will be used by many people. It's worth hiring a cryptographer to check it.