Encrypting large data using Java and RSA is not a lot different to encrypting small data, as long as you know the basics. Our goal is to encrypt a String of arbitrary length, send it over the Internet and decrypt it again on the other end. We will not discuss key exchange here since that is a rather trivial task. What we need first is a KeyPair. Where you get it from does not matter in the end – Here we will create one on the fly.
kpg.initialize(1024);
this.keypair = kpg.generateKeyPair();
Now that we have our KeyPair we also need a Cipher that works with our Keys. I used a plain RSA Cipher without specifying padding etc.
Next we would like to have 2 functions that can encrypt and decrypt. Here we will face 2 big problems: Ciphers do not use Strings, they use byte arrays. block ciphers cannot encrypt arbitrary long byte arrays directly. Both of those points seem rather trivial to solve – but the devil is in the details. First of all we have to understand that Strings and Strings can be different things. At the end of the day, a String represents the encoding of quite a lot of 0s and 1s. Even if our bytes are set in stone that does not mean that byte -> String -> byte will give us identical byte arrays. Question: Why not? Answer: Encodings! I’m pretty sure you have heard about UTF-8 somewhere so far. UTF-8 only defines how a series of bytes is mapped to a char. The problem is that there are quite a lot of byte patterns that do not always make sense (in the context of UTF-8) or are not really standardized. This is why German letters like öäü etc or for example Japanese symbols sometimes get replaced by something else like a box, star etc. We have all seen it. So we will need a better representation than “normal” Strings. Especially for transferring the String (or storing it or … Well basically anything) we want a representation that will keep the correct byte message and supports byte -> String -> byte operations. The 2 most common ways are to use base 64 encoding or hex encoding. In my example I will use Hex encoding since I had to call REST services with encrypted Strings and base 64 encoding inserts CR/LF markers when it seems fit – something you do not really want in URLs. But, before I go on, let’s have a look at the encrypt and decrypt functions:
this.cipher.init(Cipher.ENCRYPT_MODE, this.keypair.getPublic());
byte[] bytes = plaintext.getBytes("UTF-8");
byte[] encrypted = blockCipher(bytes,Cipher.ENCRYPT_MODE);
char[] encryptedTranspherable = Hex.encodeHex(encrypted);
return new String(encryptedTranspherable);
}
First we init the cipher with encryption mode and our public key. We could also have gotten the key from somewhere else, the only important part is that you need a cipher and a key that work together or you will get exceptions. After that, we convert the plaintext to a byte array. You can see that we assume the String to be in UTF-8. This could be skipped but might lead to side effects while recreating the string later. I included the UTF-8 for safety reasons. Next we call the function blockCipher, which does all the magic of encrypting in blocks (we will come to that later). Now we encode our new, encrypted byte[] into a Hex based String. For this purpose I used the org.apache.commons.codec.binary.Hex class. If you do not want to import that for any reason, have a look at the source code here: Kickjava.com
The String is now ready to be saved to the disk, transferred over the Internet or even sent via mail.
Decryption is much the same, just the other way round. This time we go from HexString -> byte[] -> String. Note that we again create a String that is UTF-8 based at the end.
this.cipher.init(Cipher.DECRYPT_MODE, this.keypair.getPrivate());
byte[] bts = Hex.decodeHex(encrypted.toCharArray());
byte[] decrypted = blockCipher(bts,Cipher.DECRYPT_MODE);
return new String(decrypted,"UTF-8");
}
So far so good, but what about the voodoo in blockCipher? Here’s the source:
private byte[] blockCipher(byte[] bytes, int mode) throws IllegalBlockSizeException, BadPaddingException{ // string initialize 2 buffers. // scrambled will hold intermediate results byte[] scrambled = new byte[0]; // toReturn will hold the total result byte[] toReturn = new byte[0]; // if we encrypt we use 100 byte long blocks. Decryption requires 128 byte long blocks (because of RSA) int length = (mode == Cipher.ENCRYPT_MODE)? 100 : 128; // another buffer. this one will hold the bytes that have to be modified in this step byte[] buffer = new byte[length]; for (int i=0; i< bytes.length; i++){ // if we filled our buffer array we have our block ready for de- or encryption if ((i > 0) && (i % length == 0)){ //execute the operation scrambled = cipher.doFinal(buffer); // add the result to our total result. toReturn = append(toReturn,scrambled); // here we calculate the length of the next buffer required int newlength = length; // if newlength would be longer than remaining bytes in the bytes array we shorten it. if (i + length > bytes.length) { newlength = bytes.length - i; } // clean the buffer array buffer = new byte[newlength]; } // copy byte into our buffer. buffer[i%length] = bytes[i]; } // this step is needed if we had a trailing buffer. should only happen when encrypting. // example: we encrypt 110 bytes. 100 bytes per run means we "forgot" the last 10 bytes. they are in the buffer array scrambled = cipher.doFinal(buffer); // final step before we can return the modified data. toReturn = append(toReturn,scrambled); return toReturn; }
I will not comment the source again, just go ahead and read it. The most important part is maybe this: int length = (mode == Cipher.ENCRYPT_MODE)? 100 : 128;
This part will tell the code wheter we chunks that are 100 bytes long or use 128 long chunks.
Why do we need that?
RSA is a block cipher. No matter how long (or rather: short) the input, it will produce a 128 byte long output. That explains the 128.
But why the 100? Could we not just use the whole byte array?
No we can’t. Most guides will not tell you this part at all since the authors forget that plaintext Strings can get quite large. No block cipher can ever encrypt a bitstring longer than the maximum block size. That’s why they are called block ciphers (opposed to stream ciphers that encrypt bit by bit or byte by byte).
If you ever find a class that can take arbitrary long input, uses a block cipher and generates an output, you can be 100% sure that the block ciphering is done internally.
So what we do is:
For ENcryption we use a maximum of 100 bytes of plaintext and encrypt each of those byte chunks to exactly 128 byte long ciphertext.
For DEcryption we use 128 bytes long chunks of ciphertext and decrypt each to a (maximal) 100 byte long plaintext.
Note that we do not have to use exactly 100 bytes. We could and maybe should use a slightly bigger byte range. As far as I can remember the maximum length is 116 or 117 bytes, but you can easily find that out with trial and error (You will get an IllegalBlockSizeException or similar).
One method that was used above but not stated yet is the following:
private byte[] append(byte[] prefix, byte[] suffix){ byte[] toReturn = new byte[prefix.length + suffix.length]; for (int i=0; i< prefix.length; i++){ toReturn[i] = prefix[i]; } for (int i=0; i< suffix.length; i++){ toReturn[i+prefix.length] = suffix[i]; } return toReturn; }
This only appends 1 byte array to the other.
And, we’re done. With this you should have all things together to encrypt large data with RSA. Note that it will take a LOT of time to encrypt 1 mb of data with this algorithm (3 minutes and more). But the main goal was to encrypt large Strings, and a String with 1 mb is really HUGE.