cloned 30.08.2018 ~13:00, commit
Well, wow. Extensive and thorough work. I considered cryptography as a topic, but having seen this, I'm happy I decided to run with something else. Super interesting subject, but especially the cryptanalysis part looks to be out of my league. I really can't be of any help with the core functionality, so I'll focus only on commenting the source code more generally (read: splitting hairs).
Organization
The package structure makes sense, but it seems to me that not all classes are where you'd expect to find them. In particular, I'm looking at the cryptanalysis
package: even if the utility classes there (Ngrams
, Combinatorics
...) are needed only within this package, wouldn't they be more "at home" e.g. in the helpers
package?
Also, I'm not convinced that the subclassing of Ngrams
is quite justified. Unless I'm overlooking something, the difference between the subclasses is only in what you pass in as a constructor parameter. Why not simply use the base class, and reflect the semantic difference in the name of the constructed instance? Like:
Ngrams trigrams = new Ngrams(3, ...);
Ngrams quadgrams = new Ngrams(4, ...);
Reading through the several Cipher
classes, the Vigenere ones are very nicely organized and compact. By contrast, TranspositionCipher
looks a bit messy and bloated. Perhaps you could decouple the private methods (matrix manipulation and whatnot) to helper classes, and also split up the class into Single...
vs. Double...
. The latter would be quite trivial (just calling the other one twice), but the overall structure would be prettier and in line with the VinegereCiphers
.
In fact, it looks like all the ciphers expose a very similar logic outward: crypting and decrypting plaintext with one or more keys. It might be useful to define a simple-as-balls interface and have your ciphers implement that, something like:
public interface Cipher
{
String decrypt(String ciphertext, String... keys);
String encrypt(String plaintext, String... keys);
}
...and then have the implementing classes throw e.g. IllegalArgumentException
if too few or too many keys are passed at method call. One benefit here is that you could e.g. handle performance testing of all ciphers with one piece of code that operates with the interface.
Another minor thing that came to mind: you could make utility classes static in cases where they really have no internal state across methods (such as AlphabetHelper
, Combinatorics
, GCD
). This way you don't need constructors, and could call the methods without having to construct an instance, like:
HashTable<...> tbl = AlphabetHelper.hashAlphabet(someString);
But this is totally trivial nitpicking...
Readability
At no point I had trouble reading the code because the code in itself would have been hard to understand (such as variable and method naming). Whatever difficulty I experienced was rather due to my limited understanding of mathematics and statistics. But even in these cases, the methods were conscientiously commented, so I didn't need to look any further than the docstring to get clues on what's going on.
Another reviewer has noted that the javadoc comments are too heavy — I'm of the exact opposite opinion. I think you're doing it right when there's less lines of actual code than comments, unless it's an obvious bean with just boilerplate code. Personally I prefer having specific explanations right where the code is, instead of having to go look for clarity in separate documentation.
I appreciate the quality of your language, it is pleasant to read and throughout understandable. I know it's bit of a pain to write clearly and concisely (especially because the actual coding is so much more fun), but it really helps others.
I'm tempted to complain about the UI class, but that would be beside the point because it's just a nice-to-have addition, and does not contribute to the core functionality.
Code
I don't have anything valuable to say here, it boils down to personal preferences... There were several places where, I think, you could replace while
loops with for
loops, just for some brevity. Take Ngrams
lines 139-143 as example:
int i = 0;
while (i + this.n <= text.length()) {
logProbSum += logProbability(text.substring(i, i + this.n));
i++;
}
You could squeeze this to a two-liner:
for (int i = 0; i + n <= text.length(); i++)
logProbSum += logProbability(text.substring(i, i + n));
Also, in a couple of places the exact same code is repeated twice. This happens e.g. in constructors of the HashTable
class: many of the table attributes are set in each constructor even though the values are fixed, so you could just treat them as static variables and set the values on declaration. This would visibly declutter the code. For another example, KeyedVigenereCipher
repeats the same code (comments included) on lines 51-54 and 76-79.
Speaking of HashTable
, at quick glance it looks like there's a lot of overlap with HashedSet
functionality-wise. Maybe you could investigate if HashedSet
can be rewritten using HashTable
as the backing data structure? (That is, HashedSet
would just encapsulate a HashTable
that is used in a certain way so as to resemble a set)?
And one final nitpicking, related to division of responsibility: TranspositionCipher
calls a sorting utility on lines 36-41, but first decides which sorting technique to use based on a threshold value. Shouldn't this decision rather belong to the sorter? It would be more "kosher" to leave it to the sorting class to figure out what technique makes the most sense.
Tests
Again, just... wow. I'm speechless. You're putting a lot of effort and thought into this, and it is inspiring. I'd bow if my back wasn't hurting.