jsos17 / classic-crypto Goto Github PK

Vigenère cipher, keyed Vigenère cipher and autokey cipher. Single and double columnar transposition ciphers. Cryptanalysis of the Vigenère cipher. Data Structures and Algorithms Lab, summer 2018 (offered by Bachelor's Programme in Computer Science, University of Helsinki).

Java 100.00%

vigenere-cipher transposition-cipher classical-cryptography frequency-analysis index-of-coincidence columnar-transposition-cipher autokey-cipher keyed-cipher double-transposition-cipher classical-cipher

classic-crypto's Introduction

My Education

Master's Student (Applied Mathematics), University of Helsinki (2020-present)
- Master's Thesis at HELDA - University of Helsinki Open Repository
Bachelor of Science (Computer Science), University of Helsinki (2017-2020)
Business studies, Helsinki School of Economics (now part of Aalto University) (Autumn 2009)
Mathematics and statistics studies, University of Helsinki (2007-2009)
Ylioppilastutkinto (Matriculation Examination), Imatran Yhteislukio (2002-2005)

classic-crypto's People

Contributors

Stargazers

Watchers

Forkers

abd-elrazek

classic-crypto's Issues

Koodikatselmointi tira labra

Latasin tiedostot su 26.8. klo 17:30.

Mielenkiintoinen aihe ja ole selvästi paneutunut työhösi intohimolla ja käyttänyt siihen paljon aikaa. Hienoa! Kiva, että olit tehnyt graafisen käyttöliittymän! Mukavampi testailla ohjelmaa.

Tässä jotakin mietteitä:

Projektin rakenne on hyvä ja pääasiassa selkeä. Voisit ehkä paketoida yhteen pakettiin koko ohjelman lopuksi. Täältä voi katsoa pakettien nimeämisestä lisää: https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html
Kaikki tietorakenteet toteutettu selkeästi.
Nuo JavaDoc kuvaukset olivat välillä aika pitkiä. Ehkä ne voisivat olla hieman napakampia? JavaDoc taitaa katkaista JavaDocsissa näytettävän kuvauksen ensimmäiseen pisteeseen.
Generoidut kommentit voi varman ottaa pois lopuksi.
Kuin kokeilin testejä, niin ilmeisesti allokoitu muisti ylittyi HashTableTest -luokassa. Herjasi java.lang.OutOfMemoryError - virhettä, kun ajoin testit.
Testit tosi kattavia kyllä muuten

Koodikatselmointi #1

Latasin tiedostot ti 21.8. klo 12:08.

Tosi mielenkiintoinen projekti ja aihevalinta! Vähän on kimuranttia katselmoida keskeneräistä työtä, ja osa kommenteista onkin varmasti luokkaa "on noin, koska ei vielä valmis", mutta kirjasin nyt kuitenkin ylös kaiken, mitä silmään osui.

En tiedä, ovatko konventiot vain uponneet kallooni turhankin syvälle, mutta mulle tuotti ohikiitävän hetken verran vaikeuksia lukea for-luuppeja, jotka eivät käyttäneet ensimmäisenä indeksinä i:tä, vaan esimerkiksi k:ta tai j:tä. Esimerkiksi transpositiosalauksessa on sisäkkäiset for-luupit järjestyksessä j-i, minkä seurauksena mulle oli tosi epäluontevaa hahmottaa, mihin kumpikin viittasi, kun indeksejä käytettiin luuppien sisällä. Sorting-kansion metodeissa on indeksinä käytetty myös muuttujaa idx. Miettisin siis ehkä luuppien yhdenmukaistamista kautta linjan, niin koodia on sujuvampi lukea. Ei siis välttämättä tarvitse mennä (lievästi pakkomielteiseen) i-j-k-järjestykseen, mutta ainakin olisi hyvä, että systeemi olisi koko projektin läpi yhdenmukainen.
Samantyyppinen juttu (ja tämä ehkä todistaa neuroottisuuteni) tuli vastaan helpers-kansion GreatestCommonDivisorissa: metodi ottaa vastaan parametrit a ja b, mutta sitten vertaa niitä if-lauseessa järjestyksessä b ja a. Eihän se toimintaa haittaa saati sitä miksikään muuta, mutta näin lukijana ensimmäisenä tuli mieleen, että onko siinä jokin tietty syy, miksi näin on tehty. Noudattaisin siis itse siinä parametrien annettua järjestystä.
Tämä nyt ei ole mitenkään akuutti juttu, mutta viimeistään projektin loppusiivousvaiheessa ottaisin pois luokkien aluista "To change this licence header..." -kommentit.
Ciphers-kansion salauksissa yhdistetään merkkijonoja luupeissa +:lla. Oletko testannut, olisiko StringBuilderista saatavissa tehokkuusetuja, vaikkeivät merkkijonot nyt kovin valtavia olekaan? (Olettaen, että StringBuilderia saa tira-työssä käyttää...)
Muuttujien nimet ovat tosi hyviä ja kuvaavia valtaosin, mutta joitakin ehkä harkitsisin: CharIndexPairissa ovat muuttujat c ja number (samanmoinen vähän epäkuvaava c on myös IndexOfCoincidencessa). Olisivatko character ja index kuvaavampia? Miettisin myös tapauskohtaisesti, kuinka paljon nimeämisessä lyhentäminen auttaa. TranspositionAttackissa ovat mm. muuttujat combi, ciph ja quad. Keskimmäisestä käyttäisin varmaan muotoa cipher, niin se tekisi koodista luettavampaa, mutta nuo kaksi muuta ovat hankalampia, kun niiden kantasanat ovat sen verran pitkiä. Quad voisi tosin ehkä olla ihan perustellusti vaikka quads, kun kantasana on monikossa. (Kyllähän tuo kolmikko siis noinkin toimii, mutta jäin vain miettimään.)
Toinen nimeämisjuttu löytyy testien puolelta: JUnitissa (ja laajemmaltikin yksikkötesteissä) käytäntönä lienee, että testimetodin nimi kertoo, mitä testi testaa, siis tyyliin "public void negatiivisenLuvunAntaminenPalauttaaMiinusYksi()".
Ciphers-kansion CharIndexPairia kuvataan apuluokaksi. Kannattaisikohan se siirtää helpers-kansioon? Tai ehkä cryptanalysis-kansioon? Nyt se on vähän yksikseen salausten joukossa. Muilta osin projektin rakenne on erinomaisen hyvä ja selkeä!
Lopuksi vielä yksi juttu vähän ehkä katselmoinnin ulkolaitamilta, mutta kun itse sain vastaavaa oppia vähän aikaa sitten, ja pidin sitä tosi hyvänä, niin ajattelin panna kiertoon... Gitin commit-viesteistä, nimittäin. "Best practicena" pidetään viestien kirjoittamista imperatiivissa. Ei siis "Lisäsin testit" tai "Lisätään testit" vaan "Lisää testit" (suositus löytyy, kuten sittemmin opin, mm. täältä commit guidelineista: https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project). Kun tätä minulle ystävällisesti "ehdotettiin" ja siihen mukauduin, niin olen omissakin projekteissani pitänyt sitä hyvinkin selvänä ja toimivana tapana (plus että se vähentää arpomista commit-viestien kirjoitusvaiheessa). Ja yhdenmukaisuutta sekin tuo.

Tsemppiä loppuprojektiin!

Mika

Code review

cloned 30.08.2018 ~13:00, commit

Well, wow. Extensive and thorough work. I considered cryptography as a topic, but having seen this, I'm happy I decided to run with something else. Super interesting subject, but especially the cryptanalysis part looks to be out of my league. I really can't be of any help with the core functionality, so I'll focus only on commenting the source code more generally (read: splitting hairs).

Organization

The package structure makes sense, but it seems to me that not all classes are where you'd expect to find them. In particular, I'm looking at the cryptanalysis package: even if the utility classes there (Ngrams, Combinatorics...) are needed only within this package, wouldn't they be more "at home" e.g. in the helpers package?

Also, I'm not convinced that the subclassing of Ngrams is quite justified. Unless I'm overlooking something, the difference between the subclasses is only in what you pass in as a constructor parameter. Why not simply use the base class, and reflect the semantic difference in the name of the constructed instance? Like:

Ngrams trigrams = new Ngrams(3, ...);
Ngrams quadgrams = new Ngrams(4, ...);

Reading through the several Cipher classes, the Vigenere ones are very nicely organized and compact. By contrast, TranspositionCipher looks a bit messy and bloated. Perhaps you could decouple the private methods (matrix manipulation and whatnot) to helper classes, and also split up the class into Single... vs. Double.... The latter would be quite trivial (just calling the other one twice), but the overall structure would be prettier and in line with the VinegereCiphers.

In fact, it looks like all the ciphers expose a very similar logic outward: crypting and decrypting plaintext with one or more keys. It might be useful to define a simple-as-balls interface and have your ciphers implement that, something like:

public interface Cipher
{
    String decrypt(String ciphertext, String... keys);

    String encrypt(String plaintext, String... keys);
}

...and then have the implementing classes throw e.g. IllegalArgumentException if too few or too many keys are passed at method call. One benefit here is that you could e.g. handle performance testing of all ciphers with one piece of code that operates with the interface.

Another minor thing that came to mind: you could make utility classes static in cases where they really have no internal state across methods (such as AlphabetHelper, Combinatorics, GCD). This way you don't need constructors, and could call the methods without having to construct an instance, like:

HashTable<...> tbl = AlphabetHelper.hashAlphabet(someString);

But this is totally trivial nitpicking...

Readability

At no point I had trouble reading the code because the code in itself would have been hard to understand (such as variable and method naming). Whatever difficulty I experienced was rather due to my limited understanding of mathematics and statistics. But even in these cases, the methods were conscientiously commented, so I didn't need to look any further than the docstring to get clues on what's going on.

Another reviewer has noted that the javadoc comments are too heavy — I'm of the exact opposite opinion. I think you're doing it right when there's less lines of actual code than comments, unless it's an obvious bean with just boilerplate code. Personally I prefer having specific explanations right where the code is, instead of having to go look for clarity in separate documentation.

I appreciate the quality of your language, it is pleasant to read and throughout understandable. I know it's bit of a pain to write clearly and concisely (especially because the actual coding is so much more fun), but it really helps others.

I'm tempted to complain about the UI class, but that would be beside the point because it's just a nice-to-have addition, and does not contribute to the core functionality.

Code

I don't have anything valuable to say here, it boils down to personal preferences... There were several places where, I think, you could replace while loops with for loops, just for some brevity. Take Ngrams lines 139-143 as example:

int i = 0;
while (i + this.n <= text.length()) {
    logProbSum += logProbability(text.substring(i, i + this.n));
    i++;
}

You could squeeze this to a two-liner:

for (int i = 0; i + n <= text.length(); i++)
    logProbSum += logProbability(text.substring(i, i + n));

Also, in a couple of places the exact same code is repeated twice. This happens e.g. in constructors of the HashTable class: many of the table attributes are set in each constructor even though the values are fixed, so you could just treat them as static variables and set the values on declaration. This would visibly declutter the code. For another example, KeyedVigenereCipher repeats the same code (comments included) on lines 51-54 and 76-79.

Speaking of HashTable, at quick glance it looks like there's a lot of overlap with HashedSet functionality-wise. Maybe you could investigate if HashedSet can be rewritten using HashTable as the backing data structure? (That is, HashedSet would just encapsulate a HashTable that is used in a certain way so as to resemble a set)?

And one final nitpicking, related to division of responsibility: TranspositionCipher calls a sorting utility on lines 36-41, but first decides which sorting technique to use based on a threshold value. Shouldn't this decision rather belong to the sorter? It would be more "kosher" to leave it to the sorting class to figure out what technique makes the most sense.

Tests

Again, just... wow. I'm speechless. You're putting a lot of effort and thought into this, and it is inspiring. I'd bow if my back wasn't hurting.

jsos17 / classic-crypto Goto Github PK

classic-crypto's Introduction

My Education

classic-crypto's People

Contributors

Stargazers

Watchers

Forkers

classic-crypto's Issues

Koodikatselmointi tira labra

Koodikatselmointi #1

Code review

Organization

Readability

Code

Tests

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent