A simple java project written to parse JSON text.
In this parser we are doing the below:
- Lexical Analysis
- Syntactical Analysis
- Parsing and building the JSON object
- Formatting and beautifying the JSON
In the first 3 of the above steps, we are throwing exceptions anywhere the rules of JSON standard are not seems to be conformed by the input text. So the formatter only works on the input which is a valid JSON.
Lexer
is the only abstraction made available to the main class.- In the Lexer, we call the
nextToken()
repeatedly to get the next token at any instant - Lexer in turn uses the
TokenizerFactory
to get theTokenizer
based on the current character from the input Tokenizer
has methodgetToken(Input input)
to generate the token from the specific index we are at.Token
is the base interface to represent the type of tokens allowed in the json
The DFA for the analyzer of JSON is as below:
Analyzer
has the methodvoid analyze(TokenBase tokenBase)
which analyzes the token ordering and lets us know if the standard is followed or not- Each analyzer has the next set of characters set up as per the above DFA and on encountering a character which isn't from the allowed list at a lexeme, we throw exception
- Based on the type of
Token
of theLexeme
, we find the next Analyzer from theAnalyzerFactory
and work on its analyze method - This basically follows Chain of Responsibility pattern
- Parser is the minimal code component in this project
Parser
has the methodLexeme parse(List<Lexeme> lexemes)
which takes the Lexemes and build the object out of the lexemes- If there is any unwanted or unexpected lexemes at any instant, exceptions will be thrown
Formatter
has the methodString format(Lexeme lexeme, int depth)
which when given the Lexeme with the value of the json object built by theParser
, formats and beautifies itFormatterFactory
takes theToken
as the input and gets the Formatter needed to beautify the token and make a string value out of it
The output dump when tested looks like below:
Working on resources/input.txt
fileContent
{
"key1" : "value1","key2" : "value2","key3" : {"key31" : "value31"},"key4" : [123, 345.78, 13.45e90, -345.909, +245],
"key5" : true,"key6" : false,"key7" : [{"key711" : "value711","key712" : "value712"},{"key721" : "value721","key722" : "value722"}]
}
+--------------------+
| Lexical Analysis |
+--------------------+
Lexeme{tokenType=LEFT_BRACE, value={, start=0, end=0}
Lexeme{tokenType=STRING, value="key1", start=6, end=11}
Lexeme{tokenType=COLON, value=:, start=13, end=13}
Lexeme{tokenType=STRING, value="value1", start=15, end=22}
Lexeme{tokenType=COMMA, value=,, start=23, end=23}
Lexeme{tokenType=STRING, value="key2", start=24, end=29}
Lexeme{tokenType=COLON, value=:, start=31, end=31}
Lexeme{tokenType=STRING, value="value2", start=33, end=40}
Lexeme{tokenType=COMMA, value=,, start=41, end=41}
Lexeme{tokenType=STRING, value="key3", start=42, end=47}
Lexeme{tokenType=COLON, value=:, start=49, end=49}
Lexeme{tokenType=LEFT_BRACE, value={, start=51, end=51}
Lexeme{tokenType=STRING, value="key31", start=52, end=58}
Lexeme{tokenType=COLON, value=:, start=60, end=60}
Lexeme{tokenType=STRING, value="value31", start=62, end=70}
Lexeme{tokenType=RIGHT_BRACE, value=}, start=71, end=71}
Lexeme{tokenType=COMMA, value=,, start=72, end=72}
Lexeme{tokenType=STRING, value="key4", start=73, end=78}
Lexeme{tokenType=COLON, value=:, start=80, end=80}
Lexeme{tokenType=LEFT_SQUARE_BRACKET, value=[, start=82, end=82}
Lexeme{tokenType=NUMBER, value=123, start=83, end=85}
Lexeme{tokenType=COMMA, value=,, start=86, end=86}
Lexeme{tokenType=NUMBER, value=345.78, start=88, end=93}
Lexeme{tokenType=COMMA, value=,, start=94, end=94}
Lexeme{tokenType=NUMBER, value=13.45e90, start=96, end=103}
Lexeme{tokenType=COMMA, value=,, start=104, end=104}
Lexeme{tokenType=NUMBER, value=-345.909, start=106, end=113}
Lexeme{tokenType=COMMA, value=,, start=114, end=114}
Lexeme{tokenType=NUMBER, value=+245, start=116, end=119}
Lexeme{tokenType=RIGHT_SQUARE_BRACKET, value=], start=120, end=120}
Lexeme{tokenType=COMMA, value=,, start=121, end=121}
Lexeme{tokenType=STRING, value="key5", start=127, end=132}
Lexeme{tokenType=COLON, value=:, start=134, end=134}
Lexeme{tokenType=BOOLEAN, value=true, start=136, end=139}
Lexeme{tokenType=COMMA, value=,, start=140, end=140}
Lexeme{tokenType=STRING, value="key6", start=141, end=146}
Lexeme{tokenType=COLON, value=:, start=148, end=148}
Lexeme{tokenType=BOOLEAN, value=false, start=150, end=154}
Lexeme{tokenType=COMMA, value=,, start=155, end=155}
Lexeme{tokenType=STRING, value="key7", start=156, end=161}
Lexeme{tokenType=COLON, value=:, start=163, end=163}
Lexeme{tokenType=LEFT_SQUARE_BRACKET, value=[, start=165, end=165}
Lexeme{tokenType=LEFT_BRACE, value={, start=166, end=166}
Lexeme{tokenType=STRING, value="key711", start=167, end=174}
Lexeme{tokenType=COLON, value=:, start=176, end=176}
Lexeme{tokenType=STRING, value="value711", start=178, end=187}
Lexeme{tokenType=COMMA, value=,, start=188, end=188}
Lexeme{tokenType=STRING, value="key712", start=189, end=196}
Lexeme{tokenType=COLON, value=:, start=198, end=198}
Lexeme{tokenType=STRING, value="value712", start=200, end=209}
Lexeme{tokenType=RIGHT_BRACE, value=}, start=210, end=210}
Lexeme{tokenType=COMMA, value=,, start=211, end=211}
Lexeme{tokenType=LEFT_BRACE, value={, start=212, end=212}
Lexeme{tokenType=STRING, value="key721", start=213, end=220}
Lexeme{tokenType=COLON, value=:, start=222, end=222}
Lexeme{tokenType=STRING, value="value721", start=224, end=233}
Lexeme{tokenType=COMMA, value=,, start=234, end=234}
Lexeme{tokenType=STRING, value="key722", start=235, end=242}
Lexeme{tokenType=COLON, value=:, start=244, end=244}
Lexeme{tokenType=STRING, value="value722", start=246, end=255}
Lexeme{tokenType=RIGHT_BRACE, value=}, start=256, end=256}
Lexeme{tokenType=RIGHT_SQUARE_BRACKET, value=], start=257, end=257}
Lexeme{tokenType=RIGHT_BRACE, value=}, start=259, end=259}
Analyzing the character pattern...
+------------------------+
| Syntactical Analysis |
+------------------------+
parsed Json = {"key1"=Lexeme{tokenType=STRING, value="value1", start=15, end=22},
"key2"=Lexeme{tokenType=STRING, value="value2", start=33, end=40},
"key3"=Lexeme{tokenType=OBJECT, value={"key31"=Lexeme{tokenType=STRING, value="value31", start=62, end=70}}, start=51, end=71},
"key4"=Lexeme{tokenType=ARRAY, value=[Lexeme{tokenType=NUMBER, value=123, start=83, end=85},
Lexeme{tokenType=NUMBER, value=345.78, start=88, end=93},
Lexeme{tokenType=NUMBER, value=13.45e90, start=96, end=103},
Lexeme{tokenType=NUMBER, value=-345.909, start=106, end=113},
Lexeme{tokenType=NUMBER, value=+245, start=116, end=119}], start=82, end=120},
"key5"=Lexeme{tokenType=BOOLEAN, value=true, start=136, end=139},
"key6"=Lexeme{tokenType=BOOLEAN, value=false, start=150, end=154},
"key7"=Lexeme{tokenType=ARRAY, value=[Lexeme{tokenType=OBJECT, value={"key711"=Lexeme{tokenType=STRING, value="value711", start=178, end=187},
"key712"=Lexeme{tokenType=STRING, value="value712", start=200, end=209}}, start=166, end=210},
Lexeme{tokenType=OBJECT, value={"key721"=Lexeme{tokenType=STRING, value="value721", start=224, end=233},
"key722"=Lexeme{tokenType=STRING, value="value722", start=246, end=255}}, start=212, end=256}], start=165, end=257}}
+------------------+
| Formatted Json |
+------------------+
{
"key1" : "value1",
"key2" : "value2",
"key3" : {
"key31" : "value31"
},
"key4" : [123, 345.78, 13.45e90, -345.909, +245],
"key5" : true,
"key6" : false,
"key7" : [
{
"key711" : "value711",
"key712" : "value712"
},
{
"key721" : "value721",
"key722" : "value722"
}
]
}
- The string with nested json or string with the inverted quotations needs to be handled
As per the rules written now, a string is considered till an end quote is encountered. - Escape codes needs to be added which fixes the above too.
Use the below to write Automata for tokenizing the input into corresponding Token: https://www.json.org/json-en.html