In this assignment, you will be required to scan, parse, and check the semantics of a ﬁle that encodes the state of a variation of Greater Than Sudoku. The deﬁnition of a properly formatted input ﬁle is given in Section 1.1. You will be submitting one .java ﬁle and two .g4 (ANTLR) ﬁles via web hand-in.
1.1 File Speciﬁcation • The ﬁle contains thre (3) labeled sections: Spaces , Values , and Game . Each section is enclosed by start and end tags (%section% and $section, respectively). The value of the section is set by the = assignment operator. • Spaces and Values are caret-separated (^) lists of spaces and symbols that appear between %spaces% and $spaces (and %values% and $values) tokens. Valid Spaces and Values are numerical symbols. • Game contains three (3) two-dimensional arrays of space-separated entries that uses numeric symbols (and the -) to encode the state of the game. Rows will be ended with a /n and the Game will be begun with a %game% and ended with a $game. The ﬁrst two-dimensonal array contains the actual values placed in the 16 spaces in the game. The second and third contain an encoding of the < and signs that appear between each of the neighboring spaces in the game. The second encodes vertical s(1) and <s (1) (decreasing vs increasing), while the third encodes horizontal s(1) and <s (1).
An example of a properly formatted ﬁle is shown in Figure 1.
%section% %t i t l e%<Values = %values% 4 ˆ 3 ˆ 4 ˆ 2 ˆ 3 ˆ 4 ˆ 3 ˆ 2 ˆ 2 ˆ 1 ˆ 2 ˆ 3 ˆ 2 $values $section %section% %t i t l e%<Spaces = %spaces% 1 ˆ 15 ˆ 5 ˆ 7 ˆ 2 ˆ 12 ˆ 8 ˆ 14 ˆ 3 ˆ 16 ˆ 11 ˆ 4 ˆ 13 $spaces $section %section% %t i t l e%<Game = %game% 4 /n 13/n /n $ game %game% 1 1 1 1/n 11 1 1/n 1 111$ game %game% 11 1/n 11 1/n1 1 1/n1 11$ game $section
Figure 1: A properly formatted Greater Than Sudoku (gts) encoding
The assignment is made up of two parts: scanning the text of the input ﬁle and parsing the information contained in the input ﬁle.
Construct a combined grammar in a .g4 ﬁle that ANTLR can use to scan a supplied Greater Than Sudoku encoding. The logic in this ﬁle should be robust enough to identify tokens in the encoding and accurately process any correctly formatted encoding. The rules in your .g4 ﬁle will be augmented with actions that display information about the input ﬁle. An example of that output is speciﬁed in Section 2. The purpose of the scanner is to extract tokens from the input and pass those along to the parser. For the Greater Than Sudoku encoding, the types of tokens that you will need to consider are given in Table 1.
Type Form Section Beginning %section% Section Ending $section Section Title %title%<Spaces, %title%<Values and %title%<Game Assign Value = Space Symbol One or more Numerical Symbols Value Symbol One or more Numerical Symbols Game Symbol – or One or more Numerical symbols Numerical Symbol 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or Row Ending /n Game Beginning %game% Game Ending $game List Beginning %spaces% or $spaces List Ending %values% or $values White Space (to be ignored) spaces, tabs, newlines
Table 1: Tokens to Consider
1.2.1 Invalid Encodings
For invalid Greater Than Sudoku encodings, the output ERR: T in line L. should display. T would be the symbol read and L would be the line of input where the symbol was read. Your scanner should stop scanning the ﬁle after an unrecognized token is found.
Construct a combined grammar in a .g4 ﬁle that ANTLR can use to parse a supplied Greater Than Sudoku encoding. In addition to the rules for scanning, there are several parsing rules: • Each section appears once and only once. The sections may appear in either Spaces /Values /Game or Values /Spaces /Game order. • There must be more than two (2) rows in a valid Game . • There must be more than two (2) columns in a valid Game .
You may assume that each row has the same number of columns, and each column has the same number of rows.
You may assume that the visible game always comes before the grid of row inequalities and the grid of column inequalities. • There must be more than three (3) spaces in the Spaces section and more than three values in the Values section.
The semantics of a properly formatted Greater Than Sudoku encoding are:
1. The number of rows/columns must be equal and a perfect square
2. The number of numerical symbols already placed in the Game must not exceed 25% of all spaces in the Game .
3. Extra Credit (10 points or Honors contract): TheGame must still be satisfy all the contraints of Greater than Sudoku (each of the numbers 1 through n (number of rows/columns) appears at most once in each row/column and each pn⇥pn cluster of spaces, and all < and signs are obeyed.
Your .g4 ﬁle should produce output for both correctly formatted ﬁles and incorrectly formatted ﬁles. For the correctly formatted ﬁle in Figure 1, the output would have the form of the output presented in Figure 2
Start Section: %section% Label: %title%<Values Assign: = Start Values: %values% Numerical Symbol: 4 … Numerical Symbol: 2 Cease Values: $values Cease Section: $section Start Section: %section% Label: %title%<Spaces Assign: = Start Spaces: %spaces% Numerical Symbol: 1 … Numerical Symbol: 13 Cease Spaces: $spaces Cease Section: $section Start Section: %section% Label: %title%<Game Assign: = Start Values: %game% Open Space: Open Space: Numerical Symbol: 4 Open Space: Cease Row: /n Open Space: … Numerical Symbol: 1 Cease Values: $game Cease Section: $section Cease File
Figure 2: Truncated Output of Scanner for File in Figure 1
For a correctly formatted ﬁle in Part 2, the output would be: You need to fill s spaces. where s is the number of unﬁlled spaces in the Game . For the ﬁle in Figure 1, the output would be
You need to fill 13 spaces..
2.1.1 Invalid Syntax & Semantics in Parsing
For invalid encodings in Part 2, a message describing the error should be displayed. For a syntax error (violation of the syntax rules), the output ERR: T in line L. should be displayed, where L is the line number where the unrecognized token T was found. For that error, the parser should stop processing the ﬁle. For a semantic rule violation, the output SEMANTIC ERROR: Rule R. should be displayed, where R is the number of the rule (from List 1.3) that was violated, but parsing should continue. Syntax errors in Part 2 should be reported in the syntaxError method of csce322hw001pt02error.java.
3 Naming Conventions
The ANTLR ﬁle for the ﬁrst part of the assignment should be named csce322hw001pt01.g4. The ANTLR ﬁle for the second part of the assignment should be named csce322hw001pt02.g4. Both grammars should contain a start rule named greaterThanSudoku. The Java ﬁle for the second part of the assignment should be named csce322hw001pt02error.java.
The webgrader is available for this assignment. You can test your submitted ﬁles before the deadline by submitting them on webhandin and going to http://cse.unl.edu/˜cse322/grade, choosing the correct assignment and entering your cse.unl.edu credentials The script should take approximately 2 minutes to run and produce a PDF.
4.1 The Use of diff
Because Part 1 of this assignment only depends on the symbols in the ﬁle, the order in which they are displayed should not be submission dependent. Therefore, diff will be used to compare the output of a particular submission against the output of the solution implementation.
5 Point Allocation
Component Points Part 1 35 Part 2 65 Total 100
6 External Resources
ANTLR Getting Started with ANTLR v4 ANTLR 4 Documentation Overview (ANTLR 4 Runtime 4.5 API)
7 Commands of Interest
alias antlr4=’java -jar /path/to/antlr -4.5.3-complete.jar’ alias grun=’java org.antlr.v4.gui.TestRig’ export CLASSPATH=”/path/to/antlr -4.5.3-complete.jar:$CLASSPATH” antlr4 /path/to/csce322hw01part0#.g4 javac -d /path/for/.classfiles /path/to/csce322hw01part0#*.java java /path/of/.classfiles csce322hw01part02driver /path/to/inputfile grun csce322hw01pt0# greaterThanSudoku -gui grun csce322hw01pt0# greaterThanSudoku -gui /path/to/inputfile grun csce322hw01pt0# greaterThanSudoku grun csce322hw01pt0# greaterThanSudoku /path/to/inputfile
Homework 01: Semantics greater then sudoku