Programming Assignment 01
1. Introduction
The goal of this assignment is to have you write a lexical and syntax analyzer for a hypothetical programming language (roughly based on Pascal). The input of your program is a source code written in the programming language’s grammar. If the source code is syntactically correct your program should display a generated parse tree. An appropriate error message should be provided otherwise.
The expectation for this programming assignment is that you will be coding a complete parser from scratch using Scala. You can choose between the two approaches discussed in class: top-down or bottom-up. You are not allowed to base your project on any parser generator software like YACC or Java CC, for example.
2. Grammar
Below is the grammar for the Pascal-like programming language specified using EBNF notation. Assume that this PL is case-sensitive.
program = ´program´ identifier body ´.´
identifier = letter { ( letter | digit ) }
body = [ var_sct ] block
var_sct = ´var´ var_dcl { ´;´ var_dcl }
var_dcl = identifier { identifier } ´:´ type
type = ´Integer´ | ´Boolean´
block = ´begin´ stmt { ´;´ stmt } ´end´
stmt = assgm_stmt | read_stmt | write_stmt | if_stmt | while_stmt | block
assgm_stmt = identifier ´:=´ expr
read_stmt = ´read´ identifier
write_stmt = ´write´ ( identifier | literal )
if_stmt = ´if´ bool_expr ´then´ stmt [ ´else´ stmt ]
while_stmt = ´while´ bool_expr ´do´ stmt
expr = arithm_expr | bool_expr
arithm_expr = arithm_expr ( ´+´ | ´-´ ) term | term
term = term ´*´ factor | factor
factor = identifier | int_literal
literal = int_literal | bool_literal
int_literal = digit { digit }
bool_litreal = ´true´ | ´false´
bool_expr = bool_literal | arithm_expr ( ´>´ | ´>=´ | ´=´ | ´<=´ | ´<´ ) arithm_expr
letter = ´a´ | ´b´ | ´c´ | ´d´ | ´e´ | ´f´ | ´g´ | ´h´ | ´i´ | ´j´ | ´k´ | ´l´ | ´m´ | ´n´ | ´o´ | ´p´ | ´q´ | ´r´ | ´s´ | ´t´ | ´u´ | ´v´ | ´w´ | ´x´ | ´y´ | ´z´ | ´A´ | ´B´ | ´C´ | ´D´ | ´E´ | ´F´ | ´G´ | ´H´ | ´I´ | ´J´ | ´K´ | ´L´ | ´M´ | ´N´ | ´O´ | ´P´ | ´Q´ | ´R´ | ´S´ | ´T´ | ´U´ | ´V´ | ´W´ | ´X´ | ´Y´ | ´Z´
digit = ´0´ | ´1´ | ´2´ | ´3´ | ´4´ | ´5´ | ´6´ | ´7´ | ´8´ | ´9´
3. Tests
For testing purposes, 15 source files are provided (get them from https://github.com/thyagomota/20FCS3210). The expected output for each of the tests is provided next.
program Source1
var x1 x2 y: Integer
begin
read x1;
read x2;
y := x1 + x2;
write y
end.
program
program
identifier: ‘Source1’
body
var_sct
var
var_dct
identifier: ‘x1’
identifier: ‘x2’
identifier: ‘y’
:
type
Integer
block
begin
stmt
read_stmt
read
identifier: ‘x1’
;
stmt
read_stmt
read
identifier: ‘x2’
;
stmt
assgm_stmt
identifier: ‘y’
:=
expr
arithm_expr
term
factor
identifier: ‘x1′
term’
arithm_expr’
+
term
factor
identifier: ‘x2′
term’
arithm_expr’
;
stmt
write_stmt
write
identifier: ‘y’
end
.
program Source2
var
s: Integer;
i: Integer;
acc: Integer
begin
read s;
i := 1;
acc := 0;
while i <= s do
begin
acc := acc + i;
i := i + 1
end
end. program
program
identifier: ‘Source2’
body
var_sct
var
var_dct
identifier: ‘s’
:
type
Integer
;
var_dct
identifier: ‘i’
:
type
Integer
;
var_dct
identifier: ‘acc’
:
type
Integer
block
begin
stmt
read_stmt
read
identifier: ‘s’
;
stmt
assgm_stmt
identifier: ‘i’
:=
expr
arithm_expr
term
factor
int_literal: ‘1’
term’
arithm_expr’
;
stmt
assgm_stmt
identifier: ‘acc’
:=
expr
arithm_expr
term
factor
int_literal: ‘0’
term’
arithm_expr’
;
stmt
while_stmt
while
bool_expr
arithm_expr
term
factor
identifier: ‘i’
term’
arithm_expr’
<=
arithm_expr
term
factor
identifier: ‘s’
term’
arithm_expr’
do
stmt
block
begin
stmt
assgm_stmt
identifier: ‘acc’
:=
expr
arithm_expr
term
factor
identifier: ‘acc’
term’
arithm_expr’
+
term
factor
identifier: ‘i’
term’
arithm_expr’
;
stmt
assgm_stmt
identifier: ‘i’
:=
expr
arithm_expr
term
factor
identifier: ‘i’
term’
arithm_expr’
+
term
factor
int_literal: ‘1’
term’
arithm_expr’
end
end
.
program Source3
var
x: Integer;
y: Integer;
result: Boolean
begin
read x;
read y;
if x > y then
result := true
else
result := false;
write result
end. program
program
identifier: ‘Source3’
body
var_sct
var
var_dct
identifier: ‘x’
:
type
Integer
;
var_dct
identifier: ‘y’
:
type
Integer
;
var_dct
identifier: ‘result’
:
type
Boolean
block
begin
stmt
read_stmt
read
identifier: ‘x’
;
stmt
read_stmt
read
identifier: ‘y’
;
stmt
if_stmt
if
bool_expr
arithm_expr
term
factor
identifier: ‘x’
term’
arithm_expr’
>
arithm_expr
term
factor
identifier: ‘y’
term’
arithm_expr’
then
stmt
assgm_stmt
identifier: ‘result’
:=
expr
bool_expr
bool_literal
true
else
stmt
assgm_stmt
identifier: ‘result’
:=
expr
bool_expr
bool_literal
false
;
stmt
write_stmt
write
identifier: ‘result’
end
.
source4.pas Syntax Analyzer Error: identifier expected!
source5.pas Syntax Analyzer Error: begin expected!
source6.pas Syntax Analyzer Error: colon expected!
source7.pas Syntax Analyzer Error: assignment expected!
source8.pas Syntax Analyzer Error: type expected!
source9.pas Syntax Analyzer Error: do expected!
source10.pas Syntax Analyzer Error: relational operator expected!
source11.pas Syntax Analyzer Error: period expected!
source12.pas Syntax Analyzer Error: identifier or (int) literal expected!
source13.pas Syntax Analyzer Error: then expected!
source14.pas Syntax Analyzer Error: end expected!
source15.pas Syntax Analyzer Error: EOF expected!
Please be aware that those tests are far from being comprehensive. The instructor reserves the right to run other tests on your code if deemed necessary. You are encouraged to create other tests on your own.
4. Deliverables and Submission
All of the source code that you use for this project must be zipped in a file named src.zip and submitted through Canvas. I only accept zip format (no Z, RAR, or any other format).
Each source code MUST have a comment section in the beginning with the name(s) of the author(s) of the project. You are allowed to work together with another classmate. Teams of more than two students will NOT be accepted (NO exceptions). Only one of the members of the team needs to submit on Canvas.
5. Rubric
This programming assignment is worth 100 points, distributed in the following way:
+3 command-line validation
+32 lexical analyzer works as expected
+5 tokens match specification
+2 recognizes EOF
+3 recognizes identifiers
+3 recognizes literals
+5 recognizes special words
+2 recognizes assignment operator
+3 recognizes arithmetic operator
+3 recognizes relational operators
+2 recognizes logical operators
+3 recognizes punctuators
+1 raises exception with proper error message when it fails
+60 syntax analyzer works as expected
+10 grammar used matches specification
+10 parser shows appropriate error messages
+10 parse tree is built correctly
+2 for each source input test that passes all tests
+5 submission follows instructions (student names identified in a comment section, zip format,)