Assignment 2

In this assignment, you are expected to fill in a program that implements the CYK algorithm for context free grammars. The file-parsing part and data structures have been finished. What is left to fill in is the following method in CYK.java: (Note: For Python version, the names of files, methods and variables are exactly the same as Java version)

// CYK algorithm

// Calculate the array X

// w is the string to be processed

void calcCYK(int [] w)

Details of the CYK algorithm are in slide 7~12 in lecture 15_cfl5. Basically, the whole program is to read in strings over alphabet {a,b}, and then use the context-free grammar defined on slide 8 of lecture 15_cfl5:

S – AB

A – BC | a

B – AC | b

C – a | b

For each string w of length L, the program will call the CYK algorithm to calculate the table X defined on slide 7 of lecture 15_cfl5. In other words, you are expected to calculate Xij,0<=i<=j<=L-1 (in Java, array is 0-based). To facilitate computation, we use numerical values to represent variables and terminals. Here are the definitions of arrays you will need:

int [][] production = new int[maxProductionNum+1][3];

//Productions in Chomsky Normal Form (CNF)

//production[i][0] is the number for the variable (0~3, 0: S 1: A, 2: B, 3: C)

//If this production is of the form A-BC (two variables), then production[i][1] and production[i][2] will contain the numbers for these two variables

//If this production is of the form A-a (a single terminal), then production[i][1] will contain the number for the terminal (0 or 1, 0: a, 1: b), production[i][2]=-1

boolean [][][] X;

//X[i][j][s]=true if and only if variable s (0~3, 0: S 1: A, 2: B, 3: C) is in Xij defined in CYK

//Suppose the length of string to be processed is L, then 0?i?j?L?1

Here is a method we have implemented which you may find helpful:

//check whether (a,b,c) exists in production defined above

//e.g. If you want to check whether S-AB is a production, just call existProd(0,1,2); if you want to check whether A-a is a production, just call existProd(1,0,-1)

boolean existProd(int a,int b,int c)

Here’s the sample input for your program, which is exactly the same as in slide 12 of lecture 15_cfl5.

sample input (sampleCYK.in):

ababa