The purpose of this research project is to explore the power of quick language specification with lex and yacc (or flex and bison — their younger cousins).
Your report needs to cover the origins of the lex and yacc tools, their theoretical basis for existence and working, common applications, [lack of] portability, and compare/contrast your experience with building either the expression lexer or the data lexer (see below) to doing the same thing by hand in C.
The support code is in three parts:
Expression Interpreter
Create a lexical analyzer, parser, and interpreter for the following expression language:
Operations (most to least precedence)
Operator | Discussion |
---|---|
( ) | parentheses to change the order of other operations |
^ | exponentiation |
& | mixed numeral combinator as in 4 & 3/7 or even 4 & (3x)/(7y) (but not 4 & 3x/(7y) which would have the & at a common level of operation with the multiplication of 3 and x) |
*, /, +, - | multiplication, division, unary plus, and unary minus (aka
negation or additive opposite) (unary plus has no actual effect on its operand) (multiplication can also be indicated merely by position as in 3x to indicate the product of 3 and x) |
+, -, & | addition, subtraction, and degenerate addition (when not used as a mixed number combinator as above) |
= | assignment of a value to a variable |
Identifiers
Variable identifiers may be any sequence of non-spacing, non-operator, non-period characters. The sequence length may be as long as desired. Identifiers may NOT begin with a digit character.
Positional multiplication of variables requires spacing between them: x y is multiplication of x by y but xy is the variable xy.
Numerical Literals
All numerical values may be considered double typed (or long double if you prefer) and are built up from an optional sign (±), a whole part (W), a decimal part (D), a decimal point (.), and a possible scientific notation part (S). Possible combinations include: ±W, ±W., ±W.D, or ±.D. Any of these may be followed by either an e or an E and ±S.
You may optionally include the following set of conditional operations in your expression language:
Operator | Discussion |
---|---|
! | logical not (!x is arithmetically equivalent to 1-x) |
&& | logical and (arithmetically equivalent to *) |
¦¦ | logical or (arithmetically equivalent to +) |
==, !=, <, <=, >, >= | comparison |
They would be inserted between the addition operations and the assignment operation in the precedence chart above.
This option would raise the level of the project by (Level 1).
You may also elect to intelligently realize that, had the user already set up x and y, they are likely attempting to multiply these two variables with xy rather than creating a new variable in the middle of a calculation.
This option would raise the level of the project by (Level 2).
Data Lexer
Our data files contain quote-delimited sequences of characters, [curly] brace enclosed groups of values, two forms of commenting, and blank lines.
A quoted sequence must begin and end with the same delimiter, but this can be either a single quote (') or a double quote ("). A quote symbol may be placed inside such a sequence if preceded by the escape symbol (\). (Although a single quote may be inside a double-quote-delimited sequence unescaped and vice versa.)
A quoted sequence may NOT span to multiple lines.
Comments may begin with a double slash (//) and would then consume all characters up to and including the next new-line ('\n').
Comments may alternatively begin with a slash-star symbol (/*) and would then consume all characters up to and including the star-slash symbol (*/).
Comments may be nested inside their own kind with no repercussions. When nesting double-slash comments, all are considered to share their new-line ('\n') terminator. Slash-star-star-slash comments require their own terminators when nested.
Cross-nesting of comments may be allowed (this will earn you an extra (Level 2)), but be aware of two tricky situations. If a slash-star-star-slash comment begins [but does not also end] within a double-slash comment, they are considered to overlap rather than nest:
data // comment /* shared comment slash-star only comment comment */ data
However, the placement of a double-slash comment on the last line of a slash-star-star-slash comment — inside of the slash-star-star-slash's zone — does NOT overlap to the new-line but is terminated by the surrounding slash-star-star-slash's terminator:
data /* comment slash-star only comment comment // shared comment */ data
The trickiest situation to some (which must be done as part of the actual supporting code), however, is that quoted material must be respected even inside a comment zone!
data /* comment comment " ...quoted sequence */ quoted sequence... " comment comment */ data
Any non-quoted sequence of spaces, tabs, and/or new-lines may be compressed to a single space.
Each 'item' lexed must be reported along with its starting and ending lines/columns. (Columns are measured regardless of space/tab compaction. Lines are measured regardless of new-line compaction.)
The lexer must report everything lexed — including comments! (A nested comment is considered part of its surrounding comment. Both portions of overlapping comments are to be reported separately — duplicating the overlapping portion.)
An entire brace-enclosed group is not considered a single item. Rather, it is a sequence starting with { and ending with }. (Comments and further brace-groups may therefore be placed inside such a structure.)
Comparison C Code
Choose one of the above items and hand-craft your solution in C. Use this code as a point for comparison and contrast in your report.
If you combine the results of the data lexer with the expression interpreter (allowing for data 'values' to be expressions in some way), I'll throw in an extra (Level 3).
This assignment is (Level 8). (This does not include any enhancements or add-ons you chose to do above, please add their appropriate level as well.)