c - Eliminating grammar ambiguity when a rule covers a subset of another -
i trying build small bison grammar, having issue part of definition. functions can called expression legal on right side (expression_list in grammar) arguments.
the issue arises because on left side, functions can defined assigning them (an identifier followed list of identifiers - assignment_expression , identifier_list in grammar)
my question how can eliminate ambiguity in grammar, since statements legal on left side subset of legal on right.
the grammar written in bison (v. 2.4.1)
the output command was:
2 shift/reduce, 2 reduce/reduce warning: rule useless in parser due conflicts: assignment_expression: identifier lparen rparen here complete grammar:
expression: assignment_expression | expression decorator identifier value: identifier | hex | bin | oct | sci | float | int ; constant_expression: value | lparen constant_expression rparen | constant_expression or constant_expression | constant_expression xor constant_expression | constant_expression , constant_expression | constant_expression lshift constant_expression | constant_expression rshift constant_expression | constant_expression plus constant_expression | constant_expression minus constant_expression | constant_expression mul constant_expression | constant_expression div constant_expression | constant_expression mod constant_expression | constant_expression pow constant_expression | constant_expression factorial | not constant_expression | identifier lparen rparen | identifier lparen constant_expression rparen | identifier lparen expression_list rparen ; expression_list: constant_expression comma constant_expression | expression_list comma constant_expression ; assignment_expression: constant_expression | identifier equal assignment_expression | identifier lparen rparen | identifier lparen identifier rparen | identifier lparen identifier_list rparen ; identifier_list: identifier comma identifier | identifier_list comma identifier ; here relevant sections of bison's output verbose mode (-v):
state 34 conflicts: 2 shift/reduce state 35 conflicts: 2 reduce/reduce state 34 3 value: identifier . 25 constant_expression: identifier . lparen rparen 26 | identifier . lparen constant_expression rparen 27 | identifier . lparen expression_list rparen 33 assignment_expression: identifier lparen identifier . rparen 35 identifier_list: identifier . comma identifier comma shift, , go state 53 lparen shift, , go state 39 rparen shift, , go state 54 comma [reduce using rule 3 (value)] rparen [reduce using rule 3 (value)] $default reduce using rule 3 (value) state 35 25 constant_expression: identifier lparen rparen . 32 assignment_expression: identifier lparen rparen . $end reduce using rule 25 (constant_expression) $end [reduce using rule 32 (assignment_expression)] decorator reduce using rule 25 (constant_expression) decorator [reduce using rule 32 (assignment_expression)] $default reduce using rule 25 (constant_expression) as per request here minimal grammar issue:
assignment_expression: constant_expression | identifier lparen identifier_list rparen ; value: identifier | int ; constant_expression: value | identifier lparen expression_list rparen ; expression_list: constant_expression comma constant_expression | expression_list comma constant_expression ; identifier_list: identifier comma identifier | identifier_list comma identifier ;
your text , grammar don't quite line up. or maybe i'm not understanding text correctly. say:
on left side, functions can defined assigning them (an identifier followed list of identifiers - assignment_expression , identifier_list in grammar)
in head, imagine example of like:
comb(n, r) = n! / (r! * (n-r)!) but grammar reads:
assignment_expression: constant_expression | identifier equal assignment_expression | identifier lparen rparen | identifier lparen identifier rparen | identifier lparen identifier_list rparen which not parse above definition, because thing can appear left hand side of equal identifier. right-recursion allows number of repetitions of identifier = before assignment_expression, last thing must either either constant_expression or 1 of 3 prototype productions. matched:
c = r = f(a,b) but this:
c = r = f(2, 7) i'd makes grammar inherently ambiguous, error. meant was:
assignment_expression: rvalue | lvalue '=' assignment_expression rvalue: constant_expression lvalue: identifier | identifier '(' ')' | identifier '(' identifier_list ')' i note in passing definition of identifier_list requiring @ least 2 identifiers unnecessarily complicated, i've assumed above actual definition of identifier_list is:
identifier_list: identifier | identifier_list ',' identifier that's not sufficient solve problem, though. still leaves parser not knowing whether:
comb(n | lookahead ',' is start of
comb(n, r) = ... or function call
comb(n, 4) so fix that, need pull out heavy artillery.
we can start simple solution. grammar not ambiguous, since lvalue must followed =. when reach =, can tell whether have far rvalue or lvalue, if happen identical. (comb(n, r), example.) issue = may unlimited distance happen be.
with bison, if have unambiguous grammar , cannot bothered fix lookahead problem, can ask glr parser. glr parser less efficient because needs maintain possible parses in parallel, still linear complexity unambiguous grammars. (glr parsers can parse ambiguous grammars in o(n3) bison implementation doesn't tolerate ambiguity. it's designed parse programming languages, after all.)
so that, merely need add
%glr-parser and read section of bison manual how semantic actions affected. (summary: stored until parse disambiguated, may not happen during parse in lalr(1) parser.)
a second simple solution, common in practice, have parser accept superset of desired language, , add arguably syntactic check in semantic action. write grammar allow looks call_expression on left-hand-side of assignment, when build ast node assignment/definition, verify argument list call simple list of identifiers.
not simplify grammar without implementation cost, makes possible generate accurate error messages describe syntax error, not easy standard lalr(1) parser.
still, there is lalr(1) grammar language (or, rather, imagine language be). in order produce it, need avoid forcing reduction distinguish between lvalue , rvalue until know 1 is.
so issue identifier either part of expression_list or part of identifier_list. , don't know one, when see ). consequently, need special case identifier '(' identifier_list ')', allow part of both lvalue , rvalue. in other words, need like:
lvalue: identifier | prototype rvalue: expression_other_than_lvalue | lvalue which leaves question of how define expression_other_than_lvalue.
much of time, solution simple: constants, operator expressions, parenthesized expressions; none of these can lvalues. call parenthesized list includes expression_other_than_identifier expression_other_than_identifier. thing won't count precisely identifier(identifier,identifier,...)
so let's rewrite grammar far can. (i've changed constant_expression lvalue because shorter type. , substituted many token names actual symbol, find more readable. of following same original.)
value_not_identifier: hex | bin | oct | sci | float | int expr_not_lvalue: value_not_identifier | '(' rvalue ')' | rvalue or rvalue | ... | identifier '(' list_not_id_list ')' lvalue: identifier | identifier '(' ')' | identifier '(' identifier_list ')' identifier_list: identifier | identifier_list ',' identifier now, aside detail haven't defined list_not_id_list, fall place. lvalue , expr_not_lvalue disjoint, can finish with:
rvalue: lvalue | expr_not_lvalue assignment_expression: rvalue | lvalue '=' assignment_expression and need deal expression lists not identifier lists. noted above, like:
expr_not_identifier: expr_not_lvalue lvalue_not_identifier list_not_id_list: expr_not_identifier | list_not_id_list ',' rvalue | identifier_list ',' expr_not_identifier so while parsing list, first time find not identifier, remove list identifier_list production. if through whole list, might still find ourselves lvalue when rvalue desired, decision (finally) can made when see = or statement terminator.
so correct (i hope) complete grammar is:
expression: assignment_expression | expression decorator identifier assignment_expression: rvalue | lvalue '=' assignment_expression value_not_identifier: hex | bin | oct | sci | float | int expr_not_lvalue: value_not_identifier | '(' rvalue ')' | rvalue or rvalue | rvalue xor rvalue | rvalue , rvalue | rvalue lshift rvalue | rvalue rshift rvalue | rvalue '+' rvalue | rvalue '-' rvalue | rvalue '*' rvalue | rvalue '/' rvalue | rvalue '%' rvalue | rvalue pow rvalue | rvalue '!' | not rvalue | identifier '(' list_not_id_list')' lvalue_not_identifier: identifier '(' ')' | identifier '(' identifier_list ')' lvalue: lvalue_not_identifier | identifier rvalue: lvalue | expr_not_lvalue identifier_list: identifier | identifier_list ',' identifier list_not_id_list: expr_not_identifier | list_not_id_list ',' rvalue | identifier_list ',' expr_not_identifier expr_not_identifier: expr_not_lvalue lvalue_not_identifier given availability of simple solutions, , inelegance of transformations required implement precise grammar, little wonder see sort of construction. however, find used extensively in ecma-262 standard (which defines ecmascript aka javascript). grammar formalism used in report includes kind of macro feature simplifies above transformation, doesn't make grammar easier read (imho), , don't know of parser generator implements feature.
Comments
Post a Comment