ANTLR gramática para reStructuredText (prioridades de regla)

Question

May 30, 2011, 06:16 PM

ANTLR gramática para reStructuredText (prioridades de regla)

Primera secuencia de preguntas

Hola a todos

Esto podría ser un seguimiento de esta pregunta:Antlr reglas prioridades

Estoy tratando de escribir una gramática ANTLR para elreStructuredText markup language.

El principal problema al que me enfrento es: "¿Cómo hacer coincidir cualquier secuencia de caracteres (texto normal) sin enmascarar otras reglas gramaticales?"

Tomemos un ejemplo con un párrafo con marcado en línea:

In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element 
*before* the insert point. The variable ``after_ptr`` points to the element *after* the 
insert. In other words, we are going to put our new element **in between** ``before_ptr`` 
and ``after_ptr``.

Pensé que escribir reglas para el texto de marcado en línea sería fácil. Entonces escribí una gramática simple:

grammar Rst;

options {
    output=AST;
    language=Java;
    backtrack=true;
    //memoize=true;
}

@members {
boolean inInlineMarkup = false;
}

// PARSER

text
    : inline_markup (WS? inline_markup)* WS? EOF
    ;


inline_markup
@after {
inInlineMarkup = false;
}
    : {!inInlineMarkup}? (emphasis|strong|litteral|link)
    ;

emphasis
@init {
inInlineMarkup = true;
}
    : '*' (~'*')+ '*' {System.out.println("emphasis: " + $text);}
    ;

strong
@init {
inInlineMarkup = true;
}
    : '**' (~'*')+ '**' {System.out.println("bold: " + $text);}
    ;

litteral
@init {
inInlineMarkup = true;
}
    : '``' (~'`')+ '``' {System.out.println("litteral: " + $text);}
    ;

link
@init {
inInlineMarkup = true;
}
    : inline_internal_target
    | footnote_reference
    | hyperlink_reference
    ;

inline_internal_target
    : '_`' (~'`')+ '`' {System.out.println("inline_internal_target: " + $text);}
    ;

footnote_reference
    : '[' (~']')+ ']_' {System.out.println("footnote_reference: " + $text);}
    ;


hyperlink_reference
    : ~(' '|'\t'|'\u000C'|'_')+ '_' {System.out.println("hyperlink_reference: " + $text);}
    |   '`' (~'`')+ '`_' {System.out.println("hyperlink_reference (long): " + $text);}
    ;

// LEXER

WS  
  : (' '|'\t'|'\u000C')+
  ; 

NEWLINE
  : '\r'? '\n'
  ;

Esta gramática simple no funciona. Y ni siquiera intenté igualar aregula texto ...

Mis preguntas

¿Podría alguien señalar mis errores y tal vez darme una pista sobre cómo hacer coincidirregula texto? ¿Hay alguna forma de establecer prioridad en las reglas gramaticales? Tal vez esto podría ser una ventaja.

Gracias de antemano por tu ayuda :-

Robi

Segunda secuencia de preguntas

¡Muchas gracias por su ayuda! Hubiera tenido dificultades para calcular mis errores ... No estoy escribiendo esa gramática (solo) para aprender ANTLR, estoy tratando de codificar un complemento IDE para eclipse. Y para eso, necesito una gramática;)

Logré ir más allá en la gramática y escribí untext regla:

grammar Rst;

options {
    output=AST;
    language=Java;
}



@members {
boolean inInlineMarkup = false;
}

//////////////////
// PARSER RULES //
//////////////////

file
  : line* EOF
  ;


line
  : text* NEWLINE
  ;

text
    : inline_markup
    | normal_text
    ;

inline_markup
@after {
inInlineMarkup = false;
}
    : {!inInlineMarkup}? {inInlineMarkup = true;} 
  (
  | STRONG
  | EMPHASIS
  | LITTERAL
  | INTERPRETED_TEXT
  | SUBSTITUTION_REFERENCE
  | link
  )
    ;


link
    : INLINE_INTERNAL_TARGET
    | FOOTNOTE_REFERENCE
    | HYPERLINK_REFERENCE
    ;

normal_text
  : {!inInlineMarkup}? 
   ~(EMPHASIS
      |SUBSTITUTION_REFERENCE
      |STRONG
      |LITTERAL
      |INTERPRETED_TEXT
      |INLINE_INTERNAL_TARGET
      |FOOTNOTE_REFERENCE
      |HYPERLINK_REFERENCE
      |NEWLINE
      )
  ;
//////////////////
// LEXER TOKENS //
//////////////////

EMPHASIS
    : STAR ANY_BUT_STAR+ STAR {System.out.println("EMPHASIS: " + $text);}
    ;

SUBSTITUTION_REFERENCE
  : PIPE ANY_BUT_PIPE+ PIPE  {System.out.println("SUBST_REF: " + $text);}
  ;

STRONG
    : STAR STAR ANY_BUT_STAR+ STAR STAR {System.out.println("STRONG: " + $text);}
    ;

LITTERAL
    : BACKTICK BACKTICK ANY_BUT_BACKTICK+ BACKTICK BACKTICK {System.out.println("LITTERAL: " + $text);}
    ;
INTERPRETED_TEXT
  : BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println("LITTERAL: " + $text);}
  ;

INLINE_INTERNAL_TARGET
    : UNDERSCORE BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println("INLINE_INTERNAL_TARGET: " + $text);}
    ;

FOOTNOTE_REFERENCE
    : L_BRACKET ANY_BUT_BRACKET+ R_BRACKET UNDERSCORE {System.out.println("FOOTNOTE_REFERENCE: " + $text);}
    ;


HYPERLINK_REFERENCE
  : BACKTICK ANY_BUT_BACKTICK+ BACKTICK UNDERSCORE {System.out.println("HYPERLINK_REFERENCE (long): " + $text);}
  | ANY_BUT_ENDLINK+ UNDERSCORE {System.out.println("HYPERLINK_REFERENCE (short): " + $text);}
  ;

WS  
  : (' '|'\t')+ {$channel=HIDDEN;}
  ; 

NEWLINE
  : '\r'? '\n' {$channel=HIDDEN;}
  ;


///////////////
// FRAGMENTS //
///////////////

fragment ANY_BUT_PIPE
  : ESC PIPE
  | ~(PIPE|'\n'|'\r')
  ;
fragment ANY_BUT_BRACKET
  : ESC R_BRACKET
  | ~(R_BRACKET|'\n'|'\r')
  ;
fragment ANY_BUT_STAR
  : ESC STAR
  | ~(STAR|'\n'|'\r')
  ;
fragment ANY_BUT_BACKTICK
  : ESC BACKTICK
  | ~(BACKTICK|'\n'|'\r')
  ;
fragment ANY_BUT_ENDLINK
  : ~(UNDERSCORE|' '|'\t'|'\n'|'\r')
  ;



fragment ESC
  : '\\'
  ;
fragment STAR
  : '*'
  ;
fragment BACKTICK
  : '`'
  ;
fragment PIPE
  : '|'
  ;
fragment L_BRACKET
  : '['
  ;
fragment R_BRACKET
  : ']'
  ;
fragment UNDERSCORE
  : '_'
  ;

La gramática funciona bien para inline_markup pero normal_text no coincide.

Aquí está mi clase de prueba:

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.tree.Tree;

public class Test {

    public static void main(String[] args) throws RecognitionException, IOException {

        InputStream is = Test.class.getResourceAsStream("test.rst");
        Reader r = new InputStreamReader(is);
        StringBuilder source = new StringBuilder();
        char[] buffer = new char[1024];
        int readLenght = 0;
        while ((readLenght = r.read(buffer)) > 0) {
            if (readLenght < buffer.length) {
                source.append(buffer, 0, readLenght);
            } else {
                source.append(buffer);
            }
        }
        r.close();
        System.out.println(source.toString());

        ANTLRStringStream in = new ANTLRStringStream(source.toString());
        RstLexer lexer = new RstLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        RstParser parser = new RstParser(tokens);
        RstParser.file_return out = parser.file();
        System.out.println(((Tree)out.getTree()).toStringTree());
    }
}

Y el archivo de entrada que uso:

In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element 
*before* the insert point. The variable ``after_ptr`` points to the |element| *after* the 
insert. In other words, `we are going`_ to put_ our new element **in between** ``before_ptr`` 
and ``after_ptr``.

Y obtengo esta salida:

HYPERLINK_REFERENCE (short): 7-6`_
line 1:2 mismatched character ' ' expecting '_'
line 1:10 mismatched character ' ' expecting '_'
line 1:18 mismatched character ' ' expecting '_'
line 1:21 mismatched character ' ' expecting '_'
line 1:26 mismatched character ' ' expecting '_'
line 1:37 mismatched character ' ' expecting '_'
LITTERAL: `before_ptr`
line 1:86 no viable alternative at character '\r'
line 1:55 mismatched character ' ' expecting '_'
line 1:60 mismatched character ' ' expecting '_'
line 1:63 mismatched character ' ' expecting '_'
line 1:70 mismatched character ' ' expecting '_'
line 1:73 mismatched character ' ' expecting '_'
line 1:77 mismatched character ' ' expecting '_'
line 1:85 mismatched character ' ' expecting '_'
EMPHASIS: *before*
line 2:12 mismatched character ' ' expecting '_'
line 2:19 mismatched character ' ' expecting '_'
line 2:26 mismatched character ' ' expecting '_'
LITTERAL: `after_ptr`
line 2:30 mismatched character ' ' expecting '_'
line 2:39 mismatched character ' ' expecting '_'
line 2:90 no viable alternative at character '\r'
line 2:60 mismatched character ' ' expecting '_'
line 2:63 mismatched character ' ' expecting '_'
line 2:67 mismatched character ' ' expecting '_'
line 2:77 mismatched character ' ' expecting '_'
line 2:85 mismatched character ' ' expecting '_'
line 2:89 mismatched character ' ' expecting '_'
line 3:7 mismatched character ' ' expecting '_'
line 3:10 mismatched character ' ' expecting '_'
line 3:16 mismatched character ' ' expecting '_'
line 3:23 mismatched character ' ' expecting '_'
line 3:27 mismatched character ' ' expecting '_'
line 3:31 mismatched character ' ' expecting '_'
line 3:42 mismatched character ' ' expecting '_'
line 3:51 mismatched character ' ' expecting '_'
line 3:55 mismatched character ' ' expecting '_'
line 3:63 mismatched character ' ' expecting '_'
line 3:94 mismatched character '\r' expecting '*'
line 4:3 mismatched character ' ' expecting '_'
line 4:18 no viable alternative at character '\r'
line 4:18 mismatched character '\r' expecting '_'
HYPERLINK_REFERENCE (short): oing`_
HYPERLINK_REFERENCE (short): ut_
EMPHASIS: *in between*
LITTERAL: `after_ptr`
BR.recoverFromMismatchedToken
line 0:-1 mismatched input '<EOF>' expecting NEWLINE
null

¿Puede señalar mis errores? (el analizador funciona para el marcado en línea sin errores cuando agrego la opción filter = true; a la gramática)

Robi