[antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Tue Sep 27 23:40:47 PDT 2011

Previous message: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed
Next message: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Jim,

What you think about this idea to resolve everything on the LEXER level?

So we must resolve tokens as
    
* STRING_LITERAL          'aa'
* STRING_LITERAL          'aa' ws* 'bb'     => Token( "aabb" )

* STRING_LITERAL          'aa\'bb'          => Token( "aa'bb" )
* STRING_LITERAL          'aa''bb'           => Token( "aa'bb" )
* STRING_LITERAL          'aa''bb''cc'      => Token( "aa'bb'cc" )

* HEX_LITERAL              x'aa'                  => Token( "aabb" )
* HEX_LITERAL              x'aa' ws* 'bb'     => Token( "aabb" )


Do you think we can do this in [C] without copying buffers?
I think not.

Then question is: 
    how this can be solved using minimal copies?

Or you think that better really use
    Lexer -> Parser -> TreeParser combination ?


On 9/28/11 1:34 AM, "Ruslan Zasukhin" <ruslan_zasukhin at valentina-db.com>
wrote:

> On 9/28/11 12:46 AM, "Douglas Godfrey" <douglasgodfrey at gmail.com> wrote:
> 
> Hi Douglas,
> 
> Yes, I have thinked about this way also.
> 
> But in your solution you use helper functions as
>     RemoveQuotePairs()
> 
> Which, I guess do some coping in additional ram buffers.
> This is fine for Java guys, but in C code, as Jim likes underline each time,
> we tend to use only pointers to input buffer, as long as possible.
>  
> 
>> You need to modify your string lexing rules to use sub-rules for the
>> elementary
>> strings and return the concatenated string as the lexer token value.
>> 
>> The value of 
>> 
>> StringConstant: QuotedString
>> {RemoveQuotePairs($QuotedString);};
>> 
>> fragment
>> QuotedString:  ( StringTerm )+;
>> 
>> fragment
>> StringTerm:  Dquote ( Character )* Dquote;
>> 
>> fragment
>> Character: ( ' ' | AlphaChar | Punctuation | Digit );

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]

Previous message: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed
Next message: [antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the antlr-interest mailing list