[antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed
Ruslan Zasukhin
ruslan_zasukhin at valentina-db.com
Tue Sep 27 23:40:47 PDT 2011
Hi Jim,
What you think about this idea to resolve everything on the LEXER level?
So we must resolve tokens as
* STRING_LITERAL 'aa'
* STRING_LITERAL 'aa' ws* 'bb' => Token( "aabb" )
* STRING_LITERAL 'aa\'bb' => Token( "aa'bb" )
* STRING_LITERAL 'aa''bb' => Token( "aa'bb" )
* STRING_LITERAL 'aa''bb''cc' => Token( "aa'bb'cc" )
* HEX_LITERAL x'aa' => Token( "aabb" )
* HEX_LITERAL x'aa' ws* 'bb' => Token( "aabb" )
Do you think we can do this in [C] without copying buffers?
I think not.
Then question is:
how this can be solved using minimal copies?
Or you think that better really use
Lexer -> Parser -> TreeParser combination ?
On 9/28/11 1:34 AM, "Ruslan Zasukhin" <ruslan_zasukhin at valentina-db.com>
wrote:
> On 9/28/11 12:46 AM, "Douglas Godfrey" <douglasgodfrey at gmail.com> wrote:
>
> Hi Douglas,
>
> Yes, I have thinked about this way also.
>
> But in your solution you use helper functions as
> RemoveQuotePairs()
>
> Which, I guess do some coping in additional ram buffers.
> This is fine for Java guys, but in C code, as Jim likes underline each time,
> we tend to use only pointers to input buffer, as long as possible.
>
>
>> You need to modify your string lexing rules to use sub-rules for the
>> elementary
>> strings and return the concatenated string as the lexer token value.
>>
>> The value of
>>
>> StringConstant: QuotedString
>> {RemoveQuotePairs($QuotedString);};
>>
>> fragment
>> QuotedString: ( StringTerm )+;
>>
>> fragment
>> StringTerm: Dquote ( Character )* Dquote;
>>
>> fragment
>> Character: ( ' ' | AlphaChar | Punctuation | Digit );
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the antlr-interest
mailing list