[antlr-interest] [SPAM] ANTLR PHP target / runtime status
Kenneth Domino
kenneth.domino at domemtech.com
Thu Sep 8 14:16:19 PDT 2011
Hi All,
I updated the PHP target to work with Antlr 3.4/PHP 5.3. This code is
available at http://domemtech.com/code/antlrphpruntime.zip for the next
month or so, until it—hopefully—finds a permanent location. I plan on making
more changes when I start rewriting the runtime tests for the target and
figuring out what in the world is going on with this target.
NOTE: Someone needs to take control of this mess, delete the many forked
copies of this target, and put this in one official location. The
development of this target is absolutely atrocious. This code is not in one
repository, but at least four. I really do not understand why people cannot
make private repositories on their machines instead of proliferating
multiple public repositories. It is not easy figuring out who made what
change when, why, and are those changes useful. There may be more forked
copies of the PHP runtime out in the wild, but who knows.
For better or worse, I chose code base #3 listed below for development, and
made a copy of that onto my machine. The reason I chose that code base was
because the author sent a cogent email explaining his changes, and because
it was changed more recently than any of the other code bases.
WHERE IS ANTLR PHP LOCATED?
Here are the four different repositories:
(1) http://antlrphpruntime.googlecode.com
(http://code.google.com/p/antlrphpruntime/ ) – SVN.
This code is officially anointed in the Antlr targets web page
http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets as “the one
and only PHP target”. It isn’t clear what Antlr version or PHP version this
code targets.
This code was last changed on June 19, 2010 (code) by Eugeny Yakimovitch.
Several other unimportant changes were made more recently (e.g., June 26,
2011).
(2) https://github.com/rollxx/antlr-php-runtime – GIT
This code was probably forked from (1), but since there are no embedded
version ids in the source code, I can’t tell you what was done.
The code was last changed March 21, 2010 by rollex. At the top of the page,
the author says:
“This version in not maintained. Please visit the main project page listed
below for the current version “, and gives a link to (1).
Unfortunately, it’s hard to say whether the changes were successfully merged
back into (1), but there are check ins in late March by rollex to (1).
(3) https://github.com/beberlei/antlr-php-runtime – GIT.
Benjamin Eberlei noted in an email to the “Antlr Interest” and “Antlr dev”
lists
(http://markmail.org/message/zbdc2ni3mfjioens#query:+page:1+mid:zbdc2ni3mfjioens+state:results
http://www.antlr.org/pipermail/antlr-interest/2010-September/039653.html
http://markmail.org/message/v7wq2a6wvsjlwl4n ) that development of the
source code in (1) was halted since Feb 2010. Eberlei modified this code to
fix several bugs and improve on the quality of the code and checked it in.
This repository was forked from (2) (unclear when), and last modified in
September 2010 by beberbei.
(4) http://code.google.com/p/phpandallthat/ – SVN.
Eugeny.Yakimovitch, who is on the list of developers for (1), has an unknown
fork of (1) that is yet another implementation of the PHP runtime.
The latest changes to that source code was in September 10, 2010. Great!
NOTE: As far as I know, there is no PHP target listed in the Fisheye view of
the Antlr repository (linked via http://antlr.org).
DISCUSSIONS ON THE PHP TARGET:
* Aug 30, 2011 http://markmail.org/message/73fo5jg5a36qhv5p
* May 30, 2011
http://www.antlr.org/pipermail/antlr-interest/2011-May/041725.html
* Sep 6/8, 2010 http://markmail.org/message/zbdc2ni3mfjioens
http://markmail.org/message/v7wq2a6wvsjlwl4n
* May 6, 2010 http://www.antlr.org/pipermail/antlr-dev/2009-May/002292.html
* Oct 9, 2009 http://markmail.org/message/ewmppl7u4b3jnwgh
WHAT CHANGES DID I MAKE TO ANTLR PHP?
Most of my changes are in Php.stg, to move it forward to Antlr 3.4, and to
handle lexers with semantic rules, like this grammar:
lexer grammar BigParLexer;
options {
backtrack = true;
filter = true;
}
@members{
int open = 0;
}
P
@init{open = 1;}
:
'/*'
(
{open > 0}?=> // keep reapeating `( ... )*` as long as open > 0
( ( { !((input.LA(1) == '/' && input.LA(2) == '*') || (input.LA(1)
== '*' && input.LA(2) == '/')) }?=> . ) // match anything other than
delimiters.
| '/*' {open++;}
| '*/' {open--;}
)
)*
;
The lexer for this grammar accepts input like ‘/* hi /* there */ */’ as one
token. NB: this grammar doesn’t work exactly as written for the PHP target,
as I explain below.
* Rolled changes from Java.stg, Revision ID: 8204, into Php.stg. The link to
the code for Java.stg used in the modification of Php.stg is:
https://fisheye2.atlassian.com/browse/antlr/tool/src/main/resources/org/antlr/codegen/templates/Java/Java.stg
* Fixed problems with backtracking.
* Fixed missing $input declaration for semantic predicates.
* Fixed missing ‘$’ for ‘alt...’ state variables in DFA generated code.
* Added a makefile to constuct antlr.jar. I could not find any “build.xml”
file anywhere. And, I cannot stand Ant.
WHAT DOES NOT WORK?
Not all the tests in .../runtime/Php/test/Antlr/Tests work. Many of these
are terrible test cases, some of which cause the Antlr tool to output
warnings, and others that crash the tool altogether.
I don't know the status of AST construction, tree parsing, etc. There is
code for tree construction, but I haven't tested it.
WHAT DON'T I LIKE ABOUT THE PHP TARGET?
* PHP does not automatically convert an integer into a string and vice versa
for tests; variables must be preceded with “$”; and “?>” ends PHP code even
in a comment.
Input streams in Antlr are composed of integers, not characters.
“input->LA()” returns a number. When you want to test the lookahead in a
semantic predicate, you must convert the character you are testing into a
number, or convert LA() into a string. So, in the above grammar
BigParLexer, “input.LA(1) == ‘/’” won’t work—and PHP won’t complain! It
must be converted to a target-specific syntax, e.g., “\$input->LA(1) == 47”.
* In the wisdom of the developers of PHP, “?>” ends the PHP code section
even if it is on a comment line.
e.g., “// you are screwed ?> boo hoo.”
Consequently, some of the templates in Php.stg are missing code to generate
descriptions in comments. If the grammar contains “?>”, as in some of the
test cases, PHP will barf on the generated code. There must be a way to
convert the description into a PHP safe format, but I don’t know what that
would be.
* THERE IS NO DOCUMENTATION!
WHAT DO I LIKE ABOUT THE PHP TARGET?
PHP does not have a “64K byte code per method limit” as in Java. When
writing a lexer grammar with semantic predicates, it seems extremely easy to
generate Java code that will not compile (e.g., BitParLexer.g but with
delimiters with more characters, e.g., “<script> .... </script>”. But, PHP
works!
Ken Domino
More information about the antlr-interest
mailing list