blob: c5540d9ea44cb66ef4b363f4195e1a581ad985a6 [file] [log] [blame]
The following changes (change numbers refer to perforce) were
made from version 3.1.1 to 3.1.2
Runtime
-------
Change 5641 on 2009/02/20 by jimi@jimi.jimi.antlr3
Release version 3.1.2 of the ANTLR C runtime.
Updated documents and release notes will have to follow later.
Change 5639 on 2009/02/20 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-356
Ensure that code generation for C++ does not require casts
Change 5577 on 2009/02/12 by jimi@jimi.jimi.antlr3
C Runtime - Bug fixes.
o Having moved to use an extract directly from a vector for returning
tokens, it exposed a
bug whereby the EOF boudary calculation in tokLT was incorrectly
checking > rather than >=.
o Changing to API initialization of tokens rather than memcmp()
incorrectly forgot to set teh input stream pointer for the
manufactured tokens in the token factory;
o Rewrite streams for rewriting tree parsers did not check whether the
rewrite stream was ever assigned before trying to free it, it is now
in line with the ordinary parser code.
Change 5576 on 2009/02/11 by jimi@jimi.jimi.antlr3
C Runtime: Ensure that when we manufacture a new token for a missing
token, that the user suplied custom information (if any) is copied
from the current token.
Change 5575 on 2009/02/08 by jimi@jimi.jimi.antlr3
C Runtime - Vastly improve the reuse of allocated memory for nodes in
tree rewriting.
A problem for all targets at the moment si that the rewrite logic
generated by ANTLR makes no attempt
to reuse any resources, it merely gurantees that the tree shape at the
end is correct. To some extent this is mitigated by the garbage
collection systems of Java and .Net, even thoguh it is still an overhead to
keep creating so many modes.
This change implements the first of two C runtime changes that make
best efforst to track when a node has become orphaned and will never
be reused, based on inherent knowledge of the rewrite logic (which in
the long term is not a great soloution).
Much of the rewrite logic consists of creating a niilnode into which
child nodes are appended. At: rulePost processing time; when a rewrite
stream is closed; and when becomeRoot is called, there are many situations
where the root of the tree that will be manipulted, or is finished with
(in the case of rewrtie streams), where the nilNode was just a temporary
creation for the sake of the rewrite itself.
In these cases we can see that the nilNode would just be left ot rot in
the node factory that tracks all the tree nodes.
Rather than leave these in the factory to rot, we now keep a resuse
stck and always reuse any node on this
stack before claimin a new node from the factory pool.
This single change alone reduces memory usage in the test case (20,604
line C program and a GNU C parser)
from nearly a GB, to 276MB. This is still way more memory than we
shoudl need to do this operation, even on such a large input file,
but the reduction results in a huge performance increase and greatly
reduced system time spent on allocations.
After this optimizatoin, comparison with gcc yeilds:
time gcc -S a.c
a.c:1026: warning: conflicting types for built-in function ‘vsprintf’
a.c:1030: warning: conflicting types for built-in function ‘vsnprintf’
a.c:1041: warning: conflicting types for built-in function ‘vsscanf’
0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+240outputs (0major+8345minor)pagefaults 0swaps
and
time ./jimi
Reading a.c
0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+66609minor)pagefaults 0swaps
And we can now interpolate the fact that the only major differnce is
now the huge disparity in memory allocations. A
future optimization of vector pooling, to sepate node resue from vector
reuse, currently looks promising for further reuse of memory.
Finally, a static analysis of the rewrte code, plus a realtime analysis
of the heap at runtime, may well give us a reasonable memory usage
pattern. In reality though, it is the generated rewrite logic
that must becom optional at not continuously rewriting things that it
need not, as it ascends the rule chain.
Change 5563 on 2009/01/28 by jimi@jimi.jimi.antlr3
Allow rewrite streams to use the base adaptors vector factory and not
try to malloc new vectors themselves.
Change 5562 on 2009/01/28 by jimi@jimi.jimi.antlr3
Don't use CALLOC to allocate tree pools, use malloc as there is no need
for calloc.
Change 5561 on 2009/01/28 by jimi@jimi.jimi.antlr3
Prevent warnigsn about retval.stop not being initialized when a rule
returns eraly because it is in backtracking mode
Change 5558 on 2009/01/28 by jimi@jimi.jimi.antlr3
Lots of optimizations (though the next one to be checked in is the huge
win) for AST building and vector factories.
A large part of tree rewriting was the creation of vectors to hold AST
nodes. Although I had created a vector factory, for some reason I never got
around to creating a proper one, that pre-allocated the vectors in chunks and
so on. I guess I just forgot to. Hence a big win here is prevention of calling
malloc lots and lots of times to create vectors.
A second inprovement was to change teh vector definition such that it
holds a certain number of elements wihtin the vector structure itself, rather
than malloc and freeing these. Currently this is set to 8, but may increase.
For AST construction, this is generally a big win because AST nodes don't often
have many individual children unless there has not been any shaping going on in
the parser. But if you are not shaping, then you don't really need a tree.
Other perforamnce inprovements here include not calling functions
indirectly within token stream and common token stream. Hence tokens are
claimed directly from the vectors. Users can override these funcitons of course
and all this means is that if you override tokenstreams then you pretty much
have to provide all the mehtods, but then I think you woudl have to anyway (and
I don't know of anyone that has wanted to do this as you can carry your own
structure around with the tokens anyway and that is much easier).
Change 5555 on 2009/01/26 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-288
Correct the interpretation of the skip token such that channel, start
index, char pos in lie, start line and text are correctly reset to the start of
the new token when the one that we just traversed was marked as being skipped.
This correctly excludes the text that was matched as part of the
SKIP()ed token from the next token in the token stream and so has the side
effect that asking for $text of a rule no longer includes the text that shuodl
be skipped, but DOES include the text of tokens that were merely placed off the
default channel.
Change 5551 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-287
Most of the source files did not include the BSD license. THis might
not be that big a deal given that I don't care what people do with it
other than take my name off it, but having the license reproduced
everywhere
at least makes things perfectly clear. Hence this mass change of
sources and templates
to include the license.
Change 5550 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-365
Ensure that as soon as we known about an input stream on the lexer that
we borrow its string factroy adn use it in our EOF token in case
anyone tries to make it a string, such as in error messages for
instance.
Change 5548 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-363
At some point the Java runtime default changed from discarding offchannel
tokens to preserving them. The fix is to make the C runtime also
default to preserving off-channel tokens.
Change 5544 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-360
Ensure that the fillBuffer funtiion does not call any methods
that require the cached buffer size to be recorded before we
have actually recorded it.
Change 5543 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-362
Some users have started using string factories themselves and
exposed a flaw in the destroy method, that is intended to remove
a strng htat was created by the factory and is no longer needed.
The string was correctly removed from the vector that tracks them
but after the first one, all the remaining strings are then numbered
incorrectly. Hence the destroy method has been recoded to reindex
the strings in the factory after one is removed and everythig is once
more hunky dory.
User suggested fix rejected.
Change 5542 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed ANTLR-366
The recognizer state now ensures that all fields are set to NULL upon
creation
and the reset does not overwrite the tokenname array
Change 5527 on 2009/01/15 by jimi@jimi.jimi.antlr3
Add the C runtime for 3.1.2 beta2 to perforce
Change 5526 on 2009/01/15 by jimi@jimi.jimivista.antlr3
Correctly define the MEMMOVE macro which was inadvertently left to be
memcpy.
Change 5503 on 2008/12/12 by jimi@jimi.jimi.antlr3
Change C runtime release number to 3.1.2 beta
Change 5473 on 2008/12/01 by jimi@jimi.jimivista.antlr3
Fixed: ANTLR-350 - C runtime use of memcpy
Prior change to use memcpy instead of memmove in all cases missed the
fact that the string factory can be in a situation where overlaps occur. We now
have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
Change 5471 on 2008/12/01 by jimi@jimi.jimivista.antlr3
Fixed ANTLR-361
- Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for
MingW
Templates
---------
Change 5637 on 2009/02/20 by jimi@jimi.jimi.antlr3
C rtunime - make sure that ADAPTOR results are cast to the tree type on
a rewrite
Change 5620 on 2009/02/18 by jimi@jimi.jimi.antlr3
Rename/Move:
From: //depot/code/antlr/main/src/org/antlr/codegen/templates/...
To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/...
Relocate the code generating templates to exist in the directory set
that maven expects.
When checking in your templates, you may find it easiest to make a copy
of what you have, revert the change in perforce, then just check out the
template in the new location, and copy the changes back over. Nobody has oore
than two files open at the moment.
Change 5578 on 2009/02/12 by jimi@jimi.jimi.antlr3
Correct the string template escape sequences for generating scope
code in the C templates.
Change 5577 on 2009/02/12 by jimi@jimi.jimi.antlr3
C Runtime - Bug fixes.
o Having moved to use an extract directly from a vector for returning
tokens, it exposed a
bug whereby the EOF boudary calculation in tokLT was incorrectly
checking > rather than
>=.
o Changing to API initialization of tokens rather than memcmp()
incorrectly forgot to
set teh input stream pointer for the manufactured tokens in the
token factory;
o Rewrite streams for rewriting tree parsers did not check whether the
rewrite stream
was ever assigned before trying to free it, it is now in line with
the ordinary parser code.
Change 5567 on 2009/01/29 by jimi@jimi.jimi.antlr3
C Runtime - Further Optimizations
Within grammars that used scopes and were intended to parse large
inputs with many rule nests,
the creation anf deletion of the scopes themselves became significant.
Careful analysis shows that
for most grammars, while a parse could create and delete 20,000 scopes,
the maxium depth of
any scope was only 8.
This change therefore changes the scope implementation so that it does
not free scope memory when
it is popped but just tracks it in a C runtime stack, eventually
freeing it when the stack is freed. This change
caused the allocation of only 12 scope structures instead of 20,000 for
the extreme example case.
This change means that scope users must be carefule (as ever in C) to
initializae their scope elements
correctly as:
1) If not you may inherit values from a prior use of the scope
structure;
2) SCope structure are now allocated with malloc and not calloc;
Also, when using a custom free function to clean a scope when it is
popped, it is probably a good idea
to set any free'd pointers to NULL (this is generally good C programmig
practice in any case)
Change 5566 on 2009/01/29 by jimi@jimi.jimi.antlr3
Remove redundant BACKTRACK checking so that MSVC9 does not get confused
about possibly uninitialized variables
Change 5565 on 2009/01/28 by jimi@jimi.jimi.antlr3
Use malloc rather than calloc to allocate memory for new scopes. Note
that this means users will have to be careful to initialize any values in their
scopes that they expect to be 0 or NULL and I must document this.
Change 5564 on 2009/01/28 by jimi@jimi.jimi.antlr3
Use malloc rather than calloc for copying list lable tokens for
rewrites.
Change 5561 on 2009/01/28 by jimi@jimi.jimi.antlr3
Prevent warnigsn about retval.stop not being initialized when a rule
returns eraly because it is in backtracking mode
Change 5560 on 2009/01/28 by jimi@jimi.jimi.antlr3
Add a NULL check before freeing rewrite streams used in AST rewrites
rather than auto-rewrites.
While the NULL check is redundant as the free cannot be called unless
it is assigned, Visual Studio C 2008
gets it wrong and thinks that there is a PATH than can arrive at the
free wihtout it being assigned and that is too annoying to ignore.
Change 5559 on 2009/01/28 by jimi@jimi.jimi.antlr3
C target Tree rewrite optimization
There is only one optimization in this change, but it is a huge one.
The code generation templates were set up so that at the start of a rule,
any rewrite streams mentioned in the rule wer pre-created. However, this
is a massive overhead for rules where only one or two of the streams are
actually used, as we create them then free them without ever using them.
This was copied from the Java templates basically.
This caused literally millions of extra calls and vector allocations
in the case of the GNU C parser given to me for testing with a 20,000
line program.
After this change, the following comparison is avaiable against the gcc
compiler:
Before (different machines here so use the relative difference for
comparison):
gcc:
real 0m0.425s
user 0m0.384s
sys 0m0.036s
ANTLR C
real 0m1.958s
user 0m1.284s
sys 0m0.656s
After the previous optimizations for vector pooling via a factory,
plus this huge win in removing redundant code, we have the following
(different machine to the one above):
gcc:
0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+328outputs (0major+9922minor)pagefaults 0swaps
ANTLR C:
0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+130944minor)pagefaults 0swaps
The extra system time coming from the fact that although the tree
rewriting is now optimal in terms of not allocating things it does
not need, there is still a lot more overhead in a parser that is generated
for generic use, including much more use of structures for tokens and extra
copying and so on. I will
continue to work on improviing things where I can, but the next big
improvement will come from Ter's optimization of the actual code structures we
generate including not doing things with rewrite streams that we do not need to
do at all.
The second machine I used is about twice as fast CPU wise as the system
that was used originally by the user that asked about this performance.
Change 5558 on 2009/01/28 by jimi@jimi.jimi.antlr3
Lots of optimizations (though the next one to be checked in is the huge
win) for AST building and vector factories.
A large part of tree rewriting was the creation of vectors to hold AST
nodes. Although I had created a vector factory, for some reason I never got
around to creating a proper one, that pre-allocated the vectors in chunks and
so on. I guess I just forgot to. Hence a big win here is prevention of calling
malloc lots and lots of times to create vectors.
A second inprovement was to change teh vector definition such that it
holds a certain number of elements wihtin the vector structure itself, rather
than malloc and freeing these. Currently this is set to 8, but may increase.
For AST construction, this is generally a big win because AST nodes don't often
have many individual children unless there has not been any shaping going on in
the parser. But if you are not shaping, then you don't really need a tree.
Other perforamnce inprovements here include not calling functions
indirectly within token stream and common token stream. Hence tokens are
claimed directly from the vectors. Users can override these funcitons of course
and all this means is that if you override tokenstreams then you pretty much
have to provide all the mehtods, but then I think you woudl have to anyway (and
I don't know of anyone that has wanted to do this as you can carry your own
structure around with the tokens anyway and that is much easier).
Change 5554 on 2009/01/26 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-379
For some reason in the past, the ruleMemozation() template had required
that the name parameter be set to the rule name. This does not seem to be a
requirement any more. The name=xxx override when invoking the template was
causing all the scope names derived when cleaning up in memoization to be
called after the rule name, which was not correct. Howver, this only affected
the output when in output=AST mode.
This template invocation is now corrected.
Change 5553 on 2009/01/26 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-330
Managed to get the one rule that could not see the ASTLabelType to call
back in to the super template C.stg and ask it to construct hte name. I am not
100% sure that this fixes all cases, but I cannot find any that fail. PLease
let me know if you find any exampoles of being unable to default the
ASTLabelType option in the C target.
Change 5552 on 2009/01/25 by jimi@jimi.jimi.antlr3
Progress: ANTLR-327
Fix debug code generation templates when output=AST such that code
can at least be generated and I can debug the output code correctly.
Note that this checkin does not implement the debugging requirements
for tree generating parsers.
Change 5551 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-287
Most of the source files did not include the BSD license. THis might
not be that big a deal given that I don't care what people do with it
other than take my name off it, but having the license reproduced
everywhere at least makes things perfectly clear. Hence this mass change of
sources and templates to include the license.
Change 5549 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-354
Using 0.0D as the default initialize value for a double caused
VS 2003 C compiler to bomb out. There seesm to be no reason other
than force of habit to set this to 0.0D so I have dropped the D so
that older compilers do not complain.
Change 5547 on 2009/01/25 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-282
All references are now unadorned with any type of NULL check for the
following reasons:
1) A NULL reference means that there is a problem with the
grammar and we need the program to fail immediately so
that the programmer can work out where the problem occured;
2) Most of the time, the only sensible value that can be
returned is NULL or 0 which
obviates the NULL check in the first place;
3) If we replace a NULL reference with some value such as 0,
then the program may blithely continue but just do something
logically wrong, which will be very difficult for the
grammar programmer to detect and correct.
Change 5545 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-357
The bug report was correct in that the types of references to things
like $start were being incorrectly cast as they wer not changed from
Java style casts (and the casts are unneccessary). this is now fixed
and references are referencing the correct, uncast, types.
However, the bug report was wrong in that the reference in the bok to
$start.pos will only work for Java and really, it is incorrect in the
book because it shoudl not access the .pos member directly but shudl
be using $start.getCharPositionInLine().
Because there is no access qualification in C, one could use
$start.charPosition, however
really this should be $start->getCharPositionInLine($start);
Change 5541 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed - ANTLR-367
The code generation for the free method of a recognizer was not
distinguishing tree parsers from parsers when it came to calling delegate free
functions.
This is now corrected.
Change 5540 on 2009/01/24 by jimi@jimi.jimi.antlr3
Fixed ANTLR-355
Ensure that we do not attempt to free any memory that we did not
actually allocate because the parser rule was being executed in
backtracking mode.
Change 5539 on 2009/01/24 by jimi@jimi.jimivista.antlr3
Fixed: ANTLR-355
When a C targetted parser is producing in backtracking mode, then the
creation of new stream rewrite structures shoudl not happen if the rule is
currently backtracking
Change 5502 on 2008/12/11 by jimi@jimi.jimi.antlr3
Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit
compatible
Change 5473 on 2008/12/01 by jimi@jimi.jimivista.antlr3
Fixed: ANTLR-350 - C runtime use of memcpy
Prior change to use memcpy instead of memmove in all cases missed the
fact that the string factory can be in a situation where overlaps occur. We now
have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
Change 5387 on 2008/11/05 by parrt@parrt.spork
Fixed x+=. issue with tree grammars; added unit test
Change 5325 on 2008/10/23 by parrt@parrt.spork
We were all ref'ing backtracking==0 hardcoded instead checking the
@synpredgate action.