| #!/usr/bin/ruby |
| # encoding: utf-8 |
| |
| =begin LICENSE |
| |
| [The "BSD licence"] |
| Copyright (c) 2009-2010 Kyle Yetter |
| All rights reserved. |
| |
| Redistribution and use in source and binary forms, with or without |
| modification, are permitted provided that the following conditions |
| are met: |
| |
| 1. Redistributions of source code must retain the above copyright |
| notice, this list of conditions and the following disclaimer. |
| 2. Redistributions in binary form must reproduce the above copyright |
| notice, this list of conditions and the following disclaimer in the |
| documentation and/or other materials provided with the distribution. |
| 3. The name of the author may not be used to endorse or promote products |
| derived from this software without specific prior written permission. |
| |
| THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR |
| IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES |
| OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. |
| IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, |
| INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT |
| NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, |
| DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY |
| THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF |
| THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| |
| =end |
| |
| module ANTLR3 |
| |
| |
| =begin rdoc ANTLR3::Stream |
| |
| = ANTLR3 Streams |
| |
| This documentation first covers the general concept of streams as used by ANTLR |
| recognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module. |
| |
| == ANTLR Stream Classes |
| |
| ANTLR recognizers need a way to walk through input data in a serialized IO-style |
| fashion. They also need some book-keeping about the input to provide useful |
| information to developers, such as current line number and column. Furthermore, |
| to implement backtracking and various error recovery techniques, recognizers |
| need a way to record various locations in the input at a number of points in the |
| recognition process so the input state may be restored back to a prior state. |
| |
| ANTLR bundles all of this functionality into a number of Stream classes, each |
| designed to be used by recognizers for a specific recognition task. Most of the |
| Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default |
| when 'antlr3' is required. |
| |
| --- |
| |
| Here's a brief overview of the various stream classes and their respective |
| purpose: |
| |
| StringStream:: |
| Similar to StringIO from the standard Ruby library, StringStream wraps raw |
| String data in a Stream interface for use by ANTLR lexers. |
| FileStream:: |
| A subclass of StringStream, FileStream simply wraps data read from an IO or |
| File object for use by lexers. |
| CommonTokenStream:: |
| The job of a TokenStream is to read lexer output and then provide ANTLR |
| parsers with the means to sequential walk through series of tokens. |
| CommonTokenStream is the default TokenStream implementation. |
| TokenRewriteStream:: |
| A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers |
| the ability to produce new output text from an input token-sequence by |
| managing rewrite "programs" on top of the stream. |
| CommonTreeNodeStream:: |
| In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens |
| to recognizers in a sequential fashion. However, the stream object serializes |
| an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves |
| the two-dimensional shape of the tree using special UP and DOWN tokens. The |
| sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not |
| defined in antlr3/stream.rb, but antlr3/tree.rb |
| |
| --- |
| |
| The next few sections cover the most significant methods of all stream classes. |
| |
| === consume / look / peek |
| |
| <tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are |
| advanced by one character and TokenStreams are advanced by one token. |
| |
| <tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest |
| to a recognizer at look-ahead position specified by <tt>k</tt>. For |
| <b>StringStreams</b>, this is the <i>integer value of the character</i> |
| <tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this |
| is the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the |
| stream cursor. |
| |
| <tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at |
| look-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the |
| <i>bare-minimum lightweight information</i> that the recognizer needs, |
| <tt>look</tt> provides the <i>full object of concern</i> in the stream. For |
| <b>StringStreams</b>, this is a <i>string object containing the single |
| character</i> <tt>k</tt> characters ahead of the stream cursor. For |
| <b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens |
| ahead of the stream cursor. |
| |
| <b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is |
| implemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is |
| implemented by some method with a name like <tt>LT(k)</tt>. When writing this |
| Ruby runtime API, I found this naming practice both confusing, ambiguous, and |
| un-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a |
| quick-look (peek) and a full-fledged look-ahead operation (look). If this causes |
| confusion or any sort of compatibility strife for developers using this |
| implementation, all apologies. |
| |
| === mark / rewind / release |
| |
| <tt>marker = stream.mark</tt> causes the stream to record important information |
| about the current stream state, place the data in an internal memory table, and |
| return a memento, <tt>marker</tt>. The marker object is typically an integer key |
| to the stream's internal memory table. |
| |
| Used in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can |
| be used to restore the stream to an earlier state. This is used by recognizers |
| to perform tasks such as backtracking and error recovery. |
| |
| <tt>stream.release(marker = last_marker)</tt> can be used to release an existing |
| state marker from the memory table. |
| |
| === seek |
| |
| <tt>stream.seek(position)</tt> moves the stream cursor to an absolute position |
| within the stream, basically like typical ruby <tt>IO#seek</tt> style methods. |
| However, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute |
| position seeking. |
| |
| == The Stream Module |
| |
| <tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream |
| classes used by ANTLR recognizers. |
| |
| The module doesn't do much on its own besides define arguably annoying |
| ``abstract'' pseudo-methods that demand implementation when it is mixed in to a |
| class that wants to be a Stream. Right now this exists as an artifact of porting |
| the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is |
| represented as an interface. In Ruby, however, objects are duck-typed and |
| interfaces aren't that useful as programmatic entities -- in fact, it's mildly |
| wasteful to have a module like this hanging out. Thus, I may axe it. |
| |
| When mixed in, it does give the class a #size and #source_name attribute |
| methods. |
| |
| Except in a small handful of places, most of the ANTLR runtime library uses |
| duck-typing and not type checking on objects. This means that the methods which |
| manipulate stream objects don't usually bother checking that the object is a |
| Stream and assume that the object implements the proper stream interface. Thus, |
| it is not strictly necessary that custom stream objects include ANTLR3::Stream, |
| though it isn't a bad idea. |
| |
| =end |
| |
| module Stream |
| include ANTLR3::Constants |
| extend ClassMacros |
| |
| ## |
| # :method: consume |
| # used to advance a stream one unit (such as character or token) |
| abstract :consume |
| |
| ## |
| # :method: peek( k = 1 ) |
| # used to quickly retreive the object of interest to a recognizer at lookahead |
| # position specified by <tt>k</tt> (such as integer value of a character or an |
| # integer token type) |
| abstract :peek |
| |
| ## |
| # :method: look( k = 1 ) |
| # used to retreive the full object of interest at lookahead position specified |
| # by <tt>k</tt> (such as a character string or a token structure) |
| abstract :look |
| |
| ## |
| # :method: mark |
| # saves the current position for the purposes of backtracking and |
| # returns a value to pass to #rewind at a later time |
| abstract :mark |
| |
| ## |
| # :method: index |
| # returns the current position of the stream |
| abstract :index |
| |
| ## |
| # :method: rewind( marker = last_marker ) |
| # restores the stream position using the state information previously saved |
| # by the given marker |
| abstract :rewind |
| |
| ## |
| # :method: release( marker = last_marker ) |
| # clears the saved state information associated with the given marker value |
| abstract :release |
| |
| ## |
| # :method: seek( position ) |
| # move the stream to the given absolute index given by +position+ |
| abstract :seek |
| |
| ## |
| # the total number of symbols in the stream |
| attr_reader :size |
| |
| ## |
| # indicates an identifying name for the stream -- usually the file path of the input |
| attr_accessor :source_name |
| end |
| |
| =begin rdoc ANTLR3::CharacterStream |
| |
| CharacterStream further extends the abstract-ish base mixin Stream to add |
| methods specific to navigating character-based input data. Thus, it serves as an |
| immitation of the Java interface for text-based streams, which are primarily |
| used by lexers. |
| |
| It adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be |
| implemented to return a slice of the input string from position <tt>start</tt> |
| to position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt> |
| and <tt>column</tt>, which are expected to indicate the current line number and |
| position within the current line, respectively. |
| |
| == A Word About <tt>line</tt> and <tt>column</tt> attributes |
| |
| Presumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text |
| are familliar to most developers. Line numbers of text are indexed from number 1 |
| up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text: |
| |
| Hey this is the first line. |
| Oh, and this is the second line. |
| |
| Line 1 is the string "Hey this is the first line\\n". If a character stream is at |
| line 2, character 0, the stream cursor is sitting between the characters "\\n" |
| and "O". |
| |
| *Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt> |
| with the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered |
| to keep it simple and familliar in this Ruby runtime API. |
| |
| =end |
| |
| module CharacterStream |
| include Stream |
| extend ClassMacros |
| include Constants |
| |
| ## |
| # :method: substring(start,stop) |
| abstract :substring |
| |
| attr_accessor :line |
| attr_accessor :column |
| end |
| |
| |
| =begin rdoc ANTLR3::TokenStream |
| |
| TokenStream further extends the abstract-ish base mixin Stream to add methods |
| specific to navigating token sequences. Thus, it serves as an imitation of the |
| Java interface for token-based streams, which are used by many different |
| components in ANTLR, including parsers and tree parsers. |
| |
| == Token Streams |
| |
| Token streams wrap a sequence of token objects produced by some token source, |
| usually a lexer. They provide the operations required by higher-level |
| recognizers, such as parsers and tree parsers for navigating through the |
| sequence of tokens. Unlike simple character-based streams, such as StringStream, |
| token-based streams have an additional level of complexity because they must |
| manage the task of "tuning" to a specific token channel. |
| |
| One of the main advantages of ANTLR-based recognition is the token |
| <i>channel</i> feature, which allows you to hold on to all tokens of interest |
| while only presenting a specific set of interesting tokens to a parser. For |
| example, if you need to hide whitespace and comments from a parser, but hang on |
| to them for some other purpose, you have the lexer assign the comments and |
| whitespace to channel value HIDDEN as it creates the tokens. |
| |
| When you create a token stream, you can tune it to some specific channel value. |
| Then, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only |
| yield tokens that have the same value for <tt>channel</tt>. The stream skips |
| over any non-matching tokens in between. |
| |
| == The TokenStream Interface |
| |
| In addition to the abstract methods and attribute methods provided by the base |
| Stream module, TokenStream adds a number of additional method implementation |
| requirements and attributes. |
| |
| =end |
| |
| module TokenStream |
| include Stream |
| extend ClassMacros |
| |
| ## |
| # expected to return the token source object (such as a lexer) from which |
| # all tokens in the stream were retreived |
| attr_reader :token_source |
| |
| ## |
| # expected to return the value of the last marker produced by a call to |
| # <tt>stream.mark</tt> |
| attr_reader :last_marker |
| |
| ## |
| # expected to return the integer index of the stream cursor |
| attr_reader :position |
| |
| ## |
| # the integer channel value to which the stream is ``tuned'' |
| attr_accessor :channel |
| |
| ## |
| # :method: to_s(start=0,stop=tokens.length-1) |
| # should take the tokens between start and stop in the sequence, extract their text |
| # and return the concatenation of all the text chunks |
| abstract :to_s |
| |
| ## |
| # :method: at( i ) |
| # return the stream symbol at index +i+ |
| abstract :at |
| end |
| |
| =begin rdoc ANTLR3::StringStream |
| |
| A StringStream's purpose is to wrap the basic, naked text input of a recognition |
| system. Like all other stream types, it provides serial navigation of the input; |
| a recognizer can arbitrarily step forward and backward through the stream's |
| symbols as it requires. StringStream and its subclasses are they main way to |
| feed text input into an ANTLR Lexer for token processing. |
| |
| The stream's symbols of interest, of course, are character values. Thus, the |
| #peek method returns the integer character value at look-ahead position |
| <tt>k</tt> and the #look method returns the character value as a +String+. They |
| also track various pieces of information such as the line and column numbers at |
| the current position. |
| |
| === Note About Text Encoding |
| |
| This version of the runtime library primarily targets ruby version 1.8, which |
| does not have strong built-in support for multi-byte character encodings. Thus, |
| characters are assumed to be represented by a single byte -- an integer between |
| 0 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte |
| characters, but currently this library does not provide any streams to handle |
| non-ASCII encoding. However, encoding-savvy recognition code is a future |
| development goal for this project. |
| |
| =end |
| |
| class StringStream |
| NEWLINE = ?\n.ord |
| |
| include CharacterStream |
| |
| # current integer character index of the stream |
| attr_reader :position |
| |
| # the current line number of the input, indexed upward from 1 |
| attr_reader :line |
| |
| # the current character position within the current line, indexed upward from 0 |
| attr_reader :column |
| |
| # the name associated with the stream -- usually a file name |
| # defaults to <tt>"(string)"</tt> |
| attr_accessor :name |
| |
| # the entire string that is wrapped by the stream |
| attr_reader :data |
| attr_reader :string |
| |
| if RUBY_VERSION =~ /^1\.9/ |
| |
| # creates a new StringStream object where +data+ is the string data to stream. |
| # accepts the following options in a symbol-to-value hash: |
| # |
| # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> |
| # [:line] the initial line number; default: +1+ |
| # [:column] the initial column number; default: +0+ |
| # |
| def initialize( data, options = {} ) # for 1.9 |
| @string = data.to_s.encode( Encoding::UTF_8 ).freeze |
| @data = @string.codepoints.to_a.freeze |
| @position = options.fetch :position, 0 |
| @line = options.fetch :line, 1 |
| @column = options.fetch :column, 0 |
| @markers = [] |
| @name ||= options[ :file ] || options[ :name ] # || '(string)' |
| mark |
| end |
| |
| # |
| # identical to #peek, except it returns the character value as a String |
| # |
| def look( k = 1 ) # for 1.9 |
| k == 0 and return nil |
| k += 1 if k < 0 |
| |
| index = @position + k - 1 |
| index < 0 and return nil |
| |
| @string[ index ] |
| end |
| |
| else |
| |
| # creates a new StringStream object where +data+ is the string data to stream. |
| # accepts the following options in a symbol-to-value hash: |
| # |
| # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> |
| # [:line] the initial line number; default: +1+ |
| # [:column] the initial column number; default: +0+ |
| # |
| def initialize( data, options = {} ) # for 1.8 |
| @data = data.to_s |
| @data.equal?( data ) and @data = @data.clone |
| @data.freeze |
| @string = @data |
| @position = options.fetch :position, 0 |
| @line = options.fetch :line, 1 |
| @column = options.fetch :column, 0 |
| @markers = [] |
| @name ||= options[ :file ] || options[ :name ] # || '(string)' |
| mark |
| end |
| |
| # |
| # identical to #peek, except it returns the character value as a String |
| # |
| def look( k = 1 ) # for 1.8 |
| k == 0 and return nil |
| k += 1 if k < 0 |
| |
| index = @position + k - 1 |
| index < 0 and return nil |
| |
| c = @data[ index ] and c.chr |
| end |
| |
| end |
| |
| def size |
| @data.length |
| end |
| |
| alias length size |
| |
| # |
| # rewinds the stream back to the start and clears out any existing marker entries |
| # |
| def reset |
| initial_location = @markers.first |
| @position, @line, @column = initial_location |
| @markers.clear |
| @markers << initial_location |
| return self |
| end |
| |
| # |
| # advance the stream by one character; returns the character consumed |
| # |
| def consume |
| c = @data[ @position ] || EOF |
| if @position < @data.length |
| @column += 1 |
| if c == NEWLINE |
| @line += 1 |
| @column = 0 |
| end |
| @position += 1 |
| end |
| return( c ) |
| end |
| |
| # |
| # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents |
| # the current character. +k+ greater than 1 represents upcoming characters. A negative |
| # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last |
| # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ |
| # |
| def peek( k = 1 ) |
| k == 0 and return nil |
| k += 1 if k < 0 |
| index = @position + k - 1 |
| index < 0 and return nil |
| @data[ index ] or EOF |
| end |
| |
| # |
| # return a substring around the stream cursor at a distance +k+ |
| # if <tt>k >= 0</tt>, return the next k characters |
| # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters |
| # |
| def through( k ) |
| if k >= 0 then @string[ @position, k ] else |
| start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around |
| @string[ start ... @position ] |
| end |
| end |
| |
| # operator style look-ahead |
| alias >> look |
| |
| # operator style look-behind |
| def <<( k ) |
| self << -k |
| end |
| |
| alias index position |
| alias character_index position |
| |
| alias source_name name |
| |
| # |
| # Returns true if the stream appears to be at the beginning of a new line. |
| # This is an extra utility method for use inside lexer actions if needed. |
| # |
| def beginning_of_line? |
| @position.zero? or @data[ @position - 1 ] == NEWLINE |
| end |
| |
| # |
| # Returns true if the stream appears to be at the end of a new line. |
| # This is an extra utility method for use inside lexer actions if needed. |
| # |
| def end_of_line? |
| @data[ @position ] == NEWLINE #if @position < @data.length |
| end |
| |
| # |
| # Returns true if the stream has been exhausted. |
| # This is an extra utility method for use inside lexer actions if needed. |
| # |
| def end_of_string? |
| @position >= @data.length |
| end |
| |
| # |
| # Returns true if the stream appears to be at the beginning of a stream (position = 0). |
| # This is an extra utility method for use inside lexer actions if needed. |
| # |
| def beginning_of_string? |
| @position == 0 |
| end |
| |
| alias eof? end_of_string? |
| alias bof? beginning_of_string? |
| |
| # |
| # record the current stream location parameters in the stream's marker table and |
| # return an integer-valued bookmark that may be used to restore the stream's |
| # position with the #rewind method. This method is used to implement backtracking. |
| # |
| def mark |
| state = [ @position, @line, @column ].freeze |
| @markers << state |
| return @markers.length - 1 |
| end |
| |
| # |
| # restore the stream to an earlier location recorded by #mark. If no marker value is |
| # provided, the last marker generated by #mark will be used. |
| # |
| def rewind( marker = @markers.length - 1, release = true ) |
| ( marker >= 0 and location = @markers[ marker ] ) or return( self ) |
| @position, @line, @column = location |
| release( marker ) if release |
| return self |
| end |
| |
| # |
| # the total number of markers currently in existence |
| # |
| def mark_depth |
| @markers.length |
| end |
| |
| # |
| # the last marker value created by a call to #mark |
| # |
| def last_marker |
| @markers.length - 1 |
| end |
| |
| # |
| # let go of the bookmark data for the marker and all marker |
| # values created after the marker. |
| # |
| def release( marker = @markers.length - 1 ) |
| marker.between?( 1, @markers.length - 1 ) or return |
| @markers.pop( @markers.length - marker ) |
| return self |
| end |
| |
| # |
| # jump to the absolute position value given by +index+. |
| # note: if +index+ is before the current position, the +line+ and +column+ |
| # attributes of the stream will probably be incorrect |
| # |
| def seek( index ) |
| index = index.bound( 0, @data.length ) # ensures index is within the stream's range |
| if index > @position |
| skipped = through( index - @position ) |
| if lc = skipped.count( "\n" ) and lc.zero? |
| @column += skipped.length |
| else |
| @line += lc |
| @column = skipped.length - skipped.rindex( "\n" ) - 1 |
| end |
| end |
| @position = index |
| return nil |
| end |
| |
| # |
| # customized object inspection that shows: |
| # * the stream class |
| # * the stream's location in <tt>index / line:column</tt> format |
| # * +before_chars+ characters before the cursor (6 characters by default) |
| # * +after_chars+ characters after the cursor (10 characters by default) |
| # |
| def inspect( before_chars = 6, after_chars = 10 ) |
| before = through( -before_chars ).inspect |
| @position - before_chars > 0 and before.insert( 0, '... ' ) |
| |
| after = through( after_chars ).inspect |
| @position + after_chars + 1 < @data.length and after << ' ...' |
| |
| location = "#@position / line #@line:#@column" |
| "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>" |
| end |
| |
| # |
| # return the string slice between position +start+ and +stop+ |
| # |
| def substring( start, stop ) |
| @string[ start, stop - start + 1 ] |
| end |
| |
| # |
| # identical to String#[] |
| # |
| def []( start, *args ) |
| @string[ start, *args ] |
| end |
| end |
| |
| |
| =begin rdoc ANTLR3::FileStream |
| |
| FileStream is a character stream that uses data stored in some external file. It |
| is nearly identical to StringStream and functions as use data located in a file |
| while automatically setting up the +source_name+ and +line+ parameters. It does |
| not actually use any buffered IO operations throughout the stream navigation |
| process. Instead, it reads the file data once when the stream is initialized. |
| |
| =end |
| |
| class FileStream < StringStream |
| |
| # |
| # creates a new FileStream object using the given +file+ object. |
| # If +file+ is a path string, the file will be read and the contents |
| # will be used and the +name+ attribute will be set to the path. |
| # If +file+ is an IO-like object (that responds to :read), |
| # the content of the object will be used and the stream will |
| # attempt to set its +name+ object first trying the method #name |
| # on the object, then trying the method #path on the object. |
| # |
| # see StringStream.new for a list of additional options |
| # the constructer accepts |
| # |
| def initialize( file, options = {} ) |
| case file |
| when $stdin then |
| data = $stdin.read |
| @name = '(stdin)' |
| when ARGF |
| data = file.read |
| @name = file.path |
| when ::File then |
| file = file.clone |
| file.reopen( file.path, 'r' ) |
| @name = file.path |
| data = file.read |
| file.close |
| else |
| if file.respond_to?( :read ) |
| data = file.read |
| if file.respond_to?( :name ) then @name = file.name |
| elsif file.respond_to?( :path ) then @name = file.path |
| end |
| else |
| @name = file.to_s |
| if test( ?f, @name ) then data = File.read( @name ) |
| else raise ArgumentError, "could not find an existing file at %p" % @name |
| end |
| end |
| end |
| super( data, options ) |
| end |
| |
| end |
| |
| =begin rdoc ANTLR3::CommonTokenStream |
| |
| CommonTokenStream serves as the primary token stream implementation for feeding |
| sequential token input into parsers. |
| |
| Using some TokenSource (such as a lexer), the stream collects a token sequence, |
| setting the token's <tt>index</tt> attribute to indicate the token's position |
| within the stream. The streams may be tuned to some channel value; off-channel |
| tokens will be filtered out by the #peek, #look, and #consume methods. |
| |
| === Sample Usage |
| |
| |
| source_input = ANTLR3::StringStream.new("35 * 4 - 1") |
| lexer = Calculator::Lexer.new(source_input) |
| tokens = ANTLR3::CommonTokenStream.new(lexer) |
| |
| # assume this grammar defines whitespace as tokens on channel HIDDEN |
| # and numbers and operations as tokens on channel DEFAULT |
| tokens.look # => 0 INT['35'] @ line 1 col 0 (0..1) |
| tokens.look(2) # => 2 MULT["*"] @ line 1 col 2 (3..3) |
| tokens.tokens(0, 2) |
| # => [0 INT["35"] @line 1 col 0 (0..1), |
| # 1 WS[" "] @line 1 col 2 (1..1), |
| # 2 MULT["*"] @ line 1 col 3 (3..3)] |
| # notice the #tokens method does not filter off-channel tokens |
| |
| lexer.reset |
| hidden_tokens = |
| ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN) |
| hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1) |
| |
| =end |
| |
| class CommonTokenStream |
| include TokenStream |
| include Enumerable |
| |
| # |
| # constructs a new token stream using the +token_source+ provided. +token_source+ is |
| # usually a lexer, but can be any object that implements +next_token+ and includes |
| # ANTLR3::TokenSource. |
| # |
| # If a block is provided, each token harvested will be yielded and if the block |
| # returns a +nil+ or +false+ value, the token will not be added to the stream -- |
| # it will be discarded. |
| # |
| # === Options |
| # [:channel] The channel value the stream should be tuned to initially |
| # [:source_name] The source name (file name) attribute of the stream |
| # |
| # === Example |
| # |
| # # create a new token stream that is tuned to channel :comment, and |
| # # discard all WHITE_SPACE tokens |
| # ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token| |
| # token.name != 'WHITE_SPACE' |
| # end |
| # |
| def initialize( token_source, options = {} ) |
| case token_source |
| when CommonTokenStream |
| # this is useful in cases where you want to convert a CommonTokenStream |
| # to a RewriteTokenStream or other variation of the standard token stream |
| stream = token_source |
| @token_source = stream.token_source |
| @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL } |
| @source_name = options.fetch( :source_name ) { stream.source_name } |
| tokens = stream.tokens.map { | t | t.dup } |
| else |
| @token_source = token_source |
| @channel = options.fetch( :channel, DEFAULT_CHANNEL ) |
| @source_name = options.fetch( :source_name ) { @token_source.source_name rescue nil } |
| tokens = @token_source.to_a |
| end |
| @last_marker = nil |
| @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens |
| @tokens.each_with_index { |t, i| t.index = i } |
| @position = |
| if first_token = @tokens.find { |t| t.channel == @channel } |
| @tokens.index( first_token ) |
| else @tokens.length |
| end |
| end |
| |
| # |
| # resets the token stream and rebuilds it with a potentially new token source. |
| # If no +token_source+ value is provided, the stream will attempt to reset the |
| # current +token_source+ by calling +reset+ on the object. The stream will |
| # then clear the token buffer and attempt to harvest new tokens. Identical in |
| # behavior to CommonTokenStream.new, if a block is provided, tokens will be |
| # yielded and discarded if the block returns a +false+ or +nil+ value. |
| # |
| def rebuild( token_source = nil ) |
| if token_source.nil? |
| @token_source.reset rescue nil |
| else @token_source = token_source |
| end |
| @tokens = block_given? ? @token_source.select { |token| yield( token ) } : |
| @token_source.to_a |
| @tokens.each_with_index { |t, i| t.index = i } |
| @last_marker = nil |
| @position = |
| if first_token = @tokens.find { |t| t.channel == @channel } |
| @tokens.index( first_token ) |
| else @tokens.length |
| end |
| return self |
| end |
| |
| # |
| # tune the stream to a new channel value |
| # |
| def tune_to( channel ) |
| @channel = channel |
| end |
| |
| def token_class |
| @token_source.token_class |
| rescue NoMethodError |
| @position == -1 and fill_buffer |
| @tokens.empty? ? CommonToken : @tokens.first.class |
| end |
| |
| alias index position |
| |
| def size |
| @tokens.length |
| end |
| |
| alias length size |
| |
| ###### State-Control ################################################ |
| |
| # |
| # rewind the stream to its initial state |
| # |
| def reset |
| @position = 0 |
| @position += 1 while token = @tokens[ @position ] and |
| token.channel != @channel |
| @last_marker = nil |
| return self |
| end |
| |
| # |
| # bookmark the current position of the input stream |
| # |
| def mark |
| @last_marker = @position |
| end |
| |
| def release( marker = nil ) |
| # do nothing |
| end |
| |
| |
| def rewind( marker = @last_marker, release = true ) |
| seek( marker ) |
| end |
| |
| # |
| # saves the current stream position, yields to the block, |
| # and then ensures the stream's position is restored before |
| # returning the value of the block |
| # |
| def hold( pos = @position ) |
| block_given? or return enum_for( :hold, pos ) |
| begin |
| yield |
| ensure |
| seek( pos ) |
| end |
| end |
| |
| ###### Stream Navigation ########################################### |
| |
| # |
| # advance the stream one step to the next on-channel token |
| # |
| def consume |
| token = @tokens[ @position ] || EOF_TOKEN |
| if @position < @tokens.length |
| @position = future?( 2 ) || @tokens.length |
| end |
| return( token ) |
| end |
| |
| # |
| # jump to the stream position specified by +index+ |
| # note: seek does not check whether or not the |
| # token at the specified position is on-channel, |
| # |
| def seek( index ) |
| @position = index.to_i.bound( 0, @tokens.length ) |
| return self |
| end |
| |
| # |
| # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents |
| # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative |
| # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last |
| # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ |
| # |
| def peek( k = 1 ) |
| tk = look( k ) and return( tk.type ) |
| end |
| |
| # |
| # operates simillarly to #peek, but returns the full token object at look-ahead position +k+ |
| # |
| def look( k = 1 ) |
| index = future?( k ) or return nil |
| @tokens.fetch( index, EOF_TOKEN ) |
| end |
| |
| alias >> look |
| def << k |
| self >> -k |
| end |
| |
| # |
| # returns the index of the on-channel token at look-ahead position +k+ or nil if no other |
| # on-channel tokens exist |
| # |
| def future?( k = 1 ) |
| @position == -1 and fill_buffer |
| |
| case |
| when k == 0 then nil |
| when k < 0 then past?( -k ) |
| when k == 1 then @position |
| else |
| # since the stream only yields on-channel |
| # tokens, the stream can't just go to the |
| # next position, but rather must skip |
| # over off-channel tokens |
| ( k - 1 ).times.inject( @position ) do |cursor, | |
| begin |
| tk = @tokens.at( cursor += 1 ) or return( cursor ) |
| # ^- if tk is nil (i.e. i is outside array limits) |
| end until tk.channel == @channel |
| cursor |
| end |
| end |
| end |
| |
| # |
| # returns the index of the on-channel token at look-behind position +k+ or nil if no other |
| # on-channel tokens exist before the current token |
| # |
| def past?( k = 1 ) |
| @position == -1 and fill_buffer |
| |
| case |
| when k == 0 then nil |
| when @position - k < 0 then nil |
| else |
| |
| k.times.inject( @position ) do |cursor, | |
| begin |
| cursor <= 0 and return( nil ) |
| tk = @tokens.at( cursor -= 1 ) or return( nil ) |
| end until tk.channel == @channel |
| cursor |
| end |
| |
| end |
| end |
| |
| # |
| # yields each token in the stream (including off-channel tokens) |
| # If no block is provided, the method returns an Enumerator object. |
| # #each accepts the same arguments as #tokens |
| # |
| def each( *args ) |
| block_given? or return enum_for( :each, *args ) |
| tokens( *args ).each { |token| yield( token ) } |
| end |
| |
| |
| # |
| # yields each token in the stream with the given channel value |
| # If no channel value is given, the stream's tuned channel value will be used. |
| # If no block is given, an enumerator will be returned. |
| # |
| def each_on_channel( channel = @channel ) |
| block_given? or return enum_for( :each_on_channel, channel ) |
| for token in @tokens |
| token.channel == channel and yield( token ) |
| end |
| end |
| |
| # |
| # iterates through the token stream, yielding each on channel token along the way. |
| # After iteration has completed, the stream's position will be restored to where |
| # it was before #walk was called. While #each or #each_on_channel does not change |
| # the positions stream during iteration, #walk advances through the stream. This |
| # makes it possible to look ahead and behind the current token during iteration. |
| # If no block is given, an enumerator will be returned. |
| # |
| def walk |
| block_given? or return enum_for( :walk ) |
| initial_position = @position |
| begin |
| while token = look and token.type != EOF |
| consume |
| yield( token ) |
| end |
| return self |
| ensure |
| @position = initial_position |
| end |
| end |
| |
| # |
| # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens |
| # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters |
| # are converted to integers with their <tt>to_i</tt> methods, and thus tokens |
| # can be provided to specify start and stop. If a block is provided, tokens are |
| # yielded and filtered out of the return array if the block returns a +false+ |
| # or +nil+ value. |
| # |
| def tokens( start = nil, stop = nil ) |
| stop.nil? || stop >= @tokens.length and stop = @tokens.length - 1 |
| start.nil? || stop < 0 and start = 0 |
| tokens = @tokens[ start..stop ] |
| |
| if block_given? |
| tokens.delete_if { |t| not yield( t ) } |
| end |
| |
| return( tokens ) |
| end |
| |
| |
| def at( i ) |
| @tokens.at i |
| end |
| |
| # |
| # identical to Array#[], as applied to the stream's token buffer |
| # |
| def []( i, *args ) |
| @tokens[ i, *args ] |
| end |
| |
| ###### Standard Conversion Methods ############################### |
| def inspect |
| string = "#<%p: @token_source=%p @ %p/%p" % |
| [ self.class, @token_source.class, @position, @tokens.length ] |
| tk = look( -1 ) and string << " #{ tk.inspect } <--" |
| tk = look( 1 ) and string << " --> #{ tk.inspect }" |
| string << '>' |
| end |
| |
| # |
| # fetches the text content of all tokens between +start+ and +stop+ and |
| # joins the chunks into a single string |
| # |
| def extract_text( start = 0, stop = @tokens.length - 1 ) |
| start = start.to_i.at_least( 0 ) |
| stop = stop.to_i.at_most( @tokens.length ) |
| @tokens[ start..stop ].map! { |t| t.text }.join( '' ) |
| end |
| |
| alias to_s extract_text |
| |
| end |
| |
| end |