runtime/Ruby/lib/antlr3/streams.rb - platform/external/antlr - Git at Google

 #!/usr/bin/ruby
 # encoding: utf-8

 =begin LICENSE

 [The "BSD licence"]
 Copyright (c) 2009-2010 Kyle Yetter
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:

  1. Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the distribution.
  3. The name of the author may not be used to endorse or promote products
     derived from this software without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 =end

 module ANTLR3


 =begin rdoc ANTLR3::Stream

 = ANTLR3 Streams

 This documentation first covers the general concept of streams as used by ANTLR
 recognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module.

 == ANTLR Stream Classes

 ANTLR recognizers need a way to walk through input data in a serialized IO-style
 fashion. They also need some book-keeping about the input to provide useful
 information to developers, such as current line number and column. Furthermore,
 to implement backtracking and various error recovery techniques, recognizers
 need a way to record various locations in the input at a number of points in the
 recognition process so the input state may be restored back to a prior state.

 ANTLR bundles all of this functionality into a number of Stream classes, each
 designed to be used by recognizers for a specific recognition task. Most of the
 Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default
 when 'antlr3' is required.

 ---

 Here's a brief overview of the various stream classes and their respective
 purpose:

 StringStream::
   Similar to StringIO from the standard Ruby library, StringStream wraps raw
   String data in a Stream interface for use by ANTLR lexers.
 FileStream::
   A subclass of StringStream, FileStream simply wraps data read from an IO or
   File object for use by lexers.
 CommonTokenStream::
   The job of a TokenStream is to read lexer output and then provide ANTLR
   parsers with the means to sequential walk through series of tokens.
   CommonTokenStream is the default TokenStream implementation.
 TokenRewriteStream::
   A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers
   the ability to produce new output text from an input token-sequence by
   managing rewrite "programs" on top of the stream.
 CommonTreeNodeStream::
   In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens
   to recognizers in a sequential fashion. However, the stream object serializes
   an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves
   the two-dimensional shape of the tree using special UP and DOWN tokens. The
   sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not
   defined in antlr3/stream.rb, but antlr3/tree.rb

 ---

 The next few sections cover the most significant methods of all stream classes.

 === consume / look / peek

 <tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are
 advanced by one character and TokenStreams are advanced by one token.

 <tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest
 to a recognizer at look-ahead position specified by <tt>k</tt>. For
 <b>StringStreams</b>, this is the <i>integer value of the character</i>
 <tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this
 is the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the
 stream cursor.

 <tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at
 look-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the
 <i>bare-minimum lightweight information</i> that the recognizer needs,
 <tt>look</tt> provides the <i>full object of concern</i> in the stream. For
 <b>StringStreams</b>, this is a <i>string object containing the single
 character</i> <tt>k</tt> characters ahead of the stream cursor. For
 <b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens
 ahead of the stream cursor.

 <b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is
 implemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is
 implemented by some method with a name like <tt>LT(k)</tt>. When writing this
 Ruby runtime API, I found this naming practice both confusing, ambiguous, and
 un-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a
 quick-look (peek) and a full-fledged look-ahead operation (look). If this causes
 confusion or any sort of compatibility strife for developers using this
 implementation, all apologies.

 === mark / rewind / release

 <tt>marker = stream.mark</tt> causes the stream to record important information
 about the current stream state, place the data in an internal memory table, and
 return a memento, <tt>marker</tt>. The marker object is typically an integer key
 to the stream's internal memory table.

 Used in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can
 be used to restore the stream to an earlier state. This is used by recognizers
 to perform tasks such as backtracking and error recovery.

 <tt>stream.release(marker = last_marker)</tt> can be used to release an existing
 state marker from the memory table.

 === seek

 <tt>stream.seek(position)</tt> moves the stream cursor to an absolute position
 within the stream, basically like typical ruby <tt>IO#seek</tt> style methods.
 However, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute
 position seeking.

 == The Stream Module

 <tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream
 classes used by ANTLR recognizers.

 The module doesn't do much on its own besides define arguably annoying
 ``abstract'' pseudo-methods that demand implementation when it is mixed in to a
 class that wants to be a Stream. Right now this exists as an artifact of porting
 the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is
 represented as an interface. In Ruby, however, objects are duck-typed and
 interfaces aren't that useful as programmatic entities -- in fact, it's mildly
 wasteful to have a module like this hanging out. Thus, I may axe it.

 When mixed in, it does give the class a #size and #source_name attribute
 methods.

 Except in a small handful of places, most of the ANTLR runtime library uses
 duck-typing and not type checking on objects. This means that the methods which
 manipulate stream objects don't usually bother checking that the object is a
 Stream and assume that the object implements the proper stream interface. Thus,
 it is not strictly necessary that custom stream objects include ANTLR3::Stream,
 though it isn't a bad idea.

 =end

 module Stream
   include ANTLR3::Constants
   extend ClassMacros

   ##
   # :method: consume
   # used to advance a stream one unit (such as character or token)
   abstract :consume

   ##
   # :method: peek( k = 1 )
   # used to quickly retreive the object of interest to a recognizer at lookahead
   # position specified by <tt>k</tt> (such as integer value of a character or an
   # integer token type)
   abstract :peek

   ##
   # :method: look( k = 1 )
   # used to retreive the full object of interest at lookahead position specified
   # by <tt>k</tt> (such as a character string or a token structure)
   abstract :look

   ##
   # :method: mark
   # saves the current position for the purposes of backtracking and
   # returns a value to pass to #rewind at a later time
   abstract :mark

   ##
   # :method: index
   # returns the current position of the stream
   abstract :index

   ##
   # :method: rewind( marker = last_marker )
   # restores the stream position using the state information previously saved
   # by the given marker
   abstract :rewind

   ##
   # :method: release( marker = last_marker )
   # clears the saved state information associated with the given marker value
   abstract :release

   ##
   # :method: seek( position )
   # move the stream to the given absolute index given by +position+
   abstract :seek

   ##
   # the total number of symbols in the stream
   attr_reader :size

   ##
   # indicates an identifying name for the stream -- usually the file path of the input
   attr_accessor :source_name
 end

 =begin rdoc ANTLR3::CharacterStream

 CharacterStream further extends the abstract-ish base mixin Stream to add
 methods specific to navigating character-based input data. Thus, it serves as an
 immitation of the Java interface for text-based streams, which are primarily
 used by lexers.

 It adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be
 implemented to return a slice of the input string from position <tt>start</tt>
 to position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt>
 and <tt>column</tt>, which are expected to indicate the current line number and
 position within the current line, respectively.

 == A Word About <tt>line</tt> and <tt>column</tt> attributes

 Presumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text
 are familliar to most developers. Line numbers of text are indexed from number 1
 up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text:

   Hey this is the first line.
   Oh, and this is the second line.

 Line 1 is the string "Hey this is the first line\\n". If a character stream is at
 line 2, character 0, the stream cursor is sitting between the characters "\\n"
 and "O".

 *Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt>
 with the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered
 to keep it simple and familliar in this Ruby runtime API.

 =end

 module CharacterStream
   include Stream
   extend ClassMacros
   include Constants

   ##
   # :method: substring(start,stop)
   abstract :substring

   attr_accessor :line
   attr_accessor :column
 end


 =begin rdoc ANTLR3::TokenStream

 TokenStream further extends the abstract-ish base mixin Stream to add methods
 specific to navigating token sequences. Thus, it serves as an imitation of the
 Java interface for token-based streams, which are used by many different
 components in ANTLR, including parsers and tree parsers.

 == Token Streams

 Token streams wrap a sequence of token objects produced by some token source,
 usually a lexer. They provide the operations required by higher-level
 recognizers, such as parsers and tree parsers for navigating through the
 sequence of tokens. Unlike simple character-based streams, such as StringStream,
 token-based streams have an additional level of complexity because they must
 manage the task of "tuning" to a specific token channel.

 One of the main advantages of ANTLR-based recognition is the token
 <i>channel</i> feature, which allows you to hold on to all tokens of interest
 while only presenting a specific set of interesting tokens to a parser. For
 example, if you need to hide whitespace and comments from a parser, but hang on
 to them for some other purpose, you have the lexer assign the comments and
 whitespace to channel value HIDDEN as it creates the tokens.

 When you create a token stream, you can tune it to some specific channel value.
 Then, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only
 yield tokens that have the same value for <tt>channel</tt>. The stream skips
 over any non-matching tokens in between.

 == The TokenStream Interface

 In addition to the abstract methods and attribute methods provided by the base
 Stream module, TokenStream adds a number of additional method implementation
 requirements and attributes.

 =end

 module TokenStream
   include Stream
   extend ClassMacros

   ##
   # expected to return the token source object (such as a lexer) from which
   # all tokens in the stream were retreived
   attr_reader :token_source

   ##
   # expected to return the value of the last marker produced by a call to
   # <tt>stream.mark</tt>
   attr_reader :last_marker

   ##
   # expected to return the integer index of the stream cursor
   attr_reader :position

   ##
   # the integer channel value to which the stream is ``tuned''
   attr_accessor :channel

   ##
   # :method: to_s(start=0,stop=tokens.length-1)
   # should take the tokens between start and stop in the sequence, extract their text
   # and return the concatenation of all the text chunks
   abstract :to_s

   ##
   # :method: at( i )
   # return the stream symbol at index +i+
   abstract :at
 end

 =begin rdoc ANTLR3::StringStream

 A StringStream's purpose is to wrap the basic, naked text input of a recognition
 system. Like all other stream types, it provides serial navigation of the input;
 a recognizer can arbitrarily step forward and backward through the stream's
 symbols as it requires. StringStream and its subclasses are they main way to
 feed text input into an ANTLR Lexer for token processing.

 The stream's symbols of interest, of course, are character values. Thus, the
 #peek method returns the integer character value at look-ahead position
 <tt>k</tt> and the #look method returns the character value as a +String+. They
 also track various pieces of information such as the line and column numbers at
 the current position.

 === Note About Text Encoding

 This version of the runtime library primarily targets ruby version 1.8, which
 does not have strong built-in support for multi-byte character encodings. Thus,
 characters are assumed to be represented by a single byte -- an integer between
 0 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte
 characters, but currently this library does not provide any streams to handle
 non-ASCII encoding. However, encoding-savvy recognition code is a future
 development goal for this project.

 =end

 class StringStream
   NEWLINE = ?\n.ord

   include CharacterStream

   # current integer character index of the stream
   attr_reader :position

   # the current line number of the input, indexed upward from 1
   attr_reader :line

   # the current character position within the current line, indexed upward from 0
   attr_reader :column

   # the name associated with the stream -- usually a file name
   # defaults to <tt>"(string)"</tt>
   attr_accessor :name

   # the entire string that is wrapped by the stream
   attr_reader :data
   attr_reader :string

   if RUBY_VERSION =~ /^1\.9/

     # creates a new StringStream object where +data+ is the string data to stream.
     # accepts the following options in a symbol-to-value hash:
     #
     # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
     # [:line] the initial line number; default: +1+
     # [:column] the initial column number; default: +0+
     #
     def initialize( data, options = {} )      # for 1.9
       @string   = data.to_s.encode( Encoding::UTF_8 ).freeze
       @data     = @string.codepoints.to_a.freeze
       @position = options.fetch :position, 0
       @line     = options.fetch :line, 1
       @column   = options.fetch :column, 0
       @markers  = []
       @name   ||= options[ :file ] || options[ :name ] # || '(string)'
       mark
     end

     #
     # identical to #peek, except it returns the character value as a String
     #
     def look( k = 1 )               # for 1.9
       k == 0 and return nil
       k += 1 if k < 0

       index = @position + k - 1
       index < 0 and return nil

       @string[ index ]
     end

   else

     # creates a new StringStream object where +data+ is the string data to stream.
     # accepts the following options in a symbol-to-value hash:
     #
     # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
     # [:line] the initial line number; default: +1+
     # [:column] the initial column number; default: +0+
     #
     def initialize( data, options = {} )    # for 1.8
       @data = data.to_s
       @data.equal?( data ) and @data = @data.clone
       @data.freeze
       @string = @data
       @position = options.fetch :position, 0
       @line = options.fetch :line, 1
       @column = options.fetch :column, 0
       @markers = []
       @name ||= options[ :file ] || options[ :name ] # || '(string)'
       mark
     end

     #
     # identical to #peek, except it returns the character value as a String
     #
     def look( k = 1 )                        # for 1.8
       k == 0 and return nil
       k += 1 if k < 0

       index = @position + k - 1
       index < 0 and return nil

       c = @data[ index ] and c.chr
     end

   end

   def size
     @data.length
   end

   alias length size

   #
   # rewinds the stream back to the start and clears out any existing marker entries
   #
   def reset
     initial_location = @markers.first
     @position, @line, @column = initial_location
     @markers.clear
     @markers << initial_location
     return self
   end

   #
   # advance the stream by one character; returns the character consumed
   #
   def consume
     c = @data[ @position ] || EOF
     if @position < @data.length
       @column += 1
       if c == NEWLINE
         @line += 1
         @column = 0
       end
       @position += 1
     end
     return( c )
   end

   #
   # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents
   # the current character. +k+ greater than 1 represents upcoming characters. A negative
   # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last
   # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
   #
   def peek( k = 1 )
     k == 0 and return nil
     k += 1 if k < 0
     index = @position + k - 1
     index < 0 and return nil
     @data[ index ] or EOF
   end

   #
   # return a substring around the stream cursor at a distance +k+
   # if <tt>k >= 0</tt>, return the next k characters
   # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters
   #
   def through( k )
     if k >= 0 then @string[ @position, k ] else
       start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around
       @string[ start ... @position ]
     end
   end

   # operator style look-ahead
   alias >> look

   # operator style look-behind
   def <<( k )
     self << -k
   end

   alias index position
   alias character_index position

   alias source_name name

   #
   # Returns true if the stream appears to be at the beginning of a new line.
   # This is an extra utility method for use inside lexer actions if needed.
   #
   def beginning_of_line?
     @position.zero? or @data[ @position - 1 ] == NEWLINE
   end

   #
   # Returns true if the stream appears to be at the end of a new line.
   # This is an extra utility method for use inside lexer actions if needed.
   #
   def end_of_line?
     @data[ @position ] == NEWLINE #if @position < @data.length
   end

   #
   # Returns true if the stream has been exhausted.
   # This is an extra utility method for use inside lexer actions if needed.
   #
   def end_of_string?
     @position >= @data.length
   end

   #
   # Returns true if the stream appears to be at the beginning of a stream (position = 0).
   # This is an extra utility method for use inside lexer actions if needed.
   #
   def beginning_of_string?
     @position == 0
   end

   alias eof? end_of_string?
   alias bof? beginning_of_string?

   #
   # record the current stream location parameters in the stream's marker table and
   # return an integer-valued bookmark that may be used to restore the stream's
   # position with the #rewind method. This method is used to implement backtracking.
   #
   def mark
     state = [ @position, @line, @column ].freeze
     @markers << state
     return @markers.length - 1
   end

   #
   # restore the stream to an earlier location recorded by #mark. If no marker value is
   # provided, the last marker generated by #mark will be used.
   #
   def rewind( marker = @markers.length - 1, release = true )
     ( marker >= 0 and location = @markers[ marker ] ) or return( self )
     @position, @line, @column = location
     release( marker ) if release
     return self
   end

   #
   # the total number of markers currently in existence
   #
   def mark_depth
     @markers.length
   end

   #
   # the last marker value created by a call to #mark
   #
   def last_marker
     @markers.length - 1
   end

   #
   # let go of the bookmark data for the marker and all marker
   # values created after the marker.
   #
   def release( marker = @markers.length - 1 )
     marker.between?( 1, @markers.length - 1 ) or return
     @markers.pop( @markers.length - marker )
     return self
   end

   #
   # jump to the absolute position value given by +index+.
   # note: if +index+ is before the current position, the +line+ and +column+
   #       attributes of the stream will probably be incorrect
   #
   def seek( index )
     index = index.bound( 0, @data.length )  # ensures index is within the stream's range
     if index > @position
       skipped = through( index - @position )
       if lc = skipped.count( "\n" ) and lc.zero?
         @column += skipped.length
       else
         @line += lc
         @column = skipped.length - skipped.rindex( "\n" ) - 1
       end
     end
     @position = index
     return nil
   end

   #
   # customized object inspection that shows:
   # * the stream class
   # * the stream's location in <tt>index / line:column</tt> format
   # * +before_chars+ characters before the cursor (6 characters by default)
   # * +after_chars+ characters after the cursor (10 characters by default)
   #
   def inspect( before_chars = 6, after_chars = 10 )
     before = through( -before_chars ).inspect
     @position - before_chars > 0 and before.insert( 0, '... ' )

     after = through( after_chars ).inspect
     @position + after_chars + 1 < @data.length and after << ' ...'

     location = "#@position / line #@line:#@column"
     "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>"
   end

   #
   # return the string slice between position +start+ and +stop+
   #
   def substring( start, stop )
     @string[ start, stop - start + 1 ]
   end

   #
   # identical to String#[]
   #
   def []( start, *args )
     @string[ start, *args ]
   end
 end


 =begin rdoc ANTLR3::FileStream

 FileStream is a character stream that uses data stored in some external file. It
 is nearly identical to StringStream and functions as use data located in a file
 while automatically setting up the +source_name+ and +line+ parameters. It does
 not actually use any buffered IO operations throughout the stream navigation
 process. Instead, it reads the file data once when the stream is initialized.

 =end

 class FileStream < StringStream

   #
   # creates a new FileStream object using the given +file+ object.
   # If +file+ is a path string, the file will be read and the contents
   # will be used and the +name+ attribute will be set to the path.
   # If +file+ is an IO-like object (that responds to :read),
   # the content of the object will be used and the stream will
   # attempt to set its +name+ object first trying the method #name
   # on the object, then trying the method #path on the object.
   #
   # see StringStream.new for a list of additional options
   # the constructer accepts
   #
   def initialize( file, options = {} )
     case file
     when $stdin then
       data = $stdin.read
       @name = '(stdin)'
     when ARGF
       data = file.read
       @name = file.path
     when ::File then
       file = file.clone
       file.reopen( file.path, 'r' )
       @name = file.path
       data = file.read
       file.close
     else
       if file.respond_to?( :read )
         data = file.read
         if file.respond_to?( :name ) then @name = file.name
         elsif file.respond_to?( :path ) then @name = file.path
         end
       else
         @name = file.to_s
         if test( ?f, @name ) then data = File.read( @name )
         else raise ArgumentError, "could not find an existing file at %p" % @name
         end
       end
     end
     super( data, options )
   end

 end

 =begin rdoc ANTLR3::CommonTokenStream

 CommonTokenStream serves as the primary token stream implementation for feeding
 sequential token input into parsers.

 Using some TokenSource (such as a lexer), the stream collects a token sequence,
 setting the token's <tt>index</tt> attribute to indicate the token's position
 within the stream. The streams may be tuned to some channel value; off-channel
 tokens will be filtered out by the #peek, #look, and #consume methods.

 === Sample Usage


   source_input = ANTLR3::StringStream.new("35 * 4 - 1")
   lexer = Calculator::Lexer.new(source_input)
   tokens = ANTLR3::CommonTokenStream.new(lexer)

   # assume this grammar defines whitespace as tokens on channel HIDDEN
   # and numbers and operations as tokens on channel DEFAULT
   tokens.look         # => 0 INT['35'] @ line 1 col 0 (0..1)
   tokens.look(2)      # => 2 MULT["*"] @ line 1 col 2 (3..3)
   tokens.tokens(0, 2)
     # => [0 INT["35"] @line 1 col 0 (0..1),
     #     1 WS[" "] @line 1 col 2 (1..1),
     #     2 MULT["*"] @ line 1 col 3 (3..3)]
     # notice the #tokens method does not filter off-channel tokens

   lexer.reset
   hidden_tokens =
     ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN)
   hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1)

 =end

 class CommonTokenStream
   include TokenStream
   include Enumerable

   #
   # constructs a new token stream using the +token_source+ provided. +token_source+ is
   # usually a lexer, but can be any object that implements +next_token+ and includes
   # ANTLR3::TokenSource.
   #
   # If a block is provided, each token harvested will be yielded and if the block
   # returns a +nil+ or +false+ value, the token will not be added to the stream --
   # it will be discarded.
   #
   # === Options
   # [:channel] The channel value the stream should be tuned to initially
   # [:source_name] The source name (file name) attribute of the stream
   #
   # === Example
   #
   #   # create a new token stream that is tuned to channel :comment, and
   #   # discard all WHITE_SPACE tokens
   #   ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token|
   #     token.name != 'WHITE_SPACE'
   #   end
   #
   def initialize( token_source, options = {} )
     case token_source
     when CommonTokenStream
       # this is useful in cases where you want to convert a CommonTokenStream
       # to a RewriteTokenStream or other variation of the standard token stream
       stream = token_source
       @token_source = stream.token_source
       @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL }
       @source_name = options.fetch( :source_name ) { stream.source_name }
       tokens = stream.tokens.map { | t | t.dup }
     else
       @token_source = token_source
       @channel = options.fetch( :channel, DEFAULT_CHANNEL )
       @source_name = options.fetch( :source_name ) {  @token_source.source_name rescue nil }
       tokens = @token_source.to_a
     end
     @last_marker = nil
     @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens
     @tokens.each_with_index { |t, i| t.index = i }
     @position =
       if first_token = @tokens.find { |t| t.channel == @channel }
         @tokens.index( first_token )
       else @tokens.length
       end
   end

   #
   # resets the token stream and rebuilds it with a potentially new token source.
   # If no +token_source+ value is provided, the stream will attempt to reset the
   # current +token_source+ by calling +reset+ on the object. The stream will
   # then clear the token buffer and attempt to harvest new tokens. Identical in
   # behavior to CommonTokenStream.new, if a block is provided, tokens will be
   # yielded and discarded if the block returns a +false+ or +nil+ value.
   #
   def rebuild( token_source = nil )
     if token_source.nil?
       @token_source.reset rescue nil
     else @token_source = token_source
     end
     @tokens = block_given? ? @token_source.select { |token| yield( token ) } :
                              @token_source.to_a
     @tokens.each_with_index { |t, i| t.index = i }
     @last_marker = nil
     @position =
       if first_token = @tokens.find { |t| t.channel == @channel }
         @tokens.index( first_token )
       else @tokens.length
       end
     return self
   end

   #
   # tune the stream to a new channel value
   #
   def tune_to( channel )
     @channel = channel
   end

   def token_class
     @token_source.token_class
   rescue NoMethodError
     @position == -1 and fill_buffer
     @tokens.empty? ? CommonToken : @tokens.first.class
   end

   alias index position

   def size
     @tokens.length
   end

   alias length size

   ###### State-Control ################################################

   #
   # rewind the stream to its initial state
   #
   def reset
     @position = 0
     @position += 1 while token = @tokens[ @position ] and
                          token.channel != @channel
     @last_marker = nil
     return self
   end

   #
   # bookmark the current position of the input stream
   #
   def mark
     @last_marker = @position
   end

   def release( marker = nil )
     # do nothing
   end


   def rewind( marker = @last_marker, release = true )
     seek( marker )
   end

   #
   # saves the current stream position, yields to the block,
   # and then ensures the stream's position is restored before
   # returning the value of the block
   #
   def hold( pos = @position )
     block_given? or return enum_for( :hold, pos )
     begin
       yield
     ensure
       seek( pos )
     end
   end

   ###### Stream Navigation ###########################################

   #
   # advance the stream one step to the next on-channel token
   #
   def consume
     token = @tokens[ @position ] || EOF_TOKEN
     if @position < @tokens.length
       @position = future?( 2 ) || @tokens.length
     end
     return( token )
   end

   #
   # jump to the stream position specified by +index+
   # note: seek does not check whether or not the
   #       token at the specified position is on-channel,
   #
   def seek( index )
     @position = index.to_i.bound( 0, @tokens.length )
     return self
   end

   #
   # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents
   # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative
   # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last
   # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
   #
   def peek( k = 1 )
     tk = look( k ) and return( tk.type )
   end

   #
   # operates simillarly to #peek, but returns the full token object at look-ahead position +k+
   #
   def look( k = 1 )
     index = future?( k ) or return nil
     @tokens.fetch( index, EOF_TOKEN )
   end

   alias >> look
   def << k
     self >> -k
   end

   #
   # returns the index of the on-channel token at look-ahead position +k+ or nil if no other
   # on-channel tokens exist
   #
   def future?( k = 1 )
     @position == -1 and fill_buffer

     case
     when k == 0 then nil
     when k < 0 then past?( -k )
     when k == 1 then @position
     else
       # since the stream only yields on-channel
       # tokens, the stream can't just go to the
       # next position, but rather must skip
       # over off-channel tokens
       ( k - 1 ).times.inject( @position ) do |cursor, |
         begin
           tk = @tokens.at( cursor += 1 ) or return( cursor )
           # ^- if tk is nil (i.e. i is outside array limits)
         end until tk.channel == @channel
         cursor
       end
     end
   end

   #
   # returns the index of the on-channel token at look-behind position +k+ or nil if no other
   # on-channel tokens exist before the current token
   #
   def past?( k = 1 )
     @position == -1 and fill_buffer

     case
     when k == 0 then nil
     when @position - k < 0 then nil
     else

       k.times.inject( @position ) do |cursor, |
         begin
           cursor <= 0 and return( nil )
           tk = @tokens.at( cursor -= 1 ) or return( nil )
         end until tk.channel == @channel
         cursor
       end

     end
   end

   #
   # yields each token in the stream (including off-channel tokens)
   # If no block is provided, the method returns an Enumerator object.
   # #each accepts the same arguments as #tokens
   #
   def each( *args )
     block_given? or return enum_for( :each, *args )
     tokens( *args ).each { |token| yield( token ) }
   end


   #
   # yields each token in the stream with the given channel value
   # If no channel value is given, the stream's tuned channel value will be used.
   # If no block is given, an enumerator will be returned.
   #
   def each_on_channel( channel = @channel )
     block_given? or return enum_for( :each_on_channel, channel )
     for token in @tokens
       token.channel == channel and yield( token )
     end
   end

   #
   # iterates through the token stream, yielding each on channel token along the way.
   # After iteration has completed, the stream's position will be restored to where
   # it was before #walk was called. While #each or #each_on_channel does not change
   # the positions stream during iteration, #walk advances through the stream. This
   # makes it possible to look ahead and behind the current token during iteration.
   # If no block is given, an enumerator will be returned.
   #
   def walk
     block_given? or return enum_for( :walk )
     initial_position = @position
     begin
       while token = look and token.type != EOF
         consume
         yield( token )
       end
       return self
     ensure
       @position = initial_position
     end
   end

   #
   # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens
   # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters
   # are converted to integers with their <tt>to_i</tt> methods, and thus tokens
   # can be provided to specify start and stop. If a block is provided, tokens are
   # yielded and filtered out of the return array if the block returns a +false+
   # or +nil+ value.
   #
   def tokens( start = nil, stop = nil )
     stop.nil?  || stop >= @tokens.length and stop = @tokens.length - 1
     start.nil? || stop < 0 and start = 0
     tokens = @tokens[ start..stop ]

     if block_given?
       tokens.delete_if { |t| not yield( t ) }
     end

     return( tokens )
   end


   def at( i )
     @tokens.at i
   end

   #
   # identical to Array#[], as applied to the stream's token buffer
   #
   def []( i, *args )
     @tokens[ i, *args ]
   end

   ###### Standard Conversion Methods ###############################
   def inspect
     string = "#<%p: @token_source=%p @ %p/%p" %
       [ self.class, @token_source.class, @position, @tokens.length ]
     tk = look( -1 ) and string << " #{ tk.inspect } <--"
     tk = look( 1 ) and string << " --> #{ tk.inspect }"
     string << '>'
   end

   #
   # fetches the text content of all tokens between +start+ and +stop+ and
   # joins the chunks into a single string
   #
   def extract_text( start = 0, stop = @tokens.length - 1 )
     start = start.to_i.at_least( 0 )
     stop = stop.to_i.at_most( @tokens.length )
     @tokens[ start..stop ].map! { |t| t.text }.join( '' )
   end

   alias to_s extract_text

 end

 end