changed around filtering so you can just say "decode.utf8" or "decode.<whatever>" for generic expression decoding
diff --git a/doc/build/content/filtering.txt b/doc/build/content/filtering.txt index 519c830..05d18e8 100644 --- a/doc/build/content/filtering.txt +++ b/doc/build/content/filtering.txt
@@ -17,6 +17,7 @@ * `trim` : whitespace trimming, provided by `string.strip()` * `entity` : produces HTML entity references for applicable strings, derived from `htmlentitydefs` * `unicode` : produces a Python unicode string (this function is applied by default). +* `decode.<some encoding>` : decode input into a Python unicode with the specified encoding To apply more than one filter, separate them by a comma: @@ -58,7 +59,10 @@ {python} t = TemplateLookup(directories=['/tmp'], default_filters=['unicode']) -Further information about the default usage of `unicode` can be found in [unicode](rel:unicode). +To replace the usual `unicode` function with a specific encoding, the `decode` filter can be substituted: + + {python} + t = TemplateLookup(directories=['/tmp'], default_filters=['decode.utf8']) Any string name can be added to `default_filters` where it will be added to all expressions as a filter. The filters are applied from left to right, meaning the leftmost filter is applied first. @@ -81,12 +85,6 @@ def render_body(context): context.write(myfilter(unicode("some text"))) -The `unicode` function can be removed or replaced with another function, such as this example which assumes all expressions are utf-8 encoded strings: - - {python} - t = Template(text, - default_filters=['encodeutf8'], - imports=["encodeutf8 = lambda x:unicode(x, 'utf-8')"]) ### Filtering Defs
diff --git a/doc/build/content/unicode.txt b/doc/build/content/unicode.txt index 5f0a26a..5ac93fb 100644 --- a/doc/build/content/unicode.txt +++ b/doc/build/content/unicode.txt
@@ -47,7 +47,7 @@ {python} context.write(unicode("hello world")) -That is, **the output of all expressions is run through the `unicode` builtin**. As of 0.1.2 this behavior can be modified, but its recommended that its left in place, or substituted with another unicode-producing function. This is mainly so that any expression, such as an expression that results in an integer result or an object instance which contains a `__str__()` method, can be properly converted to a string before rendering without requiring any explicit step, but it also serves to ensure that the entire output stream of the template is available as a unicode object, which can then be rendered to any desired encoding. So the main implication of this is that **any raw bytestrings that contain an encoding other than ascii must first be decoded to a Python unicode object**. It means you can't say this: +That is, **the output of all expressions is run through the `unicode` builtin**. This is the default setting, and can be modified to expect various encodings. The `unicode` step serves both the purpose of rendering non-string expressions into strings (such as integers or objects which contain `__str()__` methods), and to ensure that the final output stream is constructed as a unicode object. The main implication of this is that **any raw bytestrings that contain an encoding other than ascii must first be decoded to a Python unicode object**. It means you can't say this: ${"voix m’a réveillé."} # error ! @@ -58,20 +58,16 @@ Similarly, if you are reading data from a file, or returning data from some object that is returning a Python bytestring containing a non-ascii encoding, you have to explcitly decode to unicode first, such as: ${call_my_object().decode('utf-8')} + +If you want a certain encoding applied to *all* expressions, override the `unicode` builtin with the `decode` builtin at the `Template` or `TemplateLookup` level: - * **why convert all objects to a unicode first? What if I dont want that conversion to occur ?** - While there is an option to turn this behavior off, Mako is a textual template language, and its job is to produce textual output. It makes no sense to output data that is not in a string format. If you want to output raw bytes or something, such as for an image file, typically this should not be occuring from within a textual template. Besides, if you really want to, you can always write whatever data you want to the `context` directly: + {python} + t = Template(templatetext, default_filters=['decode.utf8']) - {python} - context.write(get_some_whatever_data()) +Note that the built-in `decode` object is slower than the `unicode` function, since unlike `unicode` its not a Python builtin, and it also checks the type of the incoming data to determine if string conversion is needed first. - * **why use the unicode() function? Why not call str()?** - The output stream of a Mako template is a Python unicode object, which is then encoded when `render()` is called. Calling `str()` would produce a plain bytestring, not a unicode string. Plus it cant handle any non-`ascii` encoded values. +The `default_filters` argument can be used to entirely customize the filtering process of expressions. This argument is described in [filtering_expression_defaultfilters](rel:filtering_expression_defaultfilters). - * **why not send the input/output encoding of the template as the "encoding" argument to the "unicode" function? ** - This would be good. But unfortunately, when presented with the `encoding` argument, the `unicode` function suddenly gives up its role as the converter-of-non-string objects, and expressions that produce numerics and other non-string types (which are perfectly stringifiable otherwise via their `__str__()` function) will produce errors. - - * **ok, why not use a more intelligent function than unicode(), which looks at the input and figures out what kind of encoding/string conversion is needed ?** - the `unicode` function, being a Python builtin, is much faster than any custom function written in Python; using a slower function by default would introduce a usually unnecessary latency to Mako's performance. Additionally, Mako would rather have a very simple behavior rather than something more magical/implicit going on. Since if you want some other behavior, you can change it! - -If you want all expressions passed through a function other than `unicode()`, or even if you would like to try your hand at sending raw bytestrings straight through, work with the `default_filters` argument to `Template` or `TemplateLookup`, described in [filtering_expression_defaultfilters](rel:filtering_expression_defaultfilters). - ### Defining Output Encoding Now that we have a template which produces a pure unicode output stream, all the hard work is done. We can take the output and do anything with it.
diff --git a/lib/mako/codegen.py b/lib/mako/codegen.py index d17ae64..477ccad 100644 --- a/lib/mako/codegen.py +++ b/lib/mako/codegen.py
@@ -391,8 +391,20 @@ def create_filter_callable(self, args, target, is_expression): """write a filter-applying expression based on the filters present in the given filter names, adjusting for the global 'default' filter aliases as needed.""" - d = dict([(k, (v is unicode and 'unicode' or "filters." + v.func_name)) for k, v in filters.DEFAULT_ESCAPES.iteritems()]) - + def locate_encode(name): + if re.match(r'decode\..+', name): + return "filters." + name + elif name == 'unicode': + return 'unicode' + else: + return \ + {'x':'filters.xml_escape', + 'h':'filters.html_escape', + 'u':'filters.url_escape', + 'trim':'filters.trim', + 'entity':'filters.html_entities_escape', + }.get(name, name) + if is_expression and self.compiler.pagetag: args = self.compiler.pagetag.filter_args.args + args if is_expression and self.compiler.default_filters: @@ -402,10 +414,13 @@ m = re.match(r'(.+?)(\(.*\))', e) if m: (ident, fargs) = m.group(1,2) - f = d.get(ident, ident) + f = locate_encode(ident) e = f + fargs else: - e = d.get(e, e) + x = e + e = locate_encode(e) + if e is None: + raise "der its none " + x target = "%s(%s)" % (e, target) return target
diff --git a/lib/mako/filters.py b/lib/mako/filters.py index 5d782ee..d59e974 100644 --- a/lib/mako/filters.py +++ b/lib/mako/filters.py
@@ -37,7 +37,20 @@ def trim(string): return string.strip() - + + +class Decode(object): + def __getattr__(self, key): + def decode(x): + if isinstance(x, unicode): + return x + if not isinstance(x, str): + return str(x) + return unicode(x, encoding=key) + return decode +decode = Decode() + + _ASCII_re = re.compile(r'\A[\x00-\x7f]*\Z') def is_ascii_str(text): @@ -142,12 +155,13 @@ # TODO: options to make this dynamic per-compilation will be added in a later release DEFAULT_ESCAPES = { - 'x':xml_escape, - 'h':html_escape, - 'u':url_escape, - 'trim':trim, - 'entity':html_entities_escape, - 'unicode':unicode + 'x':'filters.xml_escape', + 'h':'filters.html_escape', + 'u':'filters.url_escape', + 'trim':'filters.trim', + 'entity':'filters.html_entities_escape', + 'unicode':'unicode', + 'decode':'decode' }
diff --git a/test/filters.py b/test/filters.py index 314a6bc..8d9447c 100644 --- a/test/filters.py +++ b/test/filters.py
@@ -53,6 +53,13 @@ #print t.code assert t.render().strip()=="trim this string: some string to trim continue" + def test_encode_filter(self): + t = Template("""# coding: utf-8 + some stuff.... ${x} + """, default_filters=['decode.utf8']) + #print t.code + assert t.render_unicode(x="voix m’a réveillé").strip() == u"some stuff.... voix m’a réveillé" + def test_custom_default(self): t = Template(""" <%!