Validate Python string translation in Transifex

Transifex already supported validating translations of old styled Python strings, e.g.,

[sourcecode language=”python”]
"A sample string with a %(keyword)s argument." % {‘keyword': ‘key word’}

The validation is done by checking if all the positional and keyword arguments are present in the translation string and the translation string does not contain any extra argument which is not in the source string. You can have a look at the validator code here.

However, the existing validator is not able to check for replacement fields in new style Python format strings, e.g.

[sourcecode language=”python”]
"This is a sample string with different replacement fields: {} {1} {foo["bar"]:^30}".format(
"arg0", "arg1", foo={"bar":"a kwarg"})

I tried to devise a regex to extract the replacement fields in the Python format string based on the grammar defined here.

[sourcecode language=”python”]
# Regex to find format specifiers in a Python string

import re

fieldname = ‘(?P(?Pw+|d+){0,1}’
conversion = ‘(?Pr|s)’
align = ‘(?:(?P[^}{]?)(?P[<>^=]))’
sign = ‘(?P[+- ])’
width = ‘(?Pd+)’
precision = ‘(?Pd+)’
= ‘(?P[bcdeEfFgGnosxX%])’
formatspec = ”
‘)’ % {
‘align': align,
‘sign': sign,
‘width': width,
‘precision': precision,
‘type': type

replacementfield = ”
‘}’ % {
name': fieldname,
‘conversion': conversion,
spec': format_spec

printfre = re.compile(
‘(?:’ + replacement
field + ‘|’

Well, with the above, I was able to parse almost all the cases discussed here except for this one:

[sourcecode language=”python”]
import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
s = ‘{:%Y-%m-%d %H:%M:%S}’.format(d)

I was not sure how I could fit the above case to my regex. After some discussions in #python on IRC, I found some limitations of regular expressions and that it is not Turing complete. People suggested me to use some parser tools.

I, being a strong supporter of “Never re invent the wheel”, gave another shot to find some existing solution and lucky I was to come across formatterparser() of a Python string object.  It correctly found all replacement fields in python format strings properly and returned  an iterable of tuples (literal_textfield_nameformat_specconversion). All I needed then was to convert this info to a list of replacement fields in a format string. A simple script below would is all that I needed to extract replacement fields in a format string in Python:

[sourcecode language=”python”]
replacement_fields = []
s = "{foo:^+30f} bar {0} foo {} {time:%Y-%m-%d %H:%M:%S}"

for literaltext, fieldname, formatspec, conversion in
if field
name is not None:
replacementfield = fieldname
if conversion is not None:
replacementfield += ‘!’ + conversion
if format
replacementfield += ‘:’ + formatspec
replacementfield = ‘{‘ + replacementfield + ‘}’
print replacement_fields
["{foo:^+30f}", "{0}", "{}", "{time:%Y-%m-%d %H:%M:%S}"]


That’s all. Simple and easy, isn’t it?

Check the next post: My talk got selected for #Pycon India 2012 »

Share on:
comments powered by Disqus