regex – Parsing template schema with Python and Regular Expressions – Education Career Blog

I’m working on a script for work to extract data from an old template engine schema:

%price%
{
$54.99
}
%/price%

%model%
{
WRT54G
}
%/model%

%brand%{
LINKSYS
}
%/brand%

everything within the % % is the key, and everything in the { } is the value. Using Python and regex, I was able to get this far: (?<=%)(?P\w*?)(?=\%)

which returns ‘price’, ‘model’, ‘brand’

I’m just having a problem getting it match the bracket data as a value

,

I agree with Devin that a single regex isn’t the best solution. If there do happen to be any strange cases that aren’t handled by your regex, there’s a real risk that you won’t find out.

I’d suggest using a finite state machine approach. Parse the file line by line, first looking for a price-model-brand block, then parse whatever is within the braces. Also, make sure to note if any blocks aren’t opened or closed correctly as these are probably malformed.

You should be able to write something like this in python in about 30-40 lines of code.

,

just for grins:

import re
RE_kv = re.compile("\%(.*)%\.*?\n?\s*{\s*(.*)")
matches = re.findall(RE_kv, test, re.M)
for k, v in matches:
    print k, v

output:

price $54.99
model WRT54G
brand LINKSYS

Note I did just enough regex to get the matches to show up, it’s not even bounded at the end for the close brace. Use at your own risk.

,

It looks like it’d be easier to do with re.Scanner (sadly undocumented) than with a single regular expression.

Leave a Comment