Tangential comments about Software Development

Saturday, June 24, 2006

Regex and TDD - sine qua non

I'm hauling myself up the regular expressions learning curve. What worries me is that it is so easy to use them wrongly. When there's an expression like "(?=(\d\d\d)*$)(\d\d\d)" how sure are you of what it does?
That's where Test Driven Development comes to the rescue. I'm telling myself to write at least half a dozen tests to check that the behaviour is as expected. Take the following function, designed to turn a long value like 1234567 into a string value of 1,234,567. A neat use of regular expressions, processing a set of matches using a lookahead. But can you see the bug?

public string BignessWithCommas
string retval = string.Empty;
string str = m_bigness.ToString();

Regex r = new Regex(@"(?=(\d\d\d)*$)(\d\d\d)");
int tripletlen = 0;
foreach (Match m in r.Matches(str))
if (retval != string.Empty)
retval += ",";
retval += m.Value;
tripletlen += 3;
if( tripletlen < str.Length )
retval = str.Substring( 0, str.Length - tripletlen ) + "," + retval;
return retval;

No, I couldn't see the bug either. Thankfully, I tested it with

public void ItemWithCommas()
DuctTop100Item x = new DuctTop100Item("Me", 17);
Assert.AreEqual("17", x.BignessWithCommas);
x.Bigness = 179; Assert.AreEqual("179", x.BignessWithCommas);
x.Bigness = 1790; Assert.AreEqual("1,790", x.BignessWithCommas);
x.Bigness = 12790; Assert.AreEqual("12,790", x.BignessWithCommas);
x.Bigness = 123790; Assert.AreEqual("123,790", x.BignessWithCommas);
x.Bigness = 1234790; Assert.AreEqual("1,234,790", x.BignessWithCommas);
x.Bigness = 1234567890; Assert.AreEqual("1,234,567,890", x.BignessWithCommas);

and it went wrong immediately. As so often is the case, it failed on the simple example, because my whole brain was used up doing the difficult bit.