bitwix

Tangential comments about Software Development

Tuesday, June 27, 2006

Regex, Roman Numerals and TDD

I read a blog on Roman Numeral Conversion and decided to do the reverse, roman to integer conversion. I wanted to use Regular Expressions as I thought they would allow a neat treatment. And I wanted to use Test Driven Development (TDD) to ensure that I had written what I thought I had written.

I started with a simple test, and a simple class (in that order). After one and a half iterations, the code was like this. By one and a half iterations, I mean one working test, one failing test.
[TestFixture]
public class RomanToIntTester
{
[Test]
public void TestZero()
{
RomanToInt r2i = new RomanToInt();

Assert.AreEqual(0, r2i.Convert( "" ) );
}
[Test]
public void TestOne()
{
RomanToInt r2i = new RomanToInt();

Assert.AreEqual(1, r2i.Convert("I"));
}
}

public class RomanToInt
{
public RomanToInt()
{
}
public int Convert(string str)
{
return 0;
}
}


I chose not to work through tests for II = 2, III = 3, IV = 4, but instead to write the daddy of all tests, using Excel to do the repetitive coding for me. I set up a 4,000 row spreadsheet, and put the following formulae into the cells on row 4000 -
A4000 =A3999+1
B4000 =ROMAN(A4000,0)
C4000 =CONCATENATE("Assert.AreEqual(",A4000,", r2i.Convert(""",B4000,"""));")

I then copied these into rows 1 to 3999, and then typed 0 into cell A1. Column C was now my test code. I copied it into my test class and had the following -

[Test]
public void TestZeroTo3999()
{
RomanToInt r2i = new RomanToInt();
Assert.AreEqual(0, r2i.Convert(""));
Assert.AreEqual(1, r2i.Convert("I"));
Assert.AreEqual(2, r2i.Convert("II"));
...
Assert.AreEqual(3997, r2i.Convert("MMMCMXCVII"));
Assert.AreEqual(3998, r2i.Convert("MMMCMXCVIII"));
Assert.AreEqual(3999, r2i.Convert("MMMCMXCIX"));
}


I then set about the task. I wrote tests then code to remove say "IV" and replace it with "4". I used a Regex Replace call for this. Then tests and code to replace all two-letter values - IV, IX, XL, XC, CD and CM - with a single digit or a letter not otherwise used - "F" for "XL" (forty), "N" for "XC" (ninety), "A" and "B" for "CD" and "CM".
    Assert.AreEqual("9", r2i.ReplaceAll("IX"));
Assert.AreEqual("MBN9", r2i.ReplaceAll("MCMXCIX"));


I wrote tests and a function to count the number of occurences of a character and return that count multiplied by a base value.
    Assert.AreEqual(3, r2i.RepeatCountAndMultiply("XXXVIII", "I", 1));
Assert.AreEqual(5, r2is.RepeatCountAndMultiply("XXXVIII", "V", 5));
Assert.AreEqual(30, r2i.RepeatCountAndMultiply("XXXVIII", "X", 10));

All that remained was to implement the Convert function in full, as a call to ReplaceAll followed by incrementing a value with successive calls to RepeatCountAndMultiply. And guess what? The TestZeroTo3999 test passed first time. Does the smugness come through in the tone of this post? I quickly
refactored the class and tests to make most of the functions protected, and therefore deriving the tester class from the implementation class (is this a good idea?) and I considered myself done.

Here's the conversion class in full.
namespace MusingsLib
{
public class RomanToInt
{
public RomanToInt()
{
}
public int Convert(string str)
{
int retval = 0;
str = ReplaceAll(str);
retval += RepeatCountAndMultiply(str, "I", 1);
retval += RepeatCountAndMultiply(str, "4", 4);
retval += RepeatCountAndMultiply(str, "V", 5);
retval += RepeatCountAndMultiply(str, "9", 9);
retval += RepeatCountAndMultiply(str, "X", 10);
retval += RepeatCountAndMultiply(str, "F", 40);
retval += RepeatCountAndMultiply(str, "L", 50);
retval += RepeatCountAndMultiply(str, "N", 90);
retval += RepeatCountAndMultiply(str, "C", 100);
retval += RepeatCountAndMultiply(str, "A", 400);
retval += RepeatCountAndMultiply(str, "D", 500);
retval += RepeatCountAndMultiply(str, "B", 900);
retval += RepeatCountAndMultiply(str, "M", 1000);
return retval;
}

protected string ReplaceAll(string str)
{
str = ReplaceRomanWithAlternative(str, "IV", "4");
str = ReplaceRomanWithAlternative(str, "IX", "9");
str = ReplaceRomanWithAlternative(str, "XL", "F");
str = ReplaceRomanWithAlternative(str, "XC", "N");
str = ReplaceRomanWithAlternative(str, "CD", "A");
str = ReplaceRomanWithAlternative(str, "CM", "B");
return str;
}

protected int RepeatCountAndMultiply(string inputstr, string roman, int baseval)
{
Regex r = new Regex( roman );
return r.Matches(inputstr).Count * baseval;
}

protected string ReplaceRomanWithAlternative(string inputstr, string roman, string alt)
{
Regex r = new Regex( roman );
return r.Replace(inputstr, alt);
}
}
}


Reference : www.differentpla.net/node/58

Saturday, June 24, 2006

Regex and TDD - sine qua non

I'm hauling myself up the regular expressions learning curve. What worries me is that it is so easy to use them wrongly. When there's an expression like "(?=(\d\d\d)*$)(\d\d\d)" how sure are you of what it does?
That's where Test Driven Development comes to the rescue. I'm telling myself to write at least half a dozen tests to check that the behaviour is as expected. Take the following function, designed to turn a long value like 1234567 into a string value of 1,234,567. A neat use of regular expressions, processing a set of matches using a lookahead. But can you see the bug?


public string BignessWithCommas
{
get
{
string retval = string.Empty;
string str = m_bigness.ToString();

Regex r = new Regex(@"(?=(\d\d\d)*$)(\d\d\d)");
int tripletlen = 0;
foreach (Match m in r.Matches(str))
{
if (retval != string.Empty)
retval += ",";
retval += m.Value;
tripletlen += 3;
}
if( tripletlen < str.Length )
retval = str.Substring( 0, str.Length - tripletlen ) + "," + retval;
return retval;
}
}


No, I couldn't see the bug either. Thankfully, I tested it with

[Test]
public void ItemWithCommas()
{
DuctTop100Item x = new DuctTop100Item("Me", 17);
Assert.AreEqual("17", x.BignessWithCommas);
x.Bigness = 179; Assert.AreEqual("179", x.BignessWithCommas);
x.Bigness = 1790; Assert.AreEqual("1,790", x.BignessWithCommas);
x.Bigness = 12790; Assert.AreEqual("12,790", x.BignessWithCommas);
x.Bigness = 123790; Assert.AreEqual("123,790", x.BignessWithCommas);
x.Bigness = 1234790; Assert.AreEqual("1,234,790", x.BignessWithCommas);
x.Bigness = 1234567890; Assert.AreEqual("1,234,567,890", x.BignessWithCommas);
}

and it went wrong immediately. As so often is the case, it failed on the simple example, because my whole brain was used up doing the difficult bit.

Thursday, June 15, 2006

BookDiff on XP Explained

First EditionSecond Edition

I needed a copy of Kent Beck's Extreme Programming Explained, and was pleased to see a new edition. True to the XP ethos, it has been refactored. But I fear it may not pass a suite of tests, such as repeating all that was good in the first edition.

Kent shares the Second Edition (SE) credit with Cynthia Andres. I wonder whose idea was it to forbid sexual relationships between team members (SE p43), and are they ashamed of their heterosexual stereotyping? And I’m suspicious that Kent and Cynthia are closer than some other co-authors – on page 104 there is a reference to Beth Andres-Beck.

So we used to have four values – communication, simplicity, feedback and courage. SE adds Respect. The First Edition (FE) chapter on principles is 6 pages long, the SE chapter is 12 pages, and there’s almost no common ground between the two.

The shock is that the twelve practices have gone, at least as a tightly defined list. I miss them so much I put together http://www.bitwix.com/xpdiagram.htm to remind me what they are. Instead of the excellent diagram on FE page 70, we have spidery maps.

And so it goes on. Do you want to read about Frederick Taylor, the industrial engineer who used systematic methods to improve factory production? In FE, he is in the bibliography but dismissed with “as strategies for software development [his principles make] no business sense and no human sense”. In SE, he gets a whole chapter.

My disappointment is that Kent’s first edition was so practical and so readable, it was inspiring. The second edition is neither. It loses the simple, countable sets of values, principles and practices in a mess of new ideas. And the ideas are communicated in such a way that I find myself skipping to the end of the chapter.

Thankfully, there are secondhand copies of the first edition available. Embrace change? Not in this case.