Monday, December 3, 2007

Dealing with primitive obsession

One code smell I tend to miss a lot is primitive obsession.  Primitives are the building blocks of data in any programming language, such as strings, numbers, booleans, and so on.

Many times, primitives have special meaning, such as phone numbers, zip codes, money, etc.  Nearly every time I encounter these values, they're exposed as simple primitives:

public class Address
{
    public string ZipCode { get; set; }
}

But there are special rules for zip codes, such as they can only be in a couple formats in the US: "12345" or "12345-3467".  This logic is typically captured somewhere away from the "ZipCode" value, and typically duplicated throughout the application.  For some reason, I was averse to creating small objects to hold these values and their simple logic.  I don't really know why, as data objects tend to be highly cohesive and can cut down a lot of duplication.

Beyond what Fowler walks through, I need to add a couple more features to my data object to make it really useful.

Creating the data object

First I'll need to create the data object by following the steps in Fowler's book.  I'll make the ZipCode class a DDD Value Object, and this is what I end up with:

public class Address
{
    public ZipCode ZipCode { get; set; }
}

public class ZipCode
{
    private readonly string _value;

    public ZipCode(string value)
    {
        // perform regex matching to verify XXXXX or XXXXX-XXXX format
        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }
}

This is pretty much where Fowler's walkthrough stops.  But there are some issues with this implementation:

  • Now more difficult to deal with Zip in its native format, strings
  • Zip codes used to be easier to display

Both of these issues can be easy to fix with the .NET Framework's casting operators and available overrides.

Cleaning it up

First, I'll override the ToString() method and just output the internal value:

public override string ToString()
{
    return _value;
}

Lots of classes, tools, and frameworks use the ToString method to display the value of an object, and now it will use the internal value of the zip code instead of just outputting the name of the type (which is the default).

Next, I can create some casting operators to go to and from System.String.  Since zip codes are still dealt with mostly as strings in this system, I stuck with string instead of int or some other primitive.  Also, many other countries have different zip code formats, so I stayed with strings.  Here are the cast operators, both implicit and explicit:

public static implicit operator string(ZipCode zipCode)
{
    return zipCode.Value;
}

public static explicit operator ZipCode(string value)
{
    return new ZipCode(value);
}

I prefer explicit operators when converting from primitives, and implicit operators when converting to primitives.  FDG guidelines for conversion operators are:

DO NOT provide a conversion operator if such conversion is not clearly expected by the end users.

DO NOT define conversion operators outside of a type's domain.

DO NOT provide an implicit conversion if the conversion is potentially lossy.

DO NOT throw exceptions from implicit casts.

DO throw System.InvalidCastException if a call to a cast operator results in lossy conversion and the contract of the operator does not allow lossy conversions.

I meet all of these guidelines, so I think this implementation will work.

End result

Usability with the changed ZipCode class is much improved now:

Address address = new Address();

address.ZipCode = new ZipCode("12345"); // constructor
address.ZipCode = (ZipCode) "12345"; // explicit operator

string zip = address.ZipCode; // implicit operator

Console.WriteLine("ZipCode: {0}", address.ZipCode); // ToString method

Basically, my ZipCode class now "plays nice" with strings and code that expects strings.

With any hurdles out of the way for using simple data objects, I can eliminate a lot of duplication and scattered logic by creating small, specialized classes for all of the special primitives in my app.

2 comments:

Colin Jack said...

Just wondered why you favor explicit operators when converting from primitives?

Jimmy Bogard said...

@colin

Most conversions from primitives can be lossy. In the zip code example, not all strings are allowed to be zip codes, but all zip codes are allowed to be strings.

Explicit casting is a way for the developer to make the decision to convert at compile-time, rather than accidentally at run-time.

For example, there are implicit conversions from int to long, but only explicit conversions from long to int. Lossy (or special) conversions require explicit casts.