Tuesday, December 18, 2007

Extension methods and primitive obsession

In another water-cooler argument today, a couple of coworkers didn't like my extension method example.  One main problem is that it violates instance semantics, where you expect that a method call off an instance won't work if the instance is null.  However, extension methods break that convention, leading the developer to question every method call and wonder if it's an extension method or not.  For example, you can run into these types of scenarios:

string nullString = null;

bool isNull = nullString.IsNullOrEmpty();

In normal circumstances, the call to IsNullOrEmpty would throw a NullReferenceException.  Since we're using an extension method, we leave it up to the developer of the extension method to determine what to do with null references.

Since there's no way to describe to the user of the API whether or not the extension method handles nulls, or how it handles null references, this can lead to quite a bit of confusion to clients of that API, or later, those maintaining code using extension methods.

In addition to problems with dealing with null references (which Elton pointed out, could be better handled with design-by-contract), some examples of extension methods online propose examples that show more than a whiff of the "Primitive Obsession" code smell:

Dealing with primitive obsession

In both of the examples above (Scott cites David's example), an extension method is used to determine if a string is an email:

string email = txtEmailAddress.Text;

if (! email.IsValidEmailAddress())
{
    // oh noes!
}

It's something I've done a hundred times, taking raw text from user input and performing some validation to make sure it's the "right" kind of string I want.  But where do you stop with validation?  Do you assume all throughout the application that this string is the correct kind of string, or do you duplicate the validation?

An alternative approach is accept that classes are your friend, and create a small class to represent your "special" primitive.  Convert back and forth at the boundaries between your system and customer-facing layers.  Here's the new Email class:

public class Email
{
    private readonly string _value;
    private static readonly Regex _regex = new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$");

    public Email(string value)
    {
        if (!_regex.IsMatch(value))
            throw new ArgumentException("Invalid email format.", "value");

        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }

    public static implicit operator string(Email email)
    {
        return email.Value;
    }

    public static explicit operator Email(string value)
    {
        return new Email(value);
    }

    public static Email Parse(string email)
    {
        if (email == null)
            throw new ArgumentNullException("email");

        Email result = null;

        if (!TryParse(email, out result))
            throw new FormatException("Invalid email format.");

        return result;
    }

    public static bool TryParse(string email, out Email result)
    {
        if (!_regex.IsMatch(email))
        {
            result = null;
            return false;
        }

        result = new Email(email);
        return true;
    }
}

I do a few things to make it easy on developers to use an email class that can play well with strings as well as other use cases:

  • Made Email immutable
  • Defined conversion operators to and from string
  • Added the Try-Parse pattern

The usage of the Email class closely resembles usage for other string-friendly types, such as DateTime:

string inputEmail = txtEmailAddress.Text;

Email email;

if (! Email.TryParse(inputEmail, out email))
{
    // oh noes!
}

txtEmailAddress.Text = email;

Now I can go back and forth from strings and my Email class, plus I provided a way to convert without throwing exceptions.  This looks very similar to code dealing with textual date representations.

Yes, but

The final Email class takes more code to write than the original extension method.  However, now that we have a single class that plays nice with primitives, additional Email behavior has a nice home.  With a class in place, I can now model more expressive emails, such as ones that include names like "Ricky Bobby <ricky.bobby@rb.com>".  Once the home is created, behavior can start moving in.  Otherwise, validation would be sprinkled throughout the system at each user boundary, such as importing data, GUIs, etc.

If you find yourself adding logic to primitives to the point of obsession, it's a strong indicator you're suffering from primitive obsession and a nice, small, specialized class can help eliminate a lot of the duplication primitive obsession tends to create.

No comments: