Tuesday, February 22, 2011

Specification Pattern, Entity Framework & LINQ

 

Firstly just to clarify I am going to be talking about the OOP Specification Pattern not the data pattern commonly found in the SID (Shared Information & Data) model.

Much has been said about the specification pattern so I’m not going to go into that, if you want an overview check out these posts:

http://www.lostechies.com/blogs/chrismissal/archive/2009/09/10/using-the-specification-pattern-for-querying.aspx
http://devlicio.us/blogs/jeff_perrin/archive/2006/12/13/the-specification-pattern.aspx

In this post I’m going to demonstrate how you can make use of the specification pattern to query Entity Framework and create reusable, testable query objects and eliminate inline LINQ queries.

The Smell

When I first got started with Entity Framework way back in 2008 when EF was still in it’s infancy we had lot’s of inline LINQ all over the code base and specific methods on our repositories for querying requirements (which any OOP purist will tell you is bad).

We had a service layer method which more or less looked something like:

     public IEnumerable<Product> FindAllActiveProducts(string keyword)
     {
         return productRepository.FindAllActive(keyword); 
     }
And then on our repository a method which looked like:
        public IEnumerable<Product> FindAllActive(string keyword)
        {
            var query = context.CreateObjectSet<Product>().Include("Category");


            var products = from p in query
                           where p.IsActive
                                 && p.Name.Contains(keyword)
                           select p;

            return products.ToList();

        }

Now whilst this is not all bad it does present a few code smells:

  • You end up with lot’s of methods on your repositories to handle different query scenarios
  • There’s no way to test the query in isolation
  • Magic strings for the Include path

I’m not going to touch on whether or not you should be using repositories for this type of query scenario because that’s a whole other topic and a quick Google search will yield many posts debating this very subject. 

However I will say that if you find yourself with lot’s of methods on your repositories that only perform query operations then this is a big code smell.

The Specification Interface

The first step is to define an interface for our Specification.

   public interface ISpecification<T>
   {
       Expression<Func<T, bool>> Predicate { get; }

       IFetchStrategy<T> FetchStrategy { get; set; }

       bool IsSatisifedBy(T entity);
   }

If you’re familiar with this pattern you will notice the addition of two properties Predicate and FetchStrategy.

The Predicate is ultimately what will be using to perform the query. You will notice this is read-only which forces it to be defined within the specification implementation.

The FetchStrategy is an abstraction which defines the child objects that should be retrieved when loading the entity. More on this below.

Fetch Strategy

For those of you who don’t know when you load an entity from EF & other ORMs you can choose to either load just the root properties or load the related entities at the same time. The way to do this is in EF is by using the .Include method on the ObjectQuery.

This works fine, however fetch strategies are likely needed to be reused in different places so having the .Include with magic strings everywhere becomes a real maintenance headache.

In order to alleviate this pain I’ve created an abstraction on the concept. 

   public interface IFetchStrategy<T>
   {
       IEnumerable<string> IncludePaths { get; }

       IFetchStrategy<T> Include(Expression<Func<T, object>> path);

       IFetchStrategy<T> Include(string path);
   }

 

Here is a generic implementation of this IFetchStrategy.

    public class GenericFetchStrategy<T> : IFetchStrategy<T>
    {
        private readonly IList<string> properties;

        public GenericFetchStrategy()
        {
            properties = new List<string>();
        }

        #region IFetchStrategy<T> Members

        public IEnumerable<string> IncludePaths
        {
            get { return properties; }
        }

        public IFetchStrategy<T> Include(Expression<Func<T, object>> path)
        {
            properties.Add(path.ToPropertyName());
            return this;
        }

        public IFetchStrategy<T> Include(string path)
        {
            properties.Add(path);
            return this;
        }

        #endregion
    }

    public static class Extensions
    {
        public static string ToPropertyName<T>(this Expression<Func<T, object>> selector)
        {
            var me = selector.Body as MemberExpression;
            if (me == null)
            {
                throw new ArgumentException("MemberException expected.");
            }

            var propertyName = me.ToString().Remove(0, 2);
            return propertyName;
        }
    }

This is all fairly self explanatory, all it does is maintain a list of the include paths and provides a fluent interface. All the ToPropertyName extension does is take a LINQ expression and returns the name of the property.

Do note however that there is still one Include method that takes a string as the parameter.
This is here to support really deep object hierarchies which can’t be represented as an expression.

You could easily create your own implementations for each scenario e.g. FullProductFetchStrategy and use that, however I tend to define the fetch strategy within the specification itself as you will soon see.

The Specification Implementation

First off we have a base class which contains the basic functionality and implements the ISpecification interface.

    public abstract class SpecificationBase<T> : ISpecification<T>
    {
        protected IFetchStrategy<T> fetchStrategy;
        protected Expression<Func<T, bool>> predicate;

        protected SpecificationBase()
        {
            fetchStrategy = new GenericFetchStrategy<T>();
        }

        public Expression<Func<T, bool>> Predicate
        {
            get { return predicate; }
        }

        public IFetchStrategy<T> FetchStrategy
        {
            get { return fetchStrategy; }
            set { fetchStrategy = value; }
        }

        public bool IsSatisifedBy(T entity)
        {
            return new[] {entity}.AsQueryable().Any(predicate); 
        }
    }

I have given the fetch strategy a getter & setter as it provides a bit of flexibility to the consumers but arguably you could make this read only and force instantiation in the constructor.

Now returning to the original example of finding active products with a similar name to the keyword provided here is the ActiveProductsByNameSpec.

   public class ActiveProductsByNameSpec : SpecificationBase<Product>
   {
       public ActiveProductsByNameSpec(string keyword)
       {
           predicate = p => p.Name.Contains(keyword) && p.IsActive;

           fetchStrategy = new GenericFetchStrategy<Product>().Include(p => p.Category);
       }
   }

As you can see everything is defined in the constructor.

Now you could expose a property for the keyword argument and have a single function which builds the predicate.

My preference however is to do everything in the constructor as it is immediately obvious what the requirements for this class are and by exposing properties you risk having required values not set leading to subtle bugs.

 

Testing the Specification

Testability for me is one the greatest benefits of using this pattern. With deep object graphs LINQ queries can soon grow in size and complexity. More code == More chance of bugs.

I can count 4 different test cases for this spec and here’s how we can test them.

 
        [TestMethod]
        public void When_Product_Not_Active_Predicate_Should_Find_No_Match()
        {
            var product = new Product {IsActive = false, Name = "Resharper"};
            
            var spec = new ActiveProductsByNameSpec("Resharper");

            var actual = spec.IsSatisifedBy(product); 

            Assert.IsFalse(actual);

        }

        [TestMethod]
        public void 
            When_Product_IsActive_But_Does_Not_Contain_Keyword_Predicate_Should_Find_No_Match()
        {
            var product = new Product { IsActive = true, Name = "Visual Studio" };

            var spec = new ActiveProductsByNameSpec("Resharper");

            var actual = spec.IsSatisifedBy(product);

            Assert.IsFalse(actual);

        }

        [TestMethod]
        public void 
            When_Product_Does_Not_Contain_Keyword_And_Is_Not_Active_Predicate_Should_Find_No_Match()
        {
            var product = new Product { IsActive = false, Name = "Visual Studio" };

            var spec = new ActiveProductsByNameSpec("Resharper");

            var actual = spec.IsSatisifedBy(product);

            Assert.IsFalse(actual);
        }

        [TestMethod]
        public void 
            When_Product_IsActive_And_Contains_Keyword_Predicate_Should_Find_Match()
        {
            var product = new Product { IsActive = true, Name = "Resharper" };

            var spec = new ActiveProductsByNameSpec("Resharper");

            var actual = spec.IsSatisifedBy(product);

            Assert.IsTrue(actual);
        }

 

The Generic Repository

Now that we have our spec we need to create a generic repository which takes the ISpecification interface and returns some Entities.

   public interface IGenericQueryRepository
   {
       T Load<T>(ISpecification<T> spec);

       IEnumerable<T> LoadAll<T>(ISpecification<T> spec);

       bool Matches<T>(ISpecification<T> spec);
   }

And this is implemented by:

    public class GenericQueryRepository : IGenericQueryRepository
    {
        private ObjectContext context;

        #region IGenericQueryRepository Members

        public T Load<T>(ISpecification<T> spec)
        {
            var query = GetQuery(spec.FetchStrategy);

            return query.FirstOrDefault(spec.Predicate);
        }

        public IEnumerable<T> LoadAll<T>(ISpecification<T> spec)
        {
            var query = GetQuery(spec.FetchStrategy);

            return query.ToList();
        }

        public bool Matches<T>(ISpecification<T> spec)
        {
            var query = GetQuery(spec.FetchStrategy);

            return query.Any(spec.Predicate);
        }

        #endregion

        private IQueryable<T> GetQuery<T>(IFetchStrategy<T> fetchStrategy)
        {
            ObjectQuery<T> query = context.CreateObjectSet<T>();

            if (fetchStrategy == null)
            {
                return query;
            }

            foreach (var path in fetchStrategy.IncludePaths)
            {
                query = query.Include(path);
            }

            return query;
        }
    }

 

Pulling it all together

Now that we have a generic repository we can safely get rid of the FindAllActive method on our ProductRepository and instead change our service layer to depend on the IGenericQueryRepository and instantiate the specification like so.

       public IEnumerable<Product> FindAllActiveProducts(string keyword)
       {
           var spec = new ActiveProductsByNameSpec(keyword);

           return queryRepository.LoadAll(spec); 
       }

 

Conclusion

Well that’s it. I hope I’ve demonstrated how you can reduce the number of inline LINQ queries and ease the testability of such queries. On my team it is now a rule that all LINQ to Entities queries are defined as Specifications.

The only element of this approach that is specific to Entity Framework is the fetch strategy approach I’ve used and I’m sure this could be easily adapted to fit with other ORMs that support LINQ.

As always I’m happy to hear any feedback and feel free to contact me should you need any clarification.

Tuesday, February 1, 2011

Managing Change in Long Running Workflows Part 2

 

Recently I wrote about how some of the Serialization problems you can face when dealing with long running workflows. In this post I’m going to cover how I deal with logic changes in persisted workflows and also how you can make logic changes to your workflows without having to deploy code.

Hosting the Workflow

When it comes to hosting your workflows there are really only two options to consider.

Self-Hosting
WCF Workflow Service

To be honest I only looked at the WCF Workflow Service model briefly and decided it wasn’t the best fit for the requirements as it couldn’t provide the same benefits of self-hosting within a Workflow Application.

Workflow Instance Store

In order to persist Workflow instances you need to first create the Workflow Instance Store database.

For this you will need to use the following scripts:

C:\Windows\Microsoft.NET\Framework\v4.0.30319\SQL\en\SqlWorkflowInstanceStoreSchema.sql
C:\Windows\Microsoft.NET\Framework\v4.0.30319\SQL\en\SqlWorkflowInstanceStoreLogic.sql

 

The Workflow

The scenario I’m going to use for this post is that of an order approval workflow. So after an order is created it goes into a “Pending” state, a workflow is then started requiring two users to approve before it can be changed to an “Approved” state.

This is represented in the below diagram.

OrderApprovalWorkflow

Bookmarks

If you already know how to implement long running workflows using Bookmarks then you can skip down to the next section.

The way to have points in your workflows that trigger persistence and can wait for external input are called Bookmarks.

In above diagram the “Wait for Approval” activity is a Bookmark and looks like the following:

 [Serializable]
 public sealed class WaitForApproval : NativeActivity<bool>
 {
     public InArgument<int> UserId { get; set; }

     protected override bool CanInduceIdle
     {
         get { return true; }
     }

     protected override void Execute(NativeActivityContext context)
     {
         var bookmarkName = "waitingFor_" + UserId.Get(context);

         context.CreateBookmark(bookmarkName, OnReadComplete);
     }

     private void OnReadComplete(NativeActivityContext context, Bookmark bookmark, object state)
     {
         var input = Convert.ToBoolean(state, new CultureInfo("EN-us"));

         context.SetValue(Result, input);
     }
 }

Take note of the bookmark name as this will be used later when resuming the workflow.

TIP: The bookmark is stored in the BlockingBookmarks field on your InstancesTable in your workflow instance database.

The Problem

The problem you can face after you’ve persisted your workflow instance is when the workflow definition changes and you try to load the instance using WorkflowApplication.Load.

A number of changes detailed below can lead to invalidating the workflow instance and will cause a System.Activities.ValidationException to be thrown.

The somewhat “official" word on this found here is that “WF4 is not able to handle runtime changes to the workflow definition”. This was elaborated a bit more and here are the “official” lists of breaking and non-breaking changes.

Non-Breaking Changes:

  • Changes to the activities code
  • Changes which don’t affect the number or types of arguments/variables
  • Addition or removal of member fields or properties

Breaking Changes:

  • Renaming or removing methods that had been used as bookmark, fault or completion callbacks
  • Adding new arguments
  • Adding new variables
  • Adding new children

Basically the rules of serialization/deserialization versioning apply also to workflows, which when you think about it makes sense.

Don’t fret though as this can be avoided without too much effort, so keep reading to find out how.

Incorporating Workflow into the Architecture

Very early on in my foray with Workflow Foundation I realised the power and flexibility it could to add to the application, specifically the ability to use different workflows for different users/accounts and changing workflows at run-time.

Because of this it was decided that the workflows should be treated as a first class citizen in our domain model and not just left as an abstract technical implementation of the business process.

Below is a simple class diagram which should give you a good idea of the approach taken.

ClassDiagram1

As you can see in the Order class there is the WorkflowInstanceId and WorkflowXaml.
The WorkflowInstanceId is there because it allows to identify the Workflow by using the OrderId meaning that you don’t have to expose the InstanceId in other parts of your application.

The WorkflowXaml is the raw definition of the workflow that gets copied from the WorkflowDefinition at instantiation time. The purpose of this is to allow the WorkflowDefinition Xaml to be changed without affecting the persisted workflows and solves the problem described above.

Arguably you could add a separate Class/Table which stores the active Xaml and InstanceId for the Order, but for simplicities sake I’ve just added them to the Order.

It should also be said the WorkflowInstanceId and WorkflowXaml can be cleared from the Order table once the workflow has completed.

This will all become clearer as you read on.

Generic Workflow Host

For the hosting I have come up with a GenericWorkflowHost which allows to start instances and resume from bookmarks. There’s not much that is very exciting in here although do take note of the following methods:

  • CreateActivityFrom
  • StartPersistableInsance
  • LoadInstanceWithBookmark

The key part is the XamlServices.Load method which allows to create an Activity using just the Xaml. This is important because this allows you to change the Xaml without re-compiling and deploying code.

using System;
using System.Activities;
using System.Activities.DurableInstancing;
using System.Activities.XamlIntegration;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Reflection;
using System.Runtime.DurableInstancing;
using System.Xaml;

public static class GenericWorkflowHost
{
    private static ConcurrentDictionary<Guid, WorkflowApplication> runningWorkflows;

    #region Private Helper Methods 

    private static Activity CreateActivityFrom(string xaml)
    {
        var sr = new StringReader(xaml);

        //Change LocalAssembly to where the Activities reside
        var xamlSettings = new XamlXmlReaderSettings
                               {LocalAssembly = Assembly.GetExecutingAssembly()};

        var xamlReader = ActivityXamlServices
            .CreateReader(new XamlXmlReader(sr, xamlSettings));

        var result = XamlServices.Load(xamlReader);

        var activity = result as Activity;

        return activity;
    }

    private static InstanceStore CreateInstanceStore()
    {
        var conn = ConfigurationManager.ConnectionStrings["WorkflowDbConn"].ConnectionString;

        var store = new SqlWorkflowInstanceStore(conn)
                        {
                            InstanceLockedExceptionAction = InstanceLockedExceptionAction.AggressiveRetry,
                            InstanceCompletionAction = InstanceCompletionAction.DeleteNothing,
                            HostLockRenewalPeriod = TimeSpan.FromSeconds(20),
                            RunnableInstancesDetectionPeriod = TimeSpan.FromSeconds(3)
                        };

        var handle = store.CreateInstanceHandle();

        var view = store.Execute(handle, new CreateWorkflowOwnerCommand(),
                                 TimeSpan.FromSeconds(60));

        store.DefaultInstanceOwner = view.InstanceOwner;

        handle.Free();

        return store;
    }

   
    #endregion

    public static void InvokeInstance(object input, string xaml, Guid instanceId)
    {
        var inputs = new Dictionary<string, object>();

        if (input != null)
        {
            inputs.Add(input.GetType().Name, input);
        }

        var wf = CreateActivityFrom(xaml);

        var activity = wf;

        WorkflowInvoker.Invoke(activity, inputs);
    }

    public static Guid StartPersistableInstance(IDictionary<string, object> inputs, string xaml)
    {
        if (runningWorkflows == null)
        {
            runningWorkflows = new ConcurrentDictionary<Guid, WorkflowApplication>();
        }


        var activity = CreateActivityFrom(xaml);

        var workflowApp = new WorkflowApplication(activity, inputs)
                              {
                                  InstanceStore = CreateInstanceStore(),
                                  PersistableIdle = OnIdleAndPersistable,
                                  Completed = OnWorkflowCompleted,
                                  Aborted = OnWorkflowAborted,
                                  Unloaded = OnWorkflowUnloaded,
                                  OnUnhandledException = OnWorkflowException
                              };

        workflowApp.Persist();

        var instanceId = workflowApp.Id;

        workflowApp.Run();

        runningWorkflows.TryAdd(instanceId, workflowApp);

        return workflowApp.Id;
    }

    public static bool LoadInstanceWithBookmark(string bookmarkName,
                                                Guid instanceId,
                                                object input,
                                                string xaml)
    {
        if (runningWorkflows == null)
        {
            runningWorkflows = new ConcurrentDictionary<Guid, WorkflowApplication>();
        }

        BookmarkResumptionResult result;

        if (runningWorkflows.ContainsKey(instanceId))
        {
            var workflow = runningWorkflows[instanceId];
            workflow.Completed = OnWorkflowCompleted;
            workflow.PersistableIdle = OnIdleAndPersistable;

            result = workflow.ResumeBookmark(bookmarkName, input, TimeSpan.FromSeconds(60));

            Console.WriteLine(instanceId + " resumed @ " + bookmarkName);
        }
        else
        {
            // Setup the persistance
            var store = CreateInstanceStore();

            var activity = CreateActivityFrom(xaml);

            var application = new WorkflowApplication(activity)
                                  {
                                      InstanceStore = store,
                                      Completed = OnWorkflowCompleted,
                                      Unloaded = OnWorkflowUnloaded,
                                      PersistableIdle = OnIdleAndPersistable,
                                  };

            application.Load(instanceId, TimeSpan.FromSeconds(60));

            result = application.ResumeBookmark(bookmarkName, input, TimeSpan.FromSeconds(60));

            runningWorkflows.TryAdd(instanceId, application);

            Console.WriteLine(instanceId + " resumed @ " + bookmarkName);
        }

        return result == BookmarkResumptionResult.Success;
    }

    public static void UnloadInstance(Guid instanceId)
    {
        if (!runningWorkflows.ContainsKey(instanceId))
        {
            return;
        }

        var workflow = runningWorkflows[instanceId];
        workflow.Unload();

        runningWorkflows.TryRemove(instanceId, out workflow);
    }

    #region Events 

    public static void OnWorkflowCompleted(WorkflowApplicationCompletedEventArgs e)
    {
        if (runningWorkflows != null && runningWorkflows.ContainsKey(e.InstanceId))
        {
            WorkflowApplication workflowApp;
            runningWorkflows.TryRemove(e.InstanceId, out workflowApp);
        }

        Console.WriteLine(e.CompletionState);
    }

    public static PersistableIdleAction OnIdleAndPersistable(WorkflowApplicationIdleEventArgs e)
    {
        return PersistableIdleAction.Unload;
    }

    public static void OnWorkflowAborted(WorkflowApplicationAbortedEventArgs e)
    {
        Console.WriteLine(e.Reason);
    }

    public static void OnWorkflowUnloaded(WorkflowApplicationEventArgs e)
    {
        if (runningWorkflows != null && runningWorkflows.ContainsKey(e.InstanceId))
        {
            WorkflowApplication workflowApp;
            runningWorkflows.TryRemove(e.InstanceId, out workflowApp);
        }

        Console.WriteLine(e.InstanceId + " unloaded");
    }

    public static UnhandledExceptionAction OnWorkflowException(WorkflowApplicationUnhandledExceptionEventArgs e)
    {
        //log the exception here using e.UnhandledException 
        return UnhandledExceptionAction.Terminate;
    }

    #endregion
}

As you may have noticed the GenericWorkflowHost is a static class so I choose to abstract this away behind an interface so that it’s nice and Unit-Test friendly:

   public interface IOrderApprovalWorkflowHost
   {
       void Resume(Guid instanceId, string xaml, int userId, bool isApproved);

       Guid Start(int orderId, string xaml);
   }

Which is implemented like so:

    public class OrderWorkflowApprovalHost : IOrderApprovalWorkflowHost
    {
        public void Resume(Guid instanceId, string xaml, int userId, bool isApproved)
        {
            var bookmark = string.Format("waitingFor_{0}", userId);

            GenericWorkflowHost.LoadInstanceWithBookmark(bookmark, instanceId, isApproved, xaml); 
        }

        public Guid Start(int orderId, string xaml)
        {
            var inputs = new Dictionary<string, object>()
                             {
                                 {"OrderId", orderId}
                             };  
            return GenericWorkflowHost.StartPersistableInstance(inputs, xaml);
        }

    }

 

Starting the Workflow Instance

Now that we have our workflow host it’s time to start the workflow. I like to abstract any workflow calls behind a service interface that the other layers can consume without the hard-dependency on the workflow libraries.

    public interface IOrderApprovalService
    {
        void SubmitApproval(int orderId, int userId, bool isApproved);

        void StartApprovalWorkflow(int orderId); 
    }

The StartApprovalWorkflow method is implemented like so:

        public void StartApprovalWorkflow(int orderId)
        {
            using (var unitOfWork = unitOfWorkFactory.Create())
            {
                var order = orderRepository.Get(orderId);

                var workflow = workflowRepository.Get(1); // this id should come from configuration 
                var instanceId = orderApprovalWorkflowHost.Start(orderId, workflow.Xaml);

                order.StartWorkflow(workflow, instanceId);// this sets the workflow xaml & instanceid

                unitOfWork.Commit(); 

            }
        }

All we do here is load the Order & workflow from their respective repositories and then start the workflow and assign the workflow to the order. All pretty simple stuff.

Resuming the Workflow Instance

Resuming the instance is as easy as loading the Order and then calling the OrderApprovalWorkflowHost passing the persisted Xaml & InstanceId that was saved when starting the workflow.

        public void SubmitApproval(int orderId, int userId, bool isApproved)
        {
            var order = orderRepository.Get(orderId);

            orderApprovalWorkflowHost
                .Resume(order.WorkflowInstanceId, order.WorkflowXaml, userId, isApproved);
        }

 

Conclusion

The approach I’ve outlined in this post has some major benefits:

  • Ability to change the workflow definition without code deployment
  • Ability to change workflow definition without invalidating persisted instances
  • Provides a simple way to view active instances

When you want to change the logic flow in a persisted instance the simplest approach is either to force the workflow to end and then restart it or to just rollback any state changes the workflow has done and simply restart a new instance then delete the old instance. This needs to be carefully thought out though especially if any of the activities send emails.

Well that’s all I’ve got for this post, I’m really keen to hear any feedback and as always I’d be happy to answer any questions.