Extracting data with regular expressions in Microsoft Dynamics CRM 2011 processes

In last week’s blog post, I showed how to use regular expressions in Microsoft Dynamics CRM 2011 processes for data validation. Today I'll show you how you can use a similar approach to parse text and extract matching strings using regular expressions in a Dynamics CRM 2011 process.

If you're unfamiliar with regular expressions, I recommend you take a look at my earlier post for an idea of how they work. You may also want to take a look at this MSDN article about regular expressions work in .Net.

In my validation example, I used a custom workflow activity to do the following:

  1. Accept a string to validate and a regular expression match pattern via input arguments.
  2. Evaluate the regular expression.
  3. Return the result of the match via an output argument.

For extracting matching data, we will need a custom workflow activity that does this instead:

  1. Accept the following input arguments:
    • A string to parse for matches
    • A regular expression match pattern to evaluate
    • A parameter that specifies whether to return the first match, last match or all matches. (The .Net regular expression engine creates an array of matched strings, so we have some flexibility in what we can return.)
    • A string separator to use in a concatenated string if all matches are returned
  2. Evaluate the regular expression.
  3. Return the resulting match(es) via an output argument.

The code

First we set up our input and output arguments like so:

/// <summary>
/// This is the string that will be parsed
/// </summary>
[Input("String to parse")]
public InArgument<String> StringToParse { get; set; }
/// <summary>
/// This is the regular expression used for extraction - see C# regex quick reference here - http://msdn.microsoft.com/en-us/library/az24scfc.aspx
/// </summary>
[Input("Match pattern")]
public InArgument<String> MatchPattern { get; set; }
/// <summary>
/// This specifies whether to return the first match, the last match or all matches (in a concatenated string). Acceptable values are First|Last|All.
/// </summary>
[Input("Return type")]
public InArgument<String> ReturnType { get; set; }
/// <summary>
/// If "All" is specified as extraction type, this used as a string separator in the concatenated extract string
/// </summary>
[Input("String separator")]
public InArgument<String> StringSeparator { get; set; }
/// <summary>
/// This returns the matching string if one is found or an empty string if a match is not found
/// </summary>
[Output("Extracted string")]
public OutArgument<string> ExtractedString { get; set; }

Next we create an Enum type to represent the extraction type (first|last|all).

enum ExtractionType
{
	First,
	Last,
	All
}

Then we get all of our inputs ready to use:

//string to parse
string parseString = StringToParse.Get(executionContext);
//pattern to match
string matchPattern = MatchPattern.Get(executionContext);
//type of match to be returned - first match, last match or all matches
ExtractionType extractType;
switch (ReturnType.Get(executionContext).ToUpperInvariant())
{
	case "FIRST":
		extractType = ExtractionType.First;
		break;
	case "LAST":
		extractType = ExtractionType.Last;
		break;
	case "ALL":
		extractType = ExtractionType.All;
		break;
	default:
		//default will return first match only
		extractType = ExtractionType.First;
		break;
}
//separator to be used for an "all" match
string stringSeparator = StringSeparator.Get(executionContext);

Finally we use this method to handle the actual matching and string extraction:

/// <summary>
/// method to evaluate a regular expression and return the first match, last match or all matches in a concatenated string
/// </summary>
/// <param name="parseString">string to parse</param>
/// <param name="matchPattern">regular expression to evaluate</param>
/// <param name="extractType">match(es) to return - first|last|all</param>
/// <param name="separator">string to use as a separator for an "all" match return type</param>
/// <returns></returns>
private string ExtractMatchingString(string parseString, string matchPattern, ExtractionType extractType, string separator)
{
	//set the default output to the empty string. if we match something, we'll change it.
	string output = string.Empty;
	//do the regex match
	MatchCollection matches = Regex.Matches(parseString, matchPattern);
	if (matches.Count > 0)
	{
		//which match(es) should we return?
		switch (extractType)
		{
			case ExtractionType.First:
				output = matches[0].Value;
				break;
			case ExtractionType.Last:
				output = matches[matches.Count - 1].Value;
				break;
			case ExtractionType.All:
				StringBuilder matchingSb = new StringBuilder();
				for (int i = 0; i < matches.Count; i++)
				{
					matchingSb.Append(matches[i].Value);
					if (i != matches.Count - 1)
					{
						matchingSb.Append(separator);
					}
				}
				output = matchingSb.ToString();
				break;
		}
	}
	return output;
}

Here's how we call it and return the result to the calling process:

//evaluate the regex and return the match(es)
try
{
	string extractedString = ExtractMatchingString(parseString, matchPattern, extractType, stringSeparator);
	ExtractedString.Set(executionContext, extractedString);
}
catch (Exception e)
{
	tracingService.Trace("Exception: {0}", e.ToString());
	throw;
}

The full code is available here.

Seeing it in action

I set up a simple dialog that takes a user input and then parses it for U.S. phone numbers to display to the user. The dialog calls the custom workflow activity three times to get the first match, last match and all matches. Here's how I configured the inputs for the "all" scenario:
custom-activity-inputs.PNG

When I run the dialog, here's the first page where the input is entered:
dialog-0.PNG

And here's what the dialog displays to the user after the matching is complete. You'll see each type of match string is displayed on a separate line:
dialog-1.PNG

Could you see your organization using this to improve its Dynamics CRM business processes? What sorts of efficiencies do you think regular expression string extraction could help you achieve? Let us know in the comments!

A version of this post was originally published on the HP Enterprise Services Application Services blog.

comments powered by Disqus