Regular Expression / Regex Classes and Objects in .Net
written by Jake Bricker with a few modifications by FLF
11-17-13Ver 2
NOTE to CIS 3309 Fall 2013 Students -- Use these ideas and Regex to validate bank customer account numbers [consists of letters (upper or lower case) and numbers, starting with an upper case letter], length 8 or more characters) and Pin (exactly 4 digits 0-9). You will need to investigate how to specify the length of a target string.
Examples: Fluffie42, 1234
You might want to examine the examples in this document while looking at the explanations. Particularly look at the examples at the end of the document.
Regex class information taken from MSDN Entry of stated class
Url:
Link to Regular Expression Wikipedia Article:
RegexClass and Objects
The Regex class represents the .NET Framework’s regular expression engine. It can be used to quickly parse large amounts of text to find specific character patterns, validate strings, to extract, edit, replace or delete text substrings, and to add the extracted strings to a collection to generate a report. All Regex pattern identification methods include both static (Shared in Visual Basic) and instance overloads. Static methods can be invoked without first creating an object of the specified type (see Example 1). When an instance is created, it is immutable -- that is its expression value cannot be changed.
Inheritance Hierarchy
- System.Object
- System.Text.RegularExpressions.Regex
Example 1 - How I Used A Regular Expression In My Teams Slot Machine Game:
publicboolValidateName(string name)
{
// @" indicates that the string to be validated must verbatim match the
// matching "rules" in the regex string
// ^ match must start at beginning of string to be validated
// [a-zA-Z] following characters must be all upper/lower case letters
// $ match continues (in our case) to end of string or end of line
// Cancall IsMatch as shown below, or use the static overload as Jake did
//Regexrgx = newRegex(pattern, RegexOptions.IgnoreCase);
//boolvalidName = rgx.IsMatch(name, @"^[a-zA-Z]+$");
boolvalidName = Regex.IsMatch(name, @"^[a-zA-Z]+$");
if (validName == false) {
MessageBox.Show("Invalid Name: Name must contain A-Z or a-z", "Name Entry Error");
returnfalse;
} //end if
returntrue;
} //end validate name
Example 2:
publicboolValidateName(string name)
{
boolvalidName;
// String variable to hold the string we will use to check for valid characters
/*
* ^ -->anchor stating that the match must start at the beginning of the string to
* validate against
* []-->matches any single character contained within; anything that falls in between
* these brackets is called a character group and by default is case-sensitive
* the - in between a-z and A-Z designates a character span; span is determined
* in Unicode
* + -->characters in target string must match the previous element one or more times
* $ -->the match must occur at the end of the string or before the \n character at
* the end of the string
*/
stringcharacterMatch = "^[a-zA-Z]+$";
// Calling the booleanIsMatch methodin the Regex class
validName = Regex.IsMatch(name, @”characterMatch”);
if (validName == false){
MessageBox.Show("Invalid Name: Name must be A-Z or a-z", "Name Entry Error");
returnfalse;
}//end if
/*If the pattern matches 100% of the characters in the name string then the method
* will fall through the if statement and return true.
* If the pattern DOES NOT match 100% of the characters in the name string, IsMatch
* will return false
*/
returntrue;
}//end ValidateName
Regex (Regular) Expressions - rules for the formulation of Well-Formed Formulas (valid expessions):
The ‘@’ character concatenated to the string before the“ character uses verbatim string syntax, which designates the syntax we can use in the pattern.
Most Likely Character Escape Characters:
The backslash character ‘\’ in a regular expression indicated that the character that follows it is either a special character, or should be interpreted literally.
\t—matches a tab \n—matches a new line
\r—matches a carriage return \e—matches an escape
Most Likely Character Groupings (also called Classes but not in the same vein as our use of the term):
A character class matches any one of a set of characters
- [character_group]—Matches any single character in character_group. By default, the match is case-sensitive
- [^character_group]—Matches any single character that is NOT in character_group. By default, characters in character_gruop are case-sensitive
- [first-last]—Character range: Matches any single character in the range from first to last inclusive; one may define more than one range inside the brackets
- \s—Matches any white-space character
- \S—Matches any non-white-space character
Note that "white-space is defined as the following set of characters
{blank, vertical tab, horizontal tab, form feed, newline, carriage return}
or { ' ' , '\v', '\t' '\f', '\n' , '\r' }
Most Likely Anchors:
Anchors, or atomic zero-width assertions, cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume character. (In other words, these symbols simply describe where a match must occur, at the beginning of a string, at the end, or where the last match ended.)
- ^—The match must start at the beginning of the string or line
- $—The match must occur at the end of the string or before \n at the end of the line or string
- \G—The match must occur at the point where the previous match ended
Link to Regular Expression Language—Quick Reference(MSDN)—
How to Find Repeated occurrences of words with a Regular Expression Object (OPTIONAL):
Pattern / Description\b / Start the match at a word boundary.
(?<word>\w+) / Match one or more word characters up to a word boundary. Name this captured groupword.
\s+ / Match one or more white-space characters.
(\k<word>) / Match the captured group that is namedword.
\b / Match a word boundary.
Example 3:
using System;
usingSystem.Text.RegularExpressions;
publicclass Test
{
publicstaticvoid Main ()
{
// Define a regular expression for repeated words.
Regexrx = newRegex(@"\b(?<word>\w+)\s+(\k<word>)\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Define a test string.
string text = "The the quick brown fox fox jumped over the lazy dogdog.";
// Find matches.
MatchCollection matches = rx.Matches(text);
// Report the number of matches found.
Console.WriteLine("{0} matches found in:\n {1}", matches.Count,text);
// Report on each match.
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
Console.WriteLine("'{0}' repeated at positions {1} and {2}",
groups["word"].Value,
groups[0].Index,
groups[1].Index);
}
}
// The example produces the following output to the console:
// 3 matches found in:
// The the quick brown fox fox jumped over the lazy dog dog.
// 'The' repeated at positions 0 and 4
// 'fox' repeated at positions 20 and 25
// 'dog' repeated at positions 50 and 54
Example taken from MSDN Example—use of Regular Expressions
More Examples of Regex Use:
Complete Set of Jake Bricker’s Examples – Updated Sunday 11-17 11:38AM
// Validate target string with all digits
publicboolIsValidNumber(stringnumberToValidate)
{
// boolvalidNumber = Regex.IsMatch(numberToValidate, @"^\d+$");
// This will not work, why
// boolvalidNumber = Regex.IsMatch(numberToValidate, @"^[0-9]+$");
boolvalidNumber = Regex.IsMatch(numberToValidate, @"^[0123456789]+$");
if (validNumber == false)
{
MessageBox.Show("Invalid Number: Please Try Again.");
returnfalse;
} //end if
else
{
MessageBox.Show("Valid Number");
} //end if-else
returntrue;
} //end validateNumber
// Validate target string with one or more letters (English Characters)
publicboolIsValidEnglishCharacters(string name)
{
boolvalidName = Regex.IsMatch(name, @"^[a-zA-Z]+$");
// Note that putting [A-Z] immediately after the ^ should force an upper case
// letter to have to come first
if (validName == false)
{
MessageBox.Show("Invalid Name: Name must be A-Z or a-z", "Name Entry Error.");
returnfalse;
} //end if-then
else
{
MessageBox.Show("Valid name.");
} // end if-else
returntrue;
}//end IsValidName
// Validate target string of alphanumeric characters (letters and numbers)
publicboolCheckForArabicNumbersAndEnglishLetters(stringstringToCheck)
{
//boolvalidMatch = Regex.IsMatch(stringToCheck, @"^[\w]+$");
boolvalidMatch = Regex.IsMatch(stringToCheck, @"^[a-zA-Z0-9]+$");
if (validMatch == false)
{
MessageBox.Show("Invalid Entry:" + stringToCheck + " must be A-Z or a-z or 0-9");
returnfalse;
} //end if-then
else
{
MessageBox.Show("Valid English Characters and Arabic Numbers");
} // end if-else
returntrue;
} //end CheckForArabicNumbersAndEnglishLetters
// Validate a target string of length less than or equal to n = stringLength
// Try .{8,12} to limit a string to 8-12 characters
// Try .{4,4} to limit a string to exactly 4 characters
publicboolCheckStringLength(stringstringToCheck, intstringLength)
{
//converts the stringLength Integer to a string
stringstringLengthAsString = stringLength.ToString();
//building the string to be used in the Regex.IsMatch function
stringmatchPattern = @".{" + stringLength.ToString() + "}";
//the length can also be explicitly stated...uncomment this line and comment out the
// previous line and you will get the same result
//boolvalidStringLength = Regex.IsMatch(stringToCheck, @".{3}");
boolvalidStringLength = Regex.IsMatch(stringToCheck, matchPattern);
if (validStringLength == false)
{
MessageBox.Show("Invalid String length" + "\n" + "String must be " +
stringLengthAsString.ToString() + " characters long");
returnfalse;
} //end if-then
else
{
MessageBox.Show("Valid string length");
}//end if-else
returntrue;
} //end CheckStringLength
Regex Document by Jake Bricker Version 2 10/2/2018 2:29 PM Page 5