Regular Expressions in .NET Regular Expressions in .NET By: Nasser - - PowerPoint PPT Presentation

regular expressions in net regular expressions in net
SMART_READER_LITE
LIVE PREVIEW

Regular Expressions in .NET Regular Expressions in .NET By: Nasser - - PowerPoint PPT Presentation

Regular Expressions in .NET Regular Expressions in .NET By: Nasser Alshammari College of Science, Department of Computer Science Old Dominion University Outline Outline Regular Expressions? Why do we need them? RE language RE


slide-1
SLIDE 1

Regular Expressions in .NET Regular Expressions in .NET

By: Nasser Alshammari

College of Science, Department of Computer Science Old Dominion University

slide-2
SLIDE 2

Outline Outline

 Regular Expressions?  Why do we need them?  RE language  RE in .NET  Conclusion

slide-3
SLIDE 3

Regular Expressions? Regular Expressions?

 Regular Expressions (regex) are a special

string that describes a search pattern.

 Provide flexible and easy string matching.  Syntax:

 POSIX BRE  POSIX ERE  POSIX character classes

 Example:

 \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

slide-4
SLIDE 4

Why do we need them? Why do we need them?

 Security

 Buffer overflow attack.  SQL injection attack.  Cross-site scripting attack.

 Easy and powerful  Flexibility

slide-5
SLIDE 5

RE language RE language

 Metacharacters: a character that has a special

meaning.

. Any single character. a.c matches abc [ ] A single character contained within the brackets. [abc] matches ”a”, ”b”, or ”c” [a-z] matches any characater in this range [^ ] A single character that is not contained within the brackets. [^abc] matches any character other than ”a”, ”b”, or ”c”. ^ Starting position of the string $ Ending position of the string BRE \( \) ERE ( ) Defines a group or subexpression *

Matches the preceding element zero or more times

ab*c matches ”ac”, ”abc”, ”abbc”, etc BRE \{m,n\} ERE {m,n}

Matches the preceding element at least m and not more than n times a{3,5} matches only ”aaa”, ”aaaa”, and ”aaaaa”

slide-6
SLIDE 6

Continue. Continue.

 More Examples

.at matches any three-character string ending with "at", including "hat", "cat", and "bat". [hc]at matches "hat" and "cat". [^b]at matches all strings matched by .at except "bat". ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line. [hc]at$ matches "hat" and "cat", but only at the end of the string or line. \[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".

slide-7
SLIDE 7

Continue. Continue.

 POSIX ERE Metacharacters  Examples

? Matches the preceding element zero or one time. For example, ba? matches "b" or "ba". + Matches the preceding element one or more times. For example, ba+ matches "ba", "baa", "baaa", and so on. | The choice (aka alternation or set union) operator matches either the expression before or the expression after the

  • perator.

[hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", and so on, but not "at". [hc]?at matches "hat", "cat", and "at". cat | dog matches "cat" or "dog".

slide-8
SLIDE 8

Continue. Continue.

 POSIX character classes

Many ranges of characters depend on the locale settings (i.e., in some settings letters are organized as abc...zABC...Z, while in some others as aAbBcC...zZ)

POSIX .NET ASCII Description [:alnum:] [A-Za-z0-9] Alphanumeric characters [:word:] \w [A-Za-z0-9_] Alphanumeric characters plus "_" [:alpha:] [A-Za-z] Alphabetic characters [:blank:] [ \t] Space and tab [:digit:] \d [0-9] Digits [:space:] \s [ \t\r\n\v\f] Whitespace characters

slide-9
SLIDE 9

RE in .NET RE in .NET

 .NET's regex flavor is feature-rich.

 RegularExpressionValidator Server Control.  System.Text.RegularExpressions Namespace.  Examples:

^[a-zA-Z''-'\s]{1,40}$ Ravi Mukkamala O'Dell Allows uppercase or lowercase letters

  • r whitespaces up to 40 characters.

^\d{3}-\d{2}-\d{4}$ 000-11-2222 Allows only digits separated by hyphens. ^\d+$ 123 Positive integer greater than zero. ^\d+(\.\d\d)?$ 5.00 Positive currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point. For example, 3.00 is valid but 3.1 is not. ^(-)?\d+(\.\d\d)?$

  • 2.44

Positive or negative currency.

slide-10
SLIDE 10

Continue. Continue.

 Using the RegularExpressionValidator Server Control

<%@ language="C#" %> <form ID="form1" runat="server"> <asp:TextBox ID="txtName" runat="server"/> <asp:Button ID="btnSubmit" runat="server" Text="Submit" /> <asp:RegularExpressionValidator ID="regexpName" runat="server" ErrorMessage="This expression does not validate." ControlToValidate="txtName" ValidationExpression="^[a-zA-Z'.\s]{1,40}$" /> </form>

slide-11
SLIDE 11

Continue. Continue.

 Using System.Text.RegularExpressions  The Regex class

 IsMatch: returns true if a match is found.  Match: returns a Match object if a match is found.  Matches: returns a MatchCollection object.  Replace: replaces a matched string with another.  Split: splits a string according to regex and retuens String[].

// Instance method: Regex reg = new Regex(@"^[a-zA-Z'.]{1,40}$"); Response.Write(reg.IsMatch(txtName.Text)); // Static method: if (!Regex.IsMatch(txtName.Text, @"^[a-zA-Z'.]{1,40}$")) { // Name does not match }

slide-12
SLIDE 12

Continue. Continue.

 Regex.Replace

public class Example { public static void Main() { string input = "This is text with far too much " + "whitespace."; string pattern = "\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); Console.WriteLine("Original String: {0}", input); Console.WriteLine("Replacement String: {0}", result); } // The example displays the following output: // Original String: This is text with far too much whitespace. // Replacement String: This is text with far too much whitespace.

slide-13
SLIDE 13

Continue. Continue.

 Regex.Split

string input = @"07/14/2007"; string pattern = @"(-)|(/)"; Regex regex = new Regex(pattern); foreach (string result in regex.Split(input)) { Console.WriteLine("'{0}'", result); } // Under .NET 1.0 and 1.1, the method returns an array of // 3 elements, as follows: // '07' // '14' // '2007' // Under .NET 2.0, the method returns an array of // 5 elements, as follows: // '07' // '/' // '14' // '/' // '2007'

slide-14
SLIDE 14

Continue. Continue.

 Comments

Regex regex = new Regex(@" ^ # anchor at the start (?=.*\d) # must contain at least one numeric character (?=.*[a-z]) # must contain one lowercase character (?=.*[A-Z]) # must contain one uppercase character .{8,10} # From 8 to 10 characters in length \s # allows a space $ # anchor at the end", RegexOptions.IgnorePatternWhitespace);

slide-15
SLIDE 15

Continue. Continue.

 Match class: immutable and has no public

constructor.

// Search for a pattern that is not found in the input string. string pattern = "dog"; string input = "The cat saw the other cats playing in the back yard."; Match match = Regex.Match(input, pattern); if (match.Success ) // Report position as a one-based integer. Console.WriteLine("'{0}' was found at position {1} in '{2}'.", match.Value, match.Index + 1, input); else Console.WriteLine("The pattern '{0}' was not found in '{1}'.",pattern, input);

slide-16
SLIDE 16

Continue. Continue.

 MatchCollection class: immutable and has no

public constructor.

static void Main(string[] args) { string input = "The quick brown dog jumps over the lazy dog"; string pattern = "dog"; MatchCollection matches = Regex.Matches(input, pattern); foreach (Match match in matches) Console.WriteLine("'{0}' found at position '{1}'", match.Value, match.Index); }

Output:

'dog' found at position '16' 'dog' found at position '40'

slide-17
SLIDE 17

Continue. Continue.

 More Examples

// Create a Regex that accepts all URLs containing the host fragment www.odu.edu. Regex myRegex = new Regex(@"http://www\.odu\.edu/.*"); //a WebPermission that gives permissions to all the hosts containing the same host fragment. WebPermission myWebPermission = new WebPermission(NetworkAccess.Connect, myRegex); // Check whether all callers higher in the call stack have been granted the permission. myWebPermission.Demand();

slide-18
SLIDE 18

Conclusion Conclusion

 Regex is your friend, use it.  Regex offers additional security measures.  .NET supports many of regex features.

Thank You

slide-19
SLIDE 19

References References

 http://www.regular-expressions.info  http://msdn.microsoft.com