Regular Expressions C++0x Sources
C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk - - PowerPoint PPT Presentation
C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk - - PowerPoint PPT Presentation
Regular Expressions C++0x Sources C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Kbenhavns Universitet Maj 16, 2008 Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or
Regular Expressions C++0x Sources
Regular Expressions
Regular Expression, regex or regexp for short. ”A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern.”
❼ "regex"
(simple regex matching the text ”regex”)
❼
The set of metacharacters, operators and other features are usually called a regex flavor.
Regular Expressions C++0x Sources
Regular Expressions
Regular Expression, regex or regexp for short. ”A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern.”
❼ "regex"
(simple regex matching the text ”regex”)
❼ "[-+]?([0-9]*.[0-9]+|[0-9]+)"
(simple regular expression matching... what?) The set of metacharacters, operators and other features are usually called a regex flavor.
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ ❼ ❼ ❼ ❼
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ GNU BRE / ERE, GNU extensions of the standard used in
GNU tools such as grep.
❼ ❼ ❼ ❼
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ GNU BRE / ERE, GNU extensions of the standard used in
GNU tools such as grep.
❼ The languages D, Haskell, .NET, Java, ECMA
(JavaScript), Python, Ruby all have their own flavors.
❼ ❼ ❼
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ GNU BRE / ERE, GNU extensions of the standard used in
GNU tools such as grep.
❼ The languages D, Haskell, .NET, Java, ECMA
(JavaScript), Python, Ruby all have their own flavors.
❼ The languages Perl and Tcl has their own flavors as build
in language constructs.
❼ ❼
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ GNU BRE / ERE, GNU extensions of the standard used in
GNU tools such as grep.
❼ The languages D, Haskell, .NET, Java, ECMA
(JavaScript), Python, Ruby all have their own flavors.
❼ The languages Perl and Tcl has their own flavors as build
in language constructs.
❼ Libraries such as PCRE (used in PHP), Boost.Regex,
Boost.Xpressive, QT/QRegExp each their own flavor.
❼
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Flavors
There exist 15+ popular regex flavours in various languages and tools of which only two are standardized:
❼ The POSIX Standard Basic Regex / Extended Regex. ❼ GNU BRE / ERE, GNU extensions of the standard used in
GNU tools such as grep.
❼ The languages D, Haskell, .NET, Java, ECMA
(JavaScript), Python, Ruby all have their own flavors.
❼ The languages Perl and Tcl has their own flavors as build
in language constructs.
❼ Libraries such as PCRE (used in PHP), Boost.Regex,
Boost.Xpressive, QT/QRegExp each their own flavor.
❼ And the list goes on...
How are these these tasty flavours implemented?
Regular Expressions C++0x Sources
Implementations
Basicly all the different flavours are implemented with a NFA (non-deterministic finite automaton) or DFA. Machine size of M character expression, pattern recognition complexity for an N character sequence of S states. Algo Machine size Complexity DFA O(2M) O(N) bit-par non-backtracking NFA O(M) ∨ (2M) O(1 + (S/B))N) non-backtracking NFA O(M) ∨ (2M) O(SN) backtracking NFA O(M) O(2N) Currently many different implementations for C++ exist, some being procedural others object oriented. Supporting various different flavours, but most are simply object oriented wrappers for c libraries.
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ ❼ ❼ ❼ ❼ ❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ ❼ ❼ ❼ ❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ Localization features of POSIX is required since ECMA is not
capable of localization.
❼ ❼ ❼ ❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ Localization features of POSIX is required since ECMA is not
capable of localization.
❼ Performance is low, due to rich expression features. ❼ ❼ ❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ Localization features of POSIX is required since ECMA is not
capable of localization.
❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ ❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ Localization features of POSIX is required since ECMA is not
capable of localization.
❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of
expressions and stopping them.
❼
Regular Expressions C++0x Sources
Flavor
The regex support as of TR1 is an extension of std based on Boost.regex, with the following proposed changes/consequences:
❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed
syntax.
❼ Localization features of POSIX is required since ECMA is not
capable of localization.
❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of
expressions and stopping them.
❼ Customizing the expression syntax with trait classes. Nice!
Regular Expressions C++0x Sources
Implementation
A full implementation in C++, not a wrapper! Available in the header file <regex> Representation:
❼ basic regex, holder of expressions, looks like a
basic string.
❼ match results, iterator of match results
Methods:
❼ ❼ ❼
Regular Expressions C++0x Sources
Implementation
A full implementation in C++, not a wrapper! Available in the header file <regex> Representation:
❼ basic regex, holder of expressions, looks like a
basic string.
❼ match results, iterator of match results
Methods:
❼ bool regex match(basic string, basic regex) ❼ bool regex search(basic string, match results,
basic regex)
❼ basic string regex replace(basic string,
basic regex, basic string )
Regular Expressions C++0x Sources
C++0x Example
#i n c l u d e <s t d l i b . h> #i n c l u d e <regex> #i n c l u d e <s t r i n g > #i n c l u d e <iostream> using namespace std ; regex e x p r e s s i o n ( ”([0−9]+)(\\−| | ✩ ) ( . ✯ ) ” ) ; // p r o c e s s f t p :
- n
s u c c e s s r e t u r n s the f t p r espo nse code , and f i l l s // msg with the f t p r espo nse message . i n t p r o c e s s f t p ( const char✯ response , std : : s t r i n g ✯ msg) { cmatch what ; i f ( regex match ( response , what , e x p r e s s i o n )) { // what [ 0 ] c o n t a i n s the whole s t r i n g // what [ 1 ] c o n t a i n s the r espo nse code // what [ 2 ] c o n t a i n s the s e p a r a t o r c h a r a c t e r // what [ 3 ] c o n t a i n s the t e x t message . i f (msg) msg− >a s s i g n ( what [ 3 ] . f i r s t , what [ 3 ] . second ) ; return std : : a t o i ( what [ 1 ] . f i r s t ) ; } // f a i l u r e did not match i f (msg) msg− >e r a s e ( ) ; return −1; }
How is C++0x different from C++?
Regular Expressions C++0x Sources
C++ Example
#i n c l u d e <s t d l i b . h> #i n c l u d e <boost / regex . hpp> #i n c l u d e <s t r i n g > #i n c l u d e <iostream> using namespace boost ; regex e x p r e s s i o n ( ”([0−9]+)(\\−| | ✩ ) ( . ✯ ) ” ) ; // p r o c e s s f t p :
- n
s u c c e s s r e t u r n s the f t p r espo nse code , and f i l l s // msg with the f t p r espo nse message . i n t p r o c e s s f t p ( const char✯ response , std : : s t r i n g ✯ msg) { cmatch what ; i f ( regex match ( response , what , e x p r e s s i o n )) { // what [ 0 ] c o n t a i n s the whole s t r i n g // what [ 1 ] c o n t a i n s the r espo nse code // what [ 2 ] c o n t a i n s the s e p a r a t o r c h a r a c t e r // what [ 3 ] c o n t a i n s the t e x t message . i f (msg) msg− >a s s i g n ( what [ 3 ] . f i r s t , what [ 3 ] . second ) ; return std : : a t o i ( what [ 1 ] . f i r s t ) ; } // f a i l u r e did not match i f (msg) msg− >e r a s e ( ) ; return −1; }
It’s regex replace(”std”, sourceCode, ”boost”) different..
Regular Expressions C++0x Sources