c 0x
play

C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk - PowerPoint PPT Presentation

Regular Expressions C++0x Sources C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Kbenhavns Universitet Maj 16, 2008 Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or


  1. Regular Expressions C++0x Sources C++0x Regular Expressions Simon Andreas Frimann Lund Datalogisk Institut Københavns Universitet Maj 16, 2008

  2. ❼ Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or regexp for short. ” A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern. ” ❼ "regex" (simple regex matching the text ”regex”) The set of metacharacters, operators and other features are usually called a regex flavor.

  3. Regular Expressions C++0x Sources Regular Expressions Regular Expression, regex or regexp for short. ” A set of characters, metacharacters, and operators that define a string or group of strings in a search pattern. ” ❼ "regex" (simple regex matching the text ”regex”) ❼ "[-+]?([0-9]*.[0-9]+|[0-9]+)" (simple regular expression matching... what?) The set of metacharacters, operators and other features are usually called a regex flavor.

  4. ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . How are these these tasty flavours implemented?

  5. ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . How are these these tasty flavours implemented?

  6. ❼ ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. How are these these tasty flavours implemented?

  7. ❼ ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. How are these these tasty flavours implemented?

  8. ❼ Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. ❼ Libraries such as PCRE (used in PHP), Boost.Regex, Boost.Xpressive, QT/QRegExp each their own flavor. How are these these tasty flavours implemented?

  9. Regular Expressions C++0x Sources Flavors There exist 15+ popular regex flavours in various languages and tools of which only two are standardized: ❼ The POSIX Standard Basic Regex / Extended Regex . ❼ GNU BRE / ERE, GNU extensions of the standard used in GNU tools such as grep . ❼ The languages D, Haskell, .NET, Java, ECMA (JavaScript), Python, Ruby all have their own flavors. ❼ The languages Perl and Tcl has their own flavors as build in language constructs. ❼ Libraries such as PCRE (used in PHP), Boost.Regex, Boost.Xpressive, QT/QRegExp each their own flavor. ❼ And the list goes on... How are these these tasty flavours implemented?

  10. Regular Expressions C++0x Sources Implementations Basicly all the different flavours are implemented with a NFA (non-deterministic finite automaton) or DFA. Machine size of M character expression, pattern recognition complexity for an N character sequence of S states. Algo Machine size Complexity O (2 M ) DFA O ( N ) O ( M ) ∨ (2 M ) bit-par non-backtracking NFA O (1 + ( S/B )) N ) O ( M ) ∨ (2 M ) non-backtracking NFA O ( SN ) O (2 N ) backtracking NFA O ( M ) Currently many different implementations for C++ exist, some being procedural others object oriented. Supporting various different flavours, but most are simply object oriented wrappers for c libraries.

  11. ❼ ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax.

  12. ❼ ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax.

  13. ❼ ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization.

  14. ❼ ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features.

  15. ❼ ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees.

  16. ❼ Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of expressions and stopping them.

  17. Regular Expressions C++0x Sources Flavor The regex support as of TR1 is an extension of std based on Boost.regex , with the following proposed changes/consequences: ❼ Default ECMAScript syntax. ❼ Optional support for POSIX BRE/ERE/awk/grep/egrep/sed syntax. ❼ Localization features of POSIX is required since ECMA is not capable of localization. ❼ Performance is low, due to rich expression features. ❼ There are given NO performance guarantees. ❼ Boost has a way to monitor the runtime complexity of expressions and stopping them. ❼ Customizing the expression syntax with trait classes. Nice!

  18. ❼ ❼ ❼ Regular Expressions C++0x Sources Implementation A full implementation in C++, not a wrapper! Available in the header file <regex> Representation: ❼ basic regex , holder of expressions, looks like a basic string . ❼ match results , iterator of match results Methods:

  19. Regular Expressions C++0x Sources Implementation A full implementation in C++, not a wrapper! Available in the header file <regex> Representation: ❼ basic regex , holder of expressions, looks like a basic string . ❼ match results , iterator of match results Methods: ❼ bool regex match(basic string, basic regex) ❼ bool regex search(basic string, match results, basic regex) ❼ basic string regex replace(basic string, basic regex, basic string )

  20. Regular Expressions C++0x Sources C++0x Example #i n c l u d e < s t d l i b . h > #i n c l u d e < regex > #i n c l u d e < s t r i n g > #i n c l u d e < iostream > using namespace std ; regex e x p r e s s i o n ( ”([0 − 9]+)( \\−| | ✩ ) ( . ✯ ) ” ) ; // p r o c e s s f t p : on s u c c e s s r e t u r n s the f t p r espo nse code , and f i l l s // msg with the f t p r espo nse message . i n t p r o c e s s f t p ( const char ✯ response , std : : s t r i n g ✯ msg) { cmatch what ; i f ( regex match ( response , what , e x p r e s s i o n )) { // what [ 0 ] c o n t a i n s the whole s t r i n g // what [ 1 ] c o n t a i n s the r espo nse code // what [ 2 ] c o n t a i n s the s e p a r a t o r c h a r a c t e r // what [ 3 ] c o n t a i n s the t e x t message . i f (msg) msg − > a s s i g n ( what [ 3 ] . f i r s t , what [ 3 ] . second ) ; std : : a t o i ( what [ 1 ] . f i r s t ) ; return } // f a i l u r e did not match i f (msg) msg − > e r a s e ( ) ; − 1; return } How is C++0x different from C++?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend