Capturing String Manipulation with stringr Capturing > ANY_CHAR - - PowerPoint PPT Presentation

capturing
SMART_READER_LITE
LIVE PREVIEW

Capturing String Manipulation with stringr Capturing > ANY_CHAR - - PowerPoint PPT Presentation

STRING MANIPULATION WITH STRINGR Capturing String Manipulation with stringr Capturing > ANY_CHAR %R% "a" <regex> .a > capture(ANY_CHAR) %R% "a" <regex> (.)a > str_extract(c("Fat",


slide-1
SLIDE 1

STRING MANIPULATION WITH STRINGR

Capturing

slide-2
SLIDE 2

String Manipulation with stringr

Capturing

> ANY_CHAR %R% "a" <regex> .a > capture(ANY_CHAR) %R% "a" <regex> (.)a > str_extract(c("Fat", "cat"), pattern = ANY_CHAR %R% "a") [1] "Fa" "ca" > str_extract(c("Fat", "cat"), pattern = capture(ANY_CHAR) %R% "a") [1] "Fa" "ca"

slide-3
SLIDE 3

String Manipulation with stringr

str_match()

> str_match(c("Fat", "cat"), pattern = capture(ANY_CHAR) %R% "a") [,1] [,2] [1,] "Fa" "F" [2,] "ca" "c"

slide-4
SLIDE 4

String Manipulation with stringr

str_match()

> pattern <- DOLLAR %R% DGT %R% optional(DGT) %R% DOT %R% dgt(2) > str_view(c("$5.50", "$32.00"), pattern = pattern)

slide-5
SLIDE 5

String Manipulation with stringr

str_match()

> pattern <- DOLLAR %R% capture(DGT %R% optional(DGT)) %R% DOT %R% capture(dgt(2)) > str_match(c("$5.50", "$32.00"), pattern = pattern) [,1] [,2] [,3] [1,] "$5.50" "5" "50" [2,] "$32.00" "32" "00"

slide-6
SLIDE 6

String Manipulation with stringr

Non-capturing groups

> or("dog", "cat") <regex> (?:dog|cat)

dog|cat

> or("dog", "cat", capture = TRUE) <regex> (dog|cat) > capture(or("dog", "cat")) <regex> ((?:dog|cat))

Need parentheses to distinguish

(dog|cat) do(g|c)at

slide-7
SLIDE 7

STRING MANIPULATION WITH STRINGR

Let’s practice!

slide-8
SLIDE 8

STRING MANIPULATION WITH STRINGR

Backreferences

slide-9
SLIDE 9

String Manipulation with stringr

Backreferences

> REF1 <regex> \1 > REF2 <regex> \2

slide-10
SLIDE 10

String Manipulation with stringr

In a paern

SPC %R%

  • ne_or_more(WRD) %R%

SPC

slide-11
SLIDE 11

String Manipulation with stringr

In a paern

SPC %R% capture(one_or_more(WRD)) %R% SPC

slide-12
SLIDE 12

String Manipulation with stringr

In a paern

SPC %R% capture(one_or_more(WRD)) %R% SPC %R% REF1

> str_view("Paris in the the spring", SPC %R% capture(one_or_more(WRD)) %R% SPC %R% REF1)

slide-13
SLIDE 13

String Manipulation with stringr

In a replacement

> str_replace("Paris in the the spring", pattern = SPC %R% capture(one_or_more(WRD)) %R% SPC %R% REF1, replacement = str_c(" ", REF1)) [1] "Paris in the spring"

slide-14
SLIDE 14

STRING MANIPULATION WITH STRINGR

Let’s practice!

slide-15
SLIDE 15

STRING MANIPULATION WITH STRINGR

Unicode and paern matching

slide-16
SLIDE 16

String Manipulation with stringr

Unicode

  • Associates each character with a code point

Character Code Point a 61 μ 3BC 😁 1F600

slide-17
SLIDE 17

String Manipulation with stringr

Unicode in R

> "\u03BC" [1] "μ" > "\U03BC" [1] "μ" > writeLines("\U0001F44F") 👐

slide-18
SLIDE 18

String Manipulation with stringr

Unicode in R

> as.hexmode(utf8ToInt("a")) [1] "61" > as.hexmode(utf8ToInt("μ")) [1] "3bc" > as.hexmode(utf8ToInt("😁")) [1] "1f600"

slide-19
SLIDE 19

String Manipulation with stringr

Matching Unicode

> x <- "Normal(\u03BC = 0, \u03C3 = 1)" > x [1] "Normal(μ = 0, σ = 1)" > str_view(x, pattern = "\u03BC")

hp://unicode.org/charts hp://www.fileformat.info/info/unicode/char/search.htm

slide-20
SLIDE 20

String Manipulation with stringr

Matching Unicode groups

> str_view_all(x, greek_and_coptic())

?Unicode ?unicode_property ?unicode_general_category

Use \p followed by {name}

Regular expression rebus

slide-21
SLIDE 21

STRING MANIPULATION WITH STRINGR

Let’s practice!