Example exploration Volcanic eruptions R.W. Oldford Global - - PowerPoint PPT Presentation

example exploration
SMART_READER_LITE
LIVE PREVIEW

Example exploration Volcanic eruptions R.W. Oldford Global - - PowerPoint PPT Presentation

Example exploration Volcanic eruptions R.W. Oldford Global Volcanism Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava),


slide-1
SLIDE 1

Example exploration

Volcanic eruptions R.W. Oldford

slide-2
SLIDE 2

Global Volcanism

Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us).

slide-3
SLIDE 3

Global Volcanism

Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes.

slide-4
SLIDE 4

Global Volcanism

Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes. Question: What are the target and study populations here?

slide-5
SLIDE 5

Global Volcanism

Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes. Question: What are the target and study populations here? What are the units? That is, what would we be taking measurements on?

slide-6
SLIDE 6

Global Volcanism

Data The Smithsonian’s “Global Volcanism Program” (GVP) at http://volcano.si.edu/search_eruption.cfm contains several data sources on Volcanoes.

slide-7
SLIDE 7

Global Volcanism

Data The Smithsonian’s “Global Volcanism Program” (GVP) at http://volcano.si.edu/search_eruption.cfm contains several data sources on Volcanoes. The following files were downloaded March 28, 2017 from this site:

◮ GVP_Volcano_List.csv giving a list of volcanoes, their geology, and the nearby

human population

◮ GVP_Eruption_Eruptions.csv identifying volcanoes and the magnitude of their

eruptions

◮ GVP_Eruption_Events.csv containing information describing the types of

eruptions

◮ GVP_Emission_Activity.csv and GVP_Emission_Details.csv giving

information on the sulphur dioxide content of the emissions from many modern volcanic eruptions

◮ GVP_Volcano_List_References.csv and GVP_Eruption_References.csv

containing information on literature sources for the volcanoes and their eruptions.

(Data source suggested from a poster by Kelly McConville: https://www.causeweb.org/cause/sites/default/files/ecots/ecots16/posters/Kelly_McConvilleAre_Volcanic_Eruptions_Increasing.pdf)

slide-8
SLIDE 8

Global Volcanism

Data Look first at the eruptions:

(eruptions <- read_csv("GVP_Eruption_Eruptions.csv"))

slide-9
SLIDE 9

Global Volcanism

Data Look first at the eruptions:

(eruptions <- read_csv("GVP_Eruption_Eruptions.csv")) ## # A tibble: 11,108 x 24 ## `Volcano Number` `Volcano Name` `Eruption Numbe~ `Eruption Categ~ ## <int> <chr> <int> <chr> ## 1 282090 Kirishimayama 22257 Confirmed Erupt~ ## 2 352090 Sangay 22259 Confirmed Erupt~ ## 3 267020 Karangetang 22256 Confirmed Erupt~ ## 4 283120 Kusatsu-Shira~ 22258 Confirmed Erupt~ ## 5 343100 San Miguel 22251 Confirmed Erupt~ ## 6 273030 Mayon 22250 Confirmed Erupt~ ## 7 251002 Kadovar 22246 Confirmed Erupt~ ## 8 272020 Kanlaon 22249 Confirmed Erupt~ ## 9 264020 Agung 22241 Confirmed Erupt~ ## 10 261230 Dempo 22248 Confirmed Erupt~ ## # ... with 11,098 more rows, and 20 more variables: `Area of ## # Activity` <chr>, VEI <int>, `VEI Modifier` <chr>, `Start Year ## # Modifier` <chr>, `Start Year` <int>, `Start Year Uncertainty` <int>, ## # `Start Month` <int>, `Start Day Modifier` <chr>, `Start Day` <int>, ## # `Start Day Uncertainty` <int>, `Evidence Method (dating)` <chr>, `End ## # Year Modifier` <chr>, `End Year` <int>, `End Year Uncertainty` <chr>, ## # `End Month` <int>, `End Day Modifier` <chr>, `End Day` <int>, `End Day ## # Uncertainty` <int>, Latitude <dbl>, Longitude <dbl>

slide-10
SLIDE 10

Global Volcanism - Data

Variates are

names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"

slide-11
SLIDE 11

Global Volcanism - Data

Variates are

names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude" Which, if any of these, might be a primary key?

slide-12
SLIDE 12

Global Volcanism - Data

Variates are

names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude" Which, if any of these, might be a primary key? Perhaps Eruption Number? eruptions %>% count(`Eruption Number`) %>% filter(n > 1) ## # A tibble: 1 x 2 ## `Eruption Number` n ## <int> <int> ## 1 21100 3 Nope!

Important note: The name of variables have backquotes as in Eruption Number and not single or double quotes, as in ‘Eruption Number’ or “Eruption Number”! Try the above with “Eruption Number” (or even “fubar”!!).

slide-13
SLIDE 13

Global Volcanism - Data

What is available about Eruption Number 21100?

eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 2 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 3 Craters of the Moon Confirmed Eruption 43.4

  • 114.
slide-14
SLIDE 14

Global Volcanism - Data

What is available about Eruption Number 21100?

eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 2 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 3 Craters of the Moon Confirmed Eruption 43.4

  • 114.

Well, that’s interesting . . . ?

slide-15
SLIDE 15

Global Volcanism - Data

What is available about Eruption Number 21100?

eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 2 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 3 Craters of the Moon Confirmed Eruption 43.4

  • 114.

Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds.

slide-16
SLIDE 16

Global Volcanism - Data

What is available about Eruption Number 21100?

eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 2 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 3 Craters of the Moon Confirmed Eruption 43.4

  • 114.

Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds. It is located along the north border of the Snake River Plain in Idaho. It was declared a national monument by President Calvin Coolidge in 1924. The monument contains 55 cones with lava flows and 14 fissures, many of which have spatter cones. . . .

slide-17
SLIDE 17

Global Volcanism - Data

What is available about Eruption Number 21100?

eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 2 Craters of the Moon Confirmed Eruption 43.4

  • 114.

## 3 Craters of the Moon Confirmed Eruption 43.4

  • 114.

Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds. It is located along the north border of the Snake River Plain in Idaho. It was declared a national monument by President Calvin Coolidge in 1924. The monument contains 55 cones with lava flows and 14 fissures, many of which have spatter cones. . . . The Great Rift is a line of cones and lava vents that runs for 13 miles (21 km) through the monument. Fissures from this rift are the vents for the youngest lavas in the area. The youngest Craters of the Moon lavas are approximately 1500 to 2000 years old. These lavas are basaltic.”

From http://volcano.oregonstate.edu/craters-moon

slide-18
SLIDE 18

Global Volcanism - Data

What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE

slide-19
SLIDE 19

Global Volcanism - Data

What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE No combination of the first 8 variables provide a primary key.

slide-20
SLIDE 20

Global Volcanism - Data

What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE No combination of the first 8 variables provide a primary key. eruptions %>% count(`Eruption Number`, `Start Year`) %>% filter(n > 1) %>% nrow()==0 ## [1] TRUE is a possibility.

slide-21
SLIDE 21

Global Volcanism - Data

Look again at the variate names:

names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"

What does each of these record?

slide-22
SLIDE 22

Global Volcanism - Data

Look again at the variate names:

names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"

What does each of these record? Identity, location, time, plus some like Eruption Category, Area of Activity, VEI, VEI modifier, and Evidence Method (dating) which may require a little research.

slide-23
SLIDE 23

Global Volcanism - Data

For example, what is VEI?

slide-24
SLIDE 24

Global Volcanism - Data

For example, what is VEI? Volcanic Explosivity Index

slide-25
SLIDE 25

Global Volcanism - Data

For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.”

slide-26
SLIDE 26

Global Volcanism - Data

For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.” Which sure doesn’t sound well-defined.

slide-27
SLIDE 27

Global Volcanism - Data

For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.” Which sure doesn’t sound well-defined. Sounds like it includes some subjective assessment.

slide-28
SLIDE 28

Global Volcanism - Data

The variate VEI gives a numerical value to how explosive an eruption is, and is related to an estimate of the volume

  • f material ejected.

Roughly, it appears to be a logarithmic scale (base 10) with each value above 2 representing a ten fold increase in the volume of material ejected. Below 2, however, this is not the case. VEI of 0 is non-explosive (< 1 × 104m3 of material ejected), VEI of 1 has between 104m3 and 106m3 material ejected, and VEI of 2 has between 106m3 and 107m3. See https://geology.com/stories/13/volcanic-explosivity-index/ for detail.

Image source http://volcanoes.usgs.gov/Products/Pglossary/vei.htmlAccessed

slide-29
SLIDE 29

eruptions - Data

A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106

slide-30
SLIDE 30

eruptions - Data

A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106 summary(factor(eruptions$`Area of Activity`))[1:10] ## Naka-dake Bromo Ngauruhoe Mihara-yama Ohachi ## 172 62 52 47 45 ## Anak Krakatau Central Crater Okama Tarumai NE rift zone ## 42 39 37 35 32

slide-31
SLIDE 31

eruptions - Data

A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106 summary(factor(eruptions$`Area of Activity`))[1:10] ## Naka-dake Bromo Ngauruhoe Mihara-yama Ohachi ## 172 62 52 47 45 ## Anak Krakatau Central Crater Okama Tarumai NE rift zone ## 42 39 37 35 32 summary(factor(eruptions$`Evidence Method (dating)`))[1:9] ## Anthropology Ar/Ar Dendrochronology ## 38 26 15 ## Fission track Historical Observations Hydration Rind ## 3 6326 11 ## Hydrophonic Ice Core Lichenometry ## 68 116 2

slide-32
SLIDE 32

Global Volcanism - Analysis

A summary of each numeric variate:

eruptions %>% select_if(is.numeric) %>% summary ## Volcano Number Eruption Number VEI Start Year ## Min. :210010 Min. :10001 Min. :0.00 Min. :-10450 ## 1st Qu.:263310 1st Qu.:12796 1st Qu.:1.00 1st Qu.: 650 ## Median :290056 Median :15604 Median :2.00 Median : 1846 ## Mean :300374 Mean :15614 Mean :1.95 Mean : 617 ## 3rd Qu.:343030 3rd Qu.:18397 3rd Qu.:2.00 3rd Qu.: 1949 ## Max. :600000 Max. :22260 Max. :7.00 Max. : 2018 ## NA's :2911 NA's :1 ## Start Year Uncertainty Start Month Start Day ## Min. : 1.0 Min. : 0.000 Min. : 0.000 ## 1st Qu.: 50.0 1st Qu.: 0.000 1st Qu.: 0.000 ## Median : 100.0 Median : 1.000 Median : 0.000 ## Mean : 292.7 Mean : 3.436 Mean : 6.966 ## 3rd Qu.: 200.0 3rd Qu.: 7.000 3rd Qu.:15.000 ## Max. :14000.0 Max. :12.000 Max. :31.000 ## NA's :9024 NA's :184 NA's :187 ## Start Day Uncertainty End Year End Month End Day ## Min. : 1 Min. :-475 Min. : 0.000 Min. : 0.00 ## 1st Qu.: 15 1st Qu.:1894 1st Qu.: 3.000 1st Qu.: 3.00 ## Median : 15 Median :1956 Median : 6.000 Median :15.00 ## Mean : 64 Mean :1916 Mean : 6.194 Mean :13.25 ## 3rd Qu.: 45 3rd Qu.:1991 3rd Qu.: 9.000 3rd Qu.:21.00 ## Max. :730 Max. :2018 Max. :12.000 Max. :31.00 ## NA's :10247 NA's :6846 NA's :6849 NA's :6852 ## End Day Uncertainty Latitude Longitude ## Min. : 1.00 Min. :-77.530 Min. :-179.97 ## 1st Qu.: 5.00 1st Qu.: -6.102 1st Qu.: -77.66 ## Median : 15.00 Median : 18.130 Median : 55.71 ## Mean : 23.88 Mean : 16.848 Mean : 31.48 ## 3rd Qu.: 15.00 3rd Qu.: 40.821 3rd Qu.: 139.39 ## Max. :365.00 Max. : 85.608 Max. : 179.58 ## NA's :10416

slide-33
SLIDE 33

eruptions - Some exploratory analysis

In RStudio all contents can be seen using the View() function. This gives a spreadsheet view of the data that is interactive. For example, it allows sorting, searching, and filtering. View(eruptions)

slide-34
SLIDE 34

eruptions - Some exploratory analysis

We might begin with how many eruptions have been recorded for each volcano:

eruptions %>% group_by(`Volcano Number`) %>% summarize(count = n()) %>% ggplot(aes(x = `Volcano Number`, y = count)) + geom_point(col="steelblue", alpha=0.5) + ggtitle("Number of Eruptions") -> gp gp

50 100 150 200 250 2e+05 3e+05 4e+05 5e+05 6e+05

‘Volcano Number‘ count

Number of Eruptions Some volcanoes have tens and even hundreds of recorded observations. There also seems to be a large gap in volcano numbers. Beyond this gap there are very few (possibly only 1?) volcano numbers. If there is only one it is recording about 75 eruptions.

slide-35
SLIDE 35

eruptions - Some exploratory analysis

How many volcanoes are in the large volcano numbers?

slide-36
SLIDE 36

eruptions - Some exploratory analysis

How many volcanoes are in the large volcano numbers?

eruptions %>% filter(`Volcano Number` > 500000) %>% group_by(`Volcano Number`) %>% summarize(count = n()) ## # A tibble: 1 x 2 ## `Volcano Number` count ## <int> <int> ## 1 600000 77 Just the one it seems (with 77 eruptions).

slide-37
SLIDE 37

eruptions - Some exploratory analysis

How many volcanoes are in the large volcano numbers?

eruptions %>% filter(`Volcano Number` > 500000) %>% group_by(`Volcano Number`) %>% summarize(count = n()) ## # A tibble: 1 x 2 ## `Volcano Number` count ## <int> <int> ## 1 600000 77 Just the one it seems (with 77 eruptions). And what do we know about it? eruptions %>% filter(`Volcano Number` == 600000) %>% mutate(name = factor(`Volcano Name`), cat = factor(`Eruption Category`), evid = factor(`Evidence Method (dating)`)) %>% select(name, VEI, cat, evid) %>% summary() ## name VEI cat ## Unknown Source:77 Min. :6 Confirmed Eruption:77 ## 1st Qu.:6 ## Median :6 ## Mean :6 ## 3rd Qu.:6 ## Max. :6 ## NA's :76 ## evid ## Dendrochronology : 5 ## Historical Observations: 1 ## Ice Core :71 ## ## ##

slide-38
SLIDE 38

eruptions - Some exploratory analysis

On the curiously large gap in volcano numbers with no numbers beginning with either 4

  • r 5.

From the website: "The International Association for Volcanology and Chemistry of Earth’s Interior (IAVCEI), The World Organization of Volcano Observatories (WOVO), and the Global Volcano Model (GVM) have sanctioned GVP to assign numbers and primary names to the world’s volcanoes. The purpose of the numbers is to prevent ambiguity regarding the name and location of volcanoes that may have non-unique names, or that are known by multiple

  • names. The original VNums were based on a system developed in the 1950’s

for the IAVCEI Catalog of Active Volcanoes of the World (CAVW). GVP policy had been to embed significant geographical, historical, and age information in the numbers. As a result GVP often changed VNums, most frequently to accommodate newly recognized volcanoes in a particular geographical region, which over time undermined the goal of preventing ambiguity. . . . numbers have been added for subfeatures associated with each

  • volcano. None of the new numbers start with 0 or 1 to avoid confusion with

the legacy system. While a connection remains to the older system, the geographic link to CAVW regions and subregions is no longer mandatory. None of the new numbers start with 0 or 1 to avoid confusion with the legacy system." The website also says that the digits encode geographic information.

slide-39
SLIDE 39

eruptions - Some exploratory analysis

How does the number of eruptions behave over time?

eruptions %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") ## Warning: Removed 1 rows containing non-finite values (stat_bin).

1000 2000 3000 4000 −8000 −4000

‘Start Year‘ count

Looks like the number of eruptions over time is increasing. Likely this is more because

  • f increasing recordings than a more volcanic world.
slide-40
SLIDE 40

eruptions - Some exploratory analysis

How does the number of eruptions behave over time? What if we separate the counts by Eruption Category?

eruptions %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") + facet_wrap(~`Eruption Category`) ## Warning: Removed 1 rows containing non-finite values (stat_bin).

Confirmed Eruption Discredited Eruption Uncertain Eruption −8000 −4000 −8000 −4000 −8000 −4000 1000 2000 3000

‘Start Year‘ count

Still looks like the number of eruptions over time is increasing, especially for confirmed eruptions.

slide-41
SLIDE 41

eruptions - Some exploratory analysis

Just look at confirmed eruptions, and see how they depend upon evidence.

eruptions %>% filter(`Eruption Category` == "Confirmed Eruption") %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") + facet_wrap(~ `Evidence Method (dating)`, scales = "free_y")

NA Tephrochronology Thermoluminescence Uncertain Uranium−series Varve Count Potassium−Argon Radiocarbon (corrected) Radiocarbon (uncorrected) Seismicity Surface Exposure Hydration Rind Hydrophonic Ice Core Lichenometry Magnetism Anthropology Ar/Ar Dendrochronology Fission track Historical Observations −8000 −4000 −8000 −4000 −8000 −4000 −8000 −4000 −8000 −4000 1000 2000 3000 3 6 9 2 4 6 10 20 30 40 50 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 1 2 3 2 4 6 10 20 30 20 40 60 80 0.00 0.25 0.50 0.75 1.00 1 2 3 20 40 60 25 50 75 0.0 0.5 1.0 1.5 2.0 0.0 2.5 5.0 7.5 0.0 0.5 1.0 1.5 2.0 1 2 3 20 40 60 80 10 20 30

‘Start Year‘ count

Even after separating out the different sources of evidence, it looks like the number of eruptions over time is increasing.

slide-42
SLIDE 42

eruptions - Some exploratory analysis

Restrict data to the confirmed eruptions.

eruptions %>% filter(`Eruption Category` == "Confirmed Eruption") -> confirmed_eruptions

Look at VEI, the eruption’s “Volcanic Explosivity Index” of how explosive an eruption is (related to the volume of material ejected).

confirmed_eruptions %>% ggplot(aes(x = VEI)) + geom_bar(fill = "steelblue") + facet_wrap(~`VEI Modifier`) ## Warning: Removed 2277 rows containing non-finite values (stat_count).

C P NA ? ^ + 2 4 6 2 4 6 2 4 6 1000 2000 3000 1000 2000 3000

VEI count

It’s not clear from the data what each of these (non-NA) modifiers mean: ? likely means uncertainty, for + and ˆ either might indicate the valus is a lower bound, and P and C remain a mystery (a guess perhaps is type of material?). No information was found on the source website.

slide-43
SLIDE 43

eruptions - Some exploratory analysis

VEI over time (Start Year)

confirmed_eruptions %>% ggplot(aes(x = `Start Year`, y = VEI)) + geom_point(col = "steelblue", alpha = 0.5) + facet_wrap(~`VEI Modifier`) ## Warning: Removed 2277 rows containing missing values (geom_point).

C P NA ? ^ + −8000 −4000 −8000 −4000 −8000 −4000 2 4 6 2 4 6

‘Start Year‘ VEI

slide-44
SLIDE 44

eruptions - Some exploratory analysis

Average VEI over time (Start Year)

confirmed_eruptions %>% group_by(`Start Year`) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = `Start Year`, y = VEIAve)) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 404 rows containing missing values (geom_point).

2 4 6 −8000 −4000

‘Start Year‘ VEIAve

Question: What might this plot suggest about the study population?

slide-45
SLIDE 45

eruptions - Some exploratory analysis

Average VEI over time (Era of 50 years)

confirmed_eruptions %>% mutate(Ranges = cut_interval(`Start Year`, length = 50), Era = as.numeric(Ranges)) %>% group_by(Era) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = Era, y = VEIAve)) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth()

2 4 6 50 100 150 200 250

Era VEIAve

Question: What might this plot suggest about study error in this case? How might you reduce it?

slide-46
SLIDE 46

eruptions - Some exploratory analysis

Each eruption has information on when it started and when it ended (year, month, day, modifier, uncertainty). This should give us some information on each eruption’s duration and hence also of its importance. To determine durations, there are a number of functions in base R but here we will introduce some from the lubridate package.

library(lubridate) confirmed_eruptions %>% mutate(start = ymd(paste(`Start Year`, `Start Month`, `Start Day`, sep = "-")), end = ymd(paste(`End Year`, `End Month`, `End Day`, sep = "-")), duration = end - start + 1 ) %>% ggplot(aes(VEI, duration)) + geom_point(col = "steelblue", alpha = 0.5)

slide-47
SLIDE 47

eruptions - Some exploratory analysis

## Warning: 5416 failed to parse. ## Warning: 6594 failed to parse. ## Warning: Removed 6731 rows containing missing values (geom_point).

25000 50000 75000 2 4 6

VEI duration

Note warnings – perhaps dates are not as clean as we might like.

slide-48
SLIDE 48

eruptions - Some exploratory analysis

Try just duration in years

confirmed_eruptions %>% mutate(start = `Start Year`, end = `End Year`, duration = end - start + 1 ) %>% ggplot(aes(VEI, duration)) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 5968 rows containing missing values (geom_point).

100 200 300 2 4 6

VEI duration

Some eruptions have lasted 100s of years!

slide-49
SLIDE 49

eruptions - Some exploratory analysis

Which volcanoes have been erupting for hundreds of years?

confirmed_eruptions %>% mutate(start = `Start Year`, end = `End Year`, duration = end - start + 1 ) %>% filter(duration > 100) %>% select(`Volcano Number`, `Volcano Name`, `Evidence Method (dating)`, VEI, start, end, duration, Longitude, longEruptions longEruptions ## # A tibble: 4 x 9 ## `Volcano Number` `Volcano Name` `Evidence Metho~ VEI start end ## <int> <chr> <chr> <int> <int> <int> ## 1 257100 Yasur Historical Obse~ 3 1774 2018 ## 2 352090 Sangay Historical Obse~ 3 1728 1916 ## 3 211040 Stromboli Historical Obse~ 3 1558 1857 ## 4 384010 Fogo Historical Obse~ 1 1500 1761 ## # ... with 3 more variables: duration <dbl>, Longitude <dbl>, ## # Latitude <dbl>

slide-50
SLIDE 50

eruptions - Some exploratory analysis

Which volcanoes have been erupting for hundreds of years?

Volcano Number Volcano Name Evidence Method (dating) VEI start end duration Longitude Latitude 257100 Yasur Historical Observations 3 1774 2018 245 169.447

  • 19.532

352090 Sangay Historical Observations 3 1728 1916 189

  • 78.341

211040 Stromboli Historical Observations 3 1558 1857 300 15.213 38.789 384010 Fogo Historical Observations 1 1500 1761 262

  • 24.350

14.950 # from the maps package world <- maps::map(fill = TRUE, col = "cornsilk", plot = FALSE) p <- with(longEruptions, l_plot(Longitude, Latitude, color = "red", glyph = "ccircle", size = 5* VEI, itemLabel = paste0(" ", `Volcano Name`, ",

  • No. ", `Volcano Number`, "\n",

start, "-", end, "\n", duration, " years"), showItemLabels = TRUE, showGuides = TRUE, showLabels = FALSE)) l_layer(p, world, color = "cornsilk", asSingleLayer = TRUE, label = "world map", index = "last") l_scaleto_world(p) plot(p)

slide-51
SLIDE 51

eruptions - Some exploratory analysis

Which volcanoes have been erupting for hundreds of years?

Volcano Number Volcano Name Evidence Method (dating) VEI start end duration Longitude Latitude 257100 Yasur Historical Observations 3 1774 2018 245 169.447

  • 19.532

352090 Sangay Historical Observations 3 1728 1916 189

  • 78.341

211040 Stromboli Historical Observations 3 1558 1857 300 15.213 38.789 384010 Fogo Historical Observations 1 1500 1761 262

  • 24.350

14.950 ## loon layer "world map" of type polygons of plot .l0.plot ## [1] "layer0"

slide-52
SLIDE 52

eruptions - Some exploratory analysis

library(lubridate) confirmed_eruptions %>% select(`Start Year`, `Start Month`, `Start Day`, `End Year`, `End Month`, `End Day`) %>% summary() ## Start Year Start Month Start Day End Year ## Min. :-10450.0 Min. : 0.000 Min. : 0.000 Min. :-475 ## 1st Qu.: 180.0 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:1894 ## Median : 1833.0 Median : 1.000 Median : 0.000 Median :1954 ## Mean : 472.5 Mean : 3.405 Mean : 7.007 Mean :1915 ## 3rd Qu.: 1947.0 3rd Qu.: 7.000 3rd Qu.:15.000 3rd Qu.:1990 ## Max. : 2018.0 Max. :12.000 Max. :31.000 Max. :2018 ## NA's :1 NA's :182 NA's :183 NA's :5897 ## End Month End Day ## Min. : 0.000 Min. : 0.00 ## 1st Qu.: 3.000 1st Qu.: 4.00 ## Median : 6.000 Median :15.00 ## Mean : 6.202 Mean :13.33 ## 3rd Qu.: 9.000 3rd Qu.:21.00 ## Max. :12.000 Max. :31.00 ## NA's :5899 NA's :5900 Lots of missing values. Would have to impute some values to continue.

slide-53
SLIDE 53

eruptions - Some exploratory analysis

Given the difficulties with older volcanic information in this data set, we might focus on the most recent and higher quality dates. The Global Volcanism Program points out that satellite coverage able to monitor volcanic gas emissions began in 1978. So we could focus on this range of dates.

library(lubridate) confirmed_eruptions %>% filter(`Start Year` >= 1978) %>% mutate(start = ymd(paste(`Start Year`, `Start Month`, `Start Day`, sep = "-")), end = ymd(paste(`End Year`, `End Month`, `End Day`, sep = "-")), duration = end - start + 1 ) -> duration_eruptions ## Warning: 100 failed to parse.

slide-54
SLIDE 54

eruptions - Some exploratory analysis

Given the difficulties with older volcanic information in this data set, we might focus on the most recent and higher quality dates. The Global Volcanism Program points out that satellite coverage able to monitor volcanic gas emissions began in 1978. So we could focus on this range of dates.

tcuts <- 365.25 * c(1/12, 1, 5, 10, 20) # time cuts for plot ncuts <- length(tcuts) cols <- colorRamp(c("green", "red") )((0:(ncuts-1))/(ncuts-1)) cols <- rgb(cols/255) duration_eruptions %>% ggplot(aes(VEI, duration)) + geom_hline(yintercept = tcuts, lty = 1:ncuts, col = cols) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 136 rows containing missing values (geom_point).

5000 10000 2 4 6

VEI duration

slide-55
SLIDE 55

eruptions - Some exploratory analysis

How about the average VEI per year since 1978?

confirmed_eruptions %>% filter(`Start Year` >= 1978) %>% group_by(`Start Year`) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = `Start Year`, y = VEIAve)) + ylim(0, 4) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess")

1 2 3 4 1980 1990 2000 2010 2020

‘Start Year‘ VEIAve

Comments? Note that by using Start Year we are only looking at new eruptions in this time period.

slide-56
SLIDE 56

eruptions - Some exploratory analysis

Number of new eruptions per year since 1978:

confirmed_eruptions %>% filter(`Start Year` >= 1978 ) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1975, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")

20 40 60 1980 1990 2000 2010 2020

‘Start Year‘ count World wide

Number of new eruptions Comments?

slide-57
SLIDE 57

eruptions - Some exploratory analysis

Number of new eruptions per year since 1978:

confirmed_eruptions %>% filter(`Start Year` >= 1978 ) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1975, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")

20 40 60 1980 1990 2000 2010 2020

‘Start Year‘ count World wide

Number of new eruptions Comments? Right most point is based only on a few months. Should have filtered Start Year < 2018 as well.

slide-58
SLIDE 58

eruptions - Some exploratory analysis

Alternatively: Number of new eruptions per year since 1950 and before 2018:

confirmed_eruptions %>% filter(`Start Year` >= 1950 & `Start Year` < 2018) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1950, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")

20 40 60 1960 1980 2000 2020

‘Start Year‘ count World wide

Number of new eruptions Looks like about 30-40 new eruptions per year.

slide-59
SLIDE 59

eruptions - Some exploratory analysis

There are lots of other plots we might look at: duration of each eruption

confirmed_eruptions %>% filter(`End Year` >= 1978) %>% ggplot(aes(x = `Start Year`, y = `Eruption Number`)) + xlim(1970, 2020) + aes(xend = `End Year`, yend = `Eruption Number`) + geom_segment(col = "steelblue", alpha = 0.5) + geom_vline(xintercept = c(1978, 2018), col = "grey", alpha = 0.5) + ggtitle("Duration of eruptions", subtitle = "Ending in 1978 or later")

10000 12500 15000 17500 20000 22500 1970 1980 1990 2000 2010 2020

‘Start Year‘ ‘Eruption Number‘ Ending in 1978 or later

Duration of eruptions

slide-60
SLIDE 60

eruptions - Some exploratory analysis

There are lots of other plots we might look at: VEI and duration of each eruption

confirmed_eruptions %>% filter(`End Year` >= 1978) %>% mutate(jitter_VEI = jitter(VEI, factor = 2)) %>% ggplot(aes(x = `Start Year`, y = `jitter_VEI`)) + xlim(1970, 2020) + aes(xend = `End Year`, yend = `jitter_VEI`) + geom_segment(col = "steelblue", alpha = 0.5) + geom_vline(xintercept = c(1978, 2018), col = "grey", alpha = 0.5) + ylab("VEI (jittered)") + ggtitle("Duration of eruptions", subtitle = "Ending in 1978 or later")

2 4 6 1970 1980 1990 2000 2010 2020

‘Start Year‘ VEI (jittered) Ending in 1978 or later

Duration of eruptions

slide-61
SLIDE 61

eruptions - Some exploratory analysis

Interactive loon maps of all known locations:

# load the maps library before tidyverse, then access the map function # from the maps package world <- maps::map(fill=TRUE, col = "cornsilk", plot = FALSE) library(loon) confirmed_eruptions %>% group_by(`Volcano Number`) %>% summarize(longitude = first(Longitude), latitude = first(Latitude), count = n(), name = first(`Volcano Name`)) %>% select(longitude, latitude, count, name) %>% l_plot(linkingGroup = "Eruptions", glyph= "ocircle", color = "red", size = .$count, itemLabel = .$name, showItemLabels = TRUE, title = "Volcanos with confirmed eruptions", showGuides = TRUE, showLabels = FALSE) %T>% l_layer(world, color = "cornsilk", asSingleLayer = TRUE, label = "world map", index = "last") -> p plot(p)

slide-62
SLIDE 62

eruptions - Some exploratory analysis

Interactive loon maps of all known locations:

slide-63
SLIDE 63

readr - importing “rectangular” data with read_csv()

The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).

# The events events <- read_csv("GVP_Eruption_Events.csv")

slide-64
SLIDE 64

readr - importing “rectangular” data with read_csv()

The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).

# The volcanoes (volcano <- read_csv("GVP_Volcano_List.csv")) ## # A tibble: 963 x 26 ## `Volcano Number` `Volcano Name` `Primary Volcan~ `Last Eruption ~ ## <int> <chr> <chr> <chr> ## 1 283001 Abu Shield(s)

  • 6850

## 2 355096 Acamarachi Stratovolcano Unknown ## 3 342080 Acatenango Stratovolcano(e~ 1972 ## 4 213004 Acigol-Nevseh~ Caldera

  • 2080

## 5 321040 Adams Stratovolcano 950 ## 6 283170 Adatarayama Stratovolcano(e~ 1996 ## 7 221170 Adwa Stratovolcano Unknown ## 8 221110 Afdera Stratovolcano Unknown ## 9 284160 Agrigan Stratovolcano 1917 ## 10 342100 Agua Stratovolcano Unknown ## # ... with 953 more rows, and 22 more variables: Country <chr>, ## # Region <chr>, Subregion <chr>, Latitude <dbl>, Longitude <dbl>, ## # Elevation <int>, `Tectonic Settings` <chr>, `Evidence Category` <chr>, ## # `Major Rock 1` <chr>, `Major Rock 2` <chr>, `Major Rock 3` <chr>, ## # `Major Rock 4` <chr>, `Major Rock 5` <chr>, `Minor Rock 1` <chr>, ## # `Minor Rock 2` <chr>, `Minor Rock 3` <chr>, `Minor Rock 4` <chr>, ## # `Minor Rock 5` <chr>, `Population within 5 km` <int>, `Population ## # within 10 km` <int>, `Population within 30 km` <int>, `Population ## # within 100 km` <int>

slide-65
SLIDE 65

readr - importing “rectangular” data with read_csv()

The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).

# The emissions (emissions <- read_csv("GVP_Emission_Activity.csv")) ## # A tibble: 206 x 9 ## `Volcano Number` `Volcano Name` Country `Emission ID` Method ## <int> <chr> <chr> <int> <chr> ## 1 290360 Chikurachki Russia 182 Satel~ ## 2 266030 Soputan Indone~ 150 Satel~ ## 3 342090 Fuego Guatem~ 283 Satel~ ## 4 342090 Fuego Guatem~ 284 Satel~ ## 5 266030 Soputan Indone~ 151 Satel~ ## 6 344040 Telica Nicara~ 292 Satel~ ## 7 282050 Kuchinoerabuj~ Japan 166 Satel~ ## 8 233020 Fournaise, Pi~ France 71 Satel~ ## 9 357120 Villarrica Chile 328 Satel~ ## 10 233020 Fournaise, Pi~ France 72 Satel~ ## # ... with 196 more rows, and 4 more variables: `Start Date` <chr>, `End ## # Date` <chr>, `Total SO2 Mass (kt)` <dbl>, `SO2 Altitude Range ## # Start` <dbl>

slide-66
SLIDE 66

readr - importing “rectangular” data with read_csv()

The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).

# The emission details (emissionDetails <- read_csv("GVP_Emission_Details.csv")) ## # A tibble: 395 x 12 ## `Volcano Number` `Volcano Name` Country `Emission ID` Method ## <int> <chr> <chr> <int> <chr> ## 1 353060 Azul, Cerro Ecuador 316 Satel~ ## 2 360150 Soufriere St.~ Saint ~ 349 Satel~ ## 3 353050 Negra, Sierra Ecuador 314 Satel~ ## 4 311310 Makushin United~ 215 Satel~ ## 5 321050 St. Helens United~ 233 Satel~ ## 6 321050 St. Helens United~ 233 Satel~ ## 7 321050 St. Helens United~ 234 Satel~ ## 8 372070 Hekla Iceland 351 Satel~ ## 9 252120 Ulawun Papua ~ 91 Satel~ ## 10 284170 Pagan United~ 172 Satel~ ## # ... with 385 more rows, and 7 more variables: `Emission Detail ## # ID` <int>, Platform <chr>, `Date Start` <int>, `Date End` <chr>, `SO2 ## # Mass (kt)` <dbl>, `Assumed SO2 Altitude` <dbl>, `SO2 Algorithm` <chr>

slide-67
SLIDE 67

A little data wrangling

When we have several tables like this we can see they may have data in common (e.g. Volcano Number), data that is peculiar to each table (e.g. Eruption Number, Emission ID, etc.), and data that may be the same but which appears in a different format (e.g. Start Date, Start Day, etc.). To analyse the data, we might want to build some new tables based on information from one or more tables. For example, the emissions data contains some new interesting information on the amount of sulphur dioxide (SO2) emitted by volcanoes as determined from satellites. It does not however contain the latitude and longitude of the volcanoes; this can be found from the eruptions data. We could begin by extracting this information from the confirmed_eruptions:

confirmed_eruptions %>% group_by(`Volcano Number`) %>% summarize(longitude = first(Longitude), latitude = first(Latitude), aveVEI = mean(VEI, na.rm = TRUE) ) -> confirmed_eruptions_long_lat We might like to merge this information with that on emissions.

slide-68
SLIDE 68

A little data wrangling - emissions

Note first that the beginning and end of each emission are given by a single variate each, namely Start Date and End Date. For example,

emissions[1, "Start Date"] ## # A tibble: 1 x 1 ## `Start Date` ## <chr> ## 1 2016 Mar 30 emissions[1, "End Date"] ## # A tibble: 1 x 1 ## `End Date` ## <chr> ## 1 2016 Mar 30

We might want to recast these in the same form as they were for the eruptions data. That is with a separate variate for each year, month, and day. To do this, we can use the separate() function from the tidyr package. Alternatively, we might unite the Start Year, Start Month, and Start Day variates into a single Start Date variate for the eruptions. This can be accomplished with the unite() function from tidyr (essentially the inverse of separate()).

slide-69
SLIDE 69

A little data wrangling - emissions

Here we separate the start and end dates in emissions. First we need a helper function:

monthNumbers <- function(mons){ monthNumber <- function(mon){(1:12)[month.abb == mon]} sapply(mons, monthNumber)}

Now we can use separate() to separate the dates

emissions %>% separate(`Start Date`, into = c("Start Year", "Start Month Name", "Start Day")) %>% separate(`End Date`, into = c("End Year", "End Month Name", "End Day")) %>% mutate(`Start Month` = monthNumbers(`Start Month Name`))%>% mutate(`End Month` = monthNumbers(`End Month Name`)) %>% select(-`Start Month Name`, -`End Month Name`) -> emissions_separate_dates

slide-70
SLIDE 70

A little data wrangling - emissions

Which gives

emissions_separate_dates[1:5, c("Start Year", "Start Month", "Start Day")] ## # A tibble: 5 x 3 ## `Start Year` `Start Month` `Start Day` ## <chr> <int> <chr> ## 1 2016 3 30 ## 2 2016 1 05 ## 3 2016 1 04 ## 4 2016 2 10 ## 5 2016 2 06 Warning: note the data types! We would need to do a little more work to force them all to be <int> as is the case in eruptions.

unite() could have similarly been used on eruptions to create start and end dates. We might also have turned both into a common date format using lubridate’s ymd() function. Note that for the eruptions, trouble might arise because of missing values (NAs).

slide-71
SLIDE 71

A little data wrangling - emissions

Now we could join the information on longitude and latitude to the table just created as follows:

emissions_separate_dates %>% left_join(confirmed_eruptions_long_lat, by = "Volcano Number") -> emissions_long_lat emissions_long_lat ## # A tibble: 206 x 16 ## `Volcano Number` `Volcano Name` Country `Emission ID` Method ## <int> <chr> <chr> <int> <chr> ## 1 290360 Chikurachki Russia 182 Satel~ ## 2 266030 Soputan Indone~ 150 Satel~ ## 3 342090 Fuego Guatem~ 283 Satel~ ## 4 342090 Fuego Guatem~ 284 Satel~ ## 5 266030 Soputan Indone~ 151 Satel~ ## 6 344040 Telica Nicara~ 292 Satel~ ## 7 282050 Kuchinoerabuj~ Japan 166 Satel~ ## 8 233020 Fournaise, Pi~ France 71 Satel~ ## 9 357120 Villarrica Chile 328 Satel~ ## 10 233020 Fournaise, Pi~ France 72 Satel~ ## # ... with 196 more rows, and 11 more variables: `Start Year` <chr>, ## # `Start Day` <chr>, `End Year` <chr>, `End Day` <chr>, `Total SO2 Mass ## # (kt)` <dbl>, `SO2 Altitude Range Start` <dbl>, `Start Month` <int>, ## # `End Month` <int>, longitude <dbl>, latitude <dbl>, aveVEI <dbl>

which now has the longitude, latitude, and historical average VEI, for the volcano associated with each emission.

slide-72
SLIDE 72

A little data wrangling - emissions

Allowing us to plot these emissions in a couple of loon plots

emissions_long_lat %>% mutate(SO2 = sqrt(`Total SO2 Mass (kt)`), name = `Volcano Name`, ID = `Emission ID` ) %>% select(longitude, latitude, SO2, name) %>% l_plot(linkingGroup = "Emissions", glyph= "ocircle", color = "orange", size = .$SO2, itemLabel = .$name, showItemLabels = TRUE, title = "Emissions", showGuides = TRUE) %T>% l_layer(world, color = "cornsilk", asSingleLayer = TRUE, label = "world map", index = "last") -> p