Example exploration Volcanic eruptions R.W. Oldford Global - - PowerPoint PPT Presentation
Example exploration Volcanic eruptions R.W. Oldford Global - - PowerPoint PPT Presentation
Example exploration Volcanic eruptions R.W. Oldford Global Volcanism Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava),
Global Volcanism
Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us).
Global Volcanism
Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes.
Global Volcanism
Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes. Question: What are the target and study populations here?
Global Volcanism
Problem Volcanoes are a natural phenomenon caused by ruptures in the crust of planets like Earth, which have a molten core. When they erupt, they spew molten material (lava), rocks, hot gases, and ash. They can cause change in the planet, from affecting humans and their activities to causing changes in land mass, weather, and even climate. Interest lies in better understanding volcanism on Earth (and possibly other Earth-like planets). By studying known volcanoes, their eruptions, and physical characteristics, we hope to uncover interesting patterns in volcanic activity that will help us better understand them as a natural phenomenon and an impact on the planet (including us). Plan Volcanoes that have affected humans in the past will have been mentioned in the historical records. Plus geologists have now been studying volcanoes for some time. It may even be possible to infer the characteristics of historical eruptions from physical measurements on site. The plan would be to collect all possible measurements on known volcanoes from sources as reliably as we can, with as many physical measurements as possible. Of interest, is to have a much better understanding of volcanic activity from the available data on the physical characteristics of volcanoes. Question: What are the target and study populations here? What are the units? That is, what would we be taking measurements on?
Global Volcanism
Data The Smithsonian’s “Global Volcanism Program” (GVP) at http://volcano.si.edu/search_eruption.cfm contains several data sources on Volcanoes.
Global Volcanism
Data The Smithsonian’s “Global Volcanism Program” (GVP) at http://volcano.si.edu/search_eruption.cfm contains several data sources on Volcanoes. The following files were downloaded March 28, 2017 from this site:
◮ GVP_Volcano_List.csv giving a list of volcanoes, their geology, and the nearby
human population
◮ GVP_Eruption_Eruptions.csv identifying volcanoes and the magnitude of their
eruptions
◮ GVP_Eruption_Events.csv containing information describing the types of
eruptions
◮ GVP_Emission_Activity.csv and GVP_Emission_Details.csv giving
information on the sulphur dioxide content of the emissions from many modern volcanic eruptions
◮ GVP_Volcano_List_References.csv and GVP_Eruption_References.csv
containing information on literature sources for the volcanoes and their eruptions.
(Data source suggested from a poster by Kelly McConville: https://www.causeweb.org/cause/sites/default/files/ecots/ecots16/posters/Kelly_McConvilleAre_Volcanic_Eruptions_Increasing.pdf)
Global Volcanism
Data Look first at the eruptions:
(eruptions <- read_csv("GVP_Eruption_Eruptions.csv"))
Global Volcanism
Data Look first at the eruptions:
(eruptions <- read_csv("GVP_Eruption_Eruptions.csv")) ## # A tibble: 11,108 x 24 ## `Volcano Number` `Volcano Name` `Eruption Numbe~ `Eruption Categ~ ## <int> <chr> <int> <chr> ## 1 282090 Kirishimayama 22257 Confirmed Erupt~ ## 2 352090 Sangay 22259 Confirmed Erupt~ ## 3 267020 Karangetang 22256 Confirmed Erupt~ ## 4 283120 Kusatsu-Shira~ 22258 Confirmed Erupt~ ## 5 343100 San Miguel 22251 Confirmed Erupt~ ## 6 273030 Mayon 22250 Confirmed Erupt~ ## 7 251002 Kadovar 22246 Confirmed Erupt~ ## 8 272020 Kanlaon 22249 Confirmed Erupt~ ## 9 264020 Agung 22241 Confirmed Erupt~ ## 10 261230 Dempo 22248 Confirmed Erupt~ ## # ... with 11,098 more rows, and 20 more variables: `Area of ## # Activity` <chr>, VEI <int>, `VEI Modifier` <chr>, `Start Year ## # Modifier` <chr>, `Start Year` <int>, `Start Year Uncertainty` <int>, ## # `Start Month` <int>, `Start Day Modifier` <chr>, `Start Day` <int>, ## # `Start Day Uncertainty` <int>, `Evidence Method (dating)` <chr>, `End ## # Year Modifier` <chr>, `End Year` <int>, `End Year Uncertainty` <chr>, ## # `End Month` <int>, `End Day Modifier` <chr>, `End Day` <int>, `End Day ## # Uncertainty` <int>, Latitude <dbl>, Longitude <dbl>
Global Volcanism - Data
Variates are
names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"
Global Volcanism - Data
Variates are
names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude" Which, if any of these, might be a primary key?
Global Volcanism - Data
Variates are
names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude" Which, if any of these, might be a primary key? Perhaps Eruption Number? eruptions %>% count(`Eruption Number`) %>% filter(n > 1) ## # A tibble: 1 x 2 ## `Eruption Number` n ## <int> <int> ## 1 21100 3 Nope!
Important note: The name of variables have backquotes as in Eruption Number and not single or double quotes, as in ‘Eruption Number’ or “Eruption Number”! Try the above with “Eruption Number” (or even “fubar”!!).
Global Volcanism - Data
What is available about Eruption Number 21100?
eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 2 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 3 Craters of the Moon Confirmed Eruption 43.4
- 114.
Global Volcanism - Data
What is available about Eruption Number 21100?
eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 2 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 3 Craters of the Moon Confirmed Eruption 43.4
- 114.
Well, that’s interesting . . . ?
Global Volcanism - Data
What is available about Eruption Number 21100?
eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 2 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 3 Craters of the Moon Confirmed Eruption 43.4
- 114.
Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds.
Global Volcanism - Data
What is available about Eruption Number 21100?
eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 2 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 3 Craters of the Moon Confirmed Eruption 43.4
- 114.
Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds. It is located along the north border of the Snake River Plain in Idaho. It was declared a national monument by President Calvin Coolidge in 1924. The monument contains 55 cones with lava flows and 14 fissures, many of which have spatter cones. . . .
Global Volcanism - Data
What is available about Eruption Number 21100?
eruptions %>% filter(`Eruption Number` == 21100) %>% select(`Volcano Name`, `Eruption Category`, `Latitude`, `Longitude`) ## # A tibble: 3 x 4 ## `Volcano Name` `Eruption Category` Latitude Longitude ## <chr> <chr> <dbl> <dbl> ## 1 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 2 Craters of the Moon Confirmed Eruption 43.4
- 114.
## 3 Craters of the Moon Confirmed Eruption 43.4
- 114.
Well, that’s interesting . . . ? Turns out that: “Craters of the Moon is a large lava flow field with cinder cones, spatter cones, lava tubes, volcanic bombs and tree molds. It is located along the north border of the Snake River Plain in Idaho. It was declared a national monument by President Calvin Coolidge in 1924. The monument contains 55 cones with lava flows and 14 fissures, many of which have spatter cones. . . . The Great Rift is a line of cones and lava vents that runs for 13 miles (21 km) through the monument. Fissures from this rift are the vents for the youngest lavas in the area. The youngest Craters of the Moon lavas are approximately 1500 to 2000 years old. These lavas are basaltic.”
From http://volcano.oregonstate.edu/craters-moon
Global Volcanism - Data
What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE
Global Volcanism - Data
What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE No combination of the first 8 variables provide a primary key.
Global Volcanism - Data
What about Volcano Number and Volcano Name? eruptions %>% count(`Volcano Number`, `Volcano Name`) %>% filter(n > 1) %>% nrow()==0 ## [1] FALSE No combination of the first 8 variables provide a primary key. eruptions %>% count(`Eruption Number`, `Start Year`) %>% filter(n > 1) %>% nrow()==0 ## [1] TRUE is a possibility.
Global Volcanism - Data
Look again at the variate names:
names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"
What does each of these record?
Global Volcanism - Data
Look again at the variate names:
names(eruptions) ## [1] "Volcano Number" "Volcano Name" ## [3] "Eruption Number" "Eruption Category" ## [5] "Area of Activity" "VEI" ## [7] "VEI Modifier" "Start Year Modifier" ## [9] "Start Year" "Start Year Uncertainty" ## [11] "Start Month" "Start Day Modifier" ## [13] "Start Day" "Start Day Uncertainty" ## [15] "Evidence Method (dating)" "End Year Modifier" ## [17] "End Year" "End Year Uncertainty" ## [19] "End Month" "End Day Modifier" ## [21] "End Day" "End Day Uncertainty" ## [23] "Latitude" "Longitude"
What does each of these record? Identity, location, time, plus some like Eruption Category, Area of Activity, VEI, VEI modifier, and Evidence Method (dating) which may require a little research.
Global Volcanism - Data
For example, what is VEI?
Global Volcanism - Data
For example, what is VEI? Volcanic Explosivity Index
Global Volcanism - Data
For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.”
Global Volcanism - Data
For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.” Which sure doesn’t sound well-defined.
Global Volcanism - Data
For example, what is VEI? Volcanic Explosivity Index Wikipedia: “The Volcanic Explosivity Index (VEI) is a relative measure of the explosiveness of volcanic eruptions. It was devised by Chris Newhall of the United States Geological Survey and Stephen Self at the University of Hawaii in 1982. “Volume of products, eruption cloud height, and qualitative observations (using terms ranging from”gentle" to “mega-colossal”) are used to determine the explosivity value. “The scale is open-ended with the largest volcanoes in history given magnitude 8. A value of 0 is given for non-explosive eruptions, defined as less than 10,000 m3 (350,000 cu ft) of tephra ejected; and 8 representing a mega-colossal explosive eruption that can eject 1.0×1012 m3 (240 cubic miles) of tephra and have a cloud column height of over 20 km (66,000 ft). “The scale is logarithmic, with each interval on the scale representing a tenfold increase in observed ejecta criteria, with the exception of between VEI-0, VEI-1 and VEI-2.” Which sure doesn’t sound well-defined. Sounds like it includes some subjective assessment.
Global Volcanism - Data
The variate VEI gives a numerical value to how explosive an eruption is, and is related to an estimate of the volume
- f material ejected.
Roughly, it appears to be a logarithmic scale (base 10) with each value above 2 representing a ten fold increase in the volume of material ejected. Below 2, however, this is not the case. VEI of 0 is non-explosive (< 1 × 104m3 of material ejected), VEI of 1 has between 104m3 and 106m3 material ejected, and VEI of 2 has between 106m3 and 107m3. See https://geology.com/stories/13/volcanic-explosivity-index/ for detail.
Image source http://volcanoes.usgs.gov/Products/Pglossary/vei.htmlAccessed
eruptions - Data
A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106
eruptions - Data
A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106 summary(factor(eruptions$`Area of Activity`))[1:10] ## Naka-dake Bromo Ngauruhoe Mihara-yama Ohachi ## 172 62 52 47 45 ## Anak Krakatau Central Crater Okama Tarumai NE rift zone ## 42 39 37 35 32
eruptions - Data
A summary of some of the non-obvious (and non-numeric) variates: summary(factor(eruptions$`Eruption Category`)) ## Confirmed Eruption Discredited Eruption Uncertain Eruption ## 9838 164 1106 summary(factor(eruptions$`Area of Activity`))[1:10] ## Naka-dake Bromo Ngauruhoe Mihara-yama Ohachi ## 172 62 52 47 45 ## Anak Krakatau Central Crater Okama Tarumai NE rift zone ## 42 39 37 35 32 summary(factor(eruptions$`Evidence Method (dating)`))[1:9] ## Anthropology Ar/Ar Dendrochronology ## 38 26 15 ## Fission track Historical Observations Hydration Rind ## 3 6326 11 ## Hydrophonic Ice Core Lichenometry ## 68 116 2
Global Volcanism - Analysis
A summary of each numeric variate:
eruptions %>% select_if(is.numeric) %>% summary ## Volcano Number Eruption Number VEI Start Year ## Min. :210010 Min. :10001 Min. :0.00 Min. :-10450 ## 1st Qu.:263310 1st Qu.:12796 1st Qu.:1.00 1st Qu.: 650 ## Median :290056 Median :15604 Median :2.00 Median : 1846 ## Mean :300374 Mean :15614 Mean :1.95 Mean : 617 ## 3rd Qu.:343030 3rd Qu.:18397 3rd Qu.:2.00 3rd Qu.: 1949 ## Max. :600000 Max. :22260 Max. :7.00 Max. : 2018 ## NA's :2911 NA's :1 ## Start Year Uncertainty Start Month Start Day ## Min. : 1.0 Min. : 0.000 Min. : 0.000 ## 1st Qu.: 50.0 1st Qu.: 0.000 1st Qu.: 0.000 ## Median : 100.0 Median : 1.000 Median : 0.000 ## Mean : 292.7 Mean : 3.436 Mean : 6.966 ## 3rd Qu.: 200.0 3rd Qu.: 7.000 3rd Qu.:15.000 ## Max. :14000.0 Max. :12.000 Max. :31.000 ## NA's :9024 NA's :184 NA's :187 ## Start Day Uncertainty End Year End Month End Day ## Min. : 1 Min. :-475 Min. : 0.000 Min. : 0.00 ## 1st Qu.: 15 1st Qu.:1894 1st Qu.: 3.000 1st Qu.: 3.00 ## Median : 15 Median :1956 Median : 6.000 Median :15.00 ## Mean : 64 Mean :1916 Mean : 6.194 Mean :13.25 ## 3rd Qu.: 45 3rd Qu.:1991 3rd Qu.: 9.000 3rd Qu.:21.00 ## Max. :730 Max. :2018 Max. :12.000 Max. :31.00 ## NA's :10247 NA's :6846 NA's :6849 NA's :6852 ## End Day Uncertainty Latitude Longitude ## Min. : 1.00 Min. :-77.530 Min. :-179.97 ## 1st Qu.: 5.00 1st Qu.: -6.102 1st Qu.: -77.66 ## Median : 15.00 Median : 18.130 Median : 55.71 ## Mean : 23.88 Mean : 16.848 Mean : 31.48 ## 3rd Qu.: 15.00 3rd Qu.: 40.821 3rd Qu.: 139.39 ## Max. :365.00 Max. : 85.608 Max. : 179.58 ## NA's :10416
eruptions - Some exploratory analysis
In RStudio all contents can be seen using the View() function. This gives a spreadsheet view of the data that is interactive. For example, it allows sorting, searching, and filtering. View(eruptions)
eruptions - Some exploratory analysis
We might begin with how many eruptions have been recorded for each volcano:
eruptions %>% group_by(`Volcano Number`) %>% summarize(count = n()) %>% ggplot(aes(x = `Volcano Number`, y = count)) + geom_point(col="steelblue", alpha=0.5) + ggtitle("Number of Eruptions") -> gp gp
50 100 150 200 250 2e+05 3e+05 4e+05 5e+05 6e+05
‘Volcano Number‘ count
Number of Eruptions Some volcanoes have tens and even hundreds of recorded observations. There also seems to be a large gap in volcano numbers. Beyond this gap there are very few (possibly only 1?) volcano numbers. If there is only one it is recording about 75 eruptions.
eruptions - Some exploratory analysis
How many volcanoes are in the large volcano numbers?
eruptions - Some exploratory analysis
How many volcanoes are in the large volcano numbers?
eruptions %>% filter(`Volcano Number` > 500000) %>% group_by(`Volcano Number`) %>% summarize(count = n()) ## # A tibble: 1 x 2 ## `Volcano Number` count ## <int> <int> ## 1 600000 77 Just the one it seems (with 77 eruptions).
eruptions - Some exploratory analysis
How many volcanoes are in the large volcano numbers?
eruptions %>% filter(`Volcano Number` > 500000) %>% group_by(`Volcano Number`) %>% summarize(count = n()) ## # A tibble: 1 x 2 ## `Volcano Number` count ## <int> <int> ## 1 600000 77 Just the one it seems (with 77 eruptions). And what do we know about it? eruptions %>% filter(`Volcano Number` == 600000) %>% mutate(name = factor(`Volcano Name`), cat = factor(`Eruption Category`), evid = factor(`Evidence Method (dating)`)) %>% select(name, VEI, cat, evid) %>% summary() ## name VEI cat ## Unknown Source:77 Min. :6 Confirmed Eruption:77 ## 1st Qu.:6 ## Median :6 ## Mean :6 ## 3rd Qu.:6 ## Max. :6 ## NA's :76 ## evid ## Dendrochronology : 5 ## Historical Observations: 1 ## Ice Core :71 ## ## ##
eruptions - Some exploratory analysis
On the curiously large gap in volcano numbers with no numbers beginning with either 4
- r 5.
From the website: "The International Association for Volcanology and Chemistry of Earth’s Interior (IAVCEI), The World Organization of Volcano Observatories (WOVO), and the Global Volcano Model (GVM) have sanctioned GVP to assign numbers and primary names to the world’s volcanoes. The purpose of the numbers is to prevent ambiguity regarding the name and location of volcanoes that may have non-unique names, or that are known by multiple
- names. The original VNums were based on a system developed in the 1950’s
for the IAVCEI Catalog of Active Volcanoes of the World (CAVW). GVP policy had been to embed significant geographical, historical, and age information in the numbers. As a result GVP often changed VNums, most frequently to accommodate newly recognized volcanoes in a particular geographical region, which over time undermined the goal of preventing ambiguity. . . . numbers have been added for subfeatures associated with each
- volcano. None of the new numbers start with 0 or 1 to avoid confusion with
the legacy system. While a connection remains to the older system, the geographic link to CAVW regions and subregions is no longer mandatory. None of the new numbers start with 0 or 1 to avoid confusion with the legacy system." The website also says that the digits encode geographic information.
eruptions - Some exploratory analysis
How does the number of eruptions behave over time?
eruptions %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") ## Warning: Removed 1 rows containing non-finite values (stat_bin).
1000 2000 3000 4000 −8000 −4000
‘Start Year‘ count
Looks like the number of eruptions over time is increasing. Likely this is more because
- f increasing recordings than a more volcanic world.
eruptions - Some exploratory analysis
How does the number of eruptions behave over time? What if we separate the counts by Eruption Category?
eruptions %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") + facet_wrap(~`Eruption Category`) ## Warning: Removed 1 rows containing non-finite values (stat_bin).
Confirmed Eruption Discredited Eruption Uncertain Eruption −8000 −4000 −8000 −4000 −8000 −4000 1000 2000 3000
‘Start Year‘ count
Still looks like the number of eruptions over time is increasing, especially for confirmed eruptions.
eruptions - Some exploratory analysis
Just look at confirmed eruptions, and see how they depend upon evidence.
eruptions %>% filter(`Eruption Category` == "Confirmed Eruption") %>% ggplot(aes(x = `Start Year`)) + geom_histogram(fill = "steelblue", col = "white") + facet_wrap(~ `Evidence Method (dating)`, scales = "free_y")
NA Tephrochronology Thermoluminescence Uncertain Uranium−series Varve Count Potassium−Argon Radiocarbon (corrected) Radiocarbon (uncorrected) Seismicity Surface Exposure Hydration Rind Hydrophonic Ice Core Lichenometry Magnetism Anthropology Ar/Ar Dendrochronology Fission track Historical Observations −8000 −4000 −8000 −4000 −8000 −4000 −8000 −4000 −8000 −4000 1000 2000 3000 3 6 9 2 4 6 10 20 30 40 50 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 1 2 3 2 4 6 10 20 30 20 40 60 80 0.00 0.25 0.50 0.75 1.00 1 2 3 20 40 60 25 50 75 0.0 0.5 1.0 1.5 2.0 0.0 2.5 5.0 7.5 0.0 0.5 1.0 1.5 2.0 1 2 3 20 40 60 80 10 20 30
‘Start Year‘ count
Even after separating out the different sources of evidence, it looks like the number of eruptions over time is increasing.
eruptions - Some exploratory analysis
Restrict data to the confirmed eruptions.
eruptions %>% filter(`Eruption Category` == "Confirmed Eruption") -> confirmed_eruptions
Look at VEI, the eruption’s “Volcanic Explosivity Index” of how explosive an eruption is (related to the volume of material ejected).
confirmed_eruptions %>% ggplot(aes(x = VEI)) + geom_bar(fill = "steelblue") + facet_wrap(~`VEI Modifier`) ## Warning: Removed 2277 rows containing non-finite values (stat_count).
C P NA ? ^ + 2 4 6 2 4 6 2 4 6 1000 2000 3000 1000 2000 3000
VEI count
It’s not clear from the data what each of these (non-NA) modifiers mean: ? likely means uncertainty, for + and ˆ either might indicate the valus is a lower bound, and P and C remain a mystery (a guess perhaps is type of material?). No information was found on the source website.
eruptions - Some exploratory analysis
VEI over time (Start Year)
confirmed_eruptions %>% ggplot(aes(x = `Start Year`, y = VEI)) + geom_point(col = "steelblue", alpha = 0.5) + facet_wrap(~`VEI Modifier`) ## Warning: Removed 2277 rows containing missing values (geom_point).
C P NA ? ^ + −8000 −4000 −8000 −4000 −8000 −4000 2 4 6 2 4 6
‘Start Year‘ VEI
eruptions - Some exploratory analysis
Average VEI over time (Start Year)
confirmed_eruptions %>% group_by(`Start Year`) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = `Start Year`, y = VEIAve)) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 404 rows containing missing values (geom_point).
2 4 6 −8000 −4000
‘Start Year‘ VEIAve
Question: What might this plot suggest about the study population?
eruptions - Some exploratory analysis
Average VEI over time (Era of 50 years)
confirmed_eruptions %>% mutate(Ranges = cut_interval(`Start Year`, length = 50), Era = as.numeric(Ranges)) %>% group_by(Era) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = Era, y = VEIAve)) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth()
2 4 6 50 100 150 200 250
Era VEIAve
Question: What might this plot suggest about study error in this case? How might you reduce it?
eruptions - Some exploratory analysis
Each eruption has information on when it started and when it ended (year, month, day, modifier, uncertainty). This should give us some information on each eruption’s duration and hence also of its importance. To determine durations, there are a number of functions in base R but here we will introduce some from the lubridate package.
library(lubridate) confirmed_eruptions %>% mutate(start = ymd(paste(`Start Year`, `Start Month`, `Start Day`, sep = "-")), end = ymd(paste(`End Year`, `End Month`, `End Day`, sep = "-")), duration = end - start + 1 ) %>% ggplot(aes(VEI, duration)) + geom_point(col = "steelblue", alpha = 0.5)
eruptions - Some exploratory analysis
## Warning: 5416 failed to parse. ## Warning: 6594 failed to parse. ## Warning: Removed 6731 rows containing missing values (geom_point).
25000 50000 75000 2 4 6
VEI duration
Note warnings – perhaps dates are not as clean as we might like.
eruptions - Some exploratory analysis
Try just duration in years
confirmed_eruptions %>% mutate(start = `Start Year`, end = `End Year`, duration = end - start + 1 ) %>% ggplot(aes(VEI, duration)) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 5968 rows containing missing values (geom_point).
100 200 300 2 4 6
VEI duration
Some eruptions have lasted 100s of years!
eruptions - Some exploratory analysis
Which volcanoes have been erupting for hundreds of years?
confirmed_eruptions %>% mutate(start = `Start Year`, end = `End Year`, duration = end - start + 1 ) %>% filter(duration > 100) %>% select(`Volcano Number`, `Volcano Name`, `Evidence Method (dating)`, VEI, start, end, duration, Longitude, longEruptions longEruptions ## # A tibble: 4 x 9 ## `Volcano Number` `Volcano Name` `Evidence Metho~ VEI start end ## <int> <chr> <chr> <int> <int> <int> ## 1 257100 Yasur Historical Obse~ 3 1774 2018 ## 2 352090 Sangay Historical Obse~ 3 1728 1916 ## 3 211040 Stromboli Historical Obse~ 3 1558 1857 ## 4 384010 Fogo Historical Obse~ 1 1500 1761 ## # ... with 3 more variables: duration <dbl>, Longitude <dbl>, ## # Latitude <dbl>
eruptions - Some exploratory analysis
Which volcanoes have been erupting for hundreds of years?
Volcano Number Volcano Name Evidence Method (dating) VEI start end duration Longitude Latitude 257100 Yasur Historical Observations 3 1774 2018 245 169.447
- 19.532
352090 Sangay Historical Observations 3 1728 1916 189
- 78.341
211040 Stromboli Historical Observations 3 1558 1857 300 15.213 38.789 384010 Fogo Historical Observations 1 1500 1761 262
- 24.350
14.950 # from the maps package world <- maps::map(fill = TRUE, col = "cornsilk", plot = FALSE) p <- with(longEruptions, l_plot(Longitude, Latitude, color = "red", glyph = "ccircle", size = 5* VEI, itemLabel = paste0(" ", `Volcano Name`, ",
- No. ", `Volcano Number`, "\n",
start, "-", end, "\n", duration, " years"), showItemLabels = TRUE, showGuides = TRUE, showLabels = FALSE)) l_layer(p, world, color = "cornsilk", asSingleLayer = TRUE, label = "world map", index = "last") l_scaleto_world(p) plot(p)
eruptions - Some exploratory analysis
Which volcanoes have been erupting for hundreds of years?
Volcano Number Volcano Name Evidence Method (dating) VEI start end duration Longitude Latitude 257100 Yasur Historical Observations 3 1774 2018 245 169.447
- 19.532
352090 Sangay Historical Observations 3 1728 1916 189
- 78.341
211040 Stromboli Historical Observations 3 1558 1857 300 15.213 38.789 384010 Fogo Historical Observations 1 1500 1761 262
- 24.350
14.950 ## loon layer "world map" of type polygons of plot .l0.plot ## [1] "layer0"
eruptions - Some exploratory analysis
library(lubridate) confirmed_eruptions %>% select(`Start Year`, `Start Month`, `Start Day`, `End Year`, `End Month`, `End Day`) %>% summary() ## Start Year Start Month Start Day End Year ## Min. :-10450.0 Min. : 0.000 Min. : 0.000 Min. :-475 ## 1st Qu.: 180.0 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:1894 ## Median : 1833.0 Median : 1.000 Median : 0.000 Median :1954 ## Mean : 472.5 Mean : 3.405 Mean : 7.007 Mean :1915 ## 3rd Qu.: 1947.0 3rd Qu.: 7.000 3rd Qu.:15.000 3rd Qu.:1990 ## Max. : 2018.0 Max. :12.000 Max. :31.000 Max. :2018 ## NA's :1 NA's :182 NA's :183 NA's :5897 ## End Month End Day ## Min. : 0.000 Min. : 0.00 ## 1st Qu.: 3.000 1st Qu.: 4.00 ## Median : 6.000 Median :15.00 ## Mean : 6.202 Mean :13.33 ## 3rd Qu.: 9.000 3rd Qu.:21.00 ## Max. :12.000 Max. :31.00 ## NA's :5899 NA's :5900 Lots of missing values. Would have to impute some values to continue.
eruptions - Some exploratory analysis
Given the difficulties with older volcanic information in this data set, we might focus on the most recent and higher quality dates. The Global Volcanism Program points out that satellite coverage able to monitor volcanic gas emissions began in 1978. So we could focus on this range of dates.
library(lubridate) confirmed_eruptions %>% filter(`Start Year` >= 1978) %>% mutate(start = ymd(paste(`Start Year`, `Start Month`, `Start Day`, sep = "-")), end = ymd(paste(`End Year`, `End Month`, `End Day`, sep = "-")), duration = end - start + 1 ) -> duration_eruptions ## Warning: 100 failed to parse.
eruptions - Some exploratory analysis
Given the difficulties with older volcanic information in this data set, we might focus on the most recent and higher quality dates. The Global Volcanism Program points out that satellite coverage able to monitor volcanic gas emissions began in 1978. So we could focus on this range of dates.
tcuts <- 365.25 * c(1/12, 1, 5, 10, 20) # time cuts for plot ncuts <- length(tcuts) cols <- colorRamp(c("green", "red") )((0:(ncuts-1))/(ncuts-1)) cols <- rgb(cols/255) duration_eruptions %>% ggplot(aes(VEI, duration)) + geom_hline(yintercept = tcuts, lty = 1:ncuts, col = cols) + geom_point(col = "steelblue", alpha = 0.5) ## Warning: Removed 136 rows containing missing values (geom_point).
5000 10000 2 4 6
VEI duration
eruptions - Some exploratory analysis
How about the average VEI per year since 1978?
confirmed_eruptions %>% filter(`Start Year` >= 1978) %>% group_by(`Start Year`) %>% summarise(VEIAve = mean(VEI, na.rm = TRUE)) %>% ggplot(aes(x = `Start Year`, y = VEIAve)) + ylim(0, 4) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess")
1 2 3 4 1980 1990 2000 2010 2020
‘Start Year‘ VEIAve
Comments? Note that by using Start Year we are only looking at new eruptions in this time period.
eruptions - Some exploratory analysis
Number of new eruptions per year since 1978:
confirmed_eruptions %>% filter(`Start Year` >= 1978 ) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1975, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")
20 40 60 1980 1990 2000 2010 2020
‘Start Year‘ count World wide
Number of new eruptions Comments?
eruptions - Some exploratory analysis
Number of new eruptions per year since 1978:
confirmed_eruptions %>% filter(`Start Year` >= 1978 ) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1975, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")
20 40 60 1980 1990 2000 2010 2020
‘Start Year‘ count World wide
Number of new eruptions Comments? Right most point is based only on a few months. Should have filtered Start Year < 2018 as well.
eruptions - Some exploratory analysis
Alternatively: Number of new eruptions per year since 1950 and before 2018:
confirmed_eruptions %>% filter(`Start Year` >= 1950 & `Start Year` < 2018) %>% group_by(`Start Year`) %>% summarise(count = n()) %>% ggplot(aes(x = `Start Year`, y = count)) + ylim(0, 60) + xlim(1950, 2020) + geom_point(col = "steelblue", alpha = 0.5) + geom_smooth(method = "loess") + ggtitle("Number of new eruptions", subtitle = "World wide")
20 40 60 1960 1980 2000 2020
‘Start Year‘ count World wide
Number of new eruptions Looks like about 30-40 new eruptions per year.
eruptions - Some exploratory analysis
There are lots of other plots we might look at: duration of each eruption
confirmed_eruptions %>% filter(`End Year` >= 1978) %>% ggplot(aes(x = `Start Year`, y = `Eruption Number`)) + xlim(1970, 2020) + aes(xend = `End Year`, yend = `Eruption Number`) + geom_segment(col = "steelblue", alpha = 0.5) + geom_vline(xintercept = c(1978, 2018), col = "grey", alpha = 0.5) + ggtitle("Duration of eruptions", subtitle = "Ending in 1978 or later")
10000 12500 15000 17500 20000 22500 1970 1980 1990 2000 2010 2020
‘Start Year‘ ‘Eruption Number‘ Ending in 1978 or later
Duration of eruptions
eruptions - Some exploratory analysis
There are lots of other plots we might look at: VEI and duration of each eruption
confirmed_eruptions %>% filter(`End Year` >= 1978) %>% mutate(jitter_VEI = jitter(VEI, factor = 2)) %>% ggplot(aes(x = `Start Year`, y = `jitter_VEI`)) + xlim(1970, 2020) + aes(xend = `End Year`, yend = `jitter_VEI`) + geom_segment(col = "steelblue", alpha = 0.5) + geom_vline(xintercept = c(1978, 2018), col = "grey", alpha = 0.5) + ylab("VEI (jittered)") + ggtitle("Duration of eruptions", subtitle = "Ending in 1978 or later")
2 4 6 1970 1980 1990 2000 2010 2020
‘Start Year‘ VEI (jittered) Ending in 1978 or later
Duration of eruptions
eruptions - Some exploratory analysis
Interactive loon maps of all known locations:
# load the maps library before tidyverse, then access the map function # from the maps package world <- maps::map(fill=TRUE, col = "cornsilk", plot = FALSE) library(loon) confirmed_eruptions %>% group_by(`Volcano Number`) %>% summarize(longitude = first(Longitude), latitude = first(Latitude), count = n(), name = first(`Volcano Name`)) %>% select(longitude, latitude, count, name) %>% l_plot(linkingGroup = "Eruptions", glyph= "ocircle", color = "red", size = .$count, itemLabel = .$name, showItemLabels = TRUE, title = "Volcanos with confirmed eruptions", showGuides = TRUE, showLabels = FALSE) %T>% l_layer(world, color = "cornsilk", asSingleLayer = TRUE, label = "world map", index = "last") -> p plot(p)
eruptions - Some exploratory analysis
Interactive loon maps of all known locations:
readr - importing “rectangular” data with read_csv()
The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).
# The events events <- read_csv("GVP_Eruption_Events.csv")
readr - importing “rectangular” data with read_csv()
The same Smithsonian site on “Global Volcanism Program” at http://volcano.si.edu/search_eruption.cfm contains other .csv files on volcanic eruptions (downloaded the same time).
# The volcanoes (volcano <- read_csv("GVP_Volcano_List.csv")) ## # A tibble: 963 x 26 ## `Volcano Number` `Volcano Name` `Primary Volcan~ `Last Eruption ~ ## <int> <chr> <chr> <chr> ## 1 283001 Abu Shield(s)
- 6850
## 2 355096 Acamarachi Stratovolcano Unknown ## 3 342080 Acatenango Stratovolcano(e~ 1972 ## 4 213004 Acigol-Nevseh~ Caldera
- 2080