Data Management Systems Fall Semester 2020 Gustavo Alonso - - PowerPoint PPT Presentation

data management systems
SMART_READER_LITE
LIVE PREVIEW

Data Management Systems Fall Semester 2020 Gustavo Alonso - - PowerPoint PPT Presentation

Data Management Systems Fall Semester 2020 Gustavo Alonso Institute of Computing Platforms Administrative Introduction Department of Computer Science ETH Zrich Administrative Introduction 1 About us Prof. Gustavo Alonso Systems


slide-1
SLIDE 1

Data Management Systems

Fall Semester 2020 Administrative Introduction

Administrative Introduction 1

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

slide-2
SLIDE 2

About us

  • Prof. Gustavo Alonso
  • Data Management
  • Distributed Systems
  • Cloud and Data Center

Architecture

  • Dario Korolija
  • Hardware Acceleration
  • Dimitris Koutsoukos
  • Data processing & ML
  • Michal Wawrzoniak
  • Systems & Serverless
  • Systems Group (systems.ethz.ch)
  • 5 faculty
  • 12 senior researchers
  • 32 PhD students
  • Covers wide rage of system

related topics, including data processing and data management

  • Ample experience with database

system design, including products and open source

Administrative Introduction 2

slide-3
SLIDE 3

About the course

  • Lectures (CAB G11)
  • Wednesday 10:00 – 12:00
  • Fridays 08:00 – 09:00
  • Exercises (Zoom)
  • Fridays 09:00 – 10:00
  • Exam during examination season

(January)

  • Written (potentially Moodle)
  • Web page:

https://systems.ethz.ch/education/c

  • urses/2020-fall/data-management-

systems.html

  • Moodle:

https://moodle- app2.let.ethz.ch/course/view.php?id =13391

  • Stream/recording

https://video.ethz.ch/live/lectures.ht ml

Administrative Introduction 3

slide-4
SLIDE 4

Course organization

  • Lectures will be in the classroom, streamed live, and recorded
  • Exercises sessions will be per Zoom and recorded
  • Occasional presentation/talk in HG E 1.2 (streamed and recorded)
  • Attendance to CAB G11:
  • Capacity is 95 seats
  • Enrollment > Capacity
  • Need to know who will be attending in person and who online
  • Please do not show up unannounced

Administrative Introduction 4

slide-5
SLIDE 5

Course Materials

  • Unfortunately, there is no text book for what we are going to cover
  • Part of it can be found in standard database textbooks (next slide)
  • Part of it can be found online in manuals and product guides
  • Part of it is in articles and specialized books
  • Reading material and references will be provided in the slides and in

the web pages

  • Reading assignments (plus slides and what is covered in the lecture)

constitute the basis for the material that can be asked at the exam

Administrative Introduction 5

slide-6
SLIDE 6

References (database basics)

  • Check Bachelor level course (Data Management and Databases) as a

general reference and basic material:

  • http://www.ds3lab.com/dmdb-2020/
  • Database Systems, The Complete Book; Garcia-Molina, Ullman,

Widom; Prentice Hall

  • Database Systems Concepts; Silberschatz, Korth, Sudarshan; McGraw-

Hill

Administrative Introduction 6

slide-7
SLIDE 7

Exercises/Homework

  • Due to the safety measures imposed and the possibility that access to

ETH could be restricted at short notice, this semester we will not have a project or practical component in the course

  • Will focus instead on the design and on the research literature
  • More emphasis on understanding the architectures and how

everything fits together. More algorithms

  • To be fair and to avoid misunderstandings: course will not necessarily

be easier => reading and designing instead of programming

  • We will be regularly publishing exercises and homework for you to

test your knowledge and also practice for the exam.

Administrative Introduction 7

slide-8
SLIDE 8

Reading material

  • We will provide pointers to material that is either publicly accessible
  • r accessible through the ETH library (you will need to access it from

ETH’s network)

  • Please read the material as we go, do not leave everything for the end

and try to read it before the exam, it will not work

  • We are studying systems, there are many dependencies and

components that depend on other components and concepts. If you do not stay up-to-date with the course, in a couple of weeks it will become difficult to understand what we are talking about

  • There will be many references in the lectures to previous lectures

Administrative Introduction 8

slide-9
SLIDE 9

Motivation for the course

  • Data has become a precious commodity
  • Data Management has become a crucial component in IT
  • Data Management concepts and how to deal with large data

collections is an efficient and effective manner is fundamental knowledge any computer scientist should have

  • The course will provide a broad perspective on data management

systems, from traditional relational database engines to modern cloud data processing architectures

Administrative Introduction 9

slide-10
SLIDE 10

A brief history of data management

  • Tabulation of data and indexing has existed for many centuries
  • The modern era of data management started in the late 60’s and early

70’s with the relational model

Administrative Introduction 10

Edgar F. Codd CACM, Volume 13, Number 6, June 1970

slide-11
SLIDE 11

Before the relational model

  • Initially tapes, which only allowed sequential access (what today we

would call a scan)

  • Hard disks enabled random access, leading to new models (network

and hierarchical)

  • Hierarchical: entities and relationships are organized as a tree. It is very

inflexible as it support only 1-to-many relationships

  • Network: entities and relationships are organized as a graph
  • Eventually replaced with the relational model in databases
  • Hierarchical still in use in specialized systems and some data representations

(XML)

Administrative Introduction 11

slide-12
SLIDE 12

After relational model

  • Race to implement the relational model as clearly superior
  • IBM
  • Oracle
  • Ingres (Postgres)
  • Many decades of effort optimizing, tuning, and extending relational

database engines

  • In the 80’s, SQL becomes a standard, databases start using the

available networking to become distributed and parallel

  • In the 90’s, data warehouses, analytics, large scale databases

Administrative Introduction 12

slide-13
SLIDE 13

In this century

  • 2000’s: Internet era: stored procedures, cluster scale databases, start
  • f MapReduce frameworks in the cloud, NoSQL systems to support

Internet scale workloads

  • 2010’s: Back to basics: adding SQL and relational features to the

systems of the 2000’s (Google’s Spanner, Spark, NoSQL), multi-core, main memory databases, OLAP+OLTP systems; Diversification: graph databases

  • 2020’s: Hardware acceleration, cloud native engines, very large scales,

machine learning, …

Administrative Introduction 13

slide-14
SLIDE 14

History in perspective

  • Software systems have a very long life:
  • Important to understand where they come from and what problem they were

solving

  • Adapted to new technologies and hardware as time goes on
  • We will cover not only the mechanisms but also the motivation

behind the designs, the initial problem they addressed, and how they have changed as technology evolves

  • In computer science and IT, ideas tend to be recycled every X years

instead of being forgotten …

Administrative Introduction 14

slide-15
SLIDE 15

What you need to know

  • Basic knowledge of databases
  • Data Modeling and Data Bases (bachelor course, D-INFK)
  • SQL
  • Basic knowledge of computer architecture and systems
  • Virtual memory
  • Basic programming
  • Data structures and algorithms
  • Systems programming

Administrative Introduction 15

slide-16
SLIDE 16

Objectives of the course

  • Cover all key aspects of data management systems
  • Storage
  • Optimization
  • Architectures
  • Concurrency control
  • Algorithms and data structures
  • Modern approaches (data centers and cloud)
  • Provide a solid understanding of how systems work and the design

decisions behind the architectures

  • Provide the vocabulary and system understanding to be able to

engage in the design of data management systems

Administrative Introduction 16

slide-17
SLIDE 17

What you should be able to do at the end

  • Be familiar with data management systems architecture
  • Understand the trade-offs in the designs and what works when
  • Understand the workloads and applications
  • Understand the architectural differences between server based and

data center/cloud based systems

  • Read product manuals and research papers describing the

architecture of database systems with a solid understanding of what is being done and why

  • Be able to put data management in context of large IT projects

Administrative Introduction 17