USC "Entrepreneurship" -- Information Modeling - - PowerPoint PPT Presentation

usc entrepreneurship information modeling
SMART_READER_LITE
LIVE PREVIEW

USC "Entrepreneurship" -- Information Modeling - - PowerPoint PPT Presentation

USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 60, Research 1 USC Entrepreneurship / Information Modeling (P. Baumann) Roadmap ! Data


slide-1
SLIDE 1

USC Entrepreneurship / Information Modeling (P. Baumann)

USC "Entrepreneurship"

  • - Information Modeling

Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel:

  • 3178
  • ffice:

room 60, Research 1

slide-2
SLIDE 2

2 USC Entrepreneurship / Information Modeling (P. Baumann)

Roadmap

! Data Modeling ! Relational Databases ! OLAP & Data Warehousing

slide-3
SLIDE 3

3 USC Entrepreneurship / Information Modeling (P. Baumann)

Data Modeling

! So we have databases for maintaining & searching data ! ...but these need some structure

  • Ex: in a letter, distinguish name of sender / name of recipient

! What do we need to express?

  • "Sales rep X (phone Y, email Z) is contact person for customer C"
  • Hm... Objects would be fine to express, plus relationships between objects;

and we need properties

! Entity-Relationship Model (ER-M) [Peter Chen, 1970s]

  • Entities, relationships, attributes
slide-4
SLIDE 4

4 USC Entrepreneurship / Information Modeling (P. Baumann)

[Mia Mae] 123-456-XY 'Mia Mae' 2 [Mia Mae] 123-456-XY 'Mia Mae' 2 [Mia Mae] 123-456-XY 'Mia Mae' 2 [Mia Mae] 265-352-XY 'Mia Mae' 2

Entity-Relationship Model: Basics

! Entity: Real-world object distinguishable from other objects

  • Simple attribute values

(strings, numbers) Employees ssn name lot

! Entity set: collection of similar entities

  • All entities in an entity set

have same set of attributes

  • unique, identifying key

[John Doe] 314-159-XY 'John Doe' 5

slide-5
SLIDE 5

5 USC Entrepreneurship / Information Modeling (P. Baumann)

ER Model Basics (Contd.)

! Relationship: association among two or more entities

  • Ex: "Attishoo works_in Pharmacy_department"

! Relationship Set: Collection of similar relationships

  • each relationship instance in R involves entities e1∈E1, ..., en∈En

lot name Employees ssn since Works_In dname budget did Departments subor- dinate Reports_To lot name Employees super- visor ssn

slide-6
SLIDE 6

6 USC Entrepreneurship / Information Modeling (P. Baumann)

! A ISA B: every A entity is also a B entity ("A inherits from B")

  • A entities have attributes like B entities have,

plus maybe more

  • A is called subclass, B superclass

! Purpose:

information reuse (tidy up schema)

ISA (`is a’) Hierarchies

Contract_Emps hourly_wages ISA Hourly_Emps contractid hours_worked name ssn Employees lot

slide-7
SLIDE 7

7 USC Entrepreneurship / Information Modeling (P. Baumann)

Summary: ER

! ER model popular for database design ! …but even better: UML (Unified Modeling Language, www.uml.org)

  • More powerful, more exact, standardized, many tools
slide-8
SLIDE 8

8 USC Entrepreneurship / Information Modeling (P. Baumann)

Roadmap

! Data Modeling ! Relational Databases ! OLAP & Data Warehousing

slide-9
SLIDE 9

9 USC Entrepreneurship / Information Modeling (P. Baumann)

Relational Databases

! database = set of tables ("relations") ! Ex: Students table:

Students sid name login gpa attribute tuple

slide-10
SLIDE 10

10 USC Entrepreneurship / Information Modeling (P. Baumann)

  • SQL: „structured [english] query language“ [Codd 1970]
  • Can retrieve, insert, update, delete data
  • Result of SQL query is table again
  • standardised by ISO

! Queries can be written intuitively,

DBMS responsible for efficient evaluation

  • key: precise (mathematical) semantics

Querying Relational Databases

slide-11
SLIDE 11

11 USC Entrepreneurship / Information Modeling (P. Baumann)

sid name login gpa

  • 53666 Jones jones@cs 3.4

53688 Smith smith@eecs 3.2 53650 Smith smith@math 3.8 Students:

SQL @ Work

! "names of all students

with GPA less than 3.6"

! In SQL:

name

  • Jones

Smith

SELECT name FROM Students WHERE gpa < 3.6

slide-12
SLIDE 12

12 USC Entrepreneurship / Information Modeling (P. Baumann)

Wrap-Up: Databases & SQL

! "relational databases" + SQL dominate market

  • more natural than earlier, procedural query languages
  • Although looking innocent, SQL has rigid mathematical semantics definition

! Don't worry – normally, IT specialists will phrase your queries

  • also can map ER diagrams to table schemas
slide-13
SLIDE 13

13 USC Entrepreneurship / Information Modeling (P. Baumann)

Roadmap

! Data Modeling ! Relational Databases ! OLAP & Data Warehousing

slide-14
SLIDE 14

14 USC Entrepreneurship / Information Modeling (P. Baumann)

The Big Picture: DBs in the Enterprise

! IT to help knowledge worker (executive, manager, analyst)

to make faster & better decisions

  • Sales volumes by region and product category for the last year?
  • How did the share price of computer manufacturers correlate with quarterly profits over

the past 10 years?

  • Which orders should we fill to maximize revenues?
  • Will a 10% discount increase sales volume sufficiently?
  • Which of two new medications will result in the best outcome:

higher recovery rate & shorter hospital stay?

! → Decision Support Systems (DSS)

  • Data Warehousing, OLAP elements of DSS
slide-15
SLIDE 15

15 USC Entrepreneurship / Information Modeling (P. Baumann)

! Management problem:

dozens to hundreds of disparate databases – how to keep overview?

! Approach: database which is…

  • separate from organization’s operational databases
  • Primarily used for organizational decision support

! Def: Data Warehouse =

[W.H. Inmon, 1994]

  • subject-oriented, integrated, time-varying, non-volatile

Data Warehouse

slide-16
SLIDE 16

16 USC Entrepreneurship / Information Modeling (P. Baumann)

Data Warehouse: Role & Position

Customers Etc… Vendors Etc… Orders

Data Warehouse Enterprise “Database” Transactions

Copied, Organized, summarized

Data Mining Data Miners:

  • “Farmers” – they know
  • “Explorers” - unpredictable
slide-17
SLIDE 17

17 USC Entrepreneurship / Information Modeling (P. Baumann)

Data Warehouse: IT Perspective

slide-18
SLIDE 18

18 USC Entrepreneurship / Information Modeling (P. Baumann)

OLAP

! OLAP = "Online Analytical Processing" ! Goal: support ad-hoc querying for business analyst

  • familiar with spreadsheets

! → extend spreadsheet analysis model to work with warehouse data

  • Large data sets
  • Semantically enriched to understand business terms (e.g., time, geography)
  • Combined with reporting features

! Multidimensional view of data

slide-19
SLIDE 19

19 USC Entrepreneurship / Information Modeling (P. Baumann)

Multidimensional Data Model

! Data cube

= set of facts (points) in n-D space

! Space spanned by measures

= set of dimension axes

  • aka coordinate system

! dimension attributes

at different granularity

  • Hierarchy: street > county > city
  • Lattice:

date > month > year, date > week > year

10 47 30 12

Sales volume as a function of date, city, product

NY LA SF

City

3/1 3/2 3/3 3/4

Date

Juice Cola Milk Cream

Product

slide-20
SLIDE 20

20 USC Entrepreneurship / Information Modeling (P. Baumann)

Operations in n-D Data Model

! Aggregation (roll-up)

  • dimension reduction: e.g., total sales by city
  • summarization over aggregate hierarchy:

e.g., total sales by city and year -> total sales by region and by year

! Navigation to detailed data (drill-down)

  • (sales - expense) by city
  • top 3% of cities by average income

! Selection (slice) defines a subcube

  • e.g., sales where city = Palo Alto and date = 1/15/96

! Visualization Operations

  • e.g., Pivot
slide-21
SLIDE 21

21 USC Entrepreneurship / Information Modeling (P. Baumann)

Aggregation

Data Warehousing, Decision Support & OLAP Readings in Database Systems, 3rd Edition Stonebraker & Hellerstein, eds.

slide-22
SLIDE 22

22 USC Entrepreneurship / Information Modeling (P. Baumann)

Visualization by Spreadsheet

slide-23
SLIDE 23

23 USC Entrepreneurship / Information Modeling (P. Baumann)

Visualization By Graphics

slide-24
SLIDE 24

24 USC Entrepreneurship / Information Modeling (P. Baumann)

Pivot Tables: Handling n-D Spreadsheets

slide-25
SLIDE 25

25 USC Entrepreneurship / Information Modeling (P. Baumann)

Data Warehousing / OLAP Market

slide-26
SLIDE 26

26 USC Entrepreneurship / Information Modeling (P. Baumann)

Wrap-Up: Biz Information Modeling

! Entity-Relationship Model, UML

  • Describe structure of a miniworld ("universe of discourse")
  • Easy mapping to...

! (Relational) databases

  • Tables, SQL

! Data Warehouse

  • heterogeneous sources → subject-focused agglomeration

! OLAP

  • Multi-dimensional cubes → Decision Support