Big ig Dat ata a an and Had adoop oop Venkatesh Vinayakarao - - PowerPoint PPT Presentation

big ig dat ata a an and had adoop oop
SMART_READER_LITE
LIVE PREVIEW

Big ig Dat ata a an and Had adoop oop Venkatesh Vinayakarao - - PowerPoint PPT Presentation

Big ig Dat ata a an and Had adoop oop Venkatesh Vinayakarao venkateshv@cmi.ac.in http://vvtesh.co.in Chennai Mathematical Institute Data is the new oil. - Clive Humby, 2006. Venkatesh Vinayakarao (Vv) What Comes Next? byte kilobyte


slide-1
SLIDE 1

Venkatesh Vinayakarao (Vv)

Big ig Dat ata a an and Had adoop

  • op

Venkatesh Vinayakarao

venkateshv@cmi.ac.in http://vvtesh.co.in Chennai Mathematical Institute

Data is the new oil. - Clive Humby, 2006.

slide-2
SLIDE 2

What Comes Next?

byte kilobyte megabyte gigabyte ?? ??? ???? ?????

slide-3
SLIDE 3

Sizes

444 Name Size

Byte

8 bits

Kilobyte

1024 bytes

Megabyte

1024 kilobytes

Gigabyte

1024 megabytes

Terabyte

1024 gigabytes

Petabyte

1024 terabytes

Exabyte

1024 petabytes

Zettabyte

1024 exabytes

Yottabyte

1024 zettabytes

slide-4
SLIDE 4

Data Growth

445

Mankind’s quest to digitize the world! 33 ZB (2018) → 175 ZB (2025) size of global datasphere*

*Source: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate- dataage-whitepaper.pdf

slide-5
SLIDE 5

Evolution of Data and Computers

Data Storage

STaaS

Challenges Von Neumann Arch

slide-6
SLIDE 6

Recap

447 Data Storage

STaaS

Data Processing

CPU Performance GPU Performance SuperComputers

slide-7
SLIDE 7

Cloud Computing

448

So, we have the cloud. But, how to store and retrieve data? How to process jobs?

slide-8
SLIDE 8

Role of File Systems

449

File systems are key to handling data.

Variety of FS exist NTFS, FAT, DOS, CDFS, NFS, …

slide-9
SLIDE 9

Distributed Systems

WORM Model. Not designed for write-many (interactive) jobs. Not designed for co-

  • rdination jobs.

Not designed for small files.

slide-10
SLIDE 10

Hadoop and Map Reduce

451 No Interactive Jobs No Jobs Requiring Co-ordination No Small Files

When not to use Hadoop?

Map

Reduce Shuffle and Sort Map-reduce Model Hadoop Architecture

slide-11
SLIDE 11

Map-Reduce Patterns

452 Summarization Top 10 Counting Filtering

slide-12
SLIDE 12

redis> GET nonexisting (nil) redis> SET mykey "Hello" "OK" redis> GET mykey "Hello" redis>

Types of NoSQL datastores

NoSQL

453

Schema-based Relational Model - maintenance problems Impedance Mismatch Scale-up Challenges CAP Theorem

Key-Valuecv Doc-based Columnar DB Graph DB

slide-13
SLIDE 13

Web Services

454

Interoperability CORBA RMI Evolution of Web and App Servers Web Services with REST API

  • Auth

Rate Limiting

slide-14
SLIDE 14

Building Web Services

455

slide-15
SLIDE 15

Thank You

Please remember to give elaborate course feedback. I take my course feedback seriously to improve teaching quality including but not limited to the content, presentation materials, and delivery.

456