Venkatesh Vinayakarao (Vv)
Big ig Dat ata a an and Had adoop
- op
Venkatesh Vinayakarao
venkateshv@cmi.ac.in http://vvtesh.co.in Chennai Mathematical Institute
Data is the new oil. - Clive Humby, 2006.
Big ig Dat ata a an and Had adoop oop Venkatesh Vinayakarao - - PowerPoint PPT Presentation
Big ig Dat ata a an and Had adoop oop Venkatesh Vinayakarao venkateshv@cmi.ac.in http://vvtesh.co.in Chennai Mathematical Institute Data is the new oil. - Clive Humby, 2006. Venkatesh Vinayakarao (Vv) What Comes Next? byte kilobyte
Venkatesh Vinayakarao (Vv)
venkateshv@cmi.ac.in http://vvtesh.co.in Chennai Mathematical Institute
Data is the new oil. - Clive Humby, 2006.
444 Name Size
Byte
8 bits
Kilobyte
1024 bytes
Megabyte
1024 kilobytes
Gigabyte
1024 megabytes
Terabyte
1024 gigabytes
Petabyte
1024 terabytes
Exabyte
1024 petabytes
Zettabyte
1024 exabytes
Yottabyte
1024 zettabytes
445
*Source: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate- dataage-whitepaper.pdf
Data Storage
STaaS
Challenges Von Neumann Arch
447 Data Storage
STaaS
Data Processing
CPU Performance GPU Performance SuperComputers
448
449
Variety of FS exist NTFS, FAT, DOS, CDFS, NFS, …
WORM Model. Not designed for write-many (interactive) jobs. Not designed for co-
Not designed for small files.
451 No Interactive Jobs No Jobs Requiring Co-ordination No Small Files
When not to use Hadoop?
Reduce Shuffle and Sort Map-reduce Model Hadoop Architecture
452 Summarization Top 10 Counting Filtering
redis> GET nonexisting (nil) redis> SET mykey "Hello" "OK" redis> GET mykey "Hello" redis>
Types of NoSQL datastores
453
Schema-based Relational Model - maintenance problems Impedance Mismatch Scale-up Challenges CAP Theorem
Key-Valuecv Doc-based Columnar DB Graph DB
454
Interoperability CORBA RMI Evolution of Web and App Servers Web Services with REST API
Rate Limiting
455
456