Cloud Computing for the Humanities Graham Wilcock University of - - PowerPoint PPT Presentation

cloud computing for the humanities
SMART_READER_LITE
LIVE PREVIEW

Cloud Computing for the Humanities Graham Wilcock University of - - PowerPoint PPT Presentation

Cloud Computing for the Humanities Graham Wilcock University of Helsinki What is Cloud Computing? Run your app in the cloud Using somebody elses computers Computing resources on-demand Like electricity, or pizza delivery


slide-1
SLIDE 1

Cloud Computing for the Humanities

Graham Wilcock University of Helsinki

slide-2
SLIDE 2

What is Cloud Computing?

”Run your app in the cloud”

Using somebody else’s computers

Computing resources on-demand

Like electricity, or pizza delivery

Platform-as-a-Service (PaaS)

Example: Google App Engine

2 Baltic HLT, Riga, 2010 Graham Wilcock

slide-3
SLIDE 3

3 Baltic HLT, Riga, 2010 Graham Wilcock

slide-4
SLIDE 4

Google App Engine

”Run your web apps on

Google’s infrastructure”

http://your-app-name.appspot.com

My web app is AELRED:

App Engine Language Resource Editions First version: Jane Austen novels http://aelred-austen.appspot.com

4 Baltic HLT, Riga, 2010 Graham Wilcock

slide-5
SLIDE 5

5 Baltic HLT, Riga, 2010 Graham Wilcock

slide-6
SLIDE 6

6 Baltic HLT, Riga, 2010 Graham Wilcock

slide-7
SLIDE 7

7 Baltic HLT, Riga, 2010 Graham Wilcock

slide-8
SLIDE 8

8 Baltic HLT, Riga, 2010 Graham Wilcock

slide-9
SLIDE 9

9 Baltic HLT, Riga, 2010 Graham Wilcock

slide-10
SLIDE 10

10 Baltic HLT, Riga, 2010 Graham Wilcock

slide-11
SLIDE 11

11 Baltic HLT, Riga, 2010 Graham Wilcock

slide-12
SLIDE 12

12 Baltic HLT, Riga, 2010 Graham Wilcock

slide-13
SLIDE 13

13 Baltic HLT, Riga, 2010 Graham Wilcock

slide-14
SLIDE 14

14 Baltic HLT, Riga, 2010 Graham Wilcock

slide-15
SLIDE 15

15 Baltic HLT, Riga, 2010 Graham Wilcock

slide-16
SLIDE 16

16 Baltic HLT, Riga, 2010 Graham Wilcock

slide-17
SLIDE 17

Key Ideas: Easy, Big, Free

Easy: use Python

NLTK Natural Language Toolkit Django HTML Template Engine

Big: Google’s scalable infrastructure

BigTable non-relational datastore MapReduce data-intensive processing

Free: App Engine has free quotas

Only pay if high demand for app

17 Baltic HLT, Riga, 2010 Graham Wilcock

slide-18
SLIDE 18

18 Baltic HLT, Riga, 2010 Graham Wilcock

slide-19
SLIDE 19

NLTK Natural Language Toolkit

Open source Python tools

Taggers, chunkers, parsers, classifiers ...

Many major corpora and resources

Brown Corpus, Penn Treebank, WordNet ...

Excellent free online textbook

Natural Language Processing with Python Stephen Bird, Ewan Klein, Edward Loper

19 Baltic HLT, Riga, 2010 Graham Wilcock

slide-20
SLIDE 20

NLTK and App Engine

App Engine code must be pure Python Normal ”import nltk” does not work

Some NLTK code is not pure Python E.g. uses Numpy with C for speed

Use ”import aelred” instead

Aelred code is pure Python Other customization, e.g. tokenization

20 Baltic HLT, Riga, 2010 Graham Wilcock

slide-21
SLIDE 21

21 Baltic HLT, Riga, 2010 Graham Wilcock

slide-22
SLIDE 22

Django Web App Framework

Open source Python

Model-View-Controller design pattern Models defined easily by Python classes

HTML Template Engine

Web pages generated using contexts Excellent ”template inheritance” facility

Free online textbook

Django: The Book

22 Baltic HLT, Riga, 2010 Graham Wilcock

slide-23
SLIDE 23

Google BigTable Datastore

Non-relational database

Different thinking from SQL databases Designed for massive scalability

My current way of using the datastore:

Serialize complex objects to YAML Store/retrieve YAML as big text strings

23 Baltic HLT, Riga, 2010 Graham Wilcock

slide-24
SLIDE 24

MapReduce Algorithms

Data-intensive distributed processing

Different thinking from usual algorithms Designed for massive scalability

My current way of using MapReduce:

Iterate over all entities in datastore Delete entity, or update and save

24 Baltic HLT, Riga, 2010 Graham Wilcock

slide-25
SLIDE 25

25 Baltic HLT, Riga, 2010 Graham Wilcock

slide-26
SLIDE 26

26 Baltic HLT, Riga, 2010 Graham Wilcock

slide-27
SLIDE 27

27 Baltic HLT, Riga, 2010 Graham Wilcock

slide-28
SLIDE 28

28 Baltic HLT, Riga, 2010 Graham Wilcock