Apache Tika Apache Tika Whats new with 2.0? Whats new with 2.0? - - PowerPoint PPT Presentation

apache tika apache tika what s new with 2 0 what s new
SMART_READER_LITE
LIVE PREVIEW

Apache Tika Apache Tika Whats new with 2.0? Whats new with 2.0? - - PowerPoint PPT Presentation

Apache Tika Apache Tika Whats new with 2.0? Whats new with 2.0? Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate Tika, in a nutshell Tika, in a nutshell small, yellow and leech-like, and probably the oddest thing in the


slide-1
SLIDE 1

Apache Tika What’s new with 2.0? Apache Tika What’s new with 2.0?

slide-2
SLIDE 2

Nick Burch CTO, Quanticate Nick Burch CTO, Quanticate

slide-3
SLIDE 3

“small, yellow and leech-like, and probably the oddest thing in the Universe”

  • Like a Babel Fish for content!
  • Helps you work out what sort of thing

your content (1s & 0s) is

  • Helps you extract the metadata from it,

in a consistent way

  • Lets you get a plain text version of your

content, eg for full text indexing

  • Provides a rich (XHTML) version too

Tika, in a nutshell Tika, in a nutshell

slide-4
SLIDE 4

Tika in the news Tika in the news

  • Panama Papers – Tika used to extract content from most of the

fjles before indexing in Apache SOLR https://source.opennews.org/en-US/articles/people-and-tech- behind-panama-papers/

  • MEMEX – DARPA funded project

https://nakedsecurity.sophos.com/2015/02/16/memex-darpas- search-engine-for-the-dark-web/

  • http://openpreservation.org/blog/2016/10/04/apache-tikas-

regression-corpus-tika-1302/

slide-5
SLIDE 5

Tika at ApacheCon Tika at ApacheCon

  • Tim Allison, tomorrow (Thursday), 2.40pm

Evaluating T ext Extraction: Apache Tika's™ New Tika- Eval Module

  • Also related: David North (same time...)

Apache POI: The Challenges and Rewards of a 15 Year Old Codebase

  • Several Committers around, come fjnd us!
slide-6
SLIDE 6

A bit of history A bit of history A bit of history A bit of history

slide-7
SLIDE 7

Before Tika Before Tika

  • In the early 2000s, everyone was building a search engine /

search system for their CMS / web spider / etc

  • Lucene mailing list and wiki had lots of code snippets for

using libraries to extract text

  • Lots of bugs, people using old versions, people missing out
  • n useful formats, confusion abounded
  • Handful of commercial libraries, generally expensive and

aimed at large companies and/or computer forensics

  • Everyone was re-inventing the wheel, and doing it badly....
slide-8
SLIDE 8

Tika's History (in brief) Tika's History (in brief)

  • The idea from Tika fjrst came from the Apache Nutch

project, who wanted to get useful things out of all the content they were spidering and indexing

  • The Apache Lucene project (which Nutch used) were also

interested, as lots of people there had the same problems

  • Ideas and discussions started in 2006
  • Project founded in 2007, in the Apache Incubator
  • Initial contributions from Nutch, Lucene and Lius
  • Graduated in 2008, v1.0 in 2011
slide-9
SLIDE 9

Tika Releases Tika Releases

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.10 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 01/07 05/08 09/09 02/11 06/12 11/13 03/15 08/16 12/17

slide-10
SLIDE 10

A (brief) introduction to Tika A (brief) introduction to Tika A (brief) introduction to Tika A (brief) introduction to Tika

slide-11
SLIDE 11

(Some) Supported Formats (Some) Supported Formats

  • HTML, XHTML, XML
  • Microsoft Offjce – Word, Excel, PowerPoint, Works,

Publisher, Visio – Binary and OOXML formats

  • OpenDocument (OpenOffjce)
  • iWorks – Keynote, Pages, Numbers
  • PDF, RTF, Plain T

ext, CHM Help

  • Compression / Archive – Zip, T

ar, Ar, 7z, bz2, gz etc

  • Atom, RSS, ePub Lots of Scientifjc formats
  • Audio – MP3, MP4, Vorbis, Opus, Speex, MIDI, Wav
  • Image – JPEG, TIFF, PNG, BMP, GIF, ICO
slide-12
SLIDE 12

Detection Detection

  • Work out what kind of fjle something is
  • Based on a mixture of things
  • Filename
  • Mime magic (fjrst few hundred bytes)
  • Dedicated code (eg containers)
  • Some combination of all of these
  • Can be used as a standalone – what is this thing?
  • Can be combined with parsers – fjgure out what this is,

then fjnd a parser to work on it

slide-13
SLIDE 13

Metadata Metadata

  • Describes a fjle
  • eg Title, Author, Creation Date, Location
  • Tika provides a way to extract this (where present)
  • However, each fjle format tends to have its own kind of

metadata, which can vary a lot

  • eg Author, Creator, Created By, First Author, Creator[0]
  • Tika tries to map fjle format specifjc metadata onto

common, consistent metadata keys

  • “Give me the thing that closest represents what Dublin

Core defjnes as Creator”

slide-14
SLIDE 14

Plain T ext Plain T ext

  • Most fjle formats include at least some text
  • For a plain text fjle, that's everything in it!
  • For others, it's only part
  • Lots of libraries out there which can extract text, but how

you call them varies a lot

  • Tika wraps all that up for you, and gives consistentency
  • Plain T

ext is ideal for things like Full T ext Indexing, eg to feed into SOLR, Lucene or ElasticSearch

slide-15
SLIDE 15

XHTML XHTML

  • Structured T

ext extraction

  • Outputs SAX events for the tags and text of a fjle
  • This is actually the Tika default, Plain T

ext is implemented by only catching the T ext parts of the SAX output

  • Isn't supposed to be the “exact representation”
  • Aims to give meaningful, semantic but simple output
  • Can be used for basic previews
  • Can be used to fjlter, eg ignore header + footer then give

remainder as plain text

slide-16
SLIDE 16

Tika “Architecture”, in brief Tika “Architecture”, in brief

  • Hide complexity
  • Hide difgerences
  • Identify, pick and use the “best” libraries and tools
  • Work with all the upstreams for you
  • Come “Batteries Included” where possible / not too big,

“Batteries Nearby” otherwise

  • Try to avoid surprises
  • Support JVM + Non-JVM users as equals
  • Work to fjx any of the above that we happen to miss!
slide-17
SLIDE 17

What's New? What's New? What's New? What's New?

slide-18
SLIDE 18

Formats and Parsers Formats and Parsers

slide-19
SLIDE 19

Supported Formats Supported Formats

  • HTML
  • XML
  • Microsoft Offjce
  • Word
  • PowerPoint
  • Excel (2,3,4,5,97+)
  • Visio
  • Outlook
  • Pre-OOXML XML formats, Lock Files etc!
slide-20
SLIDE 20

Supported Formats Supported Formats

  • Open Document Format (ODF)
  • iWorks, Word Perfect
  • PDF, RTF
  • ePUB
  • Fonts + Font Metrics
  • T

ar, RAR, AR, CPIO, Zip, 7Zip, Gzip, BZip2, XZ and Pack200

  • Plain T

ext

  • RSS and Atom
slide-21
SLIDE 21

Supported Formats Supported Formats

  • IPTC ANPA Newswire
  • CHM Help
  • Wav, MIDI
  • MP3, MP4 Audio
  • Ogg Vorbis, Speex, FLAC, Opus, Theora
  • PNG, JPG, JP2, JPX, BMP, TIFF, BPG, ICNS, PSD, PPM, WebP
  • FLV, MP4 Video – Metadata and video histograms
  • Java classes
slide-22
SLIDE 22

Supported Formats Supported Formats

  • Source Code
  • Mbox, RFC822, Outlook PST, Outlook MSG, TNEF
  • DWG CAD
  • DIF, GDAL, ISO-19139, Grib, HDF, ISA-T

ab, NetCDF, Matlab

  • Executables (Windows, Linux, Mac)
  • Pkcs7, Time Stamp Data Envelope TSD
  • SQLite, dBase DBF
  • Microsoft Access
slide-23
SLIDE 23

OCR OCR

slide-24
SLIDE 24

OCR OCR

  • What if you don't have a text fjle, but instead a photo of

some text? Or a scan of some text?

  • OCR (Optical Character Recognition) to the rescue!
  • T

esseract is an Open Source OCR tool

  • Tika has a parser which can use T

esseract for found images

  • T

esseract is detected, and used if found on your path

  • Explicit path can be given, or can be disabled
  • TODO: Better combining of OCR + normal, or eg PDF only
slide-25
SLIDE 25

Container Formats Container Formats

slide-26
SLIDE 26

Databases Databases

slide-27
SLIDE 27

Databases Databases

  • A surprising number of Database and “database” systems

have a single-fjle mode

  • If there's a single fjle, and a suitable library or program,

then Tika can get the data out!

  • Main ones so far are MS Access & SQLite
  • Panama Papers dump may inspire some more!
  • How best to represent the contents in XHTML?
  • One HTML table per Database T

able best we have, so far...

slide-28
SLIDE 28

Tika Confjg XML Tika Confjg XML

slide-29
SLIDE 29

Tika Confjg XML Tika Confjg XML

  • Using Confjg, you can specify what to use for:

Parsers, Detectors, T ranslator, Service Loader + Warnings / Errors, Encoding Detectors, Mime T ypes

  • You can do it explicitly
  • You can do it implicitly (with defaults)
  • You can do “default except”
  • T
  • ols available to dump out a running confjg as XML
  • Use the Tika App to see what you have + save it
slide-30
SLIDE 30

Tika Confjg XML example Tika Confjg XML example

<?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> </parser> </parsers> </properties>

slide-31
SLIDE 31

Embedded Resources Embedded Resources

slide-32
SLIDE 32

Tika App Tika App

slide-33
SLIDE 33

Tika Server Tika Server

slide-34
SLIDE 34

OSGi OSGi

slide-35
SLIDE 35

Tika Batch Tika Batch

slide-36
SLIDE 36

Tika Batch Tika Batch

  • Easy way to run Tika against a very large number of

documents, for testing and for bulk ingestion

  • Multi-threaded, but not yet Hadoop enabled, see

https://wiki.apache.org/tika/TikaInHadoop for more there

  • Output T

ext or XHTML, metadata, optionally embedded

  • Records failures too, so you know where things go wrong
  • Sets up parent/child processes to robustly handle

permanenthangs/OOMs

  • Optionally restart child every x mins to mitigate memory leaks.
slide-37
SLIDE 37

Tika Batch Tika Batch

  • Runs local directory to local directory, system agnostic
  • Output can be then imported into other systems
  • For ingesting, record common failures, import from directory
  • Or... For testing, import into Tika Eval
  • java -jar tika-app.jar -i <input_directory> -o <output_directory>
  • https://wiki.apache.org/tika/TikaBatchUsage
slide-38
SLIDE 38

Named Entity Recognition Named Entity Recognition

slide-39
SLIDE 39

Grobid – Scientifjc Papers Grobid – Scientifjc Papers

slide-40
SLIDE 40

Grobid – Scientifjc Papers Grobid – Scientifjc Papers

  • Grobid - GeneRation Of BIbliographic Data
  • NLP + NER + Machine Learning
  • T
  • ol to identify metadata from scientifjc / technical papers,

based on the textual content contained within

  • Works out what sections of text are, then maps to metadata
  • Grobid dataset a little big, so Tika doesn’t include as

standard, instead calls out to it via REST if confjgured

  • http://grobid.readthedocs.io/en/latest/Introduction/

https://wiki.apache.org/tika/GrobidJournalParser

slide-41
SLIDE 41

Geo Entity Lookup Geo Entity Lookup

slide-42
SLIDE 42

Geo Entity Lookup Geo Entity Lookup

  • Augmenting “This was written in Seville, Spain in November”

with details of where that is (lat, long, country etc)

  • Apache Lucene Gazetter provides fast lookup of place names

to geographic details

  • Geonames.org dataset used to feed Gazetter
  • Apache OpenNLP identifjes places in text to lookup
  • Needs custom NLP model for place name identifjcation
  • GeoT
  • picParser saves results as metadata, best & alternate
slide-43
SLIDE 43

Image Object Recognition Image Object Recognition

slide-44
SLIDE 44

Image Object Reconition Image Object Reconition

https://memex.jpl.nasa.gov/MFSEC17.pdf

slide-45
SLIDE 45

“T ext Searchable Video” “T ext Searchable Video”

slide-46
SLIDE 46

T ext Searchable Video T ext Searchable Video

  • Pooled Time Series Analysis
  • Allows you to fjnd “similar” videos
  • Search for videos based on features of stills
  • https://memex.jpl.nasa.gov/ICMR17-oss.pdf
  • http://events.linuxfoundation.org/sites/events/fjles/slides/ACN

A15_Mattmann_Tika_Video2.pdf

slide-47
SLIDE 47

Apache cTAKES Apache cTAKES

slide-48
SLIDE 48

Apache Camel Apache Camel

slide-49
SLIDE 49

Apache Camel Integration Apache Camel Integration

  • Allows Parsing and Detection, from 2.19.0 onwards

// Parsing a directory from("file:C:\\docs\\test") .to("tika:parse") .to("log:org.apache.tika?showHeaders=true"); // Detection on a directory from("file:C:\\docs\\test") .to("tika:detect") .to("log:org.apache.tika?showHeaders=true");

slide-50
SLIDE 50

Translation Translation

slide-51
SLIDE 51

Language Detection Language Detection

slide-52
SLIDE 52

Troubleshooting Troubleshooting

slide-53
SLIDE 53

Troubleshooting Troubleshooting

  • Finally, we have a troubleshooting guide!

http://wiki.apache.org/tika/Troubleshooting%20Tika

  • Covers most of the major queries
  • Why wasn’t the right parser used
  • Why didn’t detection work
  • What parsers do I really have etc!
slide-54
SLIDE 54

Parser Errors Parser Errors

  • As well as the troubleshooting guide, for users...

http://wiki.apache.org/tika/Troubleshooting%20Tika

  • We also have the “Errors and Exceptions” page, aimed

more at people writing parsers

  • Tries to explain what a parser should be doing in various

problem situations, what exceptions to give etc http://wiki.apache.org/tika/ErrorsAndExceptions

slide-55
SLIDE 55

What's New & Coming Soon? What's New & Coming Soon? What's New & Coming Soon? What's New & Coming Soon?

slide-56
SLIDE 56

Apache Tika 1.12 – 1.14 Apache Tika 1.12 – 1.14

slide-57
SLIDE 57

Tika 1.12 Tika 1.12

  • More consistent and better HTML between PPT and PPTX
  • NamedEntity Parser, using both OpenNLP and Stanford

NER, outputting text and metadata

  • GeoT
  • pic Parser speedup via using new Lucene Geo

Gazetter REST server

  • Pooled Time Series parser for video – motion properties

from videos to text to allow comparisons

  • Bug fjxes
slide-58
SLIDE 58

Tika 1.13 Tika 1.13

  • Lots of library upgrades – Apache POI, Apache PDFBox 2.0,

Apache SIS and half a dozen others!

  • Lots of new mimetypes and magic patterns, especially for

scientifjc-related formats

  • NamedEntity Parser add support for Python NLTK and MIT-

NLP (MITRE)

  • Tika Confjg XML dumping moved to core, and the app can

now dump your running confjg for you

  • Language Detectors more easily pluggable
  • Bug fjxes
slide-59
SLIDE 59

Tika 1.14 Tika 1.14

  • Embedded Document improvements and Macro extraction

for MS Offjce formats

  • T

ensorfmow integration for image object identifjcation

  • T

esseract OCR improvements (hOCR, full-page PDF)

  • Quite a few more mime types and magics
  • More library upgrades
  • Re-enable fjleUrl feature for Tika Server, has to be turned
  • n manually, gives warnings about security efgects!
slide-60
SLIDE 60

Apache Tika 1.15+ Apache Tika 1.15+

slide-61
SLIDE 61

Tika 1.15+ Tika 1.15+

  • Additional JPEG formats support (JPX, JP2)
  • PDFBox 2.0 further updates
  • Several new older MS Offjce format varients supported
  • Word Perfect, WMF, EMF
  • Language Detector improvements – N-Gram, Optimaize

Lang Detector, MIT T ext.jl, pluggable and pickable

  • More NLP enhancement / augmentation
  • Metadata aliasing
  • Plus preparations for Tika 2
slide-62
SLIDE 62

Image, Video, NER Image, Video, NER

  • Image recognition using T

ensorfmow: https://wiki.apache.org/tika/TikaAndVision / Paper: https://memex.jpl.nasa.gov/MFSEC17.pdf

  • Image Recognition using Deeplearning4j:

https://wiki.apache.org/tika/TikaAndVisionDL4J

  • Sentiment Analysis using OpenNLP:

https://github.com/apache/tika/pull/169

  • Video labeling using tensorfmow image rec:

https://wiki.apache.org/tika/TikaAndVisionVideo

  • Named Entity Extraction using OpenNLP and CoreNLP:

https://wiki.apache.org/tika/TikaAndNER

  • Image Captioning (Image-to-T

ext) https://github.com/apache/tika/pull/180

slide-63
SLIDE 63

Tika 2.0 Tika 2.0 Tika 2.0 Tika 2.0

slide-64
SLIDE 64

Why no Tika v2 yet? Why no Tika v2 yet?

  • Apache Tika 0.1 – December 2007
  • Apache Tika 1.0 – November 2011
  • Shouldn't we have had a v2 by now?
  • Discussions started several years ago, on the list
  • Plans for what we need on the wiki for ~1 year
  • Largely though, every time someone came up with a breaking

feature for 2.0, a compatible way to do it was found!

slide-65
SLIDE 65

Deprecated Parts Deprecated Parts

  • Various parts of Tika have been deprecated over the years
  • All of those will go!
  • Main ones that might bite you:
  • Parser parse with no ParseContext
  • Old style Metadata keys
slide-66
SLIDE 66

Metadata Storage Metadata Storage

  • Currently, Metadata in Tika is String Key/Value Lists
  • Many Metadata types have Properties, which provide

typing, conversions, sanity checks etc

  • But all still stored as String Key + Value(s)
  • Some people think we need a richer storage model
  • Others want to keep it simple!
  • JSON, XML DOM, XMP being debated
  • Richer string keys also proposed
slide-67
SLIDE 67

Metadata for Video etc Metadata for Video etc

  • Video fjle might have 2 video streams, 4 audio streams, a

metadata stream and some subtitles

  • Some of those you want to treat as embedded resources
  • Some of those “belong” together
  • How should we return the number of channels for the 1st

audio stream in a video?

  • Should it change if there’s one or many?
slide-68
SLIDE 68

Java Packaging of Tika Java Packaging of Tika

  • Maven Packages of Tika are
  • Tika Core
  • Tika Parsers
  • Tika Bundle
  • Tika XMP
  • Tika Java 7
  • For just some parsers, in Tika 1.x, you need to exclude

maven dependencies + re-test

  • In Tika 2, more fjne-grained parser collections
slide-69
SLIDE 69

Tika 2.x Parser Sets Tika 2.x Parser Sets

  • Available today in Git on the 2.x branch
  • Advanced

CAD Code Crypto

  • Database

eBook Journal

  • Mail

Multimedia Offjce

  • Package

PDF Scientifjc

  • T

ext Web XMP-Commons

  • May change some more, but broadly in place now
slide-70
SLIDE 70

Logging, Confjg, Defaults Logging, Confjg, Defaults

  • Logging – Moving to SLF4J
  • Aim is to have all of Tika use that, and parsers confjgure

that in for the libraries they call

  • Confjg – ensure everything can be confjgured, and

confjgured easily

  • Consistent Confjguration – all in one place, common format
  • Defaults – Sensible, Documented, No Surprises
slide-71
SLIDE 71

Fallback/Preference Parsers Fallback/Preference Parsers

  • If we have several parsers that can handle a format
  • Preferences?
  • If one fails, how about trying others?
slide-72
SLIDE 72

Multiple Parsers Multiple Parsers

  • If we have several parsers that can handle a format
  • What about running all of them?
  • eg extract image metadata
  • then OCR it
  • then try the regular image parser for more metadata
  • Or maybe for calling multiple difgerent NER parsers
slide-73
SLIDE 73

Parser Discovery/Loading? Parser Discovery/Loading?

  • Currently, Tika uses a Service Loader mechanism to fjnd

and load available Parsers (and Detectors+Translators)

  • This allows you to drop a new Tika parser jar onto the

classpath, and have it automatically used

  • Also allows you to miss one or two jars out, and not get any

content back with no warnings / errors...

  • You can set the Service Loader to Warn, or even Error
  • But most people don't, and it bites them!
  • Change the default in 2? Or change entirely how we do it?
slide-74
SLIDE 74

What we still need help with... What we still need help with... What we still need help with... What we still need help with...

slide-75
SLIDE 75

Content Handler Reset/Add Content Handler Reset/Add

  • Tika uses the SAX Content Handler interface for supplying

plain text along with semantically meaningful XHTML

  • Streaming, write once
  • How does that work with multiple parsers?
  • How about if one parser fails and we want to try parsing

with a difgerent one?

  • How about if one parser works, then you want to run a

second?

  • Language Detection / NER – how to mark up previous text?
slide-76
SLIDE 76

Content Enhancement Content Enhancement

  • How can we post-process the content to “enhance” it in

various ways?

  • For example, how can we mark up parts of speach?
  • Pull out information into the Metadata?
  • Translate it, retaining the original positions?
  • For just some formats, or for all?
  • For just some documents in some formats?
  • While still keeping the Streaming SAX-like contract?
slide-77
SLIDE 77

Metadata Standards Metadata Standards

  • Currently, Tika works hard to map fjle-format-specifjc

metadata onto general metadata standards

  • Means you don't have to know each standard in depth, can

just say “give me the closest to dc:subject you have, no matter what fjle format or library it comes from”

  • What about non-File-format metadata, such as content

metadata (T able of Contents, Author information etc)?

  • What about combining things?
slide-78
SLIDE 78

Richer Metadata Richer Metadata

  • See Metadata Storage slides!
slide-79
SLIDE 79

Bonus! Bonus! Apache Tika at Scale Apache Tika at Scale Bonus! Bonus! Apache Tika at Scale Apache Tika at Scale

slide-80
SLIDE 80

Lots of Data is Junk Lots of Data is Junk

  • At scale, you're going to hit lots of edge cases
  • At scale, you're going to come across lots of junk or

corrupted documents

  • 1% of a lot is still a lot...
  • 1% of the internet is a huge amount!
  • Bound to fjnd fjles which are unusual or corrupted enough

to be mis-identifjed

  • You need to plan for failures!
slide-81
SLIDE 81

Unusual T ypes Unusual T ypes

  • If you're working on a big data scale, you're bound to come

across lots of valid but unusual + unknown fjles

  • You're never going to be able to add support for all of them!
  • May be worth adding support for the more common

“uncommon” unsupported types

  • Which means you'll need to track something about the fjles

you couldn't understand

  • If Tika knows the mimetype but has no parser, just log the

mimetype

  • If mimetype unknown, maybe log fjrst few bytes
slide-82
SLIDE 82

Failure at Scale Failure at Scale

  • Tika will sometimes mis-identify something, so sometimes

the wrong parser will run and object

  • Some fjles will cause parsers or their underlying libraries to

do something silly, such as use lots of memory or get into loops with lots to do

  • Some fjles will cause parsers or their underlying libraries to

OOM, or infjnite loop, or something else bad

  • If a fjle fails once, will probably fail again, so blindly just re-

running that task again won't help

slide-83
SLIDE 83

Failure at Scale, continued Failure at Scale, continued

  • You'll need approaches that plan for failure
  • Consider what will happen if a fjle locks up your JVM, or

kills it with an OOM

  • Forked Parser may be worth using
  • Running a separate Tika Server could be good
  • Depending on work needed, could have a smaller pool of

Tika Server instances for big data code to call

  • Think about failure modes, then think about retries (or not)
  • Track common problems, report and fjx them!
slide-84
SLIDE 84

Tika Batch, Eval & Hadoop Tika Batch, Eval & Hadoop T

  • morrow – 2.40pm, Brickell!

T

  • morrow – 2.40pm, Brickell!

Tika Batch, Eval & Hadoop Tika Batch, Eval & Hadoop T

  • morrow – 2.40pm, Brickell!

T

  • morrow – 2.40pm, Brickell!
slide-85
SLIDE 85

Any Questions? Any Questions? Any Questions? Any Questions?

slide-86
SLIDE 86

Nick Burch

@Gagravarr nick@apache.org

Nick Burch

@Gagravarr nick@apache.org