T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b - - PowerPoint PPT Presentation

t he use o f se a rc h e ng ine s fo r ma ssive ly sc a
SMART_READER_LITE
LIVE PREVIEW

T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b - - PowerPoint PPT Presentation

T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b le F o re nsic Re po sito rie s www.c yb e rta pllc .c o m/ John H. Ric ke tson jr ic ke tson@c ybe r tapllc .c om jr ic ke tson@de javute c hnologie s.c om +1- 978- 692-


slide-1
SLIDE 1

T he Use o f Se a rc h E ng ine s fo r Ma ssive ly Sc a la b le F

  • re nsic Re po sito rie s

www.c yb e rta pllc .c o m/

John H. Ric ke tson jr ic ke tson@c ybe r tapllc .c om jr ic ke tson@de javute c hnologie s.c om +1- 978- 692- 7229

slide-2
SLIDE 2

Who I s c yb e rta p?

  • We pro vide a for

e nsic platfor m for c ybe r inve stigations

b a se d o n se ar

c h e ngine te c hnology – E xte rna l T hre a ts & Viruse s – Ha c king – F ina nc ia l F ra ud – E le c tro nic Wa rfa re – Se c urity E ve nt Ana lysis

  • Ma rke ts

– L a w E nfo rc e me nt – Cyb e r Se c urity, fo r b o th Go ve rnme nt a nd Co mme rc ia l E nte rprise s – Ba nking / T ra ding – E le c tro nic Co mme rc e – Ca ll Ce nte rs

slide-3
SLIDE 3

F

  • re nsic E

vide nc e

  • Do c ume nts – Co mpute r F
  • re nsic s / e Disc o ve ry / “da ta -a t-re st”

– Disk Sc rub b ing , De a d Bo xe s, e tc – Sha re d Re po sito rie s like Dro pb o x o r Sha re po int

  • Arc hive s

– E ma il – Insta nt Me ssa g ing

  • We b

– Do wnlo a d e d HT ML pa g e s

  • F

ina nc ia l I nfo rma tio n

– Invo ic e s, Cre d it c a rd s, Priva te Info rma tio n – E le c tro nic T ra d e s

  • L
  • g F

ile s fro m Ne two rk De vic e s

  • Ce ll Pho ne Ca ll Re c o rds
  • Re a l-T

ime Ne two rk T ra nsa c tio ns – “da ta -in-mo tio n”

– Pa c ke t Ca pture s – Ne two rk Stre a ms

slide-4
SLIDE 4

F

  • re nsic Da ta
  • Arc hiva l in Na ture
  • No n-T

ra nsa c tio na l

  • Co nta ins Me ta -da ta & Co nte nt & E

xtra c te d I nte llig e nc e

– Me ta -Da ta

  • Do c ume nt Attrib ute s

– Autho r, Da te s, Printe rs, Ma c ro s, Do c ume nt E dits, F ile re fe re nc e

  • Ne two rk Attrib ute s

– Addre ssing E ndpo ints, I D’ s, Do ma ins, Pro to c o l He a de rs

– Co nte nt

  • Me ssa g e Co nte nt
  • Bo dy Co nte nt
  • Me dia Stre a ms

– E xtra c te d I nte llig e nc e

  • E

le c tro nic Pe rso na (e pe rso na )

  • Ge o -lo c a tio n
  • Co rre la tio ns a nd links a mo ng he te o rg e no us da ta
slide-5
SLIDE 5

Se a rc h E ng ine s Pro vide

  • No n-T

ra nsa c tio na l re po sito ry o f a rc hiva l da ta

  • Me ta -da ta de sc ripto rs fo r ne two rk a nd do c ume nt a ttrib ute s

– De line a ting me ta -da ta fro m c o nte nt in se a rc h q ue rie s is c ruc ia l

  • Sho w me a ll e ma il do c ume nts fro m this.I

P a ddre ss to tha t.I P

  • Sho w me a ll I

M me ssa g e s fro m this.I D to tha t.I D c o nta ining “nitra te ”

  • T

he a b ility to se a rc h in fre e fo rm a ny me ta -da ta o r c o nte nt ite m

  • Ma ssive ly sc a la b le fo re nsic re po sito rie s

– 10+ Billio n do c ume nts re pre se nting T e ra b yte s o f da ta

  • Sub -se c o nd se a rc h time s
  • Cro ss re fe re nc e a nd c o rre la tio n o f a ll da ta with a sing le se a rc h
  • Do c ume nt pa rsing
  • L

e ve ra g e the se F

RE E , ric h-func tio na lity, fo re nsic re po sito rie s – http:/ / luc e ne .a pa c he .o rg / so lr/ – http:/ / tika .a pa c he .o rg /

slide-6
SLIDE 6

T ika Pro vide s

  • Me ta -da ta , MI

ME , L a ng ua g e & Co nte nt

– Hype rT e xt Ma rkup L a ng ua g e (html) – XML a nd d e rive d fo rma ts – Mic ro so ft Offic e d o c ume nt fo rma ts – Ope nDo c ume nt F

  • rma t

– Po rta b le Do c ume nt F

  • rma t (pd f)

– E le c tro nic Pub lic a tio n F

  • rma t

– Ric h T e xt F

  • rma t (rtf)

– Co mpre ssio n a nd pa c ka g ing fo rma ts (zip, ta r, e tc .) – T e xt fo rma ts – Aud io fo rma ts – Ima g e fo rma ts – Vid e o fo rma ts – Ja va c la ss file s a nd a rc hive s – T he mb o x fo rma t

  • I

de ntifie s do c ume nts b y c o nte nt ONL Y

File extension ≠ Content

slide-7
SLIDE 7

Apa c he So lr & T ika Ope n So urc e Pro je c ts

Tika parser

Documents

XML

  • Meta-data
  • Content

Solr indexer Index

slide-8
SLIDE 8

De mo nstra tio n o f Do c ume nt Se a rc he s

  • I

mpo rting Do c ume nts

  • I

mpo rting E

  • ma il
  • Se a rc hing b o th Me ta -da ta & Co nte nt
slide-9
SLIDE 9

Re a l-T ime Ne two rk Pa c ke t I ng e stio n

Tika parser

Documents

XML

  • Meta-data
  • Content

Solr indexer Index Decompile

Packets

slide-10
SLIDE 10

Ope n Do c ume nt E xtra c tio n & E nric hme nt

  • Vo I

P te le pho ne c a ll pro c e ssing – wa v

– I nc luding Spe e c h-to -T e xt, Vo ic e I de ntific a tio n, Vo ic e Re c o g nitio n

  • Vide o pro c e ssing – vide o , fla sh

– I nc luding OCR, Spe e c h-to -T e xt, c lo se d c a ptio ning , a nd multi-fra me a na lysis.

  • I

ma g e pro c e ssing – jpe g , pdf, g if, e tc

– I nc luding OCR, fa c ia l re c o g nitio n, fle sh to ne de te c tio n, e tc .

  • Na tura l L

a ng ua g e Pro c e ssing - te xt

– I nc luding la ng ua g e tra nsla tio n, te xt e ntity e xtra c tio n, pro pe r na me disa mb ig ua tio n, summa rize c o nve rsa tio ns o r ide ntify pe o ple b y writing style .

  • Auto ma te d Ze ro -Da y Ma lwa re / Virus De te c tio n
  • Ste g a no g ra phic de te c tio n
  • De c ryptio n
  • e Pe rso na Ba c kg ro und Che c ks
slide-11
SLIDE 11

De mo nstra tio n o f Pa c ke t Se a rc he s

  • I

mpo rting Pa c ke ts

  • Se a rc hing b o th Me ta -da ta & Co nte nt
  • I

MAP - HT T P - Vo I P - F a c e b o o k

  • Re c o nstruc tio n
  • Co nte nt E

xtra c tio n

  • E

le c tro nic Pe rso na I de ntific a tio n (e pe rso na )

slide-12
SLIDE 12