KNIME and the Web Extract, Test, Automate KNIME Spring Summit, - - PowerPoint PPT Presentation

knime and the web extract test automate
SMART_READER_LITE
LIVE PREVIEW

KNIME and the Web Extract, Test, Automate KNIME Spring Summit, - - PowerPoint PPT Presentation

KNIME and the Web Extract, Test, Automate KNIME Spring Summit, Berlin, 25.02.2016 Philipp Katz, Our Background Three former PhD students at TU Dresden (me, Klemens Muthmann, David Urbansky) Computer Science, Information


slide-1
SLIDE 1

KNIME and the Web – Extract, Test, Automate

KNIME Spring Summit, 
 Berlin, 25.02.2016 Philipp Katz,

slide-2
SLIDE 2

Our Background

  • Three former PhD

students at TU Dresden (me, Klemens Muthmann, David Urbansky)

  • Computer Science,

Information Extraction

  • After PhD, each of us

founded a startup CYFACE

(fancy logo under construction)

slide-3
SLIDE 3

Palladian Nodes

slide-4
SLIDE 4

Palladian?

  • Java-based toolkit for information retrieval

started in 2009

  • Palladian KNIME nodes since 2011
  • Used in commercial and academic projects
  • Available from KNIME Community

Contributions download site

slide-5
SLIDE 5

The Palladian Nodes

  • Text classification
  • Content extraction
  • Date extraction
  • Named entity recognition
  • Geo data extraction
  • Web page, image, news search
  • HTML, RSS, Atom parsing
  • Ranking value retrieval
  • Evaluation metrics
slide-6
SLIDE 6

Access Web APIs

  • Web Searcher
  • Ranking Services
slide-7
SLIDE 7

Text Classification

  • Very simple, one predictor, one learner
  • n-gram features and Naïve Bayes scoring
  • Optimized for big amounts of training data
  • Learner is now streamable, Predictor soon
  • Competitive accuracy for many use cases
slide-8
SLIDE 8

Geographic Data

  • Was cooking for a while, added after last

year's summit due to popular demand

  • New: Nodes for IP and address lookup
  • New: Use local gazetteer as source for

location extraction node

slide-9
SLIDE 9

Geographic Data

  • Extract and disambiguate locations from

unstructured text, visualize them on the map

slide-10
SLIDE 10

Geographic Data

  • Extract and disambiguate locations from

unstructured text, visualize them on the map

slide-11
SLIDE 11

Geographic Data

  • Extract and disambiguate locations from

unstructured text, visualize them on the map

slide-12
SLIDE 12

HTTP and HTML

  • New: Support for cookies, headers, and

further HTTP methods besides GET

  • New: Sending arbitrary byte stream

content, form-encoding of table data

  • New: OAuth signing for HTTP requests
slide-13
SLIDE 13
slide-14
SLIDE 14

?

slide-15
SLIDE 15

?

slide-16
SLIDE 16

Selenium Nodes

slide-17
SLIDE 17

Selenium?

  • “Selenium automates

browsers.”

  • The Selenium Nodes

allow to simulate a real web browser with KNIME

  • Use a KNIME workflow

to describe actions and extract all the data you need

slide-18
SLIDE 18

Use Cases

Data extraction Task automatization Web application testing

slide-19
SLIDE 19

Browser Support

  • Local installations
  • Headless “browsers”
  • PhantomJS, jBrowserDriver
  • Remotely running

slide-20
SLIDE 20

Browser Support

  • Remotely running
  • Connect to Selenium servers or

VMs on your local network to simulate a variety of

  • perating systems or browsers
  • Use cloud services such as BrowserStack or

SauceLabs, which provide ready-to-use Selenium instances (even iOS and Android)

slide-21
SLIDE 21

Example Workflow

slide-22
SLIDE 22

Example Workflow

slide-23
SLIDE 23

Example Workflow

slide-24
SLIDE 24

Example Workflow

slide-25
SLIDE 25

Example Workflow

slide-26
SLIDE 26

Node Overview

  • Configure, start, and

quit web browsers

  • Navigate
  • Locate Elements (using

attributes, XPath, or CSS)

  • Interact with Elements

(click, input text, select, submit, …)

slide-27
SLIDE 27

Node Overview

  • Highlight elements
  • Take screenshots
  • Extract data (page

source, text content, attributes, …)

  • Execute JavaScript
  • Execute Selenium script
  • Waiting and

synchronization

slide-28
SLIDE 28

Outlook

  • More sample workflows
  • Documentation, how-tos, …
  • Workflow import and export for

Selenium Scripts

slide-29
SLIDE 29
slide-30
SLIDE 30

Questions? 
 Get in touch!

mail@seleniumnodes.com KNIME forum