Advanced Agent Builder Martin Michalowski Announcements Homework - - PowerPoint PPT Presentation

advanced agent builder
SMART_READER_LITE
LIVE PREVIEW

Advanced Agent Builder Martin Michalowski Announcements Homework - - PowerPoint PPT Presentation

Advanced Agent Builder Martin Michalowski Announcements Homework 3 posted Read submission instructions Penalties from now on if submitted incorrectly DO NOT leave agents on lab computers Will be deleted at the end of office


slide-1
SLIDE 1

Advanced Agent Builder

Martin Michalowski

slide-2
SLIDE 2

Announcements

 Homework 3 posted  Read submission instructions

 Penalties from now on if submitted incorrectly

 DO NOT leave agents on lab computers

 Will be deleted at the end of office hours

 Export your agents to hand in

slide-3
SLIDE 3
slide-4
SLIDE 4

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-5
SLIDE 5

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-6
SLIDE 6

Troubleshooting connectors

 Look at header being submitted (in internet

view)

 Make sure your connector is sending the right

values

 Redirects  Sessions

 Makes URL deconstruction easier (covered

later)

slide-7
SLIDE 7
slide-8
SLIDE 8

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-9
SLIDE 9

Training Sample Pages

 Adding additional pages because

 You already fetched all pages from connectors  You need one more “representative” page

 Can be added by:

 Local file  From an existing connector

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

Training Sample Pages

 Setting Validation

 Checks that extracted value is correct  Main source of agent errors

slide-13
SLIDE 13
slide-14
SLIDE 14

Training Sample Pages

 Post-processing

 Done after extraction  Not part of validation

slide-15
SLIDE 15
slide-16
SLIDE 16

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-17
SLIDE 17

Extraction Rules

 Manual classification possible  Playing with rules

 Rules are created by example

 if you are using bad sample pages, then agent learns

incorrect rules

 Rule locking

 Useful when adding items after learning rules

 Do it at your own risks

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-21
SLIDE 21

Filtering

 Filter the value of an attribute in data

schema view

 Filter all data or a list (can’t do individual item)

 Why would you want to filter data?

 Limit the number of items returned

 Example: Google results

 Limit how many times you follow next links

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-25
SLIDE 25

URL deconstruction

 When submitting a form, sometimes a

website redirects to a different url

 Session id  Forms within tables (homework #3??????)  What can we do?

 Make an output connector that points to the

correct URL

 Where can we find the pieces?

 On the page  Capturing the header

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

Advanced Agent Builder

 Troubleshooting connectors  Training sample pages  Extraction Rules  Filtering  URL deconstruction  Miscellaneous

slide-31
SLIDE 31

Cloning a wrapper

slide-32
SLIDE 32

Parameter list

slide-33
SLIDE 33

Number of rows in a list

slide-34
SLIDE 34

Questions?