Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

What happens w hen y o u call nlp ? doc = nlp("This is a sentence.") ADVANCED NLP WITH SPACY

B u ilt - in pipeline components Name Description Creates Token.tag tagger Part - of - speech tagger Token.dep , Token.head , Doc.sents , Doc.noun_chunks parser Dependenc y parser Doc.ents , Token.ent_iob , Token.ent_type ner Named entit y recogni z er Doc.cats te x tcat Te x t classi � er ADVANCED NLP WITH SPACY

Under the hood Pipeline de � ned in model ' s meta.json in order B u ilt - in components need binar y data to make predictions ADVANCED NLP WITH SPACY

Pipeline attrib u tes nlp.pipe_names : list of pipeline component names print(nlp.pipe_names) ['tagger', 'parser', 'ner'] nlp.pipeline : list of (name, component) t u ples print(nlp.pipeline) [('tagger', <spacy.pipeline.Tagger>), ('parser', <spacy.pipeline.DependencyParser>), ('ner', <spacy.pipeline.EntityRecognizer>)] ADVANCED NLP WITH SPACY

Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

C u stom pipeline components AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Wh y c u stom components ? Make a f u nction e x ec u te a u tomaticall y w hen y o u call nlp Add y o u r o w n metadata to doc u ments and tokens Updating b u ilt - in a � rib u tes like doc.ents ADVANCED NLP WITH SPACY

Anatom y of a component (1) F u nction that takes a doc , modi � es it and ret u rns it Can be added u sing the nlp.add_pipe method def custom_component(doc): # Do something to the doc here return doc nlp.add_pipe(custom_component) ADVANCED NLP WITH SPACY

Anatom y of a component (2) def custom_component(doc): # Do something to the doc here return doc nlp.add_pipe(custom_component) Arg u ment Description E x ample last If True , add last nlp.add_pipe(component, last=True) first If True , add � rst nlp.add_pipe(component, first=True) before nlp.add_pipe(component, before='ner') Add before component after nlp.add_pipe(component, after='tagger') Add a � er component ADVANCED NLP WITH SPACY

E x ample : a simple component (1) # Create the nlp object nlp = spacy.load('en_core_web_sm') # Define a custom component def custom_component(doc): # Print the doc's length print('Doc length:' len(doc)) # Return the doc object return doc # Add the component first in the pipeline nlp.add_pipe(custom_component, first=True) # Print the pipeline component names print('Pipeline:', nlp.pipe_names) Pipeline: ['custom_component', 'tagger', 'parser', 'ner'] ADVANCED NLP WITH SPACY

E x ample : a simple component (2) # Create the nlp object nlp = spacy.load('en_core_web_sm') # Define a custom component def custom_component(doc): # Print the doc's length print('Doc length:' len(doc)) # Return the doc object return doc # Add the component first in the pipeline nlp.add_pipe(custom_component, first=True) # Process a text doc = nlp("Hello world!") Doc length: 3 ADVANCED NLP WITH SPACY

E x tension attrib u tes AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Setting c u stom attrib u tes Add c u stom metadata to doc u ments , tokens and spans Accessible v ia the ._ propert y doc._.title = 'My document' token._.is_color = True span._.has_color = False registered on the global Doc , Token or Span u sing the set_extension method # Import global classes from spacy.tokens import Doc, Token, Span # Set extensions on the Doc, Token and Span Doc.set_extension('title', default=None) Token.set_extension('is_color', default=False) Span.set_extension('has_color', default=False) ADVANCED NLP WITH SPACY

E x tension attrib u te t y pes 1. A � rib u te e x tensions 2. Propert y e x tensions 3. Method e x tensions ADVANCED NLP WITH SPACY

Attrib u te e x tensions Set a defa u lt v al u e that can be o v er w ri � en from spacy.tokens import Token # Set extension on the Token with default value Token.set_extension('is_color', default=False) doc = nlp("The sky is blue.") # Overwrite extension attribute value doc[3]._.is_color = True ADVANCED NLP WITH SPACY

Propert y e x tensions (1) De � ne a ge � er and an optional se � er f u nction Ge � er onl y called w hen y o u retrie v e the a � rib u te v al u e from spacy.tokens import Token # Define getter function def get_is_color(token): colors = ['red', 'yellow', 'blue'] return token.text in colors # Set extension on the Token with getter Token.set_extension('is_color', getter=get_is_color) doc = nlp("The sky is blue.") print(doc[3]._.is_color, '-', doc[3].text) blue - True ADVANCED NLP WITH SPACY

Propert y e x tensions (2) Span e x tensions sho u ld almost al w a y s u se a ge � er from spacy.tokens import Span # Define getter function def get_has_color(span): colors = ['red', 'yellow', 'blue'] return any(token.text in colors for token in span) # Set extension on the Span with getter Span.set_extension('has_color', getter=get_has_color) doc = nlp("The sky is blue.") print(doc[1:4]._.has_color, '-', doc[1:4].text) print(doc[0:2]._.has_color, '-', doc[0:2].text) True - sky is blue False - The sky ADVANCED NLP WITH SPACY

Method e x tensions Assign a f u nction that becomes a v ailable as an object method Lets y o u pass arg u ments to the e x tension f u nction from spacy.tokens import Doc # Define method with arguments def has_token(doc, token_text): in_doc = token_text in [token.text for token in doc] # Set extension on the Doc with method Doc.set_extension('has_token', method=has_token) doc = nlp("The sky is blue.") print(doc._.has_token('blue'), '- blue') print(doc._.has_token('cloud'), '- cloud') True - blue False - cloud ADVANCED NLP WITH SPACY

Scaling and performance AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Processing large v ol u mes of te x t Use nlp.pipe method Processes te x ts as a stream , y ields Doc objects M u ch faster than calling nlp on each te x t BAD : docs = [nlp(text) for text in LOTS_OF_TEXTS] GOOD : docs = list(nlp.pipe(LOTS_OF_TEXTS)) ADVANCED NLP WITH SPACY

Passing in conte x t (1) Se � ing as_tuples=True on nlp.pipe lets y o u pass in (text, context) t u ples Yields (doc, context) t u ples Usef u l for associating metadata w ith the doc data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): print(doc.text, context['page_number']) This is a text 15 And another text 16 ADVANCED NLP WITH SPACY

Passing in conte x t (2) from spacy.tokens import Doc Doc.set_extension('id', default=None) Doc.set_extension('page_number', default=None) data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): doc._.id = context['id'] doc._.page_number = context['page_number'] ADVANCED NLP WITH SPACY

Using onl y the tokeni z er don ' t r u n the w hole pipeline ! ADVANCED NLP WITH SPACY

Using onl y the tokeni z er (2) Use nlp.make_doc to t u rn a te x t in to a Doc object BAD : doc = nlp("Hello world") GOOD : doc = nlp.make_doc("Hello world!") ADVANCED NLP WITH SPACY

Disabling pipeline components Use nlp.disable_pipes to temporaril y disable one or more pipes # Disable tagger and parser with nlp.disable_pipes('tagger', 'parser'): # Process the text and print the entities doc = nlp(text) print(doc.ents) restores them a � er the with block onl y r u ns the remaining components ADVANCED NLP WITH SPACY

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper What happens w hen y o u call nlp ? doc = nlp("This is a sentence.") ADVANCED NLP WITH SPACY B u ilt - in pipeline components Name

Licensed Pipelines & the Planning System Council Briefing 2019 Critical Infrastructure

COMPLETED PIPELINES FT Completed Pipelines SNOWSWICK BLUNSDEN - 2019 Instalcom for Thames

UK COMPLETED PIPELINES FT Completed Pipelines WING PIPELINE ANGLIAN WATER 1000mm water

Planning Near Transmission Pipelines Planning Near Transmission Pipelines Meghan Thoreau, planner

CS 104 Computer Organization and Design Fancy Pipelines: not just scalar in-order CS104: Fancy

PolyMage: Automatic Optimization for Image Processing Pipelines Ravi Teja Mullapudi Vinay

Pipelines and Informed Planning Alliance (PIPA) Pipelines and Informed Planning Alliance (PIPA)

Pipelines on Pipelines: Creating Agile CI/CD Workflows for Airflow DAGs By Victor Shafran CPO

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Automatically Scheduling Halide Image Processing Pipelines Ravi Teja Mullapudi (CMU) Andrew Adams

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Building Stream Processing Pipelines Gyula Fra gyfora@sics.se Introduction

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

Symposium Co-locating Nuclear Plants with Natural Gas Pipelines Paul Blanch, Energy Consultant

Princeton Hydro LLC. Pipelines in the Landscape Both photographs attributed to Delaware

Safety of Gas Gathering Pipelines RIN: 2137-AF38 Docket: PHMSA 2011 0023 Gas Pipeline

The Good, the Bad, and the Ugly: The Unix Legacy Rob Pike Bell Labs Lucent Technologies

Demystifying the job market: From PhD to Professor @jayvanbavel new york university The stuff of

The Polyhedral Model Is More Widely Applicable Than You Think Mohamed-Walid Benabderrahmane 1

On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang,

Who needs Pandoc when you have Sphinx? An exploration of the parsers and builders of the Sphinx

IMGD 1001: Game Design Documents by Mark Claypool (claypool@cs.wpi.edu) Robert W. Lindeman

A Cross-Language Approach to Historic Document Retrieval Marijn Koolen, Frans Adriaans, Jaap

High Performance HTML5 stevesouders.com/docs/qcon-2011118.pptx Disclaimer: This content does not

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper What happens w hen y o u call nlp ? doc = nlp("This is a sentence.") ADVANCED NLP WITH SPACY B u ilt - in pipeline components Name

Licensed Pipelines &amp; the Planning System Council Briefing 2019 Critical Infrastructure

COMPLETED PIPELINES FT Completed Pipelines SNOWSWICK BLUNSDEN - 2019 Instalcom for Thames

UK COMPLETED PIPELINES FT Completed Pipelines WING PIPELINE ANGLIAN WATER 1000mm water

Planning Near Transmission Pipelines Planning Near Transmission Pipelines Meghan Thoreau, planner

CS 104 Computer Organization and Design Fancy Pipelines: not just scalar in-order CS104: Fancy

PolyMage: Automatic Optimization for Image Processing Pipelines Ravi Teja Mullapudi Vinay

Pipelines and Informed Planning Alliance (PIPA) Pipelines and Informed Planning Alliance (PIPA)

Pipelines on Pipelines: Creating Agile CI/CD Workflows for Airflow DAGs By Victor Shafran CPO

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Automatically Scheduling Halide Image Processing Pipelines Ravi Teja Mullapudi (CMU) Andrew Adams

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Building Stream Processing Pipelines Gyula Fra gyfora@sics.se Introduction

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

Symposium Co-locating Nuclear Plants with Natural Gas Pipelines Paul Blanch, Energy Consultant

Princeton Hydro LLC. Pipelines in the Landscape Both photographs attributed to Delaware

Safety of Gas Gathering Pipelines RIN: 2137-AF38 Docket: PHMSA 2011 0023 Gas Pipeline

The Good, the Bad, and the Ugly: The Unix Legacy Rob Pike Bell Labs Lucent Technologies

Demystifying the job market: From PhD to Professor @jayvanbavel new york university The stuff of

The Polyhedral Model Is More Widely Applicable Than You Think Mohamed-Walid Benabderrahmane 1

On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang,

Who needs Pandoc when you have Sphinx? An exploration of the parsers and builders of the Sphinx

IMGD 1001: Game Design Documents by Mark Claypool (claypool@cs.wpi.edu) Robert W. Lindeman

A Cross-Language Approach to Historic Document Retrieval Marijn Koolen, Frans Adriaans, Jaap

High Performance HTML5 stevesouders.com/docs/qcon-2011118.pptx Disclaimer: This content does not

Licensed Pipelines & the Planning System Council Briefing 2019 Critical Infrastructure