processing pipelines
play

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper What happens w hen y o u call nlp ? doc = nlp("This is a sentence.") ADVANCED NLP WITH SPACY B u ilt - in pipeline components Name


  1. Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  2. What happens w hen y o u call nlp ? doc = nlp("This is a sentence.") ADVANCED NLP WITH SPACY

  3. B u ilt - in pipeline components Name Description Creates Token.tag tagger Part - of - speech tagger Token.dep , Token.head , Doc.sents , Doc.noun_chunks parser Dependenc y parser Doc.ents , Token.ent_iob , Token.ent_type ner Named entit y recogni z er Doc.cats te x tcat Te x t classi � er ADVANCED NLP WITH SPACY

  4. Under the hood Pipeline de � ned in model ' s meta.json in order B u ilt - in components need binar y data to make predictions ADVANCED NLP WITH SPACY

  5. Pipeline attrib u tes nlp.pipe_names : list of pipeline component names print(nlp.pipe_names) ['tagger', 'parser', 'ner'] nlp.pipeline : list of (name, component) t u ples print(nlp.pipeline) [('tagger', <spacy.pipeline.Tagger>), ('parser', <spacy.pipeline.DependencyParser>), ('ner', <spacy.pipeline.EntityRecognizer>)] ADVANCED NLP WITH SPACY

  6. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

  7. C u stom pipeline components AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  8. Wh y c u stom components ? Make a f u nction e x ec u te a u tomaticall y w hen y o u call nlp Add y o u r o w n metadata to doc u ments and tokens Updating b u ilt - in a � rib u tes like doc.ents ADVANCED NLP WITH SPACY

  9. Anatom y of a component (1) F u nction that takes a doc , modi � es it and ret u rns it Can be added u sing the nlp.add_pipe method def custom_component(doc): # Do something to the doc here return doc nlp.add_pipe(custom_component) ADVANCED NLP WITH SPACY

  10. Anatom y of a component (2) def custom_component(doc): # Do something to the doc here return doc nlp.add_pipe(custom_component) Arg u ment Description E x ample last If True , add last nlp.add_pipe(component, last=True) first If True , add � rst nlp.add_pipe(component, first=True) before nlp.add_pipe(component, before='ner') Add before component after nlp.add_pipe(component, after='tagger') Add a � er component ADVANCED NLP WITH SPACY

  11. E x ample : a simple component (1) # Create the nlp object nlp = spacy.load('en_core_web_sm') # Define a custom component def custom_component(doc): # Print the doc's length print('Doc length:' len(doc)) # Return the doc object return doc # Add the component first in the pipeline nlp.add_pipe(custom_component, first=True) # Print the pipeline component names print('Pipeline:', nlp.pipe_names) Pipeline: ['custom_component', 'tagger', 'parser', 'ner'] ADVANCED NLP WITH SPACY

  12. E x ample : a simple component (2) # Create the nlp object nlp = spacy.load('en_core_web_sm') # Define a custom component def custom_component(doc): # Print the doc's length print('Doc length:' len(doc)) # Return the doc object return doc # Add the component first in the pipeline nlp.add_pipe(custom_component, first=True) # Process a text doc = nlp("Hello world!") Doc length: 3 ADVANCED NLP WITH SPACY

  13. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

  14. E x tension attrib u tes AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  15. Setting c u stom attrib u tes Add c u stom metadata to doc u ments , tokens and spans Accessible v ia the ._ propert y doc._.title = 'My document' token._.is_color = True span._.has_color = False registered on the global Doc , Token or Span u sing the set_extension method # Import global classes from spacy.tokens import Doc, Token, Span # Set extensions on the Doc, Token and Span Doc.set_extension('title', default=None) Token.set_extension('is_color', default=False) Span.set_extension('has_color', default=False) ADVANCED NLP WITH SPACY

  16. E x tension attrib u te t y pes 1. A � rib u te e x tensions 2. Propert y e x tensions 3. Method e x tensions ADVANCED NLP WITH SPACY

  17. Attrib u te e x tensions Set a defa u lt v al u e that can be o v er w ri � en from spacy.tokens import Token # Set extension on the Token with default value Token.set_extension('is_color', default=False) doc = nlp("The sky is blue.") # Overwrite extension attribute value doc[3]._.is_color = True ADVANCED NLP WITH SPACY

  18. Propert y e x tensions (1) De � ne a ge � er and an optional se � er f u nction Ge � er onl y called w hen y o u retrie v e the a � rib u te v al u e from spacy.tokens import Token # Define getter function def get_is_color(token): colors = ['red', 'yellow', 'blue'] return token.text in colors # Set extension on the Token with getter Token.set_extension('is_color', getter=get_is_color) doc = nlp("The sky is blue.") print(doc[3]._.is_color, '-', doc[3].text) blue - True ADVANCED NLP WITH SPACY

  19. Propert y e x tensions (2) Span e x tensions sho u ld almost al w a y s u se a ge � er from spacy.tokens import Span # Define getter function def get_has_color(span): colors = ['red', 'yellow', 'blue'] return any(token.text in colors for token in span) # Set extension on the Span with getter Span.set_extension('has_color', getter=get_has_color) doc = nlp("The sky is blue.") print(doc[1:4]._.has_color, '-', doc[1:4].text) print(doc[0:2]._.has_color, '-', doc[0:2].text) True - sky is blue False - The sky ADVANCED NLP WITH SPACY

  20. Method e x tensions Assign a f u nction that becomes a v ailable as an object method Lets y o u pass arg u ments to the e x tension f u nction from spacy.tokens import Doc # Define method with arguments def has_token(doc, token_text): in_doc = token_text in [token.text for token in doc] # Set extension on the Doc with method Doc.set_extension('has_token', method=has_token) doc = nlp("The sky is blue.") print(doc._.has_token('blue'), '- blue') print(doc._.has_token('cloud'), '- cloud') True - blue False - cloud ADVANCED NLP WITH SPACY

  21. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

  22. Scaling and performance AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  23. Processing large v ol u mes of te x t Use nlp.pipe method Processes te x ts as a stream , y ields Doc objects M u ch faster than calling nlp on each te x t BAD : docs = [nlp(text) for text in LOTS_OF_TEXTS] GOOD : docs = list(nlp.pipe(LOTS_OF_TEXTS)) ADVANCED NLP WITH SPACY

  24. Passing in conte x t (1) Se � ing as_tuples=True on nlp.pipe lets y o u pass in (text, context) t u ples Yields (doc, context) t u ples Usef u l for associating metadata w ith the doc data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): print(doc.text, context['page_number']) This is a text 15 And another text 16 ADVANCED NLP WITH SPACY

  25. Passing in conte x t (2) from spacy.tokens import Doc Doc.set_extension('id', default=None) Doc.set_extension('page_number', default=None) data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): doc._.id = context['id'] doc._.page_number = context['page_number'] ADVANCED NLP WITH SPACY

  26. Using onl y the tokeni z er don ' t r u n the w hole pipeline ! ADVANCED NLP WITH SPACY

  27. Using onl y the tokeni z er (2) Use nlp.make_doc to t u rn a te x t in to a Doc object BAD : doc = nlp("Hello world") GOOD : doc = nlp.make_doc("Hello world!") ADVANCED NLP WITH SPACY

  28. Disabling pipeline components Use nlp.disable_pipes to temporaril y disable one or more pipes # Disable tagger and parser with nlp.disable_pipes('tagger', 'parser'): # Process the text and print the entities doc = nlp(text) print(doc.ents) restores them a � er the with block onl y r u ns the remaining components ADVANCED NLP WITH SPACY

  29. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend