introd u ction to spac y
play

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper The nlp object # Import the English language class from spacy.lang.en import English # Create the nlp object nlp = English() contains the


  1. Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  2. The nlp object # Import the English language class from spacy.lang.en import English # Create the nlp object nlp = English() contains the processing pipeline incl u des lang u age - speci � c r u les for tokeni z ation etc . ADVANCED NLP WITH SPACY

  3. The Doc object # Created by processing a string of text with the nlp object doc = nlp("Hello world!") # Iterate over tokens in a Doc for token in doc: print(token.text) Hello world ! ADVANCED NLP WITH SPACY

  4. The Token object doc = nlp("Hello world!") # Index into the Doc to get a single Token token = doc[1] # Get the token text via the .text attribute print(token.text) world ADVANCED NLP WITH SPACY

  5. The Span object doc = nlp("Hello world!") # A slice from the Doc is a Span object span = doc[1:4] # Get the span text via the .text attribute print(span.text) world! ADVANCED NLP WITH SPACY

  6. Le x ical attrib u tes doc = nlp("It costs $5.") print('Index: ', [token.i for token in doc]) print('Text: ', [token.text for token in doc]) print('is_alpha:', [token.is_alpha for token in doc]) print('is_punct:', [token.is_punct for token in doc]) print('like_num:', [token.like_num for token in doc]) Index: [0, 1, 2, 3, 4] Text: ['It', 'costs', '$', '5', '.'] is_alpha: [True, True, False, False, False] is_punct: [False, False, False, False, True] like_num: [False, False, False, True, False] ADVANCED NLP WITH SPACY

  7. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

  8. Statistical Models AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  9. What are statistical models ? Enable spaC y to predict ling u istic a � rib u tes in conte x t Part - of - speech tags S y ntactic dependencies Named entities Trained on labeled e x ample te x ts Can be u pdated w ith more e x amples to � ne - t u ne predictions ADVANCED NLP WITH SPACY

  10. Model Packages import spacy nlp = spacy.load('en_core_web_sm') Binar y w eights Vocab u lar y Meta information ( lang u age , pipeline ) ADVANCED NLP WITH SPACY

  11. Predicting Part - of - speech Tags import spacy # Load the small English model nlp = spacy.load('en_core_web_sm') # Process a text doc = nlp("She ate the pizza") # Iterate over the tokens for token in doc: # Print the text and the predicted part-of-speech tag print(token.text, token.pos_) She PRON ate VERB the DET pizza NOUN ADVANCED NLP WITH SPACY

  12. Predicting S y ntactic Dependencies for token in doc: print(token.text, token.pos_, token.dep_, token.head.text) She PRON nsubj ate ate VERB ROOT ate the DET det pizza pizza NOUN dobj ate ADVANCED NLP WITH SPACY

  13. Label Description E x ample ns u bj nominal s u bject She dobj direct object pi zz a det determiner ( article ) the ADVANCED NLP WITH SPACY

  14. Predicting Named Entities # Process a text doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion") # Iterate over the predicted entities for ent in doc.ents: # Print the entity text and its label print(ent.text, ent.label_) Apple ORG U.K. GPE $1 billion MONEY ADVANCED NLP WITH SPACY

  15. Tip : the e x plain method Get q u ick de � nitions of the most common tags and labels . spacy.explain('GPE') Countries, cities, states' spacy.explain('NNP') 'noun, proper singular' spacy.explain('dobj') 'direct object' ADVANCED NLP WITH SPACY

  16. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

  17. R u le - based Matching AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

  18. Wh y not j u st reg u lar e x pressions ? Match on Doc objects , not j u st strings Match on tokens and token a � rib u tes Use the model ' s predictions E x ample : " d u ck " (v erb ) v s . " d u ck " ( no u n ) ADVANCED NLP WITH SPACY

  19. Match patterns Lists of dictionaries , one per token Match e x act token te x ts [{'ORTH': 'iPhone'}, {'ORTH': 'X'}] Match le x ical a � rib u tes [{'LOWER': 'iphone'}, {'LOWER': 'x'}] Match an y token a � rib u tes [{'LEMMA': 'buy'}, {'POS': 'NOUN'}] ADVANCED NLP WITH SPACY

  20. Using the Matcher (1) import spacy # Import the Matcher from spacy.matcher import Matcher # Load a model and create the nlp object nlp = spacy.load('en_core_web_sm') # Initialize the matcher with the shared vocab matcher = Matcher(nlp.vocab) # Add the pattern to the matcher pattern = [{'ORTH': 'iPhone'}, {'ORTH': 'X'}] matcher.add('IPHONE_PATTERN', None, pattern) # Process some text doc = nlp("New iPhone X release date leaked") # Call the matcher on the doc matches = matcher(doc) ADVANCED NLP WITH SPACY

  21. Using the Matcher (2) # Call the matcher on the doc doc = nlp("New iPhone X release date leaked") matches = matcher(doc) # Iterate over the matches for match_id, start, end in matches: # Get the matched span matched_span = doc[start:end] print(matched_span.text) iPhone X match_id : hash v al u e of the pa � ern name start : start inde x of matched span end : end inde x of matched span ADVANCED NLP WITH SPACY

  22. Matching le x ical attrib u tes pattern = [ {'IS_DIGIT': True}, {'LOWER': 'fifa'}, {'LOWER': 'world'}, {'LOWER': 'cup'}, {'IS_PUNCT': True} ] doc = nlp("2018 FIFA World Cup: France won!") 2018 FIFA World Cup: ADVANCED NLP WITH SPACY

  23. Matching other token attrib u tes pattern = [ {'LEMMA': 'love', 'POS': 'VERB'}, {'POS': 'NOUN'} ] doc = nlp("I loved dogs but now I love cats more.") loved dogs love cats ADVANCED NLP WITH SPACY

  24. Using operators and q u antifiers (1) pattern = [ {'LEMMA': 'buy'}, {'POS': 'DET', 'OP': '?'}, # optional: match 0 or 1 times {'POS': 'NOUN'} ] doc = nlp("I bought a smartphone. Now I'm buying apps.") bought a smartphone buying apps ADVANCED NLP WITH SPACY

  25. Using operators and q u antifiers (2) Description {'OP': '!'} Negation : match 0 times {'OP': '?'} Optional : match 0 or 1 times {'OP': '+'} Match 1 or more times {'OP': '*'} Match 0 or more times ADVANCED NLP WITH SPACY

  26. Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend