Python basics for NLP Type : which python3 in the command prompt - PDF document

Programming environment  Is Python installed Python basics for NLP – Type : which python3 in the command prompt (applications->utilitaires->Terminal) Vincent Claveau – It should answer: /usr/bin/python3 IRISA-CNRS  Text file editor like emacs, gedit... – Create a new document named hello.py, save it as pure txt (not rtf, doc...) and type: #!/usr/bin/python3 #coding: utf-8 print('hello world\n') Python basics Programming environment Data structures  Scalar: var  Go in the directory where you saved the file – – Windows: cd Dekstop cd .. integer: var = 3 float: var = 7.456 – – Unix/Mac: ls pwd string: var = 'I love Python'  Make hello.py an executable – boolean: var = True var = False  List (table, array...) – Type chmod u+x hello.py in the command prompt t_misc = ['titi', 3.1415, var] t_misc.append('toto')  Run your program t_misc[0] = 'tutu'  Dictionary (associative array, hash...) – ./hello.py h_misc2 = { 'pi': 3.1415, 12:'December' } h_misc2['vincent'] = 'claveau' Python basics Python basics Programming structures Programming structures • • Conditional structures (note the indent) Loops if nb == 5: while nb < 100: … … elif line == “blabla\n” or 3 in t_prime: for i in range(0,len(t_word)): … … else for my_key in h_word2count: … …

Python basics Python basics • • Regular expressions (regex) About lists t_integer = [1,2,3,4,5,6,7,8,9,10] t_even = [ num for num in t_integer if num%2 == 0 ] m = re.search( '^(D|d)upon.', line) t_decreasing = sorted( t_integer, key=lambda a: -a) if m is not None: … t_word_count = [ ('toto',2), ('titi',32), ('tata',12) ] m = re.search( '^[^\t]*\t(.+)$', line) t_dec_pair = sorted( t_word_count, key=lambda a,b: (-b,a) ) if m is not None: h_count[ m.group(1) ] += 1 var = re.sub( '[ \t]{2,8}([A-Z])', '\t\g<1>', var) Python basics Python basics • • About dictionary functions def myScore( tf , df , N = 1): h_word2count = { 'toto': 2, 'titi': 32, 'tata': 12 } if df > 0: for (w,c) in h_word2count.items( ) : …. score = tf**2 * log( N / df ) else: h_doc2word2count = defaultdict(lambda: defaultdict(lambda: 0) ) print('something strange happens here\n') h_doc2word2count[ 'doc_450' ][ 'toto' ] += 1 score = 0 t_doc = h_doc2word2count.keys( ) return score t_count = h_doc2word2count[ 'doc_450' ].values( ) num = 540 for w in h_word2info: for (doc_id, freq) in h_word2info[w]: print myScore(freq, h_word2df[w],num) Python examples Python examples  Read an (existing) file my_file.txt  Explore a file : count lines fh = open('my_file.txt', 'r') line_nb = 0 your operations on file handle fh for line in open('mon_fichier.txt', 'r'): fh.close() line_nb += 1 or for line in codecs.open('my_file.txt', 'r', 'utf-8'): … or with open(...) as fh: print('There are' , line_nb , 'lines\n') for line in fh: ….  Write (create) a file output.txt fh = codecs.open('my_file.txt', 'w', 'utf-8') fh.write('I write strings !\n' + 'I convert if needed ' + str(10/3) + '\n') fh.close()

Python examples Python examples  Explore a file: put lines in an array – reverse  Explore a file : count lines with a dictionary – printing of the file frequency of beginning letter of line t_lines = [ ] h_line2count = defaultdict(lambda: 0) # better idea than { } for line in open( 'my_file.txt', 'r' ): re_letter = re.compile('^[^A-Za-z]*([A-Za-z])') t_lines.append( line.rstrip('\r\n') ) for line in open('my_file.txt', 'r' ) : ## h_line2count[ line.rstrip('\r\n') ] += 1 for i in range(len(t_lines)): m = re_letter.search(line) print t_lines[len(t_lines)-1-i] if m is not None: h_line2count[ m.group(1) ]+=1 print sorted(h_line2count.items(), key=lambda (c,v): (-v,c)) Python typical headers Python typical headers  Load common libraries  Read arguments from command line import argparse from __future__ import division parser = argparse.ArgumentParser() import os, subprocess, codecs, sys, glob, re, getopt, random, operator, pickle parser.add_argument("-t", "--train", dest="file_train", from math import * help="file containing the training docs", metavar="FILE") reload(sys) parser.add_argument("-s", "--stop", dest="file_stop", sys.setdefaultencoding('utf-8') default='common_words.total_fr.txt', help="FILE contains stop words", metavar="FILE") from collections import defaultdict parser.add_argument("-v", "--verbose", action="store_false", prg = sys.argv[0] dest="verbose", default=True, help="print status messages to stdout") def P(output=''): input(output+"\nDebug point; Press ENTER to continue") args = parser.parse_args() def Info(output=''): sys.stderr.write(output+'\n') NLTK Python exercises  Count the number of words in the PoS tagged  Natural Language Tool Kit corpus → ~4400 – Contains corpus, resources  Count how many times each word appears → – Contains basic tools: tagger, chunker... the:202  Use a local easy-install  Count the average occurrence of common nouns – easy-install --instal-dir monRepPython (NN, NNS, NNP) → 2.61099 –  Count word co-occurrences (2 words occurring Check your PYTHONPATH in a same sentence)  Install the needed data/tools import nltk nltk.download()

Python debugging Python debugging Compiling errors Runtime errors  First focus on the first syntax error  Common causes – syntax error at ./my_prg.py line 1209, near "}" – Uninitialised variables: h_occ['Le'], – common causes: wrong indent or paren ('(','{') mismatch t_count[502]  Then process the other errors as they – Division by 0, log 0 appears – Global symbol "df" unknown – Common causes : forgot to declare/initialize a variable (toto = 0)

Python basics for NLP Type : which python3 in the command prompt - PDF document

Programming environment Is Python installed Python basics for NLP Type : which python3 in the command prompt (applications->utilitaires->Terminal) Vincent Claveau It should answer: /usr/bin/python3 IRISA-CNRS Text file

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Getting Started with Python The Python Interpreter A piece of software that executes

We already know Java. Why learn Python? Using Python to Implement Algorithms Python has far less

Computer Science Class XI ( As per CBSE Board) Visit : python.mykvs.in for regular updates

CMPT 120 Basics of Python Summer 2012 Instructor: Hassan Khosravi Python A simple

= Introduction to Computer Programming Python Basics CSCI-UA 2 High-level programming

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

CS1150 Principles of Computer Science Introduction Yanyan Zhuang Department of Computer Science

Data Cleaning Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

110 Announcements Announcements - Houses How-to use Zoom for Office-hours Video Posted on

Data Cleaning Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

Coding with dunetpc and LArSoft: tips, tricks, gotchas, and debugging advice Tom Junk DUNE

Lecture #13: More Sequences and Strings Last modified: Tue Mar 18 16:17:54 2014 CS61A: Lecture

CSC 1010 Lecture 8 What do we know so far? Class lecture, lab, Rephactor, Quick Checks,

Automa'c, Efficient, and General Repair of So8ware Defects using

Python basics for NLP Type : which python3 in the command prompt - PDF document

Programming environment Is Python installed Python basics for NLP Type : which python3 in the command prompt (applications->utilitaires->Terminal) Vincent Claveau It should answer: /usr/bin/python3 IRISA-CNRS Text file

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Getting Started with Python The Python Interpreter A piece of software that executes

We already know Java. Why learn Python? Using Python to Implement Algorithms Python has far less

Computer Science Class XI ( As per CBSE Board) Visit : python.mykvs.in for regular updates

CMPT 120 Basics of Python Summer 2012 Instructor: Hassan Khosravi Python A simple

= Introduction to Computer Programming Python Basics CSCI-UA 2 High-level programming

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

CS1150 Principles of Computer Science Introduction Yanyan Zhuang Department of Computer Science

Data Cleaning Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

110 Announcements Announcements - Houses How-to use Zoom for Office-hours Video Posted on

Data Cleaning Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

Coding with dunetpc and LArSoft: tips, tricks, gotchas, and debugging advice Tom Junk DUNE

Lecture #13: More Sequences and Strings Last modified: Tue Mar 18 16:17:54 2014 CS61A: Lecture

CSC 1010 Lecture 8 What do we know so far? Class lecture, lab, Rephactor, Quick Checks,

Automa'c, Efficient, and General Repair of So8ware Defects using

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons