Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural - - PowerPoint PPT Presentation
Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural - - PowerPoint PPT Presentation
Analysis of Wikileaks Cables Using NLP Techniques CS671: Natural language Processing Arpit Jain Sugam Anand Mentor : Dr . Amitabha Mukerjee Why Wikileaks ? Wikileaks embassy cables revelations covered a huge dataset of official documents
Why Wikileaks ?
Wikileaks embassy cables revelations covered a huge dataset
- f official documents counting around 251,287 , from more
than 250 worldwide US embassies and consulates.
The cables show the extent of US spying on its allies and the
UN; turning a blind eye to corruption and human rights abuse in "client states"; backroom deals with supposedly neutral countries; lobbying for US corporations; and the measures US diplomats take to advance those who have access to them.
Such a huge, rich and structured dataset can be analyzed with
natural language and Information retrieval techniques.
Distribution of cables
http://wikileaks.org/cablegate.html
Structure of Cables
Cable contains :
Source : Embassy which sent the cable: Destination : Target Embassies Date : Sending date Body : Containing the raw text Tags : Containing meta information regarding cable like classified,unclassified or secret etc.
Objective
Diplomats communicated about some topics referencing
people,places ,organizations.
Extract out these entities from the wikileaks. Guess what is the topic ? What is the Opinion of the diplomats (extends to america
also) towards the topic.
Map these over the timelines.
Methodology
Get cables for multiple time periods for given embassies.
Extract out the entities using NLTK Named Entity Recognizer
- r Stanford CoreNLP Toolkit
Score these entities using their occurency frequency over the different cables for a particular time frame.
Guess the topics using topic modelling approach like LDA, PLSA or LSI
Progress
For Iran RPO Dubai
- Total 3853 entities like 'IRIG','supreme leader
Khameni','Khatami','Mousavi','Islamic Revolution','Middle East'. For Islamabad
- 'Kashmir','Balochistan','Musharraf','North West Frontier
Province' For New Delhi
- 'PM Manmohan Sibgh','BJP','NSSP','Tsunami Relief'
LDA Results for Islamabad
Relief operation by UN ['0.211*"usaid/dart" + 0.178*"relief" + 0.115*"water" + 0.114*"earthquake" + 0.113*"shelter“ + 0.112*"tents" + 0.103*"october“ + 0.101*"u.n." + 0.097*"sanitation" + 0.095*"food"'] Existence of extremists in madrassa ["0.018*ssp + 0.016*( + 0.012*2005 + 0.010*groups + 0.010*domestic + 0.010*leaders + 0.010*extremist + 0.010*madrassa + 0.009*'s + 0.008*its", '0.000*rns. + 0.000*opened + 0.000*increase + 0.000*2005. + 0.000*receiving + 0.000*viable + 0.000*shows + 0.000*rebuilding + 0.000*e. + 0.000*jalil']
LDA Results for New Delhi
Nuclear Deal ['0.115*"saran" + 0.113*"bjp" + 0.109*"nuclear" + 0.107*"congress" + 0.105*"jaishankar" + 0.103*"king" + 0.099*"pakistan" + 0.097*"nssp“ + 0.094*"nepal" + 0.080*"iraq"']
References
@InProceedings{
- connor-stewart-smith-13_extracting-intl-relations-from-political-