TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE
CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSITÉ DE NANTES C.LANZA@DIMES.UNICAL.IT BÉATRICE DAILLE – FULL PROFESSOR UNIVERSITÉ DE NANTES
TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN - - PowerPoint PPT Presentation
TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSIT DE NANTES C.LANZA@DIMES.UNICAL.IT BATRICE DAILLE FULL PROFESSOR UNIVERSIT DE
CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSITÉ DE NANTES C.LANZA@DIMES.UNICAL.IT BÉATRICE DAILLE – FULL PROFESSOR UNIVERSITÉ DE NANTES
Introduction
Objectives
Methodology outline
Domain
Corpus
First phase:
Second phase:
erm extraction software comparison
Future works
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
2
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
3
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
4
Realizatjon of the Italian thesaurus of Cybersecurity; Portray a comparison between the sofuware employed to confjgure a system of NLP rules to proceed with the
Set out a methodology that could automatjze, startjng from the source corpus, the semantjc updatjng for this
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
5
Document selection and analysis T erminological extraction from documents (T2K) First terminological list datasets fjltered by frequency (TF/IDF) and accuracy Collaboration with the experts
Cybersecurity domain Establishment of the semantic relationships between terms Mapping with the ICT Security vocabularies contained in the standards Creation of the thesaurus as a means of semantic control Software comparison: T2K, T ermSuite, Pke T erms
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
6
Multisciplinarity: ICT and sub-ares (Audiovisual techniques; Computer software; Electronics;
Specifjcity: technicisms and standardized terms; Cross-fjeld: computer science fjeld, legislative systems, regulations.
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
7
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
8
NIST 7298 2013 r2 ISO 27000:2016
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
9
https://www.cybersecurityosservatorio.it/it/Services/ thesaurus.jsp
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
11
FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY
4 Main categories decided alongside with the experts in a abstraction information
245 candidate terms; Semantic Relationships decided according to the Head-term based derived with T2K and
Scope notes (Defjnitions of the terms) added in accordance to the co-occurrences in the
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
12
FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
13
T2K
has been used to extract the fjrst terminological dataset (Italian native tool);
TermSuite a software that helped in enhancing the
structuring of the semantic relations by using variants (denominative, conceptual, linguistic) ;
Pke as the library that with its models, T
and Multipartite Rank, supported the defjnition of topical information about the domain. In detail, for what concerns Multipartite Rank, the fjrst executions have been tested on the main four macro-categories included in the Italian thesaurus for Cybersecurity that have been validated by the group of experts : Cybersecurity, Cybercriminality, Cyberbullism, Cyber Defence.
Source Cybersecurity Cybercriminality Cyberbullism Cyber Defence Legal/technical Security; Cyber attacks; personal data; system Information; fjght; networks; cybercrime / On-line information; security; cyber attacks; defence Divulgative Cyber security; security; networks; smart cars Information; security; cyber criminality; networks Prevention; educational system; schools: education Attacks; cyber attacks; threats; networks
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
14
Term T2K TermSuite Pke Brute Force Attack 2 occurrences T erms: attacco a forza bruto; forza bruta 1 occurrence T erm : attacco bruto Not present But see for T erm: attack Legal document: security; systems; cyber attacks; cybesecurity ;personal data Denominative variants:
Divulgative document: website; system; e-mail; virus; hacker ethics Phishing 176 occurrences T erms: phishing techniques, phishing mail, spear phishing, phishing attacks, phishing website 2 occurrences T erm: phishing Not present Conceptual variant( expansion):
Trojan Horses Not present 1 occurrence T erm: Trojan Not present
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
15
Enhancement of categories organization through Pke; Detection of semantic relationships starting by a corpora processing; Checking of the terminological coverage with respect to terms variation through time
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it TIA 2019 – 1 July
16
Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it
17