TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN - - PowerPoint PPT Presentation

terminology systematization for cybersecurity domain in
SMART_READER_LITE
LIVE PREVIEW

TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN - - PowerPoint PPT Presentation

TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSIT DE NANTES C.LANZA@DIMES.UNICAL.IT BATRICE DAILLE FULL PROFESSOR UNIVERSIT DE


slide-1
SLIDE 1

TERMINOLOGY SYSTEMATIZATION FOR CYBERSECURITY DOMAIN IN ITALIAN LANGUAGE

CLAUDIA LANZA - PHD STUDENT, UNIVERSITY OF CALABRIA AND VISITING AT UNIVERSITÉ DE NANTES C.LANZA@DIMES.UNICAL.IT BÉATRICE DAILLE – FULL PROFESSOR UNIVERSITÉ DE NANTES

slide-2
SLIDE 2

OVERVIEW

Introduction

Objectives

Methodology outline

Domain

Corpus

First phase:

  • Mapping with standards
  • Experts of the domain
  • First Italian thesaurus draft for Cybersecurity

Second phase:

  • T

erm extraction software comparison

  • Candidate terms observation

Future works

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

2

slide-3
SLIDE 3

INTRODUCTION

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

Starting elements Problems Language of the thesaurus: Italian Standardization of the terminology into Italian language (few standards in Italian for Cybersecurity) T erminology extraction tools in Italian Weak level of granularity given by the terms extraction tools for hierarchy and synonymy detection Support of the experts of the domain Contrasting perspective in the adjustment of the terminological assets

3

slide-4
SLIDE 4

INTRODUCTION

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

4

Some terms are given by the English Standards and by the Italian offjcial glossaries in the English form and they are not translatable in Italian or the have become common use terminology Translation from Standards EN-ITA

  • Phishing
  • Spam
  • Smishing
  • Cyber trolling

CYBER-DEFENCE CYBER INTELLIGENCE

slide-5
SLIDE 5

OBJECTIVES

 Realizatjon of the Italian thesaurus of Cybersecurity;  Portray a comparison between the sofuware employed to confjgure a system of NLP rules to proceed with the

enhancement of the candidate terms selectjon;

 Set out a methodology that could automatjze, startjng from the source corpus, the semantjc updatjng for this

fjeld of knowledge by adjustjng the terminological asset of the thesaurus in order to achieve an adequate coverage of Cybersecurity domain.

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

5

slide-6
SLIDE 6

METHODOLOGY OUTLINE

Document selection and analysis T erminological extraction from documents (T2K) First terminological list datasets fjltered by frequency (TF/IDF) and accuracy Collaboration with the experts

  • f the

Cybersecurity domain Establishment of the semantic relationships between terms Mapping with the ICT Security vocabularies contained in the standards Creation of the thesaurus as a means of semantic control Software comparison: T2K, T ermSuite, Pke T erms

  • bservation

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

6

slide-7
SLIDE 7

DOMAIN

 Multisciplinarity: ICT and sub-ares (Audiovisual techniques; Computer software; Electronics;

etc)

 Specifjcity: technicisms and standardized terms;  Cross-fjeld: computer science fjeld, legislative systems, regulations.

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

7

slide-8
SLIDE 8

CORPUS

Criteria of selection: — time range: documents taken into account were ranged around the latest years, minimum the latest seven ; — language : only Italian documents have been considered for the analysis ; — contexts : national, European and regional laws have been analysed. Criteria of selection: — time range: documents taken into account were ranged around the latest years, minimum the latest seven ; — language : only Italian documents have been considered for the analysis ; — contexts : national, European and regional laws have been analysed. Portals AltaLex EUR-lex Guidlines (CERT) ENISA …etc Portals AltaLex EUR-lex Guidlines (CERT) ENISA …etc The overall size of the corpus is 563 documents with 11 806 558 terms contained in them.

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

8

slide-9
SLIDE 9

FIRST (AND SECOND) PHASE MAPPING WITH STANDARDS

 NIST 7298 2013 r2  ISO 27000:2016

Manually translated in Italian using IATE databases

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

9

slide-10
SLIDE 10

https://www.cybersecurityosservatorio.it/it/Services/ thesaurus.jsp

In the thesaurus

In the thesaurus

slide-11
SLIDE 11

FIRST PHASE: EXPERTS OF THE DOMAIN

Informatics and Telematics Institute of CNR in Pisa (Tuscany)

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

11

slide-12
SLIDE 12

FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY

 4 Main categories decided alongside with the experts in a abstraction information

process according to their presence in the Cybesecurity taxonomies included in the gold standards and their frequency in the terminological lists : Cybersecurity, Cybercriminality, Cyberbullism, Cyber Defence.

 245 candidate terms;  Semantic Relationships decided according to the Head-term based derived with T2K and

the approval by the experts;

 Scope notes (Defjnitions of the terms) added in accordance to the co-occurrences in the

  • ffjcial sources of the corpus

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

12

slide-13
SLIDE 13

FIRST PHASE: FIRST ITALIAN THESAURUS DRAFT FOR CYBERSECUIRTY

https://www.cybersecurityosservatorio.it/it/Services/th esaurus.jsp

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

13

slide-14
SLIDE 14

SECOND PHASE SOFTWARE COMPARISON

 T2K

has been used to extract the fjrst terminological dataset (Italian native tool);

 TermSuite a software that helped in enhancing the

structuring of the semantic relations by using variants (denominative, conceptual, linguistic) ;

 Pke as the library that with its models, T

  • picRank

and Multipartite Rank, supported the defjnition of topical information about the domain. In detail, for what concerns Multipartite Rank, the fjrst executions have been tested on the main four macro-categories included in the Italian thesaurus for Cybersecurity that have been validated by the group of experts : Cybersecurity, Cybercriminality, Cyberbullism, Cyber Defence.

T erms oriented Document oriented

Source Cybersecurity Cybercriminality Cyberbullism Cyber Defence Legal/technical Security; Cyber attacks; personal data; system Information; fjght; networks; cybercrime / On-line information; security; cyber attacks; defence Divulgative Cyber security; security; networks; smart cars Information; security; cyber criminality; networks Prevention; educational system; schools: education Attacks; cyber attacks; threats; networks

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

14

slide-15
SLIDE 15

CANDIDATE TERMS OBSERVATION

Term T2K TermSuite Pke Brute Force Attack 2 occurrences T erms: attacco a forza bruto; forza bruta 1 occurrence T erm : attacco bruto Not present But see for T erm: attack Legal document: security; systems; cyber attacks; cybesecurity ;personal data Denominative variants:

  • npna: attacco a forza bruto
  • npna: attacco di forza bruto

Divulgative document: website; system; e-mail; virus; hacker ethics Phishing 176 occurrences T erms: phishing techniques, phishing mail, spear phishing, phishing attacks, phishing website 2 occurrences T erm: phishing Not present Conceptual variant( expansion):

  • npn: messaggio di phishing

Trojan Horses Not present 1 occurrence T erm: Trojan Not present

New info from PKE Threat Intelligence clusterized as Cyber counter-espionage; informative systems, actions, intelligence

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

15

slide-16
SLIDE 16

FUTURE WORKS

 Enhancement of categories organization through Pke;  Detection of semantic relationships starting by a corpora processing;  Checking of the terminological coverage with respect to terms variation through time

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it TIA 2019 – 1 July

16

slide-17
SLIDE 17

THANK YOU FOR YOUR ATTENTION

Claudia Lanza – PhD student, University of Calabria and visiting at Université de Nantes c.lanza@dimes.unical.it

17