Information Extraction in Illicit Web Domains Date: 2017/05/09 - PowerPoint PPT Presentation

Apr 09, 2023 •167 likes •374 views

Information Extraction in Illicit Web Domains Date: 2017/05/09 Author: Mayank Kejriwal, Pedro Szekely Source: ACM WWW 17 Advisor: Jia-ling Koh Speaker : Yi-hui Lee 1 Outline Introduction Approach Experiment Conclusion 2

Information Extraction in Illicit Web Domains Date: 2017/05/09 Author: Mayank Kejriwal, Pedro Szekely Source: ACM WWW’ 17 Advisor: Jia-ling Koh Speaker : Yi-hui Lee 1
Outline • Introduction • Approach • Experiment • Conclusion 2
Introduction • Information Extraction: 3
Introduction(cont.) • Information Extraction on Dark web(human trafficking): ages (of human trafficking victims) locations prices of services posting dates 4
Introduction(cont.) • A high-level overview of the proposed information extraction approach: input: Dark Web step 1 step 3 step 4 word representation learning preprocessing supervised classifier Apply recognizers step 2 output:Annotated corpus 5
Outline • Introduction • Approach • Experiment • Conclusion 6
Approach • Step 1. Preprocessing: Readability Text Extractor(RTE): -> Mercury Web Parser NLTK: -RTE string output -> sentence tokenize -> word tokenize -> list of tokens 7
Approach(cont.) • Step 2. Apply recognizers: GeoNames-Cities GeoNames-States RegEx-Ages: use regular expressions Dictionary-Names: person names 8
Approach(cont.) • Step 3. Word Representation learning: D1: The cow is in the farm. D2: I jumped over the farm. D3: I saw a cow in the farm. D1 D2 D3 The 1 0 0 cow 1 0 1 jumped 0 1 0 over 0 1 0 the 1 1 1 moon 0 0 0 farm 1 1 1 sim(cow, farm) = 2/(sqrt(2)+sqrt(3)) = 0.64 sim(cow, moon) = 0 9
Approach(cont.) • Step 3. Word Representation learning: Random Index [27] - randomly assigned -1, 0, 1 to the vector’s attribute 10
Approach(cont.) • Step 4. Supervised Contextual Classifier: Aggregate vectors -> l2-normalization I saw a cow jumped over the farm saw = [1, 0, 0, …, 1, 0] a = [1, 1, 1, …, 1, 1] cow = [1, 0, 1, …, 0, 0] jumped = [0, 0, 0, …, 1, 1] over = [0, 0, 1, …, 0, 1] aggregate = [1, 0, 0, …, 1, 0, 1, 1, 1, …, 1, 1, ……, 0, 1] l2-normalization = [0.0001, 0, 0, …, 0.0001, 0, 0.0001, 0.0001, 0.0001, …, 0.0001, 0.0001, ……, 0.0000, 1] 11
Approach(cont.) • Step 4. Supervised Contextual Classifier: Classifier: Random forest 12
Outline • Introduction • Approach • Experiment • Conclusion 13
Experiment • Datasets and Ground-truths: Research conducted in the DARPA MEMEX program • Ground-truths: 14
Experiment(cont.) • Baselines: Stanford Named Entity Recognition system (NER) 15
Experiment(cont.) • Evaluation: 16
Experiment(cont.) • Feature selection: 17
Outline • Introduction • Approach • Experiment • Conclusion 18
Conclusion • We presented a lightweight, feature-agnostic Information Extraction approach that is suitable for illicit Web domains. • Our approach relies on unsupervised derivation of word representations from an initial corpus, and the training of a supervised contextual classifier using external high-recall recognizers and a handful of manually verified annotations. • Real-world settings: End Human Trafficking hackathon organized by the office of the District Attorney of New York17 19

Recommend

Global, East Europe, Middle East , p , & Africa Illicit Trade Overview TISA Conference

Global, East Europe, Middle East , p , & Africa Illicit Trade Overview TISA Conference November 2013 Brian OConnell Regional Head of Anti-Illicit Trade Regional Head of Anti Illicit Trade Types of Illicit Trade The Global Illicit

762 views • 21 slides

Information Extraction in Illicit Web Domains Mayank Kejriwal Pedro Szekely Information Sciences

Information Extraction in Illicit Web Domains Mayank Kejriwal Pedro Szekely Information Sciences Institute Information Sciences Institute USC Viterbi School of Engineering USC Viterbi School of Engineering kejriwal@isi.edu pszekely@isi.edu

565 views • 10 slides

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary Tatlock, Dan Grossman 1 Extraction 2 Extraction K coq 2 Extraction K coq 2 Extraction K coq Extraction 2 Extraction K coq Extraction K

1.3k views • 106 slides

Scott Domains for Denotational Semantics and Program Extraction Ulrich Berger Swansea University

Scott Domains for Denotational Semantics and Program Extraction Ulrich Berger Swansea University Workshop Domains Oxford, 7-8 July 2018 1 / 46 Overview 1. Domains 2. Computability 3. Denotational semantics 4. Program extraction 5.

659 views • 46 slides

National study on the illicit trade and its implications 5 July 2018 TISA SCOPE & MANDATE

THE ILLICIT TOBACCO TRADE Presentation by TISA Chairman to CGCSA Colloquium on the Illicit Economy 07 September 2018 National study on the illicit trade and its implications 5 July 2018 TISA SCOPE & MANDATE TISA is the industry

256 views • 11 slides

WITH AFRICA Richard Parry African Caucus Luanda, Angola August 2015 1. ILLICIT FINANCIAL

ENGAGEMENT WITH AFRICA Richard Parry African Caucus Luanda, Angola August 2015 1. ILLICIT FINANCIAL FLOWS What are illicit financial flows? - Funds tied to illicit or Criminal criminal activity Money activities Laundering -

449 views • 28 slides

6/12/2015 Construction/Permitting Violations Illicit Discharge/ Illicit Connection/

6/12/2015 Construction/Permitting Violations Illicit Discharge/ Illicit Connection/ Improper Disposal Failure to Comply with a Permit Failure to Comply with Permanent Stormwater Management Requirements Failure to Comply with a

185 views • 4 slides

Black, white, or shades of grey? Illicit opioid and methamphetamine use in the ACT findings from

Black, white, or shades of grey? Illicit opioid and methamphetamine use in the ACT findings from the 2014 ACT Illicit Drug Reporting System (IDRS) IDRS Australian Capital Territory 86% - recent illicit opioids 75% heroin 23%

367 views • 4 slides

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil Extraction for O Organics i Joe Boyd Environmental Express Charleston, SC , Various Extraction Techniques Various Extraction Techniques

475 views • 24 slides

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A long way to get here What is a Web Service? What is a Web Service? What is a Web Service? Web Services Web Services Software service :

552 views • 33 slides

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with Embedded with Embedded Using Extraction Predicates Extraction Predicates Warren Shen, AnHai Doan, Jeffrey Naughton University of Wisconsin,

566 views • 25 slides

The most popular illicit drug

Past Month Illicit Drug Use among Persons Aged 12 or Older: 2013 The most popular illicit drug h#p://www.samhsa.gov/data/sites/default/files/NSDUHresultsPDFWHTML2013/Web/NSDUHresults2013.htm Marijuana use in the past month among

793 views • 19 slides

Bi-Continuous Domains and Some Old Problems in Domain Theory Talk at Domains IX Klaus Keimel

Bi-Continuous Domains and Some Old Problems in Domain Theory Talk at Domains IX Klaus Keimel October 21, 2008 Warning: These Notes contain the contents of my Talk at Domains IX. There may be mistakes. References are incomplete. Comments are

142 views • 11 slides

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

What is Web Mining? Wh t i W b Mi i What is Web Mining? Wh t i W b Mi i ? ? Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques to automat cally d scover and extract nformat on automatically

774 views • 20 slides

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years old Evolution of the Web The Future of the Web? THE SEMANTIC WEB The Semantic Web what is the Semantic Web? Semantic Web?

1.35k views • 99 slides

ILLICIT FINANCIAL FLOWS (IFF) TRACK IT, STOP IT, RECOVER IT! Presentation on the Report of the

ILLICIT FINANCIAL FLOWS (IFF) TRACK IT, STOP IT, RECOVER IT! Presentation on the Report of the High-Level Panel on Illicit Financial Flows from Africa Emmanuel Nnadozie Executive Secretary Presentation 5 Focus areas IFF: What it is

1.99k views • 31 slides

MAOC (N) MARITIME ANALYSIS and OPERATIONS CENTRE (NARCOTICS) 1 Reasons to establish MAOC (N)

Challenges and Opportunities in Maritime Security and Surveillance for Effective Governance and Innovation in the EUs Maritime Domain The MAOC (N) Experience Jos Ferreira Leite Dublin, 8 th -9 th April, 2013 Director MAOC (N) MARITIME

2.15k views • 12 slides

Juvenile Justice Reform Initiatives in the United States World Conference on Justice for Children

Juvenile Justice Reform Initiatives in the United States World Conference on Justice for Children 2018 Nati tion onal Cou ouncil cil of of Juveni venile le and Famil ily y Cou ourt t Judges ges Hon onor orab able e Ton ony y

1.03k views • 41 slides

International Conference On Physical Protection of Nuclear Material and Nuclear Facilities

International Conference On Physical Protection of Nuclear Material and Nuclear Facilities November 2017 IAEA-CN-254-268 Legal element for Physical protection Legal element for Physical protection regime Sudanese as case study regime Sudanese

847 views • 19 slides

Public Policy Blueprint 2019-2020 AAUW Summer Leadership Meeting 27 July 2019 Voter Registration

Public Policy Blueprint 2019-2020 AAUW Summer Leadership Meeting 27 July 2019 Voter Registration and GOTV Legislative Agenda and Public Policy Priorities Survey Voter Registration and GOTV Next Virginia Election: 5 November 2019

391 views • 11 slides

Maritime Security: A U.S. Government Perspective Presentation to the International Propeller

Maritime Security: A U.S. Government Perspective Presentation to the International Propeller Club, Geneva Mr. Richard Douglas Deputy Assistant Secretary of Defense Counternarcotics, Counterproliferation and Global Threats April 22, 2008

844 views • 13 slides

CORPORATE PRESENTATION www.defensemetals.com TSXV :DEFN OTCQB :DFMTF FSE : 35D FORWARD LOOKING

NORTH AMERICAN RARE EARTH CORPORATE PRESENTATION www.defensemetals.com TSXV :DEFN OTCQB :DFMTF FSE : 35D FORWARD LOOKING STATEMENTS This presentation includes certain statements that constitute forward-looking information within the

485 views • 21 slides

Investor Presentation KBW Winter Financial Services Symposium February 14-15, 2019 Disclaimer

Investor Presentation KBW Winter Financial Services Symposium February 14-15, 2019 Disclaimer Forward rd-Loo Looking State temen ments ts This presentation may contain forward-looking statements within the meaning of the Private

772 views • 42 slides

Security and Intelligence Services (India) Ltd INVESTOR PRESENTATION November 2017 2 SAFE

Security and Intelligence Services (India) Ltd INVESTOR PRESENTATION November 2017 2 SAFE HARBOUR This presentation and the accompanying slides (the Presentation), which have been prepared by Security and Intelligence Services (India)

612 views • 49 slides

Information Extraction in Illicit Web Domains Date: 2017/05/09 - PowerPoint PPT Presentation

Information Extraction in Illicit Web Domains Date: 2017/05/09 Author: Mayank Kejriwal, Pedro Szekely Source: ACM WWW 17 Advisor: Jia-ling Koh Speaker : Yi-hui Lee 1 Outline Introduction Approach Experiment Conclusion 2

Global, East Europe, Middle East , p , & Africa Illicit Trade Overview TISA Conference

Information Extraction in Illicit Web Domains Mayank Kejriwal Pedro Szekely Information Sciences

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Scott Domains for Denotational Semantics and Program Extraction Ulrich Berger Swansea University

National study on the illicit trade and its implications 5 July 2018 TISA SCOPE & MANDATE

WITH AFRICA Richard Parry African Caucus Luanda, Angola August 2015 1. ILLICIT FINANCIAL

6/12/2015 Construction/Permitting Violations Illicit Discharge/ Illicit Connection/

Black, white, or shades of grey? Illicit opioid and methamphetamine use in the ACT findings from

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

The most popular illicit drug

Bi-Continuous Domains and Some Old Problems in Domain Theory Talk at Domains IX Klaus Keimel

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

ILLICIT FINANCIAL FLOWS (IFF) TRACK IT, STOP IT, RECOVER IT! Presentation on the Report of the

MAOC (N) MARITIME ANALYSIS and OPERATIONS CENTRE (NARCOTICS) 1 Reasons to establish MAOC (N)

Juvenile Justice Reform Initiatives in the United States World Conference on Justice for Children

International Conference On Physical Protection of Nuclear Material and Nuclear Facilities

Public Policy Blueprint 2019-2020 AAUW Summer Leadership Meeting 27 July 2019 Voter Registration

Maritime Security: A U.S. Government Perspective Presentation to the International Propeller

CORPORATE PRESENTATION www.defensemetals.com TSXV :DEFN OTCQB :DFMTF FSE : 35D FORWARD LOOKING

Investor Presentation KBW Winter Financial Services Symposium February 14-15, 2019 Disclaimer

Security and Intelligence Services (India) Ltd INVESTOR PRESENTATION November 2017 2 SAFE

Sambuz

Useful Links

Newsletter

Mail Us

Information Extraction in Illicit Web Domains Date: 2017/05/09 - PowerPoint PPT Presentation

Information Extraction in Illicit Web Domains Date: 2017/05/09 Author: Mayank Kejriwal, Pedro Szekely Source: ACM WWW 17 Advisor: Jia-ling Koh Speaker : Yi-hui Lee 1 Outline Introduction Approach Experiment Conclusion 2

Global, East Europe, Middle East , p , &amp; Africa Illicit Trade Overview TISA Conference

Information Extraction in Illicit Web Domains Mayank Kejriwal Pedro Szekely Information Sciences

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Scott Domains for Denotational Semantics and Program Extraction Ulrich Berger Swansea University

National study on the illicit trade and its implications 5 July 2018 TISA SCOPE &amp; MANDATE

WITH AFRICA Richard Parry African Caucus Luanda, Angola August 2015 1. ILLICIT FINANCIAL

6/12/2015 Construction/Permitting Violations Illicit Discharge/ Illicit Connection/

Black, white, or shades of grey? Illicit opioid and methamphetamine use in the ACT findings from

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

The most popular illicit drug

Bi-Continuous Domains and Some Old Problems in Domain Theory Talk at Domains IX Klaus Keimel

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

ILLICIT FINANCIAL FLOWS (IFF) TRACK IT, STOP IT, RECOVER IT! Presentation on the Report of the

MAOC (N) MARITIME ANALYSIS and OPERATIONS CENTRE (NARCOTICS) 1 Reasons to establish MAOC (N)

Juvenile Justice Reform Initiatives in the United States World Conference on Justice for Children

International Conference On Physical Protection of Nuclear Material and Nuclear Facilities

Public Policy Blueprint 2019-2020 AAUW Summer Leadership Meeting 27 July 2019 Voter Registration

Maritime Security: A U.S. Government Perspective Presentation to the International Propeller

CORPORATE PRESENTATION www.defensemetals.com TSXV :DEFN OTCQB :DFMTF FSE : 35D FORWARD LOOKING

Investor Presentation KBW Winter Financial Services Symposium February 14-15, 2019 Disclaimer

Security and Intelligence Services (India) Ltd INVESTOR PRESENTATION November 2017 2 SAFE

Sambuz

Useful Links

Newsletter

Mail Us

Global, East Europe, Middle East , p , & Africa Illicit Trade Overview TISA Conference

National study on the illicit trade and its implications 5 July 2018 TISA SCOPE & MANDATE