The IMS Corpus WorkBench Marco Baroni University of Bologna - PowerPoint PPT Presentation

Oct 29, 2023 •106 likes •267 views

The IMS Corpus WorkBench Marco Baroni University of Bologna Granada Morphology and Corpora Seminar The IMS Corpus WorkBench Institut fr Maschinelle Sprachverarbeitung of the University of Stuttgart Early to mid 90s: Oliver

The IMS Corpus WorkBench Marco Baroni University of Bologna Granada “Morphology and Corpora” Seminar
The IMS Corpus WorkBench ◮ Institut für Maschinelle Sprachverarbeitung of the University of Stuttgart ◮ Early to mid 90s: Oliver Christ ◮ Late 90s to 2005: Stefan Evert ◮ From 2006: open source project led by Stefan Evert, hosted on SourceForge ◮ http://www.ims.uni-stuttgart.de/projekte/ CorpusWorkbench/ http://cwb.sourceforge.net/
The CWB toolkit ◮ Toolkit of command-line programs ◮ Tools to encode/index corpus ◮ Tools to explore corpus (in particular, cqp, the corpus query processor for interactive exploration of corpus) ◮ Supported on most Unix platforms: Linux, Mac OS X, Solaris ◮ Programmatic interface to develop, e.g., Web-based front-end
Advantages over alternatives ◮ Alternatives: WordSketch Engine, Xaira, WordSmith. . . ◮ Only CWB satisfies all of following requirements: ◮ Scaling up to very large corpora ◮ Flexible, annotation-aware queries ◮ Flexible input format ◮ Central storage of corpora ◮ Command-line interface for easy interaction with other tools ◮ Free, open source, active support and documentation community
Problems ◮ At the moment, corpora larger than about 400M tokens will have to be split into sub-corpora ◮ No standard Web interface supporting full (or even sizable subset of) cqp options ◮ (Virtually) no query optimization, i.e., [pos="V.*"][lemma="dog" ] will be much slower than [lemma="dog" pos="V.*"] ◮ Ongoing work on first two issues
Corpus representation ◮ Positional attributes: properties of words, e.g., pos and lemma ◮ Structural attributes: meta-data and constituency information
Possible input 1 The dog barks
Possible input 2 The ART the dog NN dog barks VV bark
Possible input 3 <s> The ART the dog NN dog barks VV bark </s>
Possible input 4 <text title="poem" author_sex="m"> <s> The ART the dog NN dog barks VV bark </s> </text>
Possible input 5 <text title="poem" author_sex="m"> <s> <np> The ART the dog NN dog </np> <vp> barks VV bark </vp> </s> </text>
Possible input 6 The n dog y barks n
Possible input 7... ...
The IMS corpus creation pipe ◮ Save corpus document(s) as plain text ◮ Tag and lemmatize with TreeTagger ( http://www.ims.uni-stuttgart.de/projekte/ corplex/TreeTagger/DecisionTreeTagger.html ) ◮ Index with CWB ◮ Enjoy! ◮ Often, literally a matter of minutes

Recommend

IMS Single Door Enclosures IMS Two Door Enclosures IMS Single Door with Slave Door

IMS Single Door Enclosures IMS Two Door Enclosures IMS Single Door with Slave Door Disconnect IMS Two Door with Slave Door Disconnect Plinth Bases with Black Textured Finish Galvanized Sub-Panels Galvanized Filler Plates

303 views • 29 slides

17 o f 46 Ac c ide nts 26 o f 58 Ac c ide nts 35% 45% $76,858 in WC Cla ims Pa id Out $78,628

2018 2019 2017 2018 17 o f 46 Ac c ide nts 26 o f 58 Ac c ide nts 35% 45% $76,858 in WC Cla ims Pa id Out $78,628 in WC Cla ims Pa id Out 50% o f T o ta l Cla ims! 65% o f T o ta l Cla ims! 7/ 1/ 2016- 6/ 30/ 2018 6 o f 18 Ac c

215 views • 8 slides

Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing

Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing (Routledge Advances in Corpus Linguistics) Elena Semino, Mick Short Click here if your download doesn"t start automatically Corpus Stylistics:

389 views • 5 slides

The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant

The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant patterns Launching the Corpus Statistics Group 11 th Feb. 2016 University of Birmingham The Corpus Statistics group Core members (not just

462 views • 19 slides

Your Cloud Based Modeling Workbench in 15 minutes with Eclipse Sirius @melaniebats CTO @Obeo

Your Cloud Based Modeling Workbench in 15 minutes with Eclipse Sirius @melaniebats CTO @Obeo ECLIPSE SIRIUS An Eclipse project to easily create your own Graphical Modeling Workbench Sirius is Alive! PART OF THE RELEASE TRAIN MORE THAN

1.14k views • 70 slides

100% JDclare Language Workbench Software Factories DSL Workbenches - PMW DSL Workbenches -

100% JDclare Language Workbench Software Factories DSL Workbenches - PMW DSL Workbenches - Iglu DSL Workbenches - Alef Immediate Feedback Dclare Dclare JDclare 100% JDclare Language Workbench

536 views • 11 slides

OGI-IT (L)IMS an overview IMS-Information Management System Information is stored in

OGI-IT (L)IMS an overview IMS-Information Management System Information is stored in databases (DB) - we use document oriented DB so that the DB Model (collections of documents) can match as close as possible the real world document model.

450 views • 33 slides

HYDRATHERM IMS HYDRATHERM IMS FIREPROOF PRODUCT SEMINAR FIREPROOF PRODUCT SEMINAR Cemkrete s

HYDRATHERM IMS HYDRATHERM IMS FIREPROOF PRODUCT SEMINAR FIREPROOF PRODUCT SEMINAR Cemkrete s Product Range 1. Concrete Admixture 2. Concrete Repair, Grout and Anchors 3 3. Formwork Aids/ Curing Agent Formwork Aids/ Curing Agent 4.

713 views • 20 slides

About IMS IMS factory is largest dredge manufacturing facility in the Americas.

4010 LP Versi-Dredge DREDGER HARVESTER About IMS IMS factory is largest dredge manufacturing facility in the Americas. Approximately 200 employees. 40 minutes away from international airport.

373 views • 20 slides

I.M. Skaugen SE Annual General Meeting IMS Innovative Maritime Solutions Oslo, March 18 th 2011

I.M. Skaugen SE Annual General Meeting IMS Innovative Maritime Solutions Oslo, March 18 th 2011 1 IMS Specialized marine transport services IMS Marine transport niche markets Petrochemical gas transport Small scale LNG Integrated

521 views • 23 slides

The IMS catalog: a real life implementation November 2018 Brahm Lambrechts 1 IMS Catalog: a

The IMS catalog: a real life implementation November 2018 Brahm Lambrechts 1 IMS Catalog: a real life implementation Agenda Introduction 1. Present situation at our client 2. Future situation at our client 3. Basic steps to enable the

218 views • 19 slides

IMS based NGN IMS based NGN Architecture Architecture and its application and its application

I nternational Telecom m unication Union ITU-T IMS based NGN IMS based NGN Architecture Architecture and its application and its application Dick Knight BT Group plc I TU-T W orkshop NGN and its Transport Netw orks Kobe, 2 0 -2 1

219 views • 17 slides

IM IMS Data Quality Engagement Ja Jane Webster Key areas to help improve IM IMS data

IM IMS Data Quality Engagement Ja Jane Webster Key areas to help improve IM IMS data collection and quality Accessing the right people Regular meetings and contact Data collection reminders Open discussion regarding the

437 views • 8 slides

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Trustworthy. Florent Solt,

TrustedOut Corpus Intelligence Corpus Intelligence Makes Intelligence Trustworthy. Florent Solt, CTO & co-founder GESTE, Feb 20 th 2019, Paris. The problem: Distrust in media. TrustedOut Corpus Intelligence ?? ?? The consequence: In

441 views • 12 slides

MACAQ : A Multi Annotated Corpus to study how we adapt Answers to various Questions Anne

Overview Why a corpus of human answers? Corpus constitution Corpus annotation Conclusion MACAQ : A Multi Annotated Corpus to study how we adapt Answers to various Questions Anne Garcia-Fernandez , Sophie Rosset, Anne Vilnat LIMSI-CNRS and

565 views • 35 slides

SH 358 IMPROVEMENTS Corpus Christi District Updated October 2018 SH 358 Improvements Corpus

SH 358 IMPROVEMENTS Corpus Christi District Updated October 2018 SH 358 Improvements Corpus Christi District All dates & schedules are subject to change Updated October 2018 Project Overview Length: 15 miles Total cost: $49.96

711 views • 34 slides

WEBINAR - Data Privacy: New Regulation and Implications for Big Data Approaches 29 Nov, 12h CET

WEBINAR - Data Privacy: New Regulation and Implications for Big Data Approaches 29 Nov, 12h CET 2 Re Research Exemptions in in t the G GDPR M. Mostert, LLM Julius Center, University Medical Center Utrecht Introduction Recital 157 GDPR:

495 views • 20 slides

A Semantic Hierarchy for Intuitionistic Logic Guram Bezhanishvili and Wesley H. Holliday

A Semantic Hierarchy for Intuitionistic Logic Guram Bezhanishvili and Wesley H. Holliday New Mexico State University University of California, Berkeley ToLo VI, July 5, 2018 An advertisement for our paper, A Semantic Hierarchy

1.6k views • 127 slides

Ab Initio Molecular Dynamcis MolSim2018 Evert Jan Meijer Amsterdam Center for Multiscale

Ab Initio Molecular Dynamcis MolSim2018 Evert Jan Meijer Amsterdam Center for Multiscale Modeling Van t Hoff Insitute for Molecular Chemistry University of Amsterdam Example: Chemical Processes in Complex Environment Key Notions Chemical

564 views • 20 slides

Steiner Triple Systems Lucia Moura School of Electrical Engineering and Computer Science

STS Intro Constructing STSs using Latin Squares Steiner Triple Systems Lucia Moura School of Electrical Engineering and Computer Science University of Ottawa lucia@eecs.uottawa.ca Winter 2017 Steiner Triple Systems Lucia Moura STS Intro

397 views • 20 slides

Vector Semantics Natural Language Processing Lecture 17 Adapted from Jurafsky and Martnn v3

Vector Semantics Natural Language Processing Lecture 17 Adapted from Jurafsky and Martnn v3 Why vector models of meaning? computng the similarity between words fast is similar to rapid tall is similar to height

1.21k views • 89 slides

The interplay between conceptual and referential aspects of meaning Gemma Boleda Universitat

The interplay between conceptual and referential aspects of meaning Gemma Boleda Universitat Pompeu Fabra (work in collaboration with Louise McNally) BRIDGE Workshop ESSLLI 2018, 610 August 2018, Sofia, Bulgaria 1 Acknowledgements

1.18k views • 68 slides

Mickey Muskopf Greg Hunt Charter Airlift Branch, TCAQ-CP Deputy Chief, Sealift Services Division

Mickey Muskopf Greg Hunt Charter Airlift Branch, TCAQ-CP Deputy Chief, Sealift Services Division USTRANSCOM, Acquisition USTRANSCOM, Acquisition 618-220-7114 618-220-7077 michael.w.muskopf.civ@mail.mil gregory.v.hunt2.civ@mail.mil

206 views • 16 slides

Digitized photographic plate photometry with VaST software Kirill Sokolovsky Michigan State

Digitized photographic plate photometry with VaST software Kirill Sokolovsky Michigan State University and Sternberg Astronomical Instjtute How to fjnd a variable star? One way is image subtractjon... Example of image subtractjon: discovery

776 views • 21 slides