[PPT] - Seminar on Modern Software Engineering and Database Concepts Gunter PowerPoint Presentation

SLIDE 1

D S E B

Databases Software Engineering and

Seminar on Modern Software Engineering and Database Concepts

Gunter Saake, Jacob Kr¨ uger, David Broneske, Xiao Chen, Gabriel Campero Durand, Bala Gurumurthy, Sandro Schulze, Sabine Wehnert

Arbeitsgruppe Datenbanken und Software Engineering

12. Oktober 2018

SLIDE 2

D S E B

Databases Software Engineering and

Organisatorisches – Einordnung

Pro-Seminar, Bachelor, 3 CP Delivieries:

20 Min. Vortrag
Bewertung von 2 anderen Vortr¨

age Wissenschaftliches Seminar, Bachelor, 3CP Delivieries:

20 Min. Vortrag
5-8 Seiten Ausarbeitung
Bewertung einer anderen Ausarbeitung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 2

SLIDE 3

D S E B

Databases Software Engineering and

Organisatorisches – Durchf¨ uhrung

Einf¨

uhrungstermin (heute)

Treffen mit einem Betreuer aus der Arbeitsgruppe (je nach

Thema)

Vorlesungen zu wissenschaftlichem Schreiben und

Pr¨ asentationen an Einzelterminen

Vortr¨

age als Blockseminar – Termin wird am Ende festgelegt

Folien m¨

ussen eine Woche vor dem Vortrag eingereicht werden → Feedback vom Betreuer

Wissenschaftliche Ausarbeitung (nur Wiss. Sem.) muss 2

Wochen vor dem Vortrag abgegeben werden

Saake et al. Seminar on Modern Software Engineering and Database Concepts 3

SLIDE 4

D S E B

Databases Software Engineering and

Was machen wir?

Wichtige “Soft Skills“ oder auch Schl¨

usselkompetenzen erlernen

Vortragsweise und -stil ¨

uben

“Konferenzflair“ erleben
Ein wissenschaftliches Papier schreiben
Arbeit mit entsprechenden Vorlagen (Empfehlung: LaTex)
Einarbeitung in ein neues spannendes Thema
Themen kommen aus aktueller Forschung

→ M¨

gliches Thema f¨

ur Bachelor-Arbeit / Team-Projekt

Saake et al. Seminar on Modern Software Engineering and Database Concepts 4

SLIDE 5

D S E B

Databases Software Engineering and

Vortrag

20 Minuten Vortrag
5-10 Minuten Diskussion/Fragen
¨

Uberziehen: Redner wird abgew¨ urgt

Zu Fr¨

uh: Mehr Fragen (ggf. mehr Kritik)

Rechner wird gestellt, vor Veranstaltung Pr¨

asentationen testen!

Saake et al. Seminar on Modern Software Engineering and Database Concepts 5

SLIDE 6

D S E B

Databases Software Engineering and

Vortrag: Bewertung

D S E B

Evaluationsbogen Vortrag

Arbeitsgruppe Datenbanken und Software Engineering

Titel des Vortrags:

Sehr Gut . . . Neutral . . . Schlecht Pr¨ asentation Wertung: 1 2 3 4 5 6 7 Auftreten, z.B.

Ausstrahlung, Dynamik
Motivationsf¨

ahigkeit, ¨ Uberzeugungskraft Sprache und Stimme, z.B.

Lautst¨

arke, Modulation, Klarheit

Sprechgeschwindigkeit, -fl¨

ussigkeit Mimik und Gestik, z.B.

K¨
rperhaltung, Bewegungen
Ausdruck, Blickrichtung

Fachlicher Eindruck, z.B.

Kompetenz, Seriosit¨

at

Verbindlichkeit

Folien Wertung: 1 2 3 4 5 6 7 Design, z.B.

Farben, Schriften, Formatierungen
Klarheit, ¨

Ubersichtlichkeit

Umfang des Inhalts pro Folie

Visualisierungen (Tabellen, Grafiken,...), z.B.

Erkl¨

arungskraft, Klarheit

Beschriftungen
Referenzkonzept

Vortragskonzept Wertung: 1 2 3 4 5 6 7 Gliederung, z.B.

Aufteilung (Einleitung, Hauptteil, Schluss)
Die wichtigsten Punkte sind vorab klar

Argumentation, z.B.

Schl¨

ussigkeit, Roter Faden

Kompaktheit, Klarheit, Vollst¨

andigkeit Inhalt Wertung: 1 2 3 4 5 6 7 Erl¨ auterung der Rahmenbedingungen, z.B.

Hintergrund und Motivation (Begr¨

undung)

Nutzen und Ziele (Bedeutung)

Erl¨ auterung der Vorgehensweise, z.B.

Prinzip, Aufgaben, Ergebnisse
Probleme, Lessons Learned

Bemerkungen:

Saake et al. Seminar on Modern Software Engineering and Database Concepts 6

SLIDE 7

D S E B

Databases Software Engineering and

Wissenschaftliches Papier

Saake et al. Seminar on Modern Software Engineering and Database Concepts 7

SLIDE 8

D S E B

Databases Software Engineering and

Warum ein Papier schreiben?

Bekanntgeben von neuen Errungenschaften/Erfahrungen

Publizieren = Ultimatives Ergebnis wissenschaftlicher Arbeit
Forschung ist nie beendet, solange sie nicht publiziert wurde

Andere (z.B. Community) ¨ uber die eigene Arbeit informieren

Anerkennung/Beachtung
Kontakte, wertvolle Zusammen-/Mitarbeit
Feedback

F¨ ur euch: → ¨ Uben f¨ ur die Bachelor-Arbeit

Saake et al. Seminar on Modern Software Engineering and Database Concepts 8

SLIDE 9

D S E B

Databases Software Engineering and

Paper: Bewertung

D S E B

Evaluationsbogen wissenschaftliche Ausarbeitung

Arbeitsgruppe Datenbanken und Software Engineering

Titel der Ausarbeitung: Autor: Gutachter:

Sehr Gut . . . Neutral . . . Schlecht Titel, Abstract, Einleitung Wertung: 1 2 3 4 5 6 7 Diskussionspunkte (z.B.):

Geeigneter Titel
Qualit¨

at der Zusammenfassung

Hinreichende Motivation
Klare Problemstellung

Struktur Wertung: 1 2 3 4 5 6 7 Diskussionspunkte (z.B.):

Roter Faden
Sinnvolle Gliederung
Geeignete ¨

Uberschriften zum Inhalt Inhalt Wertung: 1 2 3 4 5 6 7 Diskussionspunkte (z.B.):

Ausreichende Grundlagen
Keine unn¨
tigen Informationen
Referenzierung
Plausible Begr¨

undungen

Klarheit der Vor-/Nachteile
Gute/korrekte Nutzung

von Beispielen

Saake et al. Seminar on Modern Software Engineering and Database Concepts 9

SLIDE 10

D S E B

Databases Software Engineering and

Themenvorstellung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 10

SLIDE 11

D S E B

Databases Software Engineering and

CPU “smaller than”-selection

i n t pos = 0; f o r ( i n t i =0; i < a r r a y s i z e ; ++i ){ i f ( a r r a y [ i ] < comp val ) r e s u l t [ pos++]=i ; }

GPU “smaller than”-selection

i n t t i d = t h r e a d I d x . x + b l o c k I d x . x ✯ blockDim . x ; w h i l e ( tid<a r r a y s i z e ){ bitmask [ t i d ] = ( a r r a y [ t i d ] < comparison value ) ; t i d += blockDim . x ✯ gridDim . x ; }

Code Optimizations (Broneske)

1.

B. Raducanu, P. Boncz, M. Zukowski. 2013. Micro Adaptivity in Vectorwise. SIGMOD

2.

K. Datta, M. Murphy, V. Volkov, et al. 2008. Stencil Computation Optimization and Auto-tuning on

State-of-the-Art Multicore Architectures. SC Saake et al. Seminar on Modern Software Engineering and Database Concepts 11

SLIDE 12

D S E B

Databases Software Engineering and

Multi-Dimensional Index Structures for Main Memory (Broneske)

Hauptspeicherdatenbanken sind ein heißes Forschungsthema. Aktuell stehen beschleunigte Scans im Fokus, wobei adaptierte, klassische Indexstrukturen jedoch nicht außer Acht gelassen werden sollten. Die Frage ist: Welche klassischen Indexstrukturen machen f¨ ur den Hauptspeicherbereich Sinn? Welche Adaptionen sind f¨ ur klassische Indexstrukturen in Hauptspeicherdatenbanken sinnvoll?

1. Volker Gaede und Oliver G¨

unther. 1998. Multidimensional access methods. ACM Computing Surveys

2. Kim, Changkyu, et al. 2010. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. SIGMOD Saake et al. Seminar on Modern Software Engineering and Database Concepts 12

SLIDE 13

D S E B

Databases Software Engineering and

Hybrid Storage In Practice (Campero)

Several commercial systems nowadays offer hybrid storage. This commonly means that data can be stored simultaneously in columnar format and in traditional index

structures. Recent evaluations confirm that there are benefits from such approach for

mixed workloads, provided that hybrid designs are properly recommended. In this seminar topic we will review recent results and study a prototypical hybrid design

advisor. To close we’ll analyze possible future directions.

1. Dziedzic, Adam, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree-Are Hybrid Physical Designs Important? ICMD 2. Abadi, Daniel, Peter Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases Saake et al. Seminar on Modern Software Engineering and Database Concepts 13

SLIDE 14

D S E B

Databases Software Engineering and

Economic Games for Data Management (Campero)

In designing autonomous strategies for multi-user data systems, researchers have found that topics from Game Theory (like Pareto dominance) and from Economic Theory (like Supply/Demand modeling) can be useful to optimize the trade-offs in scheduling a common pool of resources across users with different workloads. In this seminar topic we’ll give a brief overview on basics from both fields, and we’ll study some components of a novel data management system following these ideas (NashDB). Looking forward we’ll consider how models from these fields could hold relevance for self-driving data management beyond the proposed system.

1. Marcus, Ryan, Olga Papaemmanouil, Sofiya Semenova, and Solomon Garber. 2018. NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning. CDM 2. Pentaris, Fragkiskos, and Yannis Ioannidis. 2006. Query optimization in distributed networks of autonomous database systems. ACM Transactions on Database Systems Saake et al. Seminar on Modern Software Engineering and Database Concepts 14

SLIDE 15

D S E B

Databases Software Engineering and

Interactive Data Systems (Campero)

While traditional database systems are optimized for throughput or some analytic

perations on computer servers, highly interactive data systems expect diverse devices

with fast query interfaces (e.g. touchscreens or voice), and most optimizations must consider user interactions. In this seminar topic we will establish how these systems assist users in data exploration, we’ll review the challenges they face, some proposed benchmarks, and some system designs. The focus of the research can be determined by the student.

1. Crotty, Andrew, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2016. The case for interactive data exploration accelerators (IDEAs). HLDA 2. Jiang, Lilong, Protiva Rahman, and Arnab Nandi. 2018. Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines. ICMD 3. El-Hindi, Muhammad, Zheguang Zhao, Carsten Binnig, and Tim Kraska. 2016. Vistrees: Dast indexes for interactive data exploration. HLDA 4. de Paula R.A. 2018. Visualization Techniques. Encyclopedia of Big Data Technologies Saake et al. Seminar on Modern Software Engineering and Database Concepts 15

SLIDE 16

D S E B

Databases Software Engineering and

In-Database Machine Learning (Campero)

Businesses have an ongoing interest in Machine Learning (ML), since it provides means to extracting value from large amounts of data with limited effort. However, for enterprise ML, there are several practical challenges, among them the need to move data from the operational databases to the ML systems. In-database ML (IDBML) is a relatively novel approach that solves such challenge. In this seminar topic we’ll list existing IDBML offerings, and we’ll study some underlying techniques such as leveraging UDFs or learning over joins.

1. Kumar, Arun, Matthias Boehm, and Jun Yang. 2017. Data management in machine learning: Challenges, techniques, and systems. ICMD 2. Hellerstein, Joseph M., Christoper R´ e, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng et al. 2012. The MADlib analytics library: or MAD skills, the SQL. VLDB Saake et al. Seminar on Modern Software Engineering and Database Concepts 16

SLIDE 17

D S E B

Databases Software Engineering and

Skew Handling for Parallel Entity Resolution (Chen)

Entity resolution (ER) is a process to identify records that refer to the same real-world

entity. It is a very time consuming task. In order to improve its efficiency, blocking

techniques and parallel computation should be both applied for it. To ease parallel computation, big data processing frameworks are used to implement a parallel entity resolution process. However, the sizes of blocks often vary much, which leads to an uneven workload among different nodes based on the default load balancing strategy

f those big data processing frameworks. Hence, several skew handling strategies have

been designed for parallel ER to balance the workload of different nodes and reduce execution time.

1. Kolb, Lars, Andreas Thor, and Erhard Rahm. 2012. Load balancing for mapreduce-based entity resolution. ICDE 2. Yan, Wei, Yuan Xue, and Bradley Malin. 2013. Scalable load balancing for mapreduce-based record linkage. IPCCC 3. Mestre, Demetrio Gomes, and Carlos Eduardo Santos Pires. 2013. Improving load balancing for mapreduce-based entity matching. ISCC Saake et al. Seminar on Modern Software Engineering and Database Concepts 17

SLIDE 18

D S E B

Databases Software Engineering and

Active Learning for Entity Resolution (Chen)

Learning-Based entity resolution (ER) classifies record pairs into matches and non-matches using the classifier which is trained on a training dataset. However, the training dataset has to be labeled by human. In order to reduce human effort for labeling, active learning techniques are proposed, which requires much less training data to reach the same result quality compared to normal machine learning method. So far, different active learning approaches have been proposed for ER:

1. Sarawagi, Sunita, and Anuradha Bhamidipaty. ¨ Interactive deduplication using active learning. 2002. SIGKDD 2. Qian, Kun, Lucian Popa, and Prithviraj Sen. ¨ Active Learning for Large-Scale Entity Resolution. 2017. CIKM 3. De Freitas, Junio, et al. ¨ Active learning genetic programming for record deduplication. 2010. CEC Saake et al. Seminar on Modern Software Engineering and Database Concepts 18

SLIDE 19

D S E B

Databases Software Engineering and

Query optimization in heterogeneous hardware systems (Gurumurthy)

Current generation hardware are being exploited for processing queries efficiently. However, there is not one standard query processing technique present at the moment. In this topic, we explore the optimization strategies of different systems and summarize them for complete view of query processing in these environment.

1. Sebastian Breß, Max Heimel, Norbert Siegmund, Ladjel Bellatreche, Gunter Saake. 2014. GPU-accelerated Database Systems: Survey and Open Challenges? Transactions on Large-Scale Data- and Knowledge-Centered Systems 2. Tomas Karnagel, Dirk Habich, Wolfgang Lehner. 2015. Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments. EDBT/ICDT Saake et al. Seminar on Modern Software Engineering and Database Concepts 19

SLIDE 20

D S E B

Databases Software Engineering and

Industrial Interests in Systematic Software Reuse (Kr¨ uger)

Systematic software reuse in terms of software product lines is often only introduced after a larger set of different variants has evolved. For varying reasons, including cost reduction, faster development, or improved management, these variants are merged and integrated into a platform (reverse engineering). While there are several case studies that report on the migration processes and experiences, we still need a detailed analysis of the actual industrial motivations that lead to the adoption of product lines. To this end, we aim to analyze several years of the SPLC industry track to identify industrial case studies that are concerned with such migrations and identify the motivations of the organizations.

1. Identified within 3 years of SPLC industry track papers

Saake et al. Seminar on Modern Software Engineering and Database Concepts 20

SLIDE 21

D S E B

Databases Software Engineering and

Automated Test Refactoring (Kr¨ uger)

Software is regularly updated or refactored, for example, to remove errors, introduce new features, or migrate towards a new technology. However, any change in the productive software also means that corresponding test cases may break or are not sufficient anymore. The purpose of this survey is to identify and summarize existing techniques on automated test case refacotring, meaning techniques that track code changes and support developers in maintaining the test cases for these artifacts.

1. Peng-Hua Chu, Nien-Lin Hsueh, Hong-Hsiang Chen, and Chien-Hung Liu. 2012.

A Test Case Refactoring Approach for Pattern-Based Software Development. Software Quality Journal

2. Arie van Deursen, Leon Moonen, Alex van den Bergh, and Gerard Kok. 2002.

Extreme Programming Perspectives. Chapter Refactoring Test Code

Saake et al. Seminar on Modern Software Engineering and Database Concepts 21

SLIDE 22

D S E B

Databases Software Engineering and

Why do Developers use IDEs? (Kr¨ uger)

IDEs have emerged as the major tools used by developers to develop software. Arguably, they provide far more support compared to text editors or command line

interpreters. However, several studies have investigated developers’ tool usages and

suggest that they only use a minority of the IDEs’ functionalities. In this survey, we aim to summarize findings that report research in this direction. The main goal is to summarize what motivates developers to use an IDE and its functionalities. For instance, what refactorings are used to what extent or completely ignored in their daily work?

1. Murphy, G. C., Kersten, M., Findlater, L. 2006. How are Java software

developers using the Elipse IDE?. IEEE Software

2. Mohsen Vakilian, Nicholas Chen, Stas Negara, Balaji Ambresh Rajkumar, Brian
P. Bailey, and Ralph E. Johnson. 2012. Use, disuse, and misuse of automated
refactorings. ICSE

Saake et al. Seminar on Modern Software Engineering and Database Concepts 22

SLIDE 23

D S E B

Databases Software Engineering and

Comparing Data Stream Processing Architectures for Software Analytics (Schulze)

Software analytics (SA) are of increasing interest, as they allow to improve source code, development processes or even for decision-making. Such analytics heavily rely

n lots of (heterogeneous) data that accumulates during software development. For

such a mount of continuous data, stream processing architectures have been proposed. Task: Review data stream processing frameworks in the light of software analytics. Goal: Pros & cons of such frameworks for SA. Finally, a decision what framework/engine fits best given the inherent requirements of software analytics.

1. Buse, R. P. L., Zimmermann T. 2012. Information Needs for Software

Development Analytics. ICSE

2. Menzies, T., Zimmermann, T. 2013. Software analytics: So what?. IEEE

Software

3. Gousios, G. 2018. Big Data Sotware Analytics with Apache Spark. ICSE
4. In-Stream Big Data Processing (online blog)
5. Curated dataset of stream processing engines & libraries

(https://github.com/manuzhang/awesome-streaming)

Saake et al. Seminar on Modern Software Engineering and Database Concepts 23

SLIDE 24

D S E B

Databases Software Engineering and

Reasoning in Legal Ontologies (Wehnert)

Ontologies are conceptualizations of entities and relationships of a domain. They can be highly formalized and often provide reasoning / inference capabilities. The task is to find out which ontologies exist in the legal domain and which types of reasoning they support. Additionally, they shall be compared regarding their core components.

1. Wyner, Adam. 2008. An ontology in OWL for legal case-based reasoning. Artificial Intelligence and Law 2. Hoekstra, Rinke; Breuker, Joost; Di Bello, Marcello; Boer, Alexander et al. 2007. The LKIF Core Ontology

f Basic Legal Concepts. LOAIT

3. Ajani, Gianmaria; Boella, Guido; Caro, Luigi Di; Robaldo, Livio; Humphreys, Llio; Praduroux, Sabrina; Rossi, Piercarlo; Violato, Andre. 2016. The European Taxonomy Syllabus: A multi-lingual, multi-level

ntology framework to untangle the web of European legal terminology. Applied Ontology

4. El Ghosh, Mirna; Naja, H; Abdulrab, H; Khalil, M. 2016. Towards a middle-out approach for building legal domain reference ontology. International Journal of Knowledge Engineering Saake et al. Seminar on Modern Software Engineering and Database Concepts 24

SLIDE 25

D S E B

Databases Software Engineering and

N¨ achste Schritte

Anmeldung bis 19.10.2018 mit Name, Matrikelnr,

Themenwunsch an jkrueger@ovgu.de

Themenvergabe
Terminfindung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 25