Seminar on Modern Software Engineering and Database Concepts Gunter - - PowerPoint PPT Presentation

▶

Aug 01, 2023 871 likes •1.12k views

Databases D B and Software S E Engineering Seminar on Modern Software Engineering and Database Concepts Gunter Saake, David Broneske, Gabriel Campero Durand, Bala Gurumurthy, Jacob Kr uger, Sabine Wehnert, Roman Zoun Arbeitsgruppe

SLIDE 1

D S E B

Databases Software Engineering and

Seminar on Modern Software Engineering and Database Concepts

Gunter Saake, David Broneske, Gabriel Campero Durand, Bala Gurumurthy, Jacob Kr¨ uger, Sabine Wehnert, Roman Zoun

Arbeitsgruppe Datenbanken und Software Engineering

04. April 2019

SLIDE 2

D S E B

Databases Software Engineering and

Organisatorisches – Einordnung

Pro-Seminar, Bachelor, 3 CP Delivieries:

20 Min. Vortrag
Bewertung von 2 anderen Vortr¨

age Wissenschaftliches Seminar, Bachelor, 3CP Delivieries:

20 Min. Vortrag
5-8 Seiten Ausarbeitung
Bewertung einer anderen Ausarbeitung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 2

SLIDE 3

D S E B

Databases Software Engineering and

Organisatorisches – Durchf¨ uhrung

Einf¨

uhrungstermin (heute)

Treffen mit Betreuer aus der Arbeitsgruppe (je nach Thema)
Vorlesungen zu wissenschaftlichem Schreiben und

Pr¨ asentationen an Einzelterminen

Vortr¨

age als Blockseminar (Termin wird am Ende festgelegt)

Folien werden eine Woche vor Vortrag eingereicht (Feedback)
Wissenschaftliche Ausarbeitung (nur Wiss. Sem.) muss 2

Wochen vor dem Vortrag abgegeben werden

Saake et al. Seminar on Modern Software Engineering and Database Concepts 3

SLIDE 4

D S E B

Databases Software Engineering and

Was machen wir?

Wichtige ”Soft Skills“ (Schl¨

usselkompetenzen) erlernen

Vortragsweise und -stil ¨

uben

”Konferenzflair“ erleben
Ein wissenschaftliches Papier schreiben
Arbeit mit entsprechenden Vorlagen (Empfehlung: L

EX)

Einarbeitung in ein neues spannendes Thema
Themen kommen aus aktueller Forschung

→ M¨

gliches Thema f¨

ur Bachelor-Arbeit / Team-Projekt

Saake et al. Seminar on Modern Software Engineering and Database Concepts 4

SLIDE 5

D S E B

Databases Software Engineering and

Vortrag

20 Minuten Vortrag
5-10 Minuten Diskussion/Fragen
¨

Uberziehen: Redner wird abgew¨ urgt

Zu Fr¨

uh: Mehr Fragen (ggf. mehr Kritik)

Rechner wird gestellt, vor Veranstaltung testen!

Saake et al. Seminar on Modern Software Engineering and Database Concepts 5

SLIDE 6

D S E B

Databases Software Engineering and

Vortrag: Bewertung

D S E B

Evaluationsbogen Vortrag

Arbeitsgruppe Datenbanken und Software Engineering

Titel des Vortrags:

Sehr Gut . . . Neutral . . . Schlecht Pr¨ asentation Wertung: 1 2 3 4 5 6 7 Auftreten, z.B.

Ausstrahlung, Dynamik
Motivationsf¨

ahigkeit, ¨ Uberzeugungskraft Sprache und Stimme, z.B.

Lautst¨

arke, Modulation, Klarheit

Sprechgeschwindigkeit, -fl¨

ussigkeit Mimik und Gestik, z.B.

K¨
rperhaltung, Bewegungen
Ausdruck, Blickrichtung

Fachlicher Eindruck, z.B.

Kompetenz, Seriosit¨

Verbindlichkeit

Folien Wertung: 1 2 3 4 5 6 7 Design, z.B.

Farben, Schriften, Formatierungen
Klarheit, ¨

Ubersichtlichkeit

Saake et al. Seminar on Modern Software Engineering and Database Concepts 6

SLIDE 7

D S E B

Databases Software Engineering and

Wissenschaftliches Papier

Saake et al. Seminar on Modern Software Engineering and Database Concepts 7

SLIDE 8

D S E B

Databases Software Engineering and

Warum ein Papier schreiben?

Bekanntgeben von neuen Errungenschaften/Erfahrungen

Publizieren ist das Ergebnis wissenschaftlicher Arbeit
Forschung ist nie beendet, solange sie nicht publiziert wurde

Andere (z.B. Community) ¨ uber die eigene Arbeit informieren

Anerkennung/Beachtung
Kontakte, wertvolle Zusammen-/Mitarbeit
Feedback

F¨ ur euch: → ¨ Uben f¨ ur die Bachelor-Arbeit

Saake et al. Seminar on Modern Software Engineering and Database Concepts 8

SLIDE 9

D S E B

Databases Software Engineering and

Paper: Bewertung

D S E B

Evaluationsbogen wissenschaftliche Ausarbeitung

Arbeitsgruppe Datenbanken und Software Engineering

Titel der Ausarbeitung: Autor: Gutachter:

Sehr Gut . . . Neutral . . . Schlecht Titel, Abstract, Einleitung Wertung: 1 2 3 4 5 6 7 Diskussionspunkte (z.B.):

Geeigneter Titel
Qualit¨

at der Zusammenfassung

Hinreichende Motivation
Klare Problemstellung

Struktur Wertung: 1 2 3 4 5 6 7 Diskussionspunkte (z.B.):

Roter Faden

Saake et al. Seminar on Modern Software Engineering and Database Concepts 9

SLIDE 10

D S E B

Databases Software Engineering and

Themenvorstellung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 10

SLIDE 11

D S E B

Databases Software Engineering and

CPU “smaller than”-selection

i n t pos = 0; f o r ( i n t i =0; i < a r r a y s i z e ; ++i ){ i f ( a r r a y [ i ] < comp val ) r e s u l t [ pos++]=i ; }

GPU “smaller than”-selection

i n t t i d = t h r e a d I d x . x + b l o c k I d x . x ∗ blockDim . x ; w h i l e ( tid<a r r a y s i z e ){ bitmask [ t i d ] = ( a r r a y [ t i d ] < comparison value ) ; t i d += blockDim . x ∗ gridDim . x ; }

Code Optimizations (Broneske)

B. Raducanu, P. Boncz, M. Zukowski. 2013. Micro Adaptivity in Vectorwise. SIGMOD

K. Datta, M. Murphy, V. Volkov, et al. 2008. Stencil Computation Optimization and Auto-tuning on

State-of-the-Art Multicore Architectures. SC Saake et al. Seminar on Modern Software Engineering and Database Concepts 11

SLIDE 12

D S E B

Databases Software Engineering and

Multi-Dimensional Index Structures for Main Memory (Broneske)

Hauptspeicherdatenbanken sind ein heißes Forschungsthema. Aktuell stehen beschleunigte Scans im Fokus, wobei adaptierte, klassische Indexstrukturen jedoch nicht außer Acht gelassen werden sollten. Die Frage ist: Welche klassischen Indexstrukturen machen f¨ ur den Hauptspeicherbereich Sinn? Welche Adaptionen sind f¨ ur klassische Indexstrukturen in Hauptspeicherdatenbanken sinnvoll?

1. Volker Gaede und Oliver G¨

unther. 1998. Multidimensional access methods. ACM Computing Surveys

2. Kim, Changkyu, et al. 2010. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. SIGMOD Saake et al. Seminar on Modern Software Engineering and Database Concepts 12

SLIDE 13

D S E B

Databases Software Engineering and

Machine Learning on Graph-Databases (Campero)

Graph databases are a special kind of general data management system optimized for network-oriented analytical queries and storage. They are mainly developed to support a specific representation of a graph, namely property graphs. However, recent trends require further features from these databases, either to support novel data representations (embeddings) or highly efficient feature engineering processes. In this seminar topic we aim to study some of these trends, by considering one of two applications: machine learning on networks, or graph-based recommenders. For the chosen domain we describe carefully the domain, we take a detailed look at a given example study, and we outline the implications for system development.

1. Eksombatchai, Chantat, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3+ billion items to 200+ million users in real-time. WWW 2. Cao, Yixin, Xiang Wang, Xiangnan He, and Tat-Seng Chua. 2019. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. 3. Hodler, Amy E., and Needham, Mark. 2019. Graph Algorithms. 4. Mutlu, Ece C., and Toktam A. Oghaz. 2019. Review on Graph Feature Learning and Feature Extraction Techniques for Link Prediction. Saake et al. Seminar on Modern Software Engineering and Database Concepts 13

SLIDE 14

D S E B

Databases Software Engineering and

Learning to Hash (Campero)

High-dimensional data (e.g. images or latent representations) is increasingly becoming important for advanced analytical use cases in the industry. However, working efficiently with such data requires clever hashing schemes that could accelerate similarity searches, through improvements in data organization. In this seminar topic we aim to create a taxonomy of hashing approaches for similarity search. We also propose to consider closely two approaches (one simple and one using supervised-learning). As a bonus, we seek to report on libraries and repositories available, helping in the adoption of these useful techniques for everyday data management.

1. Pagh R. 2018. Similarity Sketching. Encyclopedia of Big Data Technologies. 2. Wang, Jun, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2016. Learning to hash for indexing big data?A

survey. Proceedings of the IEEE.

Saake et al. Seminar on Modern Software Engineering and Database Concepts 14

SLIDE 15

D S E B

Databases Software Engineering and

Evolution of column-oriented RDBMS operations in modern hardware perspective (Gurumurthy)

Current trend in RDBMS is moving towards close-to-metal re-implementation of typical DBMS operations for underlying hardware. With the availability of newer features (like multi-core, SIMD) as well as device architectures (GPU, FPGAs) in the hardware landscape researches are done in tuning the operations to adapt to the

hardware. In this work, we would survey the evolution of DBMS operations with

reference points for the newer hardware availabilities. The work, in the end, provides a view on the hardware landscape with changes being applied to the DBMS operations and also the areas of dense and sparse researches.

1. Sebastian Breß: GPU-Accelerated Database Systems: Survey and Open Challenges. 2. Peter Bakkum: Accelerating SQL database operations with CUDA 3. Bin Sheng He: Relational co-processing in graphics processors. 4. J Zhou: Implementing Database Operations Using SIMD Instructions. Saake et al. Seminar on Modern Software Engineering and Database Concepts 15

SLIDE 16

D S E B

Databases Software Engineering and

GPU Cache management techniques for data processing environment (Gurumurthy)

Due to limited cache space in a GPU, not all the input data can be processed and stored in GPU. As an alternative, hot input data buffers are proposed to be stored in a GPU for further processing without transfer overhead. In this work, we will look into the issue of caching in GPU and list the possible alternatives for caching in a GPU. Since column cannot be directly stored within a GPU, we look for alternative representation of data that is still sufficient for performing database operations over them (like bitmap, position list etc.) Overall, the work presents the state of the art techniques in intermediate representation for storing column in a GPU as well as the buffer management techniques used for caching in GPU.

1. Holger Pirk: Waste Not.. Efficient Co-Processing of Relational Data 2. Jiong He: In-cache query co-processing on coupled CPU-GPU architectures 3. Peter Bakkum: Efficient Data Management for GPU Databases 4. Guenther Schindler: Techniques for Caches in GPUs Saake et al. Seminar on Modern Software Engineering and Database Concepts 16

SLIDE 17

D S E B

Databases Software Engineering and

Industrial Interests in Systematic Software Reuse (Kr¨ uger)

Systematic software reuse in terms of software product lines is often only introduced after a larger set of different variants has evolved. For varying reasons, including cost reduction, faster development, or improved management, these variants are merged and integrated into a platform (reverse engineering). While there are several case studies that report on the migration processes and experiences, we still need a detailed analysis of the actual industrial motivations that lead to the adoption of product lines. To this end, we aim to analyze several years of the SPLC industry track to identify industrial case studies that are concerned with such migrations and identify the motivations of the organizations.

1. Identified within 3 years of conference/journal papers 2. Rabiser, R., Schmid, K., Becker, M., Botterweck, G., Galster, M., Groher, I., Weyns, D. (2018). A study and comparison of industrial vs. academic software product line research published at SPLC. International Conference on Systems and Software Product Line. 14-24. ACM. Saake et al. Seminar on Modern Software Engineering and Database Concepts 17

SLIDE 18

D S E B

Databases Software Engineering and

Automated Test Refactoring (Kr¨ uger)

Software is regularly updated or refactored, for example, to remove errors, introduce new features, or migrate towards a new technology. However, any change in the productive software also means that corresponding test cases may break or are not sufficient anymore. The purpose of this survey is to identify and summarize existing techniques on automated test case refacotring, meaning techniques that track code changes and support developers in maintaining the test cases for these artifacts.

1. Peng-Hua Chu, Nien-Lin Hsueh, Hong-Hsiang Chen, and Chien-Hung Liu. 2012. A Test Case Refactoring Approach for Pattern-Based Software Development. Software Quality Journal 2. Arie van Deursen, Leon Moonen, Alex van den Bergh, and Gerard Kok. 2002. Extreme Programming

Perspectives. Chapter Refactoring Test Code

Saake et al. Seminar on Modern Software Engineering and Database Concepts 18

SLIDE 19

D S E B

Databases Software Engineering and

How do We Forget? (Kr¨ uger)

Understanding a program is an essential activity in software engineering and the research area of program comprehension is extensively investigated. However, most studies are concerned with recovering understanding of a program and how to improve code design for this purpose. Such processes resemble learning of artifacts. In contrast, the process of forgetting in software engineering is rarely investigated. With this project, we aim to provide an overview on existing studies that are concerned with forgetting in software engineering and what factors affect developers’ memory.

1. Kr¨ uger, J., Wiemann, J., Fenske, W., Saake, G., Leich, T. (2018). Do you remember this source code?. International Conference on Software Engineering. 764-775. IEEE. 2. Fritz, T., Murphy, G., Hill, E. 2007. Does a Programmer?sActivity Indicate Knowledge of Code? Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations ofSoftware Engineering. ACM, 341?350. 3. Kang, K., Hahn, J. (2009). Learning and Forgetting Curves in Software Development: Does Type of Knowledge Matter? International Conference on Information Systems. Saake et al. Seminar on Modern Software Engineering and Database Concepts 19

SLIDE 20

D S E B

Databases Software Engineering and

Legal Big Data: Rumor or Reality? (Wehnert)

Since the number of laws worldwide is constantly increasing, it is overwhelming for a single person to keep track of new or changed regulations. We are interested in whether this large amount of text is already a case for common Big Data applications and Cloud Computing. While it is quite common to use these methods for social network data generated each millisecond, such as tweets, the creation time of new regulations is significantly larger and therefore we assume that overall less text is to be analyzed in the legal domain. Are there already use cases for legal data streaming and distributed text processing or any task justifying the term ”Legal Big Data”?

1. Moses, Lyria Bennett, and Janet Chan. ¨ Using big data for legal and law enforcement decisions: Testing the new tools.¨ UNSWLJ 37 (2014): 643. 2. Sokolova, Marina. ”Big text advantages and challenges: classification perspective.¨ International Journal of Data Science and Analytics 5.1 (2018): 1-10. 3. Legal Reasoning and Big Data: Opportunities and Challenges (Grigoris Antoniou, George Baryannis, Sotiris Batsakis, Guido Governatori, Livio Robaldo, Giovanni Siragusa, Ilias Tachmazidis), 2018. Saake et al. Seminar on Modern Software Engineering and Database Concepts 20

SLIDE 21

D S E B

Databases Software Engineering and

Ensuring National Compliance to European Law with Text Analysis (Wehnert)

Nowadays, legal compliance systems need to monitor how national norms change in relation to each other. The European Union has its own legislation, which is subsequently concretized by each member country. Unfortunately, there can be differences between the national law and the European law directives which are causing conflicts for individuals and companies operating in multiple countries. The aim of this work is to find state-of-the-art methods using artificial intelligence for finding the similarities and differences between national laws and their European

counterparts. What are the best approaches to find law violations?

1. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives (Rohan Nanda, Giovanni Siragusa, Luigi Di Caro, Guido Boella, Lorenzo Grossio, Marco Gerbaudo, Francesco Costamagna), In Artificial Intelligence and Law, 2018 2. Cardellino C, Teruel M, Alemany LA, Villata S (2017) A low-cost, high-coverage legal named entity recognizer, classifier and linker. In: Proceedings of the 16th edition of the international conference on artificial intelligence and law. ACM, pp 9?18 3. Fjelstul, Joshua C., and Clifford J. Carrubba. ”The politics of international oversight: Strategic monitoring and legal compliance in the European Union.¨ American Political Science Review 112.3 (2018): 429-445. Saake et al. Seminar on Modern Software Engineering and Database Concepts 21

SLIDE 22

D S E B

Databases Software Engineering and

Cloud-based Protein Identification (Zoun)

Mass spectrometers are devices to digitize real world samples with growing success on the market. The technology sequences proteins to identify protein biomarkers of biological environments, such as oceans, humans, or microbial communities which are used in the research fields proteomics, metaproteomics and metabolomics. These biomarkers are similar to a fingerprint and can be used to identify the sample data. Due to the fast quality upgrades of the mass spectrometer, they produce ever-increasing amounts of data, resulting in terabytes of output data by a single

machine. The analysis step, so called protein identification, is used to bring insights

into the sample data. The protein identification is now a big data problem. Task: Find protein identification solutions which use big data technology and map them to the big data landscape.

R. Millioni, C. Franchin, P. Tessari, R. Polati, D. Cecconi, and G. Arrigoni. Pros and cons of peptide

isolectric focusing in shotgun proteomics. Journal of chromatography. A, 1293:19, 2013. 2.

R. D. Bjornson, N. J. Carriero, C. Colangelo, M. Shifman, K.-H. Cheung, P. L. Miller, and K. Williams.

X!!tandem, an improved method for running x!tandem in parallel on collections of commodity computers. Journal of Proteome Research, 7(1):293?299, 2008. PMID: 17902638 Saake et al. Seminar on Modern Software Engineering and Database Concepts 22

SLIDE 23

D S E B

Databases Software Engineering and

N¨ achste Schritte

Anmeldung bis 12.04.2019 mit Name, Matrikelnr,

Themenwunsch an jkrueger@ovgu.de (und euren Betreuer)

Themenvergabe
Terminfindung

Saake et al. Seminar on Modern Software Engineering and Database Concepts 23