EGI- FORUM Vilnius 2001 CORAL A Relational Abstraction Layer for - - PowerPoint PPT Presentation

egi forum vilnius 2001
SMART_READER_LITE
LIVE PREVIEW

EGI- FORUM Vilnius 2001 CORAL A Relational Abstraction Layer for - - PowerPoint PPT Presentation

EGI-InSPIRE EGI- FORUM Vilnius 2001 CORAL A Relational Abstraction Layer for C++ and Python applications Raffaello Trentadue On behalf of the IT-ES Persistency Framework group 1 4/12/11 www.egi.eu www.egi.eu EGI-InSPIRE RI-261323


slide-1
SLIDE 1

www.egi.eu EGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.eu EGI-InSPIRE RI-261323

EGI- FORUM Vilnius 2001

CORAL A Relational Abstraction Layer for C++ and Python applications

4/12/11 1

Raffaello Trentadue On behalf of the IT-ES Persistency Framework group

slide-2
SLIDE 2

www.egi.eu EGI-InSPIRE RI-261323

Outline

4/12/11 2

 The context where CORAL was designed and implemented

 CORAL INTRODUCTION  What’s CORAL  Why CORAL?  CORAL in practice  DB connection  Table Creation/Filling/Deletion  Query  CORAL STRUCTURE  CORAL back-ends  Why should a user use CORAL?  CONCLUSIONS

slide-3
SLIDE 3

www.egi.eu EGI-InSPIRE RI-261323

LHC challenge

 Enormous amount of data to be

processed and analyzed (hundred petabytes over the whole lifetime).  Impossible to implement a unique CERN analysis facility  Boost of the development of the grid computing infrastructure and technology.  A distributed analysis model implies an in-homogeneity in the data storage infrastructure across the different institutes and across the long LHC lifetime.  The computing infrastructure has to be easily maintainable and adaptable.

3 4/12/11

slide-4
SLIDE 4

www.egi.eu EGI-InSPIRE RI-261323

Persistency framework

In three of the experiments (ATLAS, CMS and LHCb), some types of data are stored and accessed using the software developed by the Persistency Framework (PF) within the Application Area (AA) of the LHC Computing Grid (LCG) to find common solutions for the LHC experiments.

 COOL provides specific software to handle the

time variation and versioning of conditions data.

 POOL is a generic hybrid store

for C++ objects, metadata catalogues and collections, using s t re a m i n g a n d re l a t i o n a l technologies.  CORAL is a generic abstraction layer with an SQL-free API for accessing relational databases.

4 4/12/11

slide-5
SLIDE 5

www.egi.eu EGI-InSPIRE RI-261323

CORAL: introduction

Common Relational Abstraction Layer (CORAL) is a C++ framework to access data in relational databases. The C++ API of CORAL consists of a set of SQL-free abstract interfaces that isolate the user code from the database implementation technology.

What’s CORAL ?

Python bindings of the API are also available.

5 4/12/11

slide-6
SLIDE 6

www.egi.eu EGI-InSPIRE RI-261323

Why CORAL?

Why CORAL ?

In the distributed data analysis model, there is an in- homogeneity in the data storage (database) infrastructure and security policies across the different institutes.  Users write the same code for all back-ends  A detailed knowledge of the many SQL flavors is not required  The SQL commands specific to each backend are executed by the relevant CORAL libraries, which are loaded at run-time by a special plugin infrastructure. CORAL provides a set of C++ libraries for several database back-ends:  local access to SQLite files;  direct client access to Oracle and MySQL servers;  read-only access to Oracle through the FroNTier/Squid

  • r CoralServer/CoralServerProxy

server/cache system.

6 4/12/11

slide-7
SLIDE 7

www.egi.eu EGI-InSPIRE RI-261323

CORAL Backends

7 4/12/11

slide-8
SLIDE 8

www.egi.eu EGI-InSPIRE RI-261323

CORAL Functionalities

– Database indirection – Connection pooling / sharing – Reconnection – Data Definition Language (DDL) operations

  • Creating, altering tables, views
  • Multicolumn keys, indices and constraints
  • BLOB, fixed or variable-size string variables

– Data Manipulation Languages (DML) operations

  • “Bulk” operations
  • Insert-select statements

– Queries

8 4/12/11

slide-9
SLIDE 9

www.egi.eu EGI-InSPIRE RI-261323

DB Connection

Using cx_Oracle python package:

conn = cx_Oracle.Connection("user/ password@server")

Using SQLPlus:

CONNECT <user_name>/<password>@<service_name>

Using MySQL:

mysql –u USERNAME -h DBSERVER -p DBNAME EXEC SQL CONNECT TO 'db' AS ’dbname' USER 'ident' USING 'pswd';

Too many styles!!!....and it is necessary to specify username and password ???!!!

9 4/12/11

slide-10
SLIDE 10

www.egi.eu EGI-InSPIRE RI-261323

DB connection in CORAL

  • 1. dbname = ”sqlite_file://coralData.db”
  • 2. dbname = “mysql://myhost/mydb”
  • 3. dbname = ”oracle://oradb_name/schema_name”
  • a. session = svc.connect(dbName, accessMode =

coral.access_Update)

  • b. session = svc.connect(’/my/database/schema’, accessMode =

coral.access_ReadOnly) Have no fear daddy! …. Here the way to deal with any database, without entering neither username

  • r password!!!!

10 4/12/11

slide-11
SLIDE 11

www.egi.eu EGI-InSPIRE RI-261323

DB replica lookup and authentication

<connectionlist> <connection name=”oracle://oradb_name/schema_name”> <parameter name=”user” value=”schema_reader” /> <parameter name=”password” value=”a_passwd” /> </connection> <connection name=”mysql://myhost/mydb”> <parameter name=”user” value=”db_user” /> <parameter name=”password” value=”my_passwd” /> </connection> </connectionlist> <servicelist> <logicalservice name=”/my/service”> <service name=”mysql://mydb/db” accessMode=”Update” authentication=”password”/> <service name=”oracle://db1/sch” accessMode=”Read” authentication=”certificate”/> <service name=”sqlite_file:data.db” accessMode=”Read” authentication=”password”/> </logicalservice> </servicelist>

11 4/12/11

slide-12
SLIDE 12

www.egi.eu EGI-InSPIRE RI-261323

Behind the scenes

  • ConnectionService internal operations:

– What are the available replicas for “/my/conditions/data” – The Lookup Service will reply “oracle://host/schema”, ... – The Relational Service will load the oracle plug-in” – The Authentication Service will return user name and password corresponding to “oracle://host/schema”. – The oracle plug-in will connect and authenticate

  • In case of failure the Connection Service tries the next replica
  • The user has a valid handle for performing DDL, DML, queries.

12 4/12/11

slide-13
SLIDE 13

www.egi.eu EGI-InSPIRE RI-261323

SQL-free C++ API

coral::ISchema& schema = session.nominalSchema(); coral::TableDescription tableDescription; tableDescription.setName( “T_t” ); tableDescription.insertColumn( “I”, “int” ); tableDescription.insertColumn( “X”, “double” ); schema.createTable( tableDescription); Schema.dropIfTableExists(“T_t”) CREATE TABLE “T_t” (I INTEGER,X BINARY_DOUBLE) IF EXISTS(SELECT TABLE_NAME FROM SCHEMA_NAME.TABLES WHERE TABLE_NAME = “T_t”) DROP TABLE “T_t” CREATE TABLE T_t(I INT,

  • X DOUBLE
  • PRECISION)

DROP TABLE [IF EXISTS] T_t

Oracle CORAL MySQL CORAL

13 4/12/11

slide-14
SLIDE 14

www.egi.eu EGI-InSPIRE RI-261323

How to fill a table

rowBuffer = coral.AttributeList() table.dataEditor().rowBuffer( rowBuffer ) rowBuffer["I"].setData(10) rowBuffer["X"].setData(10000000) INSERT INTO “T_t” (I, X) VALUES (10,10000000) INSERT INTO T_t (I,X) VALUES(10,10000000)

MySQL Orcale CORAL CORAL

14 4/12/11

slide-15
SLIDE 15

www.egi.eu EGI-InSPIRE RI-261323

Query and loop

while ( cursor.next() ): currentRow = cursor.currentRow() i = currentRow["I"].data() x = currentRow["X"].data() query = schema.tableHandle (tableName).newQuery() cursor = query.execute() SELECT * FROM T_t

Oracle/MySQL CORAL With CORAL it is easy to handle the loop on the data retrieved by the query.

15 4/12/11

slide-16
SLIDE 16

www.egi.eu EGI-InSPIRE RI-261323

Bulk operations

BLOB (Binary Large Object) is a large collection of binary data stored in a database table. The Oracle database enables use of bulk

  • peration in order to reduce time needed for

commands execution over a large amount of data.

data = open("res/london.jpg", "rb").read() blob = coral.Blob() cPickle.dump(data,blob,1) description.insertColumn( "photo", "blob") table = schema.createTable ( description ) rowBuffer = coral.AttributeList() table.dataEditor().rowBuffer ( rowBuffer ) bulkInserter = table.dataEditor ().bulkInsert( rowBuffer, 3 ) rowBuffer["photo"].setData(blob) bulkInserter.flush() del bulkInserter

4/12/11 16

slide-17
SLIDE 17

www.egi.eu EGI-InSPIRE RI-261323

Bind variables

BindVariableList = coral.AttributeList() BindVariableList.extend("idvalue", typeid(int) ) BindVariableList["idvalue"].data<int>()=1 query.setCondition( "ID > :idvalue", BindVariableList )

 Oracle first checks in the shared pool to see whether the query statement has been submied before.  Bind variables are «substitution» variables that are used in place of literals and that have the effect

  • f sending exactly the same SQL to Oracle every

time the query is executed.  Reducing the latch activity in the SGA

4/12/11 17

slide-18
SLIDE 18

www.egi.eu EGI-InSPIRE RI-261323

Oracle OCI Access

Te Oracle Call Interface (OCI) is an application programming interface (API) which allows developers to build applications using low-level C and C++ function calls to access an Oracle database server. Similarly, not only does OCI allow users to control all the aspects of SQL statement execution, but it also fully supports the data-types, calling conventions, syntax, and semantics of C and C++.

18

CORAL implements OCI internally and allows the user to avoid a complex OCI implementations.

4/12/11

slide-19
SLIDE 19

www.egi.eu EGI-InSPIRE RI-261323

CORAL Server

A middle-tier server introduced b e t w e e n C O R A L c l i e n t applications and the Oracle database servers.

Client applications connect to the “CORAL server” through a new dedicated CoralAccess plugin, while the CORAL server, itself a CORAL application, would connect to the Oracle database server inside the site firewall.

19 4/12/11

slide-20
SLIDE 20

www.egi.eu EGI-InSPIRE RI-261323

Frontier Access

The FronNTier client converts the SQL into an HTTP GET and sends it over the network to the ForNTier server. The FroNTier server unpacks the SQL request, sends it to the DB server, and retrieves the needed data. The data is optionally compressed, then packed into an HTTP formaed stream and sent back to the client. Squid proxy/caching servers between the FroNTier server and the client caches requested objects, significantly improving performance and greatly reducing the load on the central database.

20 4/12/11

slide-21
SLIDE 21

www.egi.eu EGI-InSPIRE RI-261323

CORAL/FRonTier Server advantages

  • The Oracle credentials could be stored on the server, which would use them for clients

authenticated and authorized using their proxy certificates. Access from several concurrent clients to the same Oracle schemas could be multiplexed through a single connection from the server to the database, reducing the load on the laer.

  • Client applications would also no longer depend on the Oracle client software, which is
  • nly needed in the server.
  • The model above has then been extended by introducing an additional “server proxy”

tier between the clients and the server, to provide data caching and further multiplexing for read-only use cases.

21 4/12/11

slide-22
SLIDE 22

www.egi.eu EGI-InSPIRE RI-261323

CORAL Impact

The CORAL framework is at the moment crucial for reading the condition data of almost all the LHC experiments. The reason of its success is in:

  • 1. its capacity to shield the user applications from the SQL dialects used to access any

relational database;

  • 2. its capacity to be integrated in a grid distributed analysis model:

 database indirection to replicas  support for Frontier/Coral server Furthermore, the user does not need to be an expert in all the possible database techniques, as they are already implemented in the backend accessing the plugins. These features allow CORAL to address the needs of any environment outside of High Energy Physics where safe access to a relational database is required, especially if in the context of a distributed analysis.

22 4/12/11

slide-23
SLIDE 23

www.egi.eu EGI-InSPIRE RI-261323

Conclusion

The software is by now mature in its development cycle, but a large development and support effort is still required for user support, service operation and maintenance tasks. Regular production releases are prepared whenever one of the LHC experiments demands it, leading to one release per month on average. The CORAL software is flexible and could be deployed in any environment where the access to relational databases is necessary, in particular in a distributed analysis. The user does not need almost any knowledge about SQL.

23 4/12/11

slide-24
SLIDE 24

www.egi.eu EGI-InSPIRE RI-261323

PF in the experiments

24 4/12/11

slide-25
SLIDE 25

www.egi.eu EGI-InSPIRE RI-261323

CORAL SERVER Deployment

25 4/12/11

slide-26
SLIDE 26

www.egi.eu EGI-InSPIRE RI-261323

CORAL packages

CoralBase Relational Access XML Authentication Service XML Lookup Service LFC Replica Service Connection Service Oracle Access MySQL Access SQLite Access Coral Access Frontier Access PyCoral Tests API are implemented as abstract classes Access point to CORAL CORAL backends CoralKernel CoralCommon

26 4/12/11

slide-27
SLIDE 27

www.egi.eu EGI-InSPIRE RI-261323

Attribute list

The main interface which is used for all the data exchange between a CORAL-based software component and a relational database is the AributeList class. It is a container of Aributes, which are named variables of simple types. The AributeList can therefore be thought of as a buffer holding the data of a table

  • row. It is used both for input and output operations.

4/12/11 27

slide-28
SLIDE 28

www.egi.eu EGI-InSPIRE RI-261323

Bulk operation details

SQL is the interface that enables access to data within a database. PL/SQL is a procedural expansion of SQL programming language, created by Oracle. It enables cyclic passing through records, processing exceptions, working with variables, parameters, collections, records, sequences, cursors etc. SQL sends inquiries to SQL engine which executes the inquiry and in some cases returns data to PL/SQL engine. This process is called context switching. Every switch of context between PL/SQL engine and SQL engine reduces performances. The best solution is use of bulk operations.

4/12/11 28