my name is rob hooft and i work for the netherlands
play

My name is Rob Hooft, and I work for the Netherlands Bioinformatics - PDF document

My name is Rob Hooft, and I work for the Netherlands Bioinformatics Centre, NBIC for short. NBIC is the community of many organizations in the Netherlands that are doing bioinformatics. NBIC is a Virtual Organization dealing with bioinformatics in


  1. My name is Rob Hooft, and I work for the Netherlands Bioinformatics Centre, NBIC for short. NBIC is the community of many organizations in the Netherlands that are doing bioinformatics. NBIC is a Virtual Organization dealing with bioinformatics in The Netherlands . The organisation has three target areas: research, education and support . I stand here as a representative of the "support" group. We in NBIC are collaborating with other technology providers in the Netherlands to set up DISC, which will form the Dutch node in ELIXIR, the European Life Science Data program. The goal of this talk is to explain to you why we have been looking for a single-sign-on solution, what we have done so far to implement it, and where we still see challenges. 1

  2. First I'd like to introduce you to the field of bioinformatics , and explain what we do as a virtual organization in this field. We define bioinformatics as anything that uses computer infrastructure to study the molecules of life. Application areas we deal with are health care, food and agriculture, and industrial production. Bioinformatics brings some specific challenges to IT! 2

  3. The first challenge is that we have to bridge a gap between the people that know how to implement computer solutions to scientists, doctors, and laboratory workers that may have less of an affinity for technology, maybe even an aversion. We solve that problem like firemen passing buckets between the water source and the fire. So we do not have one man running from one side to the other (in our case: someone capable of doing both biology and computer science at top level) but we make sure that we have people in the middle that understand both the computer scientists and the biologists. You can imagine that developing tools for people that do not intend to be computer wizards brings its challenges for usability and reliability. We heard this morning that even physicists can not deal with X.509 certificates , so you can imagine where the limitation to 1-2% usage of the Grid from life sciences come from. 3

  4. A second challenge is that the data we are dealing with in bioinformatics is diverse : it contains for example background knowledge retrieved from the literature, image data, measurements from blood samples. Typical traditional database designs that try to capture the diversity have hundreds of tables and thousands of columns . Semantic web technologies and NOSQL are pioneered in this field as alternative solutions. Of course we can not have the advanced technology show itself at the user interface. 4

  5. The third challenge is that the types of data and the methods to analyze them keep changing all the time. An example is the field of genomics. Since the first human genome was determined completely around the year 2000, we are now going through the third revolution of techniques used to determine this kind of information, accompanied each by their own computational challenges. Standards development for these rapidly changing data types is trailing years behind . This brings interesting difficulties for integrating data from different sources. . New processing tools are developed the whole time as the field progresses. 5

  6. This forms a nice segue to the next challenge in bioinformatics: The data volume is huge . Where the first human genome took 10 years and cost hundreds of millions of dollars, a new human genome can be routinely determined in 2 weeks now, costing about $5000. Genetics projects now routinely perform hundreds of these genomes and ask for hundreds of terabytes of storage each, and have similarly large requirements for network and compute capacity. And this would be much larger if people would keep the original raw measurements (high resolution pictures). Genomics now produces more data world wide than the Large Hadron Collider, and it still outgrows Moore’s law every year. 6

  7. A last challenge, coming up in the medical branch of the life sciences, is that much of the data is patient related and therefore subject to strict privacy regulations (which differ between countries). 7

  8. So, in this field, what do the people in "support" do? In the end, many of them program to fulfill our motto: " make other people's data work ”. Scientific programmers (as we call them) that are working in bioinformatics support, like the bioinformatics researchers, are distributed over different labs in the country: at universities and companies. We also have people that are working for other virtual organizations and projects. Some of the programmers have a training in bioinformatics, but we also have people trained in IT and with an affinity for life sciences, and life scientists with an affinity for IT. By distributing the support people over the different locations, they are close to the experts and researchers , but also to the users of the technology. 8

  9. However, there is an extra challenge here: In contrast to researchers that often do their work "alone" (PhD student or post-doc projects), the support people work on projects together . We need to make sure that this collaboration runs smoothly. To do this, we have set up task forces for a number of the subfields of bioinformatics, such as genomics. The group leaders (green) in these task forces identify common problems in the field, and we make sure to solve them in the collaboration. Each task force has a technical project leader that has as his responsibility to make sure the work can continue by organizing communications and logistics. The technical project leaders work together in the engineering team where they minimize overlap between work that crosses task force boundaries, and where they learn from each others experience. 9

  10. The kinds of problems that are tackled in each task force are not new scientific problems. They are issues that have already been solved at least once somewhere in a research group and for which the solutions have been published, but for which the solution is not widely accessible to users. The task of people in bioinformatics support is to make tools accessible to a larger user community. The work involved ranges from improving the stability of the individual programs and augmenting the documentation , to making sure the individual tools can be used by people that are not experts in IT. We collect tools into systems that can run work flows chaining tools together in flexible ways. One advantage of doing this development work in groups together is that all knowledge is shared between the participants at all times, there is no single point of failure. 10

  11. I mentioned tools and workflow systems. Many of the things developed end up made available over the web , either as individual web services or in an aggregated work flow system. This is a major advantage for the users that do not each need to install their own tools, and also do not need to know the architecture of the IT infrastructure behind each tool. However, many of the tools require to identify the person that accesses the system. Even in the case where privacy regulations do not apply. And thus we replace the nightmare of having to install different software tools that someone needs on a local computer system, with a new nightmare of maintaining logins for all of the tools. 11

  12. It will be clear: we want to use a single-sign-on solution . With single- sign-on our users do not get worn-out fingers from typing their different passwords left and right, but can actually get their work done. And as an added advantage, the tools that we develop will feel a lot more consistent. 12

  13. This is where we met up with the experts from Surf, because we really do not want to do this all ourselves. Surf has a solution for single-sign- on that is called SurfConext . It is an implementation of the OpenConext framework which will be presented in a lecture tomorrow . I will only mention here that OpenConext is a middleware framework based on two standards: SAML and OpenSocial. SAML = Security Assertion Markup Language; allows implementation of a single-sign-on for environments like ours where people are working together based in different organizations. OpenSocial = an API to allow embedding completely independent programs into a single web page, a portal. We have not yet looked at the OpenSocial component, but we are now implementing SurfConext SAML and I will tell you some of the advantages that we have seen, and also some of the challenges we will still be facing in this implementation. 13

  14. The obvious advantage of SurfConext is of course for the users of our tools: single-sign-on is a big relief, it is the name of the technology and also in the title of this talk. Next to this obvious advantage for the user there are a few related ones: the user does not need to know who exactly provides which service; and since in SurfConext the organization needs to approve the services, the user has an easier job to trust the service providers since they have been screened by someone in the organization already. But there are many advantages to SurfConext not only to the person using the software, but also to the people writing the software. 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend