Using ElasticSearch as a fast, flexible, and scalable solution to - PowerPoint PPT Presentation

Sep 03, 2022 •403 likes •612 views

Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists Christian Gendreau, Canadensys Marie-Elise Lecoq, GBIF France Introduction ElasticSearch is an open source, document oriented, distributed

Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and checklists Christian Gendreau, Canadensys Marie-Elise Lecoq, GBIF France
Introduction ElasticSearch is an open source, document oriented, distributed search engine, built on top of Apache Lucene. From ElasticSearch GitHub page
Setup • Java 6 or higher • Download : # wget …elasticsearch-0.90.5.zip • Unzip
Configuration • Name your cluster • Replication and multi-shard are enabled by default • Start : # bin/elasticsearch
Add data Using the REST API $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elastic Search" }'
Import data Rivers • Document-based database (mongoDB) • JDBC (relational database) • Data source (wikipedia, Twitter)
Mapping • Schema-less • Customize indexing • Customize querying
Autocomplete • analyzer edge-ngram • wildcard query or prefix query: not a scalable solution • completion suggest : experimental
ElasticSearch at Canadensys Database of Vascular Plants of Canada (VASCAN) data.canadensys.net/vascan
Our ElasticSearch index Index structure for scientific names • autocompletion : edge_ngram filter o “carex” -> “ca”,”car”,”care”,”carex” • genus first letter : pattern_replace filter o “carex feta” -> “c. feta” • epithet : path_hierarchy tokenizer o “carex feta” -> “feta”
ElasticSearch at GBIF France Data stored in ElasticSearch are updated upon MongoDB changes. The search engine requests elasticsearch using filters like taxon, date, place, dataset and geolocalisation. Statistic calculation using facets
ElasticSearch at GBIF France
ElasticSearch - Solr • Solr and elasticsearch both tries to solve the same problem with no much differences • Development setup and production deployment (replication / sharding) easier with elasticsearch • By default, the elasticsearch is well configured for Lucene and customization remains easy.
Facets • “Group by” in SQL • Mostly used for calculate statistics • Example : curl -XGET [...] "facets" : { ”dataset" : { "terms" : { "field" : ”dataset", "order" : "term” …
API and libraries REST API o interoperability between different programming languages o HTTP request Java API o more efficient than REST API due to the binary API use. o built in marshaling(data formatting on the network)
Query - RESTfull API Example: $ curl localhost:9200/vascan/_search?pretty=1 -d '{"query":{ "match":{ "name" :{ "query":"carex" } } } }’
Query - Java API Code example: ... SearchRequestBuilder srb = client.prepareSearch(INDEX_NAME) .setQuery(QueryBuilders .boolQuery() .should(QueryBuilders.matchQuery("vernacular_name",text)) .setTypes(VERNACULAR_TYPE); ...
Pitfalls • Error reporting (index creation, river creation) • Results may be hard to predict using complex queries • Documentation • With each mapping modification comes a free reindex from data
Future • Scientific Name analyzer • Geospatial component
Thank you!

Recommend

JSON Logging with Elasticsearch Radu Gheorghe search statistics Where do your logs end up?

JSON Logging with Elasticsearch Radu Gheorghe search statistics Where do your logs end up? Elasticsearch fast Splunk MongoDB file system scalable other logstash Kibana graylog logstash rsyslog graylog fluentd Elasticsearch Head

186 views • 18 slides

Elasticsearch T E G

Elasticsearch T E G Elasticsearch Elasticsearch/Lucene Contributor ES

538 views • 35 slides

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Transformative Learning Through Architecture The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning Academy Academy FLEXIBLE Flexible Classroom Studios ACADEMY Flexible Configurations Flexible Support

499 views • 32 slides

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and Space Flexible Seating and Space Departmentalizing and Schedule -Homerooms -ELA/MATH -Science/SS/Writing Independent Study Must Dos:

275 views • 10 slides

How Elasticsearch powers the Guardians newsroom shay banon @kimchy phil wills @philwills

How Elasticsearch powers the Guardians newsroom shay banon @kimchy phil wills @philwills creator, co-founder and cto senior software architect elasticsearch guardian news and media created in 1936 ... to secure the financial and

839 views • 40 slides

How Elasticsearch powers the Guardians newsroom shay banon @kimchy graham tackley

How Elasticsearch powers the Guardians newsroom shay banon @kimchy graham tackley @tackers creator, co-founder and cto director of architecture elasticsearch guardian news and media created in 1936 ... to secure the financial and

512 views • 32 slides

Shield your cluster Security with Elasticsearch Alexander Reelsen @spinscale alex@elastic.co

Shield your cluster Security with Elasticsearch Alexander Reelsen @spinscale alex@elastic.co Agenda Why? How? Q & A What? Next? Who? About 2012 Elasticsearch got founded Series A investment Trainings Supports subscriptions

1.52k views • 54 slides

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters

525 views • 34 slides

SUSE Enterprise Storage 5.5 Object Storage Metadata Sync Module Configuration Elasticsearch,

SUSE Enterprise Storage 5.5 Object Storage Metadata Sync Module Configuration Elasticsearch, Fluentbit, Kibana (EFK Stack) Setup of EFK Stack Install SLES 15 SP1 Install Elasticsearch - Download the rpm version (7.0.1) from

346 views • 9 slides

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso

710 views • 55 slides

Fast Scalable Parallel Comparison Sort Fast, Scalable Parallel Comparison Sort On Hybrid Multicore

Fast Scalable Parallel Comparison Sort Fast, Scalable Parallel Comparison Sort On Hybrid Multicore Architectures Dip Sankar Dip Sankar Banerjee Dip Sankar Dip Sankar Banerjee Banerjee Parikshit Sakurikar Banerjee, Parikshit Sakurikar and

731 views • 24 slides

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed memory plus coherent replication Scalable distributed memory machines P-C-M nodes connected by network communication assist interprets

1.13k views • 87 slides

Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS

Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS Dibyendu B Dataplatform Engineer, InstartLogic 1 Who We are 2 3 Dataplatform : Streaming Channel Ad-hoc queries, offline queries

579 views • 31 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

666 views • 53 slides

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward Fireside Chat Story so far What mindset do you need? Whats different about this market? Fast Reward 2 Fireside Chat Clear

508 views • 12 slides

Cross-ISA Machine Instrumentation Cross-ISA Machine Instrumentation using Fast and Scalable

Cross-ISA Machine Instrumentation Cross-ISA Machine Instrumentation using Fast and Scalable using Fast and Scalable Dynamic Binary Translation Dynamic Binary Translation Emilio G. Cota Columbia University Luca P. Carloni VEE'19 April 14,

1.05k views • 45 slides

UNLOCKING THE POTENTIAL OF CANNABINOID MEDICINES I N V E S TO R P R E S E N TAT I O N N o v e m

UNLOCKING THE POTENTIAL OF CANNABINOID MEDICINES I N V E S TO R P R E S E N TAT I O N N o v e m b e r 2 0 1 9 www.inmedpharma.com :IN :IMLFF DISCLAIMERS U N L O C K I N G T H E P O T E N T I A L O F C A N N A B I N O I D M E D I C I N

602 views • 26 slides

Precision Enology Salvatore Filippo Di Gennaro Institute of Biometeorology National Research

An open-source and low-cost monitoring system for Precision Enology Salvatore Filippo Di Gennaro Institute of Biometeorology National Research Council - IBIMET CNR MONITORING HIGH VARIABILITY IN WINEMAKING PROCESS Winemaking is a complex and

555 views • 9 slides

Energy Processing Multiple Choice Review www.njctl.org Slide 3 / 51 Quantitative Review Slide

Slide 1 / 51 New Jersey Center for Teaching and Learning Progressive Science Initiative This material is made freely available at www.njctl.org and is intended for the non-commercial use of students and teachers. These materials may not be

678 views • 32 slides

Predictive microbiology Survival, multiplication, or Predictive Modeling death of spoilage

Predictive microbiology Survival, multiplication, or Predictive Modeling death of spoilage organisms or pathogens in foods Foods as ecosystems Dean O. Cliver (variables) Standard growth curve Growth curves Classical four

148 views • 4 slides

Post Covid-19 Export Opportunities Europe & Sub-Saharan Africa Mike Kruiniger Senior

Post Covid-19 Export Opportunities Europe & Sub-Saharan Africa Mike Kruiniger Senior Analyst in Consumer and Food & Drink September 30, 2020 Agenda Consumer Spending Beyond Covid-19 Identifying Export Opportunities In

131 views • 11 slides

Less-than-chance Similarity & Language Differentiation T. Mark Ellison & Luisa Miceli

Less-than-chance Similarity & Language Differentiation T. Mark Ellison & Luisa Miceli Overview Introduction & description of research project Australian languages as the initial inspiration Contact-induced lexical

563 views • 35 slides

Health in Shahru Ramadhan Dr. Akber Mithani, MD June, 2014 ALI 269: Fasting & Your Health in

ALI 269: Fasting and Your Health in Shahru Ramadhan Dr. Akber Mithani, MD June, 2014 ALI 269: Fasting & Your Health in 1 Shahru Ramadhan Goals and Objectives Review the effects of fasting on the health of the individual, especially

499 views • 17 slides

European Higher Education Area (EHEA) and E-learning. Athens University of Economics &

European Higher Education Area (EHEA) and E-learning. Athens University of Economics & Business Department of Informatics 21 of June of 2005 Alfredo Pina Universidad Pblica de Navarra Pamplona, Spain Contents What is the EHEA? Why

654 views • 46 slides