revealing elasticsearch
play

Revealing Elasticsearch Implementation, Integration, and Execution - PowerPoint PPT Presentation

Revealing Elasticsearch Implementation, Integration, and Execution Objective: Get access to a cluster, index documents, find them, and present them. Web developers Data scientists Target audience Report developers


  1. Revealing Elasticsearch Implementation, Integration, and Execution

  2. Objective: Get access to a cluster, index documents, find them, and present them.

  3. Web developers ● Data scientists ● Target audience Report developers ● Technologists ● Infrastructure/DevOPS ●

  4. What is Elasticsearch? Written in Java ○ Open source ■ Cross platform ■ Based on Lucene and Apache Solr ○ Scaled, real-time search & analytics ○ Full RESTful API ○ Plugin ecosystem ○ SDKs for Java, .NET, many more ○ Eventually consistent ○

  5. An Elastic Timeline

  6. Elasticsearch History 2010 2011 2012 2013 2014 2015 2016 2017 1.x 0.x $104M in funding 2.x 5.x Elastic Cloud Prelert

  7. Getting Started

  8. Objective: All you need is an endpoint http://localhost:9200/_search

  9. Getting Out of the Gates Option 1 (*Ix) Option 2 (Windows) Option 3 (Cloud) Apt-get the latest version of Download the latest Create a free account with Elasticsearch (5.2.1) from version of Elasticsearch Elastic.co elastic.co from elastic.co Create a free account with Run bin/elasticsearch Run bin\elasticsearch.bat Amazon Web Services Curl http://localhost:9200 Many other providers

  10. Cluster Overview

  11. Objective: Understand how data is stored and transactions are scaled

  12. Standard Configuration A typical production cluster will contain 3 ● nodes (installations) Additional nodes can be brought ○ online through discovery A typical node will contain 5 primary ● shards and 5 replica shards Data is replicated across all nodes so loss ● of a node will not affect cluster A master node is commonly specified to ● handle routing of requests Data is also serialized to disk and can be ● recovered

  13. Storage: A cluster with 3 nodes of 32GB RAM machines has 32GB of cache.

  14. Questions?

  15. Indexing Data

  16. Objective: All you need is Postman

  17. Inverted Indexes Elasticsearch uses a Find all the unique words that appear in document ● structure called an List documents in which word (token) appears ● inverted index Reduces total search size Find all documents in ● which token exists Ranks documents based on occurrences ● Cases are removed in tokens ● Word stemming & casing Stemming algorithm drops “ing”, “ly”, “s”, etc ● All inverted indexes are normalized ● Normalization Custom analyzers can be applied to documents ●

  18. Mappings Elasticsearch Mapping Available types: Elasticsearch will attempt to ● Boolean “guess” type mappings as each document is indexed. ● Long ● Double Once created, mappings cannot be changed without re-creating ● Date the index. ● String A custom mapping can be applied before indexing documents.

  19. Analyzers None Language ● ● Standard 33+ languages supported ● ○ Splits the input text on word boundaries Stems words based on language ○ ○ Terms are lower cased Removes language specific “stop” words ○ ○ Whitespace Custom ● ● Breaks text into terms whenever it E.g. Remove “stop” words using a ○ ○ encounters a whitespace character language filter Simple ● Breaks text into terms whenever it ○ encounters a character which is not a letter Terms are lower cased ○

  20. Patient Document Example { JSON format (Javascript Object Notation) ● "patient": { "first_name": "John", Index by PUTting document to index endpoint ● "last_name": "Doe", "dob": 252507600000, (PUT patients/patient/1) ○ "gender": "Male", Last item is unique key (1) "race": "White", ○ "height": 1.8288, Index operation automatically creates an index ● "weight": 90.7185, "eyes": "blue", if it has not been created before "hair": "brown", Elasticsearch “guesses” types as they are "age": 39, ● "tobacco": "no", posted "location": { "lat": 40.762446, Each indexed document is given a version ● "lon": -73.831653 }, number "conditions": [{ Index API optionally allows for optimistic ● "icd10": "M54.5", "description": "Low back pain" concurrency control when the version }, { "icd10": "Z91.018", parameter is specified "description": "Allergy to other foods" Bulk-indexing supported (Bulk API) ● }], "medications": [{ River plugins (Oracle, MSSQL, MySQL) ● "name": "Aspirin", "dosage": 150, "units": "mg", "frequency": 8, "freq_units": "hours" }] } }

  21. Questions?

  22. Querying Documents

  23. Objective: All you need is JSON

  24. QueryDSL Domain-Specific Language Leaf query clauses ● Leaf query clauses look for a particular value in a particular field, such as the match, term or range ○ queries. These queries can be used by themselves. Compound query clauses ● Compound query clauses wrap other leaf or compound queries and are used to combine multiple ○ queries in a logical fashion (such as the bool or dis_max query), or to alter their behavior (such as the constant_score query).

  25. Common Query Types Full Text Joining ● ● Match All Nested ○ ○ Query String Geo ○ ● Term Geo Shape ● ○ Term Geo Distance ○ ○ Range Geo Polygon ○ ○ Exists Specialized ○ ● Regexp More Like This ○ ○ Fuzzy Template ○ ○ Compound Script ● ○ Bool ○ Boosting ○

  26. Sample Bool Query JSON format (Javascript Object ● Notation) { "query": { Search by performing GET ● "bool": { "must": [{ against a specific index "match": { "medications.name": "Aspirin" } /GET patients/_search ○ }], "filter": [{ This query returns all men "term": { ● "gender": "Male" } between the ages of 30 and 50 }, { "range": { who use aspirin "age": { "lte": 50, "gte": 30 } } }] } } }

  27. Query Result Query returns a formatted JSON result indicating the search metrics { "took": 1, "timed_out": false, "_shards": { Took ● "total": 5, Length of time in milliseconds the query "successful": 5, ○ "failed": 0 took to execute and return }, "hits": { Shards ● "total": 1, "max_score": 1.3862944, Number of shards utilized in execution ○ "hits": [{ of the query "_index": "patients", "_type": "patient", Hits ● "_id": "1", "_score": 1.3862944, Total and max score of all results ○ "_source": { Hits[] is an array of resulting ○ "first_name": "John", documents, which can be limited by size "last_name": "Doe", "dob": 252507600000, . . . } }] } }

  28. Aggregates An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. { "query": { "bool": { Bucketing "must": [{ A family of aggregations that build buckets, where each bucket "match": { "gender": "Male" is associated with a key and a document criterion. } }] }, Metric "aggs": { Aggregations that keep track and compute metrics over a set of "medications": { documents. "terms": { "field": "medications.name" } Matrix } } A family of aggregations that operate on multiple fields and } produce a matrix result based on the values extracted from the } requested document fields. Pipeline Aggregations that aggregate the output of other aggregations and their associated metrics

  29. Query Result A bucket aggregation finds all { . . . documents matching the query (in "aggregations" : { "medications" : { this case all males) and aggregates "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets" : [ the results into key and doc_count { "key" : "Aspirin", fields. "doc_count" : 2465 }, { "key" : "Omeprazole", Only documents matching the "doc_count" : 1824 }, { initial query will be considered for "key" : "Lisinopril", "doc_count" : 1121 }, aggregation. ] } } }

  30. Statistical Aggregates The aggregations in this family compute metrics based on values { "query": { "bool": { extracted in one way or another "must": [{ "match": { from the documents that are being "gender": "Male" } }] aggregated. The values are typically }, "aggs": { extracted from the fields of the "age_stats": { "extended_stats": { "field": "age" document (using the field data), } } but can also be generated using } } } scripts.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend