Logging with Log4j and log aggregation with Apache flume By - PowerPoint PPT Presentation

Logging with Log4j and log aggregation with Apache flume By Arivoli.K,MDS201903 Naveen Kumar Reddy,MDS201909 Saager Babu NG,MDS201917 Suman Polley,MDS201935 Avinash Kumar, MDS201907

Why Logging is necessary?

Here comes log4j

Overview • Log4j is a reliable, fast, and flexible logging framework (APIs) written in Java, which is distributed under the Apache Software License. • Log4j has been ported to the C, C++, C#, Perl, Python, Ruby, and Eiffel languages. • It views the logging process in terms of levels of priorities

Components Log4j has three main components: • Loggers: Responsible for capturing logging information. • Appenders: Responsible for publishing logging information to various preferred destinations. • Layouts: Responsible for formatting logging information in different styles.

History • Started in early 1996 as tracing API for the E.U. SEMPER (Secure Electronic Marketplace for Europe) project. • After countless enhancements and several incarnations, the initial API has evolved to become log4j, a popular logging package for Java. • The package is distributed under the Apache Software License, a full- fledged open source license certified by the open source initiative.

Features • It is thread-safe. • It is optimized for speed. • It is based on a named logger hierarchy. • It supports multiple output appenders per logger. • It is fail-stop but log4j does not guarantee that each log statement will be delivered to its destination. • And many more!!!

Pros of logging • Quick debugging • Easy maintenance • Structured storage of an application's runtime information.

Cons of logging • Slows down an application. • If too verbose, it can cause scrolling blindness. To alleviate these concerns, log4j is designed to be reliable, fast, and extensible

Logger object • Logger Object is the top-level layer is the Logger which provides the Logger object. • The Logger object is responsible for capturing logging information and they are stored in a namespace hierarchy

Appender object • This is a lower-level layer which provides Appender objects. • The Appender object is responsible for publishing logging information to various preferred destinations such as a database, file, console, UNIX Syslog, etc

Layout object • The Layout layer provides objects which are used to format logging information in different styles. • It provides support to appender objects before publishing logging information. • Layout objects play an important role in publishing logging information in a way that is human-readable and reusable.

Framework of log4j

Brief overview of the support objects 1)Level Object : The Level object defines the granularity and priority of any logging information. There are seven levels of logging defined within the API: OFF, DEBUG, INFO, ERROR, WARN, FATAL, and ALL.

Logging levels

Support objects 2) Filter Object : The Filter object is used to analyze logging information and make further decisions on whether that information should be logged or not. 3)ObjectRenderer : The ObjectRenderer object is specialized in providing a String representation of different objects passed to the logging framework. 4) LogManager: The LogManager object manages the logging framework.

Syntax

Types of appenders • RollingFileAppender • SMTPAppender • SocketAppender • SocketHubAppender • AppenderSkeleton • AsyncAppender • ConsoleAppender

Layout • We have used PatternLayout with our appender. • All the possible options are: DateLayout ,HTMLLayout ,PatternLayout ,SimpleLayout, XMLLayout

Logging methods • Logger class provides a variety of methods to handle logging activities. • The Logger class does not allow us to instantiate a new Logger instance but it provides two static methods for obtaining a Logger object: public static Logger getRootLogger(), public static Logger getLogger(String name) static Logger log = Logger.getLogger(log4jExample.class.getName())

Logging methods • Once we obtain an instance of a named logger, we can use several methods of the logger to log messages. • The Logger class has the following methods for printing the logging information as shown in the next 2 slides.

Logging methods

Log 4j appenders • Flume provides two log4j appenders that can be plugged into your application: 1) One that can write data to exactly one flume agent. 2) Another that can choose one of many configured Flume agents in a round-robin or random order.

Flume appender

Load balancing log4j appender • Log4j appenders can be configured to load balance between multiple flume agents, using a round-robin or random strategy. These log4j appenders come bundled up with flume and doesn’t require us to write any code which is another reason why it is extremely popular.

Apache Flume and log aggregating Introduction ● ● Philosophy Apache Flume in HDFS ecosystem ● ● Pros and cons -Suman Polley MDS201935

INTRODUCTION: ● Apache Flume is a distributed system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

● The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

PHILOSOPHY: Distributed pipeline architecture. ● Pushing data into HDFS using an intermediate system is a common case ● use.Flume acts as a buffer between source and destination.Thus by balancing out any inconsistency in data Flume maintains a smooth flow of data. Low-cost of installation ,operation and maintenance. ● ● Highly customizable and extendable.

Flumes position in Hadoop ecosystem:

ARCHITECTURE:

PROS: ● RELIABILITY & RECOVERABILITY : The events are staged in a channel on each agent. The events are then delivered to the next agent or terminal repository (like HDFS) in the flow. The events are removed from a channel only after they are stored in the channel of next agent or in the terminal repository.This ensures reliable data transfer and ensures recoverability.

PROS: ● DOWNSTREAMING: There could be hundreds even thousands of source .HDFS requires that exactly one clint writes at a time to the database.This could be a problem (!!!!!...)as it will create severe stress to the destination server.

PROS: Solution: By connecting multiple agents to each other Flume creates a data pipeline . It is possible to scale down the no of servers that write to the HDFS by adding intermediate Flume agents. This structure has its own problems!! If n-th tier has same volume as (n-1)th tier then n-th tier will easily overflow creating flow back-pressure.

Points to remember: ● Event volume is least in the outermost tier. ● Event volume increases as flow converges. ● Event volume is the greatest in the innermost tier.

PROS: ● HANDLING AGENT FALIURE : If the Flume agent goes down, then the all the flows hosted on that agent are aborted. Once the agent is restarted, then flow will resume.

PROS: The flow using file channel or other stable channel will resume processing events where it left off.

CONS: ● Channels in Flume act as buffers at various hops. These buffers have a fixed capacity, and once that capacity is full it will create back pressure on earlier points in the flow. If this pressure propagates to the source of the flow, Flume will become unavailable and may lose data. Rule of Thumb: Event volume must be equal to worst case data ingestion rate (max data ingestion rate )sustained over the worst case downstream outage interval.

A BETTER SOLUTION: Adding another Flume agent WHat if the single node balance the load and it gets goes down? better at downstream faliure handling.

Summery: ● All of the above points makes Apache Flume a great real time log aggregator. ● Although it was created for log aggregating ,since then it has evolved to handle many type of streaming data. ● Weak ordering nad prone to duplicacy hinders Flumes application beyond logging (like IoT, Instant messaging service).

NAME NODE CLOGGING NAME NODE Spark Map Reduce What if all the web servers collecting log data tried to connect to hdfs and write at the same time? I m p a l a 1

FLUME: EVENT An Event is the fundamental unit of data transported by flume from its point of origination to its final destination. A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes. Payload is opaque to Flume ● Headers are specified as an unordered collection of string key-value pairs ● These headers help in contextual routing ● 2

Logging with Log4j and log aggregation with Apache flume By - PowerPoint PPT Presentation

Logging with Log4j and log aggregation with Apache flume By Arivoli.K,MDS201903 Naveen Kumar Reddy,MDS201909 Saager Babu NG,MDS201917 Suman Polley,MDS201935 Avinash Kumar, MDS201907 Why Logging is necessary? Here comes log4j Overview

Log4j 2 in Web Applications Log4j 2 in Web Applications A Deeper Look at Effective Java EE

Introducing Log4j 2.0 History of Apache Log4j Early Java Logging System.out and System.err

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

ALMA Common Software Basic Track Logging and Error Systems Logging system conceptual overview

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #5: LOGGING

Debugging & Logging Java Logging Java has built-in support for logging Logs contain

Logging and Recovery Module 6, Lectures 3 and 4 If you are going to be in the logging business,

LHC LOGGING Timeline of t he proj ect , resources Cont ext : where does logging f it in? Basic

Samson Logging Tires Logging Tire Size Definition 24.5-32/16 24.5 = section width in inches -

Logging with ASP.NET Core Damien Bowden Microsoft MVP https://damienbod.com @damien_bod Why

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Sockets / RPC 1 last time redo logging write log + commit, then do operation on failure,

Compsci 201 201 More o e on T Trees es a and d Compu puter er S Scien ence Par art 1

Ulogd2, Advanced firewall logging Eric Leblond INL 172 rue de Charonne 75011 Paris, France

dnstap (brief intro and update) Merike Kaeo

Web Development Web eb developm elopment ent (at t a 10k k level) el) Typical web

meet #HOMER @ F O S D E M 2 0 1 6 Written by: Alexandr Dubovikov, Lorenzo Mangani HOMER

Linux Tools 0.6 Release Review Planned Review Date: 2010-06-23 Communication Channel:

Storage and Indexing 11/19/2018 1 Overview We covered storage of unstructured files in HDFS

Thug: a low-interaction honeyclient Angelo Dell'Aera Speaker Chief Executive Officer @ Honeynet

Logging with Log4j and log aggregation with Apache flume By - PowerPoint PPT Presentation

Logging with Log4j and log aggregation with Apache flume By Arivoli.K,MDS201903 Naveen Kumar Reddy,MDS201909 Saager Babu NG,MDS201917 Suman Polley,MDS201935 Avinash Kumar, MDS201907 Why Logging is necessary? Here comes log4j Overview

Log4j 2 in Web Applications Log4j 2 in Web Applications A Deeper Look at Effective Java EE

Introducing Log4j 2.0 History of Apache Log4j Early Java Logging System.out and System.err

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

ALMA Common Software Basic Track Logging and Error Systems Logging system conceptual overview

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #5: LOGGING

Debugging &amp; Logging Java Logging Java has built-in support for logging Logs contain

Logging and Recovery Module 6, Lectures 3 and 4 If you are going to be in the logging business,

LHC LOGGING Timeline of t he proj ect , resources Cont ext : where does logging f it in? Basic

Samson Logging Tires Logging Tire Size Definition 24.5-32/16 24.5 = section width in inches -

Logging with ASP.NET Core Damien Bowden Microsoft MVP https://damienbod.com @damien_bod Why

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Sockets / RPC 1 last time redo logging write log + commit, then do operation on failure,

Compsci 201 201 More o e on T Trees es a and d Compu puter er S Scien ence Par art 1

Ulogd2, Advanced firewall logging Eric Leblond INL 172 rue de Charonne 75011 Paris, France

dnstap (brief intro and update) Merike Kaeo

Web Development Web eb developm elopment ent (at t a 10k k level) el) Typical web

meet #HOMER @ F O S D E M 2 0 1 6 Written by: Alexandr Dubovikov, Lorenzo Mangani HOMER

Linux Tools 0.6 Release Review Planned Review Date: 2010-06-23 Communication Channel:

Storage and Indexing 11/19/2018 1 Overview We covered storage of unstructured files in HDFS

Thug: a low-interaction honeyclient Angelo Dell'Aera Speaker Chief Executive Officer @ Honeynet

Debugging & Logging Java Logging Java has built-in support for logging Logs contain