Wheres the fire? AKA: My site is down now what? Kristen Pol - - PowerPoint PPT Presentation

where s the fire aka my site is down now what
SMART_READER_LITE
LIVE PREVIEW

Wheres the fire? AKA: My site is down now what? Kristen Pol - - PowerPoint PPT Presentation

Wheres the fire? AKA: My site is down now what? Kristen Pol answers@hook42.com My name is Kristen. Kristen Pol Hook 42 CTO / Architect Drupal for 12 years! kristen@hook42.com @kristen_pol answers@hook42.com answers@hook42.com Who


slide-1
SLIDE 1

answers@hook42.com

Where’s the fire? AKA: My site is down… now what?

Kristen Pol

slide-2
SLIDE 2

answers@hook42.com answers@hook42.com

My name is Kristen.

Kristen Pol Hook 42 CTO / Architect Drupal for 12 years! kristen@hook42.com @kristen_pol

slide-3
SLIDE 3

answers@hook42.com

Builder?

Drupal Veteran? Drupal Intermediate?

All the roles?

Who are you?

Drupal Newbie?

Themer?

PM? Developer?

slide-4
SLIDE 4

answers@hook42.com answers@hook42.com

What are some website disasters?

Site down Site very slow Files directory deleted Code deleted Database deleted Email not working 3rd party services not working

slide-5
SLIDE 5

answers@hook42.com answers@hook42.com

What are some causes?

Increased traffic

Legitimate Nefarious

CDN/WAF Hosting

Router Network File system Security breach

Mail server 3rd party services Application

Slow queries Slow crons Hit edge case Insufficient caching Security breach

Human error

Drop database or tables Remove code or files Delete via UI …

slide-6
SLIDE 6

answers@hook42.com answers@hook42.com

How can you handle website disasters?

ü Planning ü Monitoring ü Diagnostics ü Support ü Recovery ü Prevention

slide-7
SLIDE 7

answers@hook42.com answers@hook42.com

Don’t panic!

slide-8
SLIDE 8

answers@hook42.com answers@hook42.com

PL PLANNIN ING

slide-9
SLIDE 9

answers@hook42.com answers@hook42.com

What is disaster planning?

“A disaster recovery plan (DRP) is a documented process or set of procedures to recover and protect a business IT infrastructure in the event of a disaster.”

slide-10
SLIDE 10

answers@hook42.com answers@hook42.com

Create process that works for you & your “client”.

Example:

ü Check other websites ü Check status pages ü Run traceroute ü Email urgent@example.com ü Check urgent coverage calendar ü Ping developer(s) via chat, text, phone ü Open internal support ticket

slide-11
SLIDE 11

answers@hook42.com answers@hook42.com

Make sure to document and train devs how to…

ü Access all the services ü Diagnosis issues ü Open support tickets ü Deploy a hot fix ü Access backups ü Recover site from backups ü Log urgent issues

slide-12
SLIDE 12

answers@hook42.com answers@hook42.com

MONIT ITORIN ING

slide-13
SLIDE 13

answers@hook42.com answers@hook42.com

What is website monitoring?

“Website monitoring is the process of testing and verifying that end-users can interact with a website or web application as expected.”

slide-14
SLIDE 14

answers@hook42.com answers@hook42.com

Here are a few popular monitoring tools.

slide-15
SLIDE 15

answers@hook42.com answers@hook42.com

You can configure checks.

slide-16
SLIDE 16

answers@hook42.com answers@hook42.com

You can track uptime.

slide-17
SLIDE 17

answers@hook42.com answers@hook42.com

You can get alerts!

slide-18
SLIDE 18

answers@hook42.com answers@hook42.com

DIA IAGNOSTIC ICS

slide-19
SLIDE 19

answers@hook42.com answers@hook42.com

What is diagnostics?

“Software diagnostics refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems.”

slide-20
SLIDE 20

answers@hook42.com answers@hook42.com

Here are some diagnostic tools.

Traceroutes Status pages Logs Application Performance Management (APM) Software Drupal modules

slide-21
SLIDE 21

answers@hook42.com answers@hook42.com

Traceroute shows round- trip times between you and destination server.

Source: ¡h*p://www.maxcdn.com/one/assets/post-­‑images/trace.png ¡

slide-22
SLIDE 22

answers@hook42.com answers@hook42.com

Here’s an example of a bad traceroute.

slide-23
SLIDE 23

answers@hook42.com answers@hook42.com

Check service status pages.

Hosting

Acquia

Pantheon

Platform.sh Blackmesh

Rackspace …

Mail Services

MailGun

Mandrill

SendGrid

CDN/WAF

CloudFlare CloudFront EdgeCast Fastly MaxCDN

Others

Analytics Marketing Automation …

Figure out which ones your site uses!

slide-24
SLIDE 24

answers@hook42.com answers@hook42.com

Check service status pages.

Many look similar. Some are location-based.

slide-25
SLIDE 25

answers@hook42.com answers@hook42.com

Check the server logs.

Acquia error.log

php-errors.log

drupal-watchdog.log Pantheon nginx-error.log

php-error.log

Server logs are hosting dependent.

slide-26
SLIDE 26

answers@hook42.com answers@hook42.com

Check the Drupal logs.

Database Logging module (core) File Logger module Logging and Alerts module Off-site logging via RabbitMQ Logs, Monolog, Logstash, etc.

Drupal logs depend on site configuration.

slide-27
SLIDE 27

answers@hook42.com answers@hook42.com

Here are a few APM tools.

slide-28
SLIDE 28

answers@hook42.com answers@hook42.com

You can analyze the app.

slide-29
SLIDE 29

answers@hook42.com answers@hook42.com

You can analyze the db.

slide-30
SLIDE 30

answers@hook42.com answers@hook42.com

You can analyze the db.

slide-31
SLIDE 31

answers@hook42.com answers@hook42.com

You can analyze the db.

slide-32
SLIDE 32

answers@hook42.com answers@hook42.com

And drill down into code.

slide-33
SLIDE 33

answers@hook42.com answers@hook42.com

And drill down into queries.

slide-34
SLIDE 34

answers@hook42.com answers@hook42.com

Drupal modules to help diagnose issues.

ü Blame ü Hacked ü Security Review ü Logging and Alerts (emaillog)

slide-35
SLIDE 35

answers@hook42.com answers@hook42.com

SUPPO PPORT

slide-36
SLIDE 36

answers@hook42.com answers@hook42.com

What is tech support?

“Technical support refers to a plethora of services by which enterprises provide de assistance to users of assistance to users of technology gy produ ducts such as as mobile phones, televisions, computers, software products or other electronic

  • r mechanical goods.”
slide-37
SLIDE 37

answers@hook42.com answers@hook42.com

Opening a support ticket.

ü First try to make sure it’s not the Drupal site that is the problem ü Determine where to open ticket(s) ü Is site down or severely impacted? Open emergency ticket! ü Be polite ü Thank them for their help

slide-38
SLIDE 38

answers@hook42.com answers@hook42.com

Give tech support what they need.

ü Detailed explanation of problem ü Level of impact ü Traceroute(s) ü Location(s) (if relevant) ü Steps to reproduce ü Diagnostic data when available ü Actions taken to remedy (if any)

slide-39
SLIDE 39

answers@hook42.com answers@hook42.com

RECOVERY RECOVERY

slide-40
SLIDE 40

answers@hook42.com answers@hook42.com

What is disaster recovery?

“Disaster recovery involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.”

slide-41
SLIDE 41

answers@hook42.com answers@hook42.com

How do you recover?

It depends!

slide-42
SLIDE 42

answers@hook42.com answers@hook42.com

Is it hackers?

Block IPs.

slide-43
SLIDE 43

answers@hook42.com answers@hook42.com

Is it hosting, CDN, 3rd party services, or too much good traffic?

Open support tickets.

slide-44
SLIDE 44

answers@hook42.com answers@hook42.com

Update and push hot fix.

Is it bad code or config?

slide-45
SLIDE 45

answers@hook42.com answers@hook42.com

Recover from backups!

Is it completely unfixable?

slide-46
SLIDE 46

answers@hook42.com answers@hook42.com

PR PREVENTIO ION

slide-47
SLIDE 47

answers@hook42.com answers@hook42.com

What is prevention?

“Measures taken to detect, contain, and forestall events or circumstances which, if left unchecked, could result in a disaster.”

slide-48
SLIDE 48

answers@hook42.com answers@hook42.com

Some prevention tips…

ü Managed hosting (if possible) ü Check automated daily backups ü Use code repository ü Track and tag releases ü Dev => Test => Live ü Test & backup before updating live! ü Monitor APM trends regularly ü Monitor long-term load time trends regularly

slide-49
SLIDE 49

answers@hook42.com answers@hook42.com

And more tips…

ü Configure caching ü Spread out cron jobs ü Reduce number of modules ü Update core and modules regularly ü Proactively fix errors in logs ü Auto-block bad IP addresses ü Peer review code ü Limit access

slide-50
SLIDE 50

answers@hook42.com answers@hook42.com

Any questions?

slide-51
SLIDE 51

answers@hook42.com answers@hook42.com

THANKS! THANKS!

Have more questions?

Email us at: answers@hook42.com

slide-52
SLIDE 52

answers@hook42.com

Join us for Sprints

¡

¡

First-­‑Time ¡Sprinter ¡Workshop ¡-­‑ ¡9am-­‑12pm ¡in ¡Room ¡271-­‑273 ¡ Mentored ¡Core ¡Sprint ¡-­‑ ¡9am-­‑6pm ¡in ¡Room ¡275-­‑277 ¡ General ¡Sprints ¡-­‑ ¡9am-­‑6pm ¡in ¡Room ¡278-­‑282 ¡

Friday, ¡May ¡13 ¡at ¡the ¡ConvenMon ¡Center ¡

slide-53
SLIDE 53

answers@hook42.com

So How Was It? Tell Us What You Think

Evaluate this session - https://events.drupal.org/neworleans2016/sessions/wheres-fire- aka-my-site-down-now-what

Thanks!