IPv6 Scanning Smart address selection and comparison to legacy IP - - PowerPoint PPT Presentation

ipv6 scanning smart address selection and comparison to
SMART_READER_LITE
LIVE PREVIEW

IPv6 Scanning Smart address selection and comparison to legacy IP - - PowerPoint PPT Presentation

Chair for Network Architectures and Services Technische Universit at M unchen IPv6 Scanning Smart address selection and comparison to legacy IP Intermediate talk Sebastian Gebhard Supervisors: Oliver Gasser, Quirin Scheitle September


slide-1
SLIDE 1

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

IPv6 Scanning Smart address selection and comparison to legacy IP

Intermediate talk Sebastian Gebhard

Supervisors: Oliver Gasser, Quirin Scheitle September 30, 2015 Chair for Network Architectures and Services Department of Informatics Technische Universit¨ at M¨ unchen

Sebastian Gebhard – IPv6 Scanning 1

slide-2
SLIDE 2

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Motivation Approach Data sourcing Intermediate results Lessons learned so far Future work Time plan

Sebastian Gebhard – IPv6 Scanning 2

slide-3
SLIDE 3

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Motivation

◮ Recent advances in scanning technology have made large

scale IPv4 scanning feasible

◮ zmap [4]: whole IPv4 address space in 4.5 minutes

◮ IPv6 address space is vastly larger

◮ Scanning the whole address space is not feasible Sebastian Gebhard – IPv6 Scanning 3

slide-4
SLIDE 4

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Motivation

◮ Recent advances in scanning technology have made large

scale IPv4 scanning feasible

◮ zmap [4]: whole IPv4 address space in 4.5 minutes

◮ IPv6 address space is vastly larger

◮ Scanning the whole address space is not feasible

Proposed solution

◮ Smart address selection

◮ Choose addresses already seen in the network ◮ Close neighbours of known addresses ◮ Pattern recognition? Sebastian Gebhard – IPv6 Scanning 3

slide-5
SLIDE 5

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Approach

◮ Data harvesting → hitlist generation

◮ Parsing data sources ◮ DNS resolving (for some data sources) ◮ Filtering and evaluating results

◮ Scanning targets

◮ traceroute ◮ Port scanning on port 80, 443 ◮ Evaluation Sebastian Gebhard – IPv6 Scanning 4

slide-6
SLIDE 6

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources

◮ Alexa Top 1Million ◮ Rapid7 rDNS ◮ Rapid7 DNS ANY ◮ Caida DNS names ◮ DNS zone files

Sebastian Gebhard – IPv6 Scanning 5

slide-7
SLIDE 7

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources Alexa Top 1Million [5] CSV list of the TOP 1 Million most visited websites, published by Alexa Internet, Inc.

1,google.com 2,facebook.com 3,youtube.com

Figure: Alexa Top 1M Websites [5] - Entries 1 to 3

◮ Raw size

◮ Filesize: 22 MB, lines: 1,000,000

◮ Processing

◮ DNS lookup: AAAA record of every domain Sebastian Gebhard – IPv6 Scanning 6

slide-8
SLIDE 8

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources scans.io [1] - Rapid7 Reverse DNS Reverse names for the whole IPv4 space (coverage ∼28%)

131.159.14.1,erdbeerschnitzel.net.in.tum.de 131.159.14.10,rack.net.in.tum.de 131.159.14.100,asgard.net.in.tum.de 131.159.14.101,ipmi.asgard.net.in.tum.de

Figure: Rapid7 Reverse DNS dataset excerpt [1]

◮ Raw size

◮ Filesize: 56 GB, lines: 1,216,518,754

◮ Processing

◮ DNS lookup: AAAA record of every hostname Sebastian Gebhard – IPv6 Scanning 7

slide-9
SLIDE 9

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources scans.io [1] - Rapid7 DNS ANY DNS ANY lookup on names gathered from scans by Rapid7

internetscience.net.in.tum.de,aaaa,2001:4ca0:2001:13:216:3eff:fee1:6973 internetwin.tumblr.com,a,66.6.41.21 internetwin.tumblr.com,txt,|v=spf1 include:_spf.google.com include:sendgrid.net include:mail.zendesk.com -all internet-science.eu,ns,lucifer.net.in.tum.de internet-science.eu,ns,nimbus.net.in.tum.de intranet.in.tum.de,cname,intranet.informatik.tu-muenchen.de

Figure: Rapid7 DNS ANY dataset excerpt [1]

◮ Raw size

◮ Filesize: 69 GB, lines: 1,435,173,425

◮ Processing

◮ extract AAAA records Sebastian Gebhard – IPv6 Scanning 8

slide-10
SLIDE 10

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources Caida DNS names [2] IP addresses collected by Caida’s large scale traceroute measurements on Archipelago (Ark) with PTR record

1438578912 2001:4ca0:2001:10::1 st.gw.net.in.tum.de 1438578913 2001:4ca0:2001:10:eea8:6bff:fef4:b705 priwen.net.in.tum.de 1438578951 2001:4ca0:2001:10::2 st.scylla.net.in.tum.de 1438578955 2001:4ca0:2001:10::3 st.charybdis.net.in.tum.de

Figure: Caida DNS names dataset excerpt

◮ Raw size

◮ Filesize: 40 MB, lines: 618,480

◮ Processing

◮ cut -f2 Sebastian Gebhard – IPv6 Scanning 9

slide-11
SLIDE 11

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Data sources DNS zone files [6] Every registered domain in the [biz, com, info, mobi, net, org, xxx] zones

WARRIOR4US.COM GARYBETTUM.COM SDRTCJ.COM CAVALLOCREEKFARM.COM TIESCOMMUNITY.COM

Figure: .com zone file excerpt [6]

◮ Raw size

◮ Filesize: 2.6 GB, lines: 151,881,502

◮ Processing

◮ DNS lookup: AAAA record of every hostname Sebastian Gebhard – IPv6 Scanning 10

slide-12
SLIDE 12

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Intermediate results

◮ Filtering ◮ Evaluation ◮ Results

Sebastian Gebhard – IPv6 Scanning 11

slide-13
SLIDE 13

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Filtering

sort -u bogon? IANAspecial? pfx2as? blacklist? raw final duplicates bogons IANAspecial unnanounced blacklisted

Remove duplicates Remove bogons Remove special prefixes Remove unannounced prefixes Remove blacklisted prefixes

Sebastian Gebhard – IPv6 Scanning 12

slide-14
SLIDE 14

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Filtering

sort -u bogon? IANAspecial? pfx2as? blacklist? raw final duplicates bogons IANAspecial unnanounced blacklisted

Remove duplicates Remove bogons Remove special prefixes Remove unannounced prefixes Remove blacklisted prefixes

Figure: Filter output on Rapid7 DNS ANY

Sebastian Gebhard – IPv6 Scanning 12

slide-15
SLIDE 15

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Evaluation Automated generation of PDF report for single data sources using iPython notebook

◮ Basic statistics

◮ Amount of data removed by filters Sebastian Gebhard – IPv6 Scanning 13

slide-16
SLIDE 16

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Evaluation Automated generation of PDF report for single data sources using iPython notebook

◮ Basic statistics

◮ Amount of data removed by filters

◮ Mapping found IPs to announced prefixes

◮ IPs per prefix ◮ IPs per AS ◮ AS / Prefix with most hits Sebastian Gebhard – IPv6 Scanning 13

slide-17
SLIDE 17

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Evaluation Automated generation of PDF report for single data sources using iPython notebook

◮ Basic statistics

◮ Amount of data removed by filters

◮ Mapping found IPs to announced prefixes

◮ IPs per prefix ◮ IPs per AS ◮ AS / Prefix with most hits

◮ Efficiency of data source

◮ Unique, real IP addresses per size of source file ◮ Possible SLAAC addresses ◮ # and % of all prefixes / ASes found Sebastian Gebhard – IPv6 Scanning 13

slide-18
SLIDE 18

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Evaluation Automated generation of PDF report for single data sources using iPython notebook

◮ Basic statistics

◮ Amount of data removed by filters

◮ Mapping found IPs to announced prefixes

◮ IPs per prefix ◮ IPs per AS ◮ AS / Prefix with most hits

◮ Efficiency of data source

◮ Unique, real IP addresses per size of source file ◮ Possible SLAAC addresses ◮ # and % of all prefixes / ASes found

◮ Hamming Weight

◮ Calculate hamming weight of interface ID of each IP ◮ Compare hamming weights ◮ Identify servers and clients? Sebastian Gebhard – IPv6 Scanning 13

slide-19
SLIDE 19

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Results for hitlist generation

Alexa Top 1M rDNS DNS Any Caida DNS names zone files size 22 MB 56 GB 69 GB 40 MB 2.6 GB input lines 1M 1.2B 1.4B 618k 151M raw 90,761 (9.076 %)a 1,023,950 (0.084 %) 9,769,653 (0.681 %) 618,480 (100.000 %) 4,762,297 (3.136 %) final 43,822 (4.382 %) 462,185 (0.038 %) 1,440,984 (0.100 %) 102,580 (16.586 %) 430,689 (0.284 %) ASes 1,424 (1,424 ppm)b 4,795 (4 ppm) 5,708 (4 ppm) 5,488 (8,873 ppm) 2,371 (16 ppm) PFXes 1,695 (1,695 ppm) 6,749 (6 ppm) 8,506 (6 ppm) 9,269 (14,987 ppm) 2,995 (20 ppm) AS coverage 13.984 % 47.088 % 56.054 % 53.894 % 23.284 % PFX coverage 6.575 % 26.178 % 32.993 % 35.953 % 11.617 % Combined AS coverage 7,331 (71.99 %) Combined PFX coverage 12,854 (49.86 %)

aSecond line: Efficiency = value input lines bppm: Parts per million = AS / pfx per 1 Million input lines)

Sebastian Gebhard – IPv6 Scanning 14

slide-20
SLIDE 20

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Lessons learned

◮ Always sanitize your input data

◮ PTR records don’t have to contain a valid hostname.

109.234.32.137

which resolves to \236\232\240-\234\240\229\241\229\23586.\240\244.

Sebastian Gebhard – IPv6 Scanning 15

slide-21
SLIDE 21

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Lessons learned

◮ Always sanitize your input data

◮ PTR records don’t have to contain a valid hostname.

109.234.32.137

which resolves to \236\232\240-\234\240\229\241\229\23586.\240\244.

◮ People are using IPv6 over ISDN

Sebastian Gebhard – IPv6 Scanning 15

slide-22
SLIDE 22

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Ongoing work

◮ Finalizing analysis of hitlists

◮ In-depth evaluation ◮ Compare reports

◮ Compare variations of data sources over time ◮ Tracerouting

Next steps

◮ Port scanning ◮ SSL scanning ◮ If IPv4 available: compare IPv4 and IPv6 traceroutes ◮ Evaluation of scan / traceroute results

Sebastian Gebhard – IPv6 Scanning 16

slide-23
SLIDE 23

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Time plan

TODAY 2015 2016 June July August September October November December January

100% complete

Familiarization Introduction Talk

100% complete

Related work

100% complete

Hitlist creation Summer break Software architecture

60% complete

Hitlist evaluation Midterm presentation

30% complete

Scanning / Evaluation

20% complete

Thesis writing Thesis hand-in Presentation writing Final presentation

Sebastian Gebhard – IPv6 Scanning 17

slide-24
SLIDE 24

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Bibliography

[1] Internet-Wide Scan Data Repository https://scans.io. [2] The CAIDA UCSD IPv6 DNS Names Dataset - 20150629 - 20150904, http://www.caida.org/data/active/ipv6_dnsnames_dataset.xml. [3] Unbound - DNS Server https://www.unbound.net/. [4] D. Adrian, Z. Durumeric, G. Singh, and J. A. Halderman. Zippier ZMap: internet-wide scanning at 10 Gbps. In Proceedings of the 8th USENIX Workshop

  • n Offensive Technologies, 2014.

[5] Alexa Internet Inc. Top 1,000,000 sites (updated daily). [6] PremiumDrops.com. Domain zone files http://www.premiumdrops.com.

Sebastian Gebhard – IPv6 Scanning 18

slide-25
SLIDE 25

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Backup slides

Sebastian Gebhard – IPv6 Scanning 19

slide-26
SLIDE 26

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Large-scale DNS lookups

◮ “unbound is a validating, recursive, and caching DNS

resolver” [3]

◮ validating and caching disabled for performance reasons ◮ up to 20k queries per second (qps)

◮ adns as bulk, asynchronous DNS resolver

◮ query 2000 hostnames at once → reduce timeout waiting

◮ Real world performance: 6k qps

◮ ∼ 2.5 days for Rapid7 rDNS dataset Sebastian Gebhard – IPv6 Scanning 20

slide-27
SLIDE 27

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Stateless Address Autoconfiguration - SLAAC

5 10 15 20

% Alexa rapid7 rDNS rapid7 DNS ANY Caida dns-names zone files

Figure: Percentage of (possible) SLAAC addresses per data source

Sebastian Gebhard – IPv6 Scanning 21

slide-28
SLIDE 28

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Alexa rDNS DNS ANY Caida zone files 20 40 60

% unique no-fullbogons no-ianaspecial pfx2as

Figure: Remaining percentage of IPs after filter steps Alexa and rDNS have the lowest amount of duplicate IPs. rDNS and DNS ANY have the highest fraction of bogon / special IPs.

Sebastian Gebhard – IPv6 Scanning 22

slide-29
SLIDE 29

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

classification

September 23, 2015

0.0.1 Sebastian Gebhard: IPv6 Scanning - Smart address selection and comparison to legacy IP

1 Hitlist evaluation Report

This is the hitlist evaluation report on file: 2015-09-01 Alexa Top1M AAAA-records.csv Report generated on: 2015-09-23 17:19:14.161141 Using Caida Prefix2AS mapping from: [’20150907’] Total ASes: 10183 Total Prefixes: 25781

Sebastian Gebhard – IPv6 Scanning 23

slide-30
SLIDE 30

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen 1.1 Filter statistics

Total IPv6 addresses collected: 90671 Total unique IPv6 addresses: 44320 Removed by prefilters: Bogon addresses: 210 IANA special: 245 Unannounced prefixes: 42 Blacklisted IPs: Remaining IPs for scanning 43822 finished without threading 1.1.1 How many ASes / How many prefixes with IPs? ASes: 1424 of total ASes: 10183 --> 13.98% Prefixes: 1695 of total prefixes 25781 --> 6.57% 1.1.2 Biggest ASes AS: 13335 with 22986 discovered IPs 1.1.3 Prefixes with most hits AS: 13335 665 Third Street Suite 207 with 22985 discovered IPs 2

Sebastian Gebhard – IPv6 Scanning 24

slide-31
SLIDE 31

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

3

Sebastian Gebhard – IPv6 Scanning 25

slide-32
SLIDE 32

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

Todo: Remove the biggest Prefix and plot again Todo: Give information on the biggest prefixes Todo: Server: 38996 Clients: 4826 Based on a limit of hammingweight 22 4

Sebastian Gebhard – IPv6 Scanning 26