Astro- -I nformatics: I nformatics: Astro Computation in the - - PowerPoint PPT Presentation

astro i nformatics i nformatics astro computation in the
SMART_READER_LITE
LIVE PREVIEW

Astro- -I nformatics: I nformatics: Astro Computation in the - - PowerPoint PPT Presentation

Astro- -I nformatics: I nformatics: Astro Computation in the study Computation in the study of the Universe of the Universe Bob Mann and Andy Lawrence Bob Mann and Andy Lawrence Institute for Astronomy, School of Physics Institute for


slide-1
SLIDE 1

Astro Astro-

  • I nformatics:

I nformatics: Computation in the study Computation in the study

  • f the Universe
  • f the Universe

Bob Mann Bob Mann and Andy Lawrence

and Andy Lawrence Institute for Astronomy, School of Physics Institute for Astronomy, School of Physics ( (rgm@roe.ac.uk rgm@roe.ac.uk & & al@roe.ac.uk al@roe.ac.uk) )

slide-2
SLIDE 2

Plan Plan

  • Computational Astrophysics

Computational Astrophysics

– – N N-

  • body simulations of galaxy clustering

body simulations of galaxy clustering

  • Astro

Astro-

  • Informatics

Informatics

– – Survey astronomy & the Virtual Observatory Survey astronomy & the Virtual Observatory

  • Discussion

Discussion

– – Astronomy and informatics Astronomy and informatics

slide-3
SLIDE 3

Plan Plan

  • Computational Astrophysics

Computational Astrophysics

– – N N-

  • body simulations of galaxy clustering

body simulations of galaxy clustering

  • Astro

Astro-

  • Informatics

Informatics

– – Survey astronomy & the Virtual Observatory Survey astronomy & the Virtual Observatory

  • Discussion

Discussion

– – Astronomy and informatics Astronomy and informatics

slide-4
SLIDE 4

Observing galaxy clustering Observing galaxy clustering

  • 1930s: Hubble

1930s: Hubble

– – Galaxies aren Galaxies aren’ ’t uniformly distributed on sky t uniformly distributed on sky

  • 1950s: Shane and

1950s: Shane and Wirtanen Wirtanen

– – Map of galaxy distribution on the sky from Map of galaxy distribution on the sky from counting 100,000 galaxies counting 100,000 galaxies by eye by eye (10 years!) (10 years!)

  • 1980s:

1980s: CfA CfA Redshift Redshift Survey Survey

– – ( (Huchra Huchra, Geller, de , Geller, de Lapparent Lapparent) ) – – First sizeable 3D map of the local Universe First sizeable 3D map of the local Universe

  • Measured rough distances to ~ 11,000 galaxies

Measured rough distances to ~ 11,000 galaxies

slide-5
SLIDE 5

1985: first 1985: first CfA CfA survey survey

  • Rich structure

Rich structure – – walls, filaments, voids walls, filaments, voids… …

– – How to explain this richness of structure? How to explain this richness of structure? 3D map of a pyramidal slice

  • f space, projected into 2D

~ 500 million light years

slide-6
SLIDE 6

Modelling galaxy clustering Modelling galaxy clustering

  • Physics simple in Cold Dark Matter model

Physics simple in Cold Dark Matter model

– – Collisionless Collisionless material moving under gravity material moving under gravity

  • Apply perturbation theory to density field

Apply perturbation theory to density field

– – Linear theory treatment simple, but Linear theory treatment simple, but… … – – Perturbations non Perturbations non-

  • linear on scales of interest

linear on scales of interest

  • Fourier modes couple, analytic methods fail

Fourier modes couple, analytic methods fail

  • Need numerical simulations to model

Need numerical simulations to model galaxy clustering into non galaxy clustering into non-

  • linear regime

linear regime

– – Set up test masses and evolve under gravity: Set up test masses and evolve under gravity: i.e. gravitational i.e. gravitational N N-

  • body simulations

body simulations

slide-7
SLIDE 7

Two decades of N Two decades of N-

  • body

body simulations simulations

  • 1985: Davis,

1985: Davis, Efstathiou Efstathiou, , Frenk Frenk, White , White

– – (32) (32) 3

3 particles

particles – – < 10 particles per galaxy < 10 particles per galaxy – – Early success for Cold Dark Matter model Early success for Cold Dark Matter model

  • 2005: Virgo Consortium

2005: Virgo Consortium

– – Inc. John Peacock (

  • Inc. John Peacock (IfA

IfA), plus EPCC ), plus EPCC – – (2202) (2202) 3

3 particles

particles – – ~ 1000 particles per galaxy ~ 1000 particles per galaxy

  • Mass resolution increased by a factor of ~ 10

Mass resolution increased by a factor of ~ 102

2

and simulation volume by a factor of ~ 10 and simulation volume by a factor of ~ 103

3

slide-8
SLIDE 8

Theory Theory v v Observation Observation

  • Theory: VIRGO

Theory: VIRGO

  • Observation: 2dFGRS

Observation: 2dFGRS

(inc. John Peacock) (inc. John Peacock) ~ 250,000 galaxies ~ 250,000 galaxies

  • (SDSS: ~ 500,000 galaxies)

(SDSS: ~ 500,000 galaxies) Quantitative clustering analysis reveals theory and observation in excellent agreement

slide-9
SLIDE 9

Galaxy clustering summary Galaxy clustering summary

  • Cold Dark Matter model accounts for

Cold Dark Matter model accounts for the observed clustering of galaxies the observed clustering of galaxies

– – Major triumph of modern astronomy Major triumph of modern astronomy

  • Numerical simulations crucial, but this is

Numerical simulations crucial, but this is astronomers using computers, not astronomers using computers, not astronomers using computer science astronomers using computer science

– – Are there examples of real interaction Are there examples of real interaction between astronomy & computer science? between astronomy & computer science?

  • More interesting than just number

More interesting than just number-

  • crunching?

crunching?

slide-10
SLIDE 10

Plan Plan

  • Computational Astrophysics

Computational Astrophysics

– – N N-

  • body simulations of galaxy clustering

body simulations of galaxy clustering

  • Astro

Astro-

  • Informatics

Informatics

– – Survey astronomy & the Virtual Observatory Survey astronomy & the Virtual Observatory

  • Discussion

Discussion

– – Astronomy and informatics Astronomy and informatics

slide-11
SLIDE 11

Observational Astronomy Observational Astronomy

  • Electromagnetic spectrum

Electromagnetic spectrum

  • Multiwavelength

Multiwavelength view of a spiral galaxy view of a spiral galaxy

– – Different angular resolution of instruments Different angular resolution of instruments – – Different physical emission mechanisms Different physical emission mechanisms

IRAS 25µ DSS Optical

IRAS 100µ

NVSS 20cm GB 6cm ROSAT ~keV

WENSS 92cm

(M51 graphics from Jim Gray & Alex (M51 graphics from Jim Gray & Alex Szalay Szalay) )

slide-12
SLIDE 12

Changes in the way that Changes in the way that we make observations we make observations

  • Old Style:

Old Style: Many small, specific programmes Many small, specific programmes – – Astronomer proposes observations, goes Astronomer proposes observations, goes to telescope, brings data home on tape, to telescope, brings data home on tape, analyses data, publishes paper, puts tape analyses data, publishes paper, puts tape in desk drawer and forgets about it in desk drawer and forgets about it

  • New Style:

New Style: Few large, multi Few large, multi-

  • use surveys

use surveys – – Consortium designs survey to address Consortium designs survey to address many science goals, undertakes survey many science goals, undertakes survey

  • ver several years, establishes database
  • ver several years, establishes database

– – many many people do people do different different science with science with same same data from DB data from DB

slide-13
SLIDE 13

Trends behind these changes Trends behind these changes

  • Instruments made easier to use & more

Instruments made easier to use & more effort put into data reduction software effort put into data reduction software

– – Easier to use data from new instrument Easier to use data from new instrument – – Multiwavelength Multiwavelength astronomy much easier astronomy much easier

  • Instruments are more sensitive and have

Instruments are more sensitive and have more detector elements more detector elements

– – Can image large areas of sky quickly Can image large areas of sky quickly – – Survey mode of observation more efficient Survey mode of observation more efficient

slide-14
SLIDE 14

Very strong local interest Very strong local interest

  • Wide Field Astronomy Unit

Wide Field Astronomy Unit

– – Part of the Part of the UoE UoE Institute for Astronomy Institute for Astronomy – – Based at Royal Observatory Based at Royal Observatory Edinburgh, on Edinburgh, on Blackford Blackford Hill Hill

  • Two strands to WFAU work

Two strands to WFAU work

– – Curation Curation of optical/near

  • f optical/near-
  • infrared sky surveys

infrared sky surveys – – Helping build the global Helping build the global “ “Virtual Observatory Virtual Observatory” ”

slide-15
SLIDE 15

The Virtual Observatory The Virtual Observatory

  • Goals

Goals

– – Federate all the world Federate all the world’ ’s astronomy data s astronomy data – – Provide resources for exploitation of data Provide resources for exploitation of data

  • Challenges

Challenges – – sociological & technical sociological & technical

– – Heterogeneous, distributed datasets Heterogeneous, distributed datasets

  • Lack of global schema; metadata often poor

Lack of global schema; metadata often poor

– – Legacy analysis codes in many languages Legacy analysis codes in many languages

  • Solution

Solution

– – International collaboration International collaboration – – Architecture built on web services Architecture built on web services

slide-16
SLIDE 16

Registry DB1 DB2 DB3 DB4 User Compute Resource Schematic Virtual Observatory

slide-17
SLIDE 17

WFAU WFAU’ ’s s computational problems computational problems

  • Quality Control

Quality Control

  • Spatial Indexing

Spatial Indexing

  • Analysis close to DB

Analysis close to DB

  • Provenance

Provenance

  • Lack of Global Schema

Lack of Global Schema

  • Query Language

Query Language

  • Difficulty in Making Joins

Difficulty in Making Joins

  • Integration with the Literature

Integration with the Literature Individual sky survey archives: scale Virtual Observatory: interoperability

slide-18
SLIDE 18

Quality control: Quality control: automated junk detection automated junk detection

  • SuperCOSMOS

SuperCOSMOS Sky Survey Sky Survey

– – Scans of photographic plates Scans of photographic plates – – ~ 1800 plates cover whole sky ~ 1800 plates cover whole sky – – Image analyser run over images Image analyser run over images

  • ~ 250,000 sources per plate

~ 250,000 sources per plate

  • Classes of spurious source

Classes of spurious source

– – Trails: satellites, aeroplanes, Trails: satellites, aeroplanes,… … – – Diffraction effects around bright stars Diffraction effects around bright stars

  • How to find these spurious sources?

How to find these spurious sources?

slide-19
SLIDE 19

Quality control: Quality control: automated junk detection (2) automated junk detection (2)

  • Junk found in unusual configurations

Junk found in unusual configurations

– – Lines, circles: the eye spots them easily Lines, circles: the eye spots them easily – – but can but can’ ’t eyeball thousands of plates! t eyeball thousands of plates!

  • Amos

Amos Storkey Storkey, Chris Williams, Nigel

, Chris Williams, Nigel Hambly Hambly – – Developed new generative method, based on Developed new generative method, based on unlikeliness unlikeliness of configurations

  • f configurations
slide-20
SLIDE 20

Analysing sky survey data Analysing sky survey data

  • WFAU has multi

WFAU has multi-

  • TB sky survey databases

TB sky survey databases

  • Many analyses will use much of the data

Many analyses will use much of the data

– – e.g. finding one e.g. finding one-

  • in

in-

  • a

a-

  • million unusual objects

million unusual objects – – e.g. quantifying properties of populations e.g. quantifying properties of populations

  • Users can

Users can’ ’t download data to workstation t download data to workstation

– – WFAU must provide analysis services on DB WFAU must provide analysis services on DB

  • Security issues if users upload their code

Security issues if users upload their code

– – Application of mobile code security work? Application of mobile code security work? – – discussion started with Don discussion started with Don Sannella Sannella’ ’s s group group

slide-21
SLIDE 21

Difficulty of matching entries Difficulty of matching entries between sky survey databases between sky survey databases

  • Angular resolution varies between datasets

Angular resolution varies between datasets

  • Matching by spatial proximity is inadequate

Matching by spatial proximity is inadequate

slide-22
SLIDE 22

Difficulty of matching entries Difficulty of matching entries between sky survey databases (2) between sky survey databases (2)

  • Probabilistic framework well established

Probabilistic framework well established

– – But need to know properties of source populations But need to know properties of source populations

  • Often not the case

Often not the case

  • Learn the probabilities

Learn the probabilities for matching different for matching different classes of source classes of source iteratively (EM algorithm) iteratively (EM algorithm)

  • Emma Taylor (PhD),

Emma Taylor (PhD), with Amos with Amos Storkey Storkey & & Chris Williams Chris Williams

slide-23
SLIDE 23

Difficulty of matching entries Difficulty of matching entries between sky survey databases (3) between sky survey databases (3)

  • Sophisticated matching algorithms are

Sophisticated matching algorithms are

  • ften computationally expensive
  • ften computationally expensive

– – Want to cache matches for re Want to cache matches for re-

  • use

use

  • AstroDAS

AstroDAS: Diego : Diego Prina Prina Ricotti Ricotti, , Raj Raj Bose Bose

– – Distributed annotation server for astronomy Distributed annotation server for astronomy

Optical X-ray Radio Annotation Server

slide-24
SLIDE 24

I ntegrating the online I ntegrating the online literature into the VO literature into the VO

  • If we find an interesting object, we

If we find an interesting object, we frequently want to ask questions like: frequently want to ask questions like:

– – What What’ ’s known about this area of sky? s known about this area of sky? – – What What’ ’s known about objects like this? s known about objects like this? – – Have objects like this been reported before? Have objects like this been reported before?

  • Literature is too large to search manually

Literature is too large to search manually

– – Can text mining techniques help? Can text mining techniques help?

slide-25
SLIDE 25

I ntegrating the online I ntegrating the online literature into the VO (2) literature into the VO (2)

  • AstroNER

AstroNER: Named Entity Recognition : Named Entity Recognition

– – Claire Grover, Ben Claire Grover, Ben Hachey Hachey et al. et al.

  • Look at abstracts of journal articles

Look at abstracts of journal articles related to spectroscopy of active galaxies related to spectroscopy of active galaxies

  • Try to identify nouns of four types

Try to identify nouns of four types

– – instrument instrument-

  • name, spectral

name, spectral-

  • feature,

feature, source source-

  • type, source

type, source-

  • name

name

  • Apply various techniques, using training

Apply various techniques, using training data annotated by data annotated by astro astro PhD students PhD students

slide-26
SLIDE 26
slide-27
SLIDE 27

Plan Plan

  • Computational Astrophysics

Computational Astrophysics

– – N N-

  • body simulations of galaxy clustering

body simulations of galaxy clustering

  • Astro

Astro-

  • Informatics

Informatics

– – Survey astronomy & the Virtual Observatory Survey astronomy & the Virtual Observatory

  • Discussion

Discussion

– – Astronomy and informatics Astronomy and informatics

slide-28
SLIDE 28

Two classes of research Two classes of research

  • Computational Astrophysics

Computational Astrophysics

– – Astronomers using computers to solve a Astronomers using computers to solve a specific problem in astrophysics specific problem in astrophysics

  • Astro

Astro-

  • Informatics

Informatics

– – Astronomers and computer scientists Astronomers and computer scientists collaborating in the application of collaborating in the application of computational techniques to astronomy computational techniques to astronomy

slide-29
SLIDE 29

c.f. distinction made by c.f. distinction made by Jim Gray (Microsoft) Jim Gray (Microsoft)

  • Comp

Comp-

  • X

X

– – X X-

  • ologists
  • logists using computers to solve a

using computers to solve a specific problem in X specific problem in X-

  • ology
  • logy
  • X

X-

  • Info

Info

– – X X-

  • ologists
  • logists and computer scientists

and computer scientists collaborating in the application of collaborating in the application of computational techniques to X computational techniques to X-

  • ology
  • logy
slide-30
SLIDE 30

Comp Comp-

  • X & X

X & X-

  • info compared

info compared

  • Comp

Comp-

  • X

X

– – Involves only X Involves only X-

  • ologists
  • logists

– – Should be funded as X Should be funded as X-

  • ology
  • logy research

research

  • X

X-

  • Info

Info

– – Requires X Requires X-

  • ologists
  • logists and computer scientists

and computer scientists

  • How should this be funded? Can both sides be kept happy?

How should this be funded? Can both sides be kept happy?

  • Comp

Comp-

  • X/X

X/X-

  • Info boundary is domain

Info boundary is domain-

  • specific

specific

– – Particle physics is almost all Comp Particle physics is almost all Comp-

  • X

X – – Biology is mainly X Biology is mainly X-

  • info

info – – bioinformatics bioinformatics – – Astronomy is a mixture of both Astronomy is a mixture of both

slide-31
SLIDE 31

Can X Can X-

  • info work?

info work?

  • Example of successful X

Example of successful X-

  • info:

info: PiCA PiCA group group – – Pittsburgh Computational Pittsburgh Computational Astrostatistics Astrostatistics Group Group – – Sustained collaboration: 1999 onwards Sustained collaboration: 1999 onwards – – Astronomy, CS and statistics expertise Astronomy, CS and statistics expertise – – Focus on scalable data Focus on scalable data mining algorithms mining algorithms

  • Astro

Astro requirements requirements drive research in both drive research in both statistics and CS

Andrew Moore

statistics and CS

slide-32
SLIDE 32

Can X Can X-

  • info work here?

info work here?

  • It is!...to some extent

It is!...to some extent

– – as this lecture series illustrates as this lecture series illustrates – – I I ’ ’ve described several ve described several astro astro-

  • info projects

info projects

  • How can we do X

How can we do X-

  • info better?

info better?

  • Sustained interactions

Sustained interactions… …

– – Understand areas of mutual interest Understand areas of mutual interest – – Give Give-

  • and

and-

  • take over individual projects

take over individual projects

  • ..which require funding

..which require funding

– – e.g. cross e.g. cross-

  • School PhD studentships

School PhD studentships

slide-33
SLIDE 33

Summary & Conclusions Summary & Conclusions

  • Astronomy relies on computation

Astronomy relies on computation

– – On both theoretical and observational sides On both theoretical and observational sides – – In both Comp In both Comp-

  • X and X

X and X-

  • info modes

info modes

  • Astronomy is a good

Astronomy is a good “ “X X” ” for X for X-

  • info

info

– – Data: free, voluminous, no ethical issues Data: free, voluminous, no ethical issues – – Needs storing, indexing, describing, mining Needs storing, indexing, describing, mining… …

  • Challenge: how to make X

Challenge: how to make X-

  • info work well

info work well

– – Huge rewards for { X} and informatics Huge rewards for { X} and informatics