Blaise Source Code Blaise Source Code Editing System Presenter: - - PowerPoint PPT Presentation

blaise source code blaise source code editing system
SMART_READER_LITE
LIVE PREVIEW

Blaise Source Code Blaise Source Code Editing System Presenter: - - PowerPoint PPT Presentation

Blaise Source Code Blaise Source Code Editing System Presenter: Danilo Gutierrez C Co-author: Sheila Deskins th Sh il D ki Health and Retirement Study (HRS) The 11th International Blaise Conference Annapolis, Maryland September 2007


slide-1
SLIDE 1

Blaise Source Code Blaise Source Code Editing System

Presenter: Danilo Gutierrez C th Sh il D ki Co-author: Sheila Deskins Health and Retirement Study (HRS) The 11th International Blaise Conference Annapolis, Maryland September 2007

slide-2
SLIDE 2

Presentation Overview Presentation Overview

How Big is Big? How Big is Big? What Does a Source Editor Do? Th S t d U d ti N L The System and Updating a New Language Current Use & Future Plans Questions

Survey Research Center • Institute for Social Research • University of Michigan

slide-3
SLIDE 3

How Big How Big is is Big? Big?

Survey Research Center • Institute for Social Research • University of Michigan

slide-4
SLIDE 4

HRS CAI Size HRS CAI Size

  • Datamodel Source Code ( bla

inc) Datamodel Source Code (.bla, .inc) 175 624 P li 175,624 Program lines 61 Include files 518 Procedures 344 Blocks 344 Blocks 10 Tables

Survey Research Center • Institute for Social Research • University of Michigan

slide-5
SLIDE 5

HRS CAI Size HRS CAI Size

  • Fields

Fields

– 5,818 Fields 1 773 Auxfields – 1,773 Auxfields – 1,691 Locals 5 754 Parameters – 5,754 Parameters

Survey Research Center • Institute for Social Research • University of Michigan

slide-6
SLIDE 6

HRS CAI Size HRS CAI Size

  • Type Definitions

yp

– 8,962 USER-DEFINED – 1,390 ENUMERATED – 2,366 STRING – 481 RANGE 366 OPEN – 366 OPEN – 309 INTEGER – 126 ARRAY – 141 SET – 11 REAL

Survey Research Center • Institute for Social Research • University of Michigan

slide-7
SLIDE 7

What does a “Source Editor” do?

Survey Research Center • Institute for Social Research • University of Michigan

slide-8
SLIDE 8

System Core System Core

  • Parses source code files (.bla and .inc)
  • Merges in update information
  • Writes updated source code files

Writes updated source code files

Survey Research Center • Institute for Social Research • University of Michigan

slide-9
SLIDE 9

Parsing - example Parsing - example

Statement: Q1 (One) "Are you ready to answer questions?“ : (y,n) Parses into tokens of: Parses into tokens of:

Q1 (One) “Are you ready to answer questions?“ y y q : ( y , n )

Survey Research Center • Institute for Social Research • University of Michigan

slide-10
SLIDE 10

Definition of Token Definition of Token

A token is part of a program statement consisting of characters identified as consisting of characters identified as meaningful syntax.

Survey Research Center • Institute for Social Research • University of Michigan

slide-11
SLIDE 11

Merging Merging

  • There are two inputs in merging

– The user update request information – The parsed tokens from original source code

Survey Research Center • Institute for Social Research • University of Michigan

slide-12
SLIDE 12

User Request User Request

User update request information: User update request information: fi ld Q1 fieldname: Q1 token type: descriptor language: default edit instruction: add (or delete) edit instruction: add (or delete) update: “This is the new label”

Survey Research Center • Institute for Social Research • University of Michigan

slide-13
SLIDE 13

Merge results - example Merge results - example

Source code in tokens:

Q1 (One) “Are you ready to answer questions?” y y q

“This is the new label” :

( y , n )

Problem: the descriptor looks like question text!

Survey Research Center • Institute for Social Research • University of Michigan

slide-14
SLIDE 14

How to fix the problem? How to fix the problem?

  • Blaise Data Object (BDO) contains all

Blaise Data Object (BDO) contains all possible parts of a Blaise syntax statement statement. Bl i S t f Fi ld Blaise Syntax for Fields

Q [ Q1 [ ] ] [ ( T ) ] [ [ Lid ] "T t" ] [ ] Q [ Q1, [ ... ] ] [ ( Tag ) ] [ [ Lid ] "Text" ] [ ... ] [ / [ Lid ] "Description" ] [ ... ] : T

Survey Research Center • Institute for Social Research • University of Michigan

slide-15
SLIDE 15

Using the BDO Using the BDO

Syntax BDO with update y Q [ Q1, [ ... ] ] [ ( T ) ] p Q1 <blank> (O ) [ ( Tag ) ] [ [ Lid ] "Text" ] (One) [ [ Lid ] "Are you ready to answer questions?" ] [ ... ] [ / [ Lid ] "Description" ] [ ... ] [ / [ Lid ] " This is the new label" ] "Description" ] [ ... ] : T " This is the new label" ] [ ... ] : T

Survey Research Center • Institute for Social Research • University of Michigan

slide-16
SLIDE 16

Edited Source Code Edited Source Code

Q1 (One) “Are you ready to answer questions?”

/ “This is the new label” : (y n) / “This is the new label” : (y , n)

Survey Research Center • Institute for Social Research • University of Michigan

slide-17
SLIDE 17

Writing Writing

  • Writing is simpler when the database is

Writing is simpler when the database is already organized by the parsing and merging processes merging processes

  • Need to write out whitespace, comments,

file names etc file names, etc.

  • Write Spanish language diacriticals
  • Write the same number of files as were

parsed

Survey Research Center • Institute for Social Research • University of Michigan

slide-18
SLIDE 18

System Considerations System Considerations

Survey Research Center • Institute for Social Research • University of Michigan

slide-19
SLIDE 19

System Considerations System Considerations

Editing requires .BMI (BCP) Parse Blaise source files (.bla, .inc) into tokens Y Y Keep Comments Y N Keep include file Y N Keep include file structure Y N Keep whitespace Y N

Survey Research Center • Institute for Social Research • University of Michigan

Keep whitespace Y N

slide-20
SLIDE 20

Why would we want a Source Editor System?

  • HRS is longitudinal study.

g y

  • It’s a ‘big’ application.
  • Most of the large scale (bulk) changes have to

do with fields ‘Small’ changes usually involve a do with fields. Small changes usually involve a few hundred fields (10% = 581 changes). Th f l “ ll” h h

  • There often are several “small” changes that

take place during the CAI preparation for a field period.

Survey Research Center • Institute for Social Research • University of Michigan

p

slide-21
SLIDE 21

What We Learned What We Learned

  • For the 2004 Descriptor update task

For the 2004 Descriptor update task

– The few descriptors mentioned turned out to be a 2 700 descriptors change request be a 2,700 descriptors change request – The merge key that was provided with the – The merge key that was provided with the descriptor update request information was the DEP field name

Survey Research Center • Institute for Social Research • University of Michigan

slide-22
SLIDE 22

Translator Functions Translator Functions

  • Convert DEP field names to defined block

Convert DEP field names to defined block and field name

  • Report duplicate requests for same
  • Report duplicate requests for same

defined block and field name

Survey Research Center • Institute for Social Research • University of Michigan

slide-23
SLIDE 23

Translator Translator

  • Need to ‘translate’ DEP fieldname paths to

Need to translate DEP fieldname paths to defined block name. Block Name (Def) # ind DEP Path Block Name (Def) # ind DEP Path BB 41 1 BB_Born 42 1 SecB.Born BB_ShowStateList 43 1 SecB.Born.B076_ B003_ BB_ShowStateList 43 2 SecB.LivedArea.B

Survey Research Center • Institute for Social Research • University of Michigan

078_B047_

slide-24
SLIDE 24

Translator Translator

  • Several DEP fields update request with one

Several DEP fields update request with one define field

Block (Defined) Name (Defined) Existing Descriptor User Descriptor Request DEP Field Name

BB Marry B066 MARR YEAR BEG FIRST MARRIAGE YEAR BEGAN SecB.Marry[1].B066 _ y 066_ G S G G S y[ ] 066_ BB_Marry B066_ MARR YEAR BEG SECOND MARRIAGE YEAR BEGAN SecB.Marry[2].B066_ BB_Marry B066_ MARR YEAR BEG THIRD MARRIAGE YEAR BEGAN SecB.Marry[3].B066_ BB Marry B066 MARR YEAR BEG MARRIAGE YEAR BEGAN -4 SecB.Marry[4].B066 _ y _ y[ ] _

Survey Research Center • Institute for Social Research • University of Michigan

slide-25
SLIDE 25

The System The System and and Updating a new language

Survey Research Center • Institute for Social Research • University of Michigan

slide-26
SLIDE 26

New Language Update New Language Update

  • The early system that handled updating

The early system that handled updating descriptors needed to be expanded to handle the ‘update’ addition of a new handle the update addition of a new language.

Survey Research Center • Institute for Social Research • University of Michigan

slide-27
SLIDE 27

System Conceptualization System Conceptualization

  • Parser Application

Parser Application

  • DEP Field Name Translator

BDO C ti

  • BDO Creation
  • Merger Application
  • Writer Application
  • User Interface

User Interface

Survey Research Center • Institute for Social Research • University of Michigan

slide-28
SLIDE 28

Survey Research Center • Institute for Social Research • University of Michigan

slide-29
SLIDE 29

System Design Considerations

  • Encapsulated routines and procedures for

Encapsulated routines and procedures for each function

  • Reusable code versus ad hoc routines
  • Reusable code versus ad hoc routines
  • Tokens described in more meaningful

terms

  • Language order option
  • Parsing whitespace

Survey Research Center • Institute for Social Research • University of Michigan

slide-30
SLIDE 30

Language Re-order Language Re-order

Before Quick Best CORE English CORE English CORE English CORE Spanish CORE Spanish CORE Spanish CORE Spanish CORE Spanish CORE Spanish PROXY English PROXY English PROXY English EXIT English EXIT English PROXY Spanish EXIT English EXIT English PROXY Spanish EXIT Spanish EXIT Spanish EXIT English S MEDIA MEDIA EXIT Spanish PROXY Spanish MEDIA

Survey Research Center • Institute for Social Research • University of Michigan

slide-31
SLIDE 31

BDO and Language BDO and Language

  • Looking at all statements that can contain

Looking at all statements that can contain language.

Field question text – Field question text – enumerated data type code descriptions, and descriptors – descriptors

  • Coming up with a new way to describe all

i l d t k d k f th involved tokens, and merge keys for the token types.

Survey Research Center • Institute for Social Research • University of Michigan

slide-32
SLIDE 32

Other considerations Other considerations

  • Update request for enumerations (defined

Update request for enumerations (defined at the field)

Merge keys needed – Merge keys needed – BDO for an enumeration type statement

Fields TP50 "What type of fish do you have?": (n "none", f "fresh water", s "salt water" )

Survey Research Center • Institute for Social Research • University of Michigan

slide-33
SLIDE 33

Other considerations Other considerations

  • Update request for enumerations (defined

Update request for enumerations (defined as a type)

Merge keys needed – Merge keys needed – Look and find the type

TP60 "Wh t t f l id f TP60 "What type of mammals aside from dogs or cats do you have?“ : TMammals

Blk Blk Blkname Field Name Type Name Type No Blk pat h Blk No End Blk NameEnd Type NameEnd TokenType SE_illustration tmammals 25 1.. 1 SE_illustration tmammals TName

Survey Research Center • Institute for Social Research • University of Michigan

B2_Other TP50 40 B2_Other TP60 41 8.1 .. 1 SE_illustration tmammals FTName

slide-34
SLIDE 34

Other considerations Other considerations

  • HRS doesn’t use LID – need to put in “”

HRS doesn t use LID need to put in to keep correct relative positioning for languages languages

Survey Research Center • Institute for Social Research • University of Michigan

slide-35
SLIDE 35

Before Before

TPosition = TPosition = (STANDING (1) "Standing" "Parado", SITTING (2) "Sitti " "S t d " SITTING (2) "Sitting" "Sentado" , LYINGDOWN (3) "Lying down" "Acostado")

Survey Research Center • Institute for Social Research • University of Michigan

slide-36
SLIDE 36

After After

TPosition = TPosition = (STANDING (1) "Standing" "Parado“ {SE Added text} "Standing" {LS} "Parado" {SE Added text} Standing {LS} Parado , SITTING (2) "Sitting" "Sentado“ {SE Add d t t} "Sitti " {LS} "S t d " Added text} "Sitting" {LS} "Sentado" , LYINGDOWN (3) "Lying down" "Acostado“ {SE Added text} "Lying down" {LS} "Acostado" )

Survey Research Center • Institute for Social Research • University of Michigan

slide-37
SLIDE 37

Other considerations Other considerations

  • Report when update request not found in

Report when update request not found in the data

  • Report duplicate update request for the
  • Report duplicate update request for the

same defined field with option to process/not process those records process/not process those records

Survey Research Center • Institute for Social Research • University of Michigan

slide-38
SLIDE 38

Current Use & Future Plans

  • The time difference between machine

versus manual work versus manual work.

  • Is it worth the effort?

Survey Research Center • Institute for Social Research • University of Michigan

slide-39
SLIDE 39

Current System Performance

  • 8 hours - Parse 175 600 lines of code

8 hours Parse 175,600 lines of code

  • 2 hours - Type Cross Reference

6 h C t BDO T bl

  • 6 hours - Create BDO Table
  • 6 hours - Merge 6,500 updates
  • 3 minutes - Write files

Total Time: 22 hours

Survey Research Center • Institute for Social Research • University of Michigan

slide-40
SLIDE 40

Other Uses Other Uses

  • Analysis of Datamodel – being able to look

Analysis of Datamodel being able to look at data within context of tokens

  • An alternative way of making updates i e
  • An alternative way of making updates, i.e.
  • ne doesn’t necessarily need to use the

merger The system allows for direct edits

  • merger. The system allows for direct edits

to the tables in the database.

Survey Research Center • Institute for Social Research • University of Michigan

slide-41
SLIDE 41

Tasks for the Current System

  • Strip out obsolete or dated comments from prior

p p years.

  • Update tags. Modify tags to be more

d i i descriptive.

  • Update descriptors. Modify labels for data out.

U d t d t t M dif fi ld i fi ld

  • Update data types. Modify field size, field

ranges, etc.

  • Update language text

Text provided by another

  • Update language text. Text provided by another

system such as a product from the HRS translation group.

Survey Research Center • Institute for Social Research • University of Michigan

slide-42
SLIDE 42

System Benefits System Benefits

  • Time saving resulting in faster turn-

Time saving, resulting in faster turn- around of tasks.

  • Hundreds of changes can be made at
  • Hundreds of changes can be made at
  • ne time.

M t l t f d t

  • More accurate placement of updates

and therefore better quality.

  • May reduce repetitive-use injury.

Survey Research Center • Institute for Social Research • University of Michigan

slide-43
SLIDE 43

System Benefits (2) System Benefits (2)

  • Robust enough to handle applications as large

as HRS.

  • Generic enough to handle other non-HRS Blaise

CAI applications. pp

  • The application can add or re-order languages.
  • The application has features to help handle

e app cat o as eatu es to e p a d e scale issues.

Survey Research Center • Institute for Social Research • University of Michigan

slide-44
SLIDE 44

Blaise Source Code Editing System Editing System

  • Acknowledgments

HRS I t t D l t T – HRS Instrument Development Team

Survey Research Center • Institute for Social Research • University of Michigan

slide-45
SLIDE 45

Blaise Source Code Editing System Questions?

Contact Information:

Danilo Gutierrez danilog@umich.edu Sheila Deskins sld@umich.edu

Survey Research Center • Institute for Social Research • University of Michigan

slide-46
SLIDE 46

FAQ FAQ

  • Q: In the paper you didn’t implement a

Q p p y p translator?

  • A: For the language update, we gave the user a

bl f fi ld d bl f i i h table for fields and a table for enumeration with keys needed so we wouldn’t have to develop the translator or worry about multiple update the translator, or worry about multiple update request for the same defined field, for this system development round.

  • A: We’ll look at how we want to develop this

implementation in the future.

Survey Research Center • Institute for Social Research • University of Michigan

slide-47
SLIDE 47

Question Question

  • Q: What technology did you use?

gy y

  • A:
  • Parser in VB6 (Access or SQLServer)

M W i i VB NET d ADO NET

  • Merger, Writer in VB.NET and ADO.NET
  • Source Editor Database in SQLServer

Proof of concept in C# NET

  • Proof of concept in C# .NET

Miscellaneous: sce a eous

  • Ad hoc routines in SAS
  • BCP

Survey Research Center • Institute for Social Research • University of Michigan