Document Engineering Bob Glushko (glushko@sims.berkeley.edu) - - PDF document

document engineering
SMART_READER_LITE
LIVE PREVIEW

Document Engineering Bob Glushko (glushko@sims.berkeley.edu) - - PDF document

Document Engineering Page 1 of 17 Document Engineering Bob Glushko (glushko@sims.berkeley.edu) Syllabus 2004 (19 July 2004) 1. Who Am I, and How Did I Get Here? I'm an Adjunct Professor in the Graduate School of Information Management and


slide-1
SLIDE 1

Document Engineering

Bob Glushko (glushko@sims.berkeley.edu) Syllabus 2004 (19 July 2004)

  • 1. Who Am I, and How Did I Get Here?

I'm an Adjunct Professor in the Graduate School of Information Management and Systems (SIMS) at UC Berkeley During the 1990s I founded or co-founded three companies that did SGML electronic publishing, XML technology

for e-commerce, B2B procurement and marketplaces

Came to Berkeley in January 2002 and have been working to systematize the various threads of XML and web

services architecture, document analysis, data modeling, patterns, reuse, model-based applications into a discipline of Document Engineering

At SIMS I teach several courses and have several R&D efforts, not just with "students" but with campus IT

professionals

We have established the Center for Document Engineering as a focal point, resource repository, technology transfer

  • rganization

These HTML slides are at www.sims.berkeley.edu/~glushko/syllabus20040719 Approximate PDF version is at www.sims.berkeley.edu/~glushko/Glushko-syllabus2004.pdf

  • 2. Plan for Today's Talk

Motivating Document Engineering The Big Ideas of Document Engineering Content (and,vs) structure (and,vs) presentation Models of document types A unified view of analysis and modeling XML for encoding models Model-based Applications Sample Projects

  • 3. Motivating "Document Engineering"

Scenario: Customer selects computer from catalog at Outpost.com Customer pays with credit card

Page 1 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-2
SLIDE 2

Computer arrives via express shipper two days later How many parties are involved? How do the participants coordinate their activities? Drop Ship Retail (Customer View) Drop Ship Retail (B2B View)

  • 4. The Document Exchange Pattern

Businesses have long dealt with each other by exchanging documents because it is a very natural thing to do Halfat's tax receipt inscribed on a shard of pottery is certainly one of the oldest documents that record a

business transaction (355 BCE)

the simplest case is "here's my catalog, do you want to buy anything" and the exchanged document being

Page 2 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-3
SLIDE 3

"here's my order"

  • 5. What is Document Engineering?

A new discipline for specifying, designing, and implementing the electronic documents that request or provide

interfaces to business processes, often via Web-based services

A synthesis of information and systems analysis, business process modeling, electronic publishing, and distributed

computing

A set of courses taught at UC Berkeley An upcoming book (co-authored with Tim McGrath, MIT Press, early 2005)

  • 6. "WellPoint to Pay $30 Million for Doctors' Computers"

Wall Street Journal, 1-15-04 In US more than 7000 deaths and 7% of annual hospital admissions are caused by adverse drug effects and

medication errors

95% could be avoided if doctors used computerized order-entry systems for prescriptions But only 5% of doctors and 19% of health care organizations are fully automated WellPoint Health Networks is giving 20% of its physicians a hand-held device for creating prescriptions and

computers for submitting reimbursement forms on Internet

  • 7. "Portals to the Future"

CIO, 12-15-03 The college admissions process is data-intensive and expensive for schools to administer Many schools, including UC Berkeley have built home-grown online application systems For profit firms like XapCorp now host applications for hundreds of schools nationally Some schools have announced that they'll require all students to apply on the Internet by 2005

  • 8. "Inventory Tool to Launch in Germany"

Wall Street Journal, 1-12-04 Metro (Germany's biggest retailer and 4th largest in world) is rolling out a RFID (wireless) inventory tracking system

with its 100 biggest suppliers, 10 warehouses, and 250 stores

Suppliers attach RFID tags to pallets and cases to improve inventory management Each tag costs about 35 cents; at 5 cents they'll replace bar codes on every separate item Wal-Mart has announced a similar mandate in the US

Page 3 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-4
SLIDE 4

Along with price, lack of standards for data formats is also an issue

  • 9. "US to Require Advance Notice on Cargo Data"

Wall Street Journal, 11-20-03 US Dept. of Homeland Security now requiring advance electronic cargo manifests for all inbound and outbound

freight

Currently only 4% of incoming cargo is inspected and in a "haphazard" way because manifests have been on paper

and arrive with the cargo

"Reputable" shippers will be fast-tracked, but potentially dangerous shipments will be checked Advance notice requirement depends on mode of transport: 24 hours before loading for ships, 4 hours before arrival

for planes, 30 minutes before arrival for trucks

  • 10. "Returns are Early, But New Categories of 'Stores' On Amazon.com

Show Promise"

New York Times. 12-22-03 In the last year Amazon.com has opened "stores" in a wide variety of categories These stores feature goods from other retailers; Amazon is just taking the order Amazon gets a commission of 7-15% on each sale "Web services" for catalog management, shopping cart, personalization engine, etc, that Amazon developed for its

  • wn business are integrated into the suppliers business systems
  • 11. What Are the Common Themes in These News Items?

Business processes coordinated / choreographed via the exchange of electronic documents Standards / patterns for documents and business processes Co-evolution of information technology and business processes Creating new business value

  • 12. Document Exchange is the Mother of All Patterns

Document exchange is the "mother of all patterns" for business models, business processes, and business information Business model patterns: marketplace, auction, supply chain, build to order, drop shipment, vendor managed

inventory, etc.

Business process patterns: procurement, payment, shipment, reconciliation, etc. Business information patterns: catalog, purchase order, invoice, etc. and the components they contain for

party, time, location, measurement, etc.

Page 4 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-5
SLIDE 5
  • 13. The Model Matrix

These models and patterns vary in their level of abstraction and in the granularity with which they view business and

it is helpful to arrange them in a single framework, which we call the "Model Matrix"

  • 14. The New Questions Aren't

Viewing business models as a network of document exchanges raises some questions: What information is being exchanged? How do we know what the document means? Can the recipient process the document the way we expect? Can we preserve our investments in older technologies for document exchange while taking advantage of new

  • nes?

How do we preserve our investments in business processes and relationships while creating new ones? These are not new questions, but document engineering gives us new approaches for answering them

  • 15. Three Types of Information In Documents

We need a vocabulary to classify different kinds of information that we find in documents and sets of data Content – "what does it mean" information Structure – "where is it" or "how it is organized or assembled" information; aggregates content information

into more usable or reusable components

Presentation – "how does it look" or "how is it displayed" information;

Page 5 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-6
SLIDE 6

Even though presentation information is the least important, it is essential to analyze it carefully - primarily because

  • f its correlations and relationships to structural and content information

These correlations and relationships follow characteristic patterns for different types of documents

  • 16. Models of Document Types

A model of a document type captures the distinctions between documents that make a difference Similar types of content occur in many document models and there is often overlap in information and structural

patterns

How is a catalog different from a brochure? How is a purchase order different from an invoice? How is a course catalog different from a schedule of classes? Models of document types can be very specific ("purchase order for industrial chemicals when buyer and seller are in

different countries") or very abstract ("fill-in-the-blank legal form for contract")

We can be more precise and define a model of a document type as the rules or constraints that distinguish one type

from another

This expression of the model is conceptual and is independent of the syntax and technology in which document

instances are ultimately implemented

  • 17. How XML Implements Models of Document Types

XML gives the idea of document type a more physical, formal foundation XML has syntactic mechanisms that capture the conceptual distinctions between document types in terms of: Elements and attributes used to encode their content Rules that govern how elements and attributes are organized Possible values for elements and attributes These are the vocabulary and the grammar of the language defined by the document type

  • 18. XML and Document Engineering

XML is a useful technology for Document Engineering, but using XML doesn't make you a document engineer XML is just the syntax in which we encode document models... what really matters is how we modeled the

documents

The best thing about XML is the ease with which you can create a new vocabulary for a particular type of document The worst thing about XML is the same as the best thing – the ease with which you can create a new vocabulary XML is NOT "self-describing"

Page 6 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-7
SLIDE 7

There are often multiple vocabularies for the same or related domains and especially for the common information

models that are used in more than one domain

  • 19. XML By Itself Guarantees Nothing
  • 20. The Document Type Spectrum

Page 7 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-8
SLIDE 8

" "Narrative" Document Types (or "Publications") Examples: Novels, user guides, technical manuals, ... "Transactional" Document Types Examples: Orders, invoices, payment instructions, ... In between are "Hybrids" Examples: Dictionaries, catalogs, ...

  • 21. A Unified View of Analysis and Modeling

Document Engineering unifies different disciplines or methods of analysis that until now have had little intersection Document-centric analysis – from text processing, publishing, hypertext systems Data-centric analysis – from database systems design, computer science Business process analysis – from business strategy, process design and re-engineering

  • 22. Document Analysis {and,or,vs} Data Modeling

Document Engineering harmonizes the terminology and emphasizes what they have in common rather than

highlighting their differences

Identifying the presentational, content, and structural components and defining their relationships to each

  • ther

Identifying "good" content components Designing, describing, and organizing components to facilitate their reuse Assembling hierarchical document models that organize components according to the requirements of a

specific context for information exchange

  • 23. {and,or,vs} Business Process Analysis

Business process analysis begins with an abstract or broadly scoped perspective on business activities Emphasizes "Does this work from a business perspective?" – less emphasis on technology implementation or on the

document exchanges needed to carry out the new business strategy

Inherently a "top down" approach that starts with business models and processes and gets to the "document payloads"

  • nly at the end

In contrast, the document analysis and data modeling approaches focus from the beginning on the structure and

content of the "document payload" that will be exchanged – this "bottom up" approach emphasizes "Does this work from a technical perspective?"

  • 24. Meeting in the Middle

Page 8 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-9
SLIDE 9

We need to achieve both business and technical interoperability – the former is necessary but insufficient for the latter We need models of the desired business processes and the documents that they will produce and consume at the same

level of detail and implementability

This is represented in the Model Matrix as "meeting in the middle"

  • 25. Getting to the Middle with Document Engineering

Page 9 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-10
SLIDE 10

But this last depiction of the unified modeling approach in document engineering shows where we want to get but

doesn't show how we can expect to get there

The document engineering approach is designed to "get to the middle" in a systematic way

  • 26. The Document Engineering Modeling Approach

Page 10 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-11
SLIDE 11
  • 27. "Webifying" Existing Applications

Much IT effort today involves "Webifying" existing applications like these to make them more available and usable,

and many of the steps are excellent candidates for web services

But too many of these applications are done as "one-offs" little reuse of data models or processes across applications applications that are coupled in unpredictable ways by shared data business rules and workflow embedded into application logic

  • 28. Model-Based Applications

Use models of documents and processes as specifications for generating code or configuring an application How much can we separate the context-specific semantics that are based on the model(s) from the generic

functionality provided by the "platform" on which is it implemented?

Can we harmonize this "inside out" approach with the conventional "outside in" user interface design approach of

iterative prototyping and usability evaluation in which document and process models are not explicitly considered?

  • 29. The Generic Application

Page 11 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-12
SLIDE 12

An enormous number of applications follow the pattern of "forms moving around within and between organizations" Starting with a data entry form or an application, information flows through a set of business processes and their

associated document types

  • 30. The Generic Model-Driven Application Application With XML

Components

The business documents (or forms) used by the application are composed in "building block" fashion from smaller

semantic components

This component architecture facilitates the reuse of information between documents and the integration of the

applications that produce and consume them

Page 12 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-13
SLIDE 13
  • 31. Components Reused in Transactions

This component architecture also facilitates the assembly or collection of the information needed at each processing

step or transaction

Page 13 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-14
SLIDE 14
  • 32. Model-based User or Application Interfaces

This needed information can be obtained from a user in a form generated from the schema that defines it (or from an

application)

Page 14 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-15
SLIDE 15
  • 33. Same Patterns in Different Domain

These process and information patterns apply to many domains

Page 15 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-16
SLIDE 16
  • 34. Courses

Document Engineering (Glushko) (spring) XML and Related Technologies (Milowski) (aka "DE Lab") (spring) XML Bootcamp (Glushko) (fall, 1-unit) Web Services (NEW) (Blum) (fall 04) Model-Based User Interfaces (NEW) (Glushko & Milowski) (fall 04)) "Services Science and Business Engineering" (NEW) (Glushko & Chesbrough) (spring 05)

  • 35. Sample Projects

The university environment is a perfect testbed for collaborative document engineering projects Center in a Box (model-based web publishing framework)

Page 16 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...

slide-17
SLIDE 17

Course Approval System (typical forms+workflow) Event Calendar Network (reuse and syndication) System Map (framework for data dictionary and visualization) Rule Based Infrastructure (model-based access control for distributed applications) Syllabus System ...

Page 17 of 17 Document Engineering 7/19/2004 file://C:\Documents%20and%20Settings\glushko\My%20Documents\SIMS%20Courses\le...