 
              The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Structural Information Theory: Principles for Distinguishing Order From Disorder Angsheng Li Institute of Software Chinese Academy of Sciences The 3rd Workshop on Big Data and Computational Intelligence, Beijing, 29 - 31, July, 2016
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Outline 1. The challenges 2. Previous measures 3. Our ideas 4. Structural information 5. Three-dimensional gene map 6. Resistance 7. Theory 8. Next-generation search engine
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Shannon’s Information Shannon, 1948: Given a distribution p = ( p 1 , p 2 , · · · , p n ) , the Shannon’s information is n � H ( p ) = − p i · log 2 p i . (1) i = 1 p i is the probability that item i is chosen, − log 2 p i is the “self-information" of item i . • Shannon’s information measure the uncertainty of a probabilistic distribution. • This metric and the associated notions of noises form the foundation of information theory and the information theoretic study in all areas of the current science. • Shannon’s metric provides the foundation for the current generation information technology.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Shannon’s Question, 1953 Shannon, 1953: 1. Is there a structural theory of information that supports communication network analysis? 2. What is the optimal communication network? Shannon noticed that his theory fails to support communication network. The reason is as follows. Given a communication network G , 1. (De-structuring) Let p be a distribution computed from G , degree distribution, or distance distribution, and so on. This discards the interesting properties of G . 2. Define H ( p ) to be the information of G . This number H ( p ) does not tell us anything about the interactions and communications occurred in G . The question is hence: How to measure the information embedded in a physical system?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Physical systems Given a physical system G , the information embedded in G should determine and decode the essential structure of G . For example, for a car and a boat, the essential structures of the two objects should be different, and the essential structures of a car and a boat should be determined by the information embedded in the car and the boat respectively. Question : What is the essential structure of a physical system?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Evolving Network Given a network G that is evolved in nature by two mechanisms: 1. The rules, regulations and laws of the objects 2. Perturbations by noises and random variations In this case, the information embedded in G should determine and decode the structure of G that is formed by the rules, regulations and laws in which the noises and random variations occurred in G are excluded.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Noisy Data Given a structured noisy data G , the information embedded G should determine and decode the structure T of G that excludes the noises occurred in G . Questions : 1. What is the principle for data mining? 2. What is the principle for establishing the theory of structures and algorithms of big data?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Dynamical Complexity of a Network Given a network G , the dynamical complexity of G should be the measure of complexity of the interactions, operations and communications occurring in G . This is different from the static complexity such as the number of nodes, the number of edges etc. What is the measure of dynamical complexity of a network?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Natural Structure and Natural Rank In Nature and Society, individuals form natural structures and follow some natural ranking. This is different from the current-generation search engine based on PageRank. • What is the natural rank? • What is the next generation search engine?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Analysis of Biological Structures • The coding of genetic information by DNA • The folded structure of proteins. • Explaining and predicting the structures
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Principles of big data • What is the principle of structuring of unstructured data? • What is the principle for extracting the order from structured noisy data?
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Structural Information Is the Key The solution for all the challenges mentioned above depends on a well-defined measure of structural information.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Brooks Comments, 2003 • We have no theory that gives us a metric for the information embedded in structures • The missing metric is the most fundamental gap in information science and computer science. The quantification of structural information as the first of the three great challenges for half-century old CS.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Hartmanis Comments In 2008, Juris Hartmanis commented that Shannon’s definition fails to analyse structures, and suggested the question to me to give a new definition for information.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Rashevsky, 1955 Given a connected graph G of n vertices, for every vertex i , let n ) i be the number of vertices in the orbit of vertex i (under automorphisms of G ). Suppose that there are k orbits with number of vertices n 1 , n 2 , · · · , n k . Then let p = ( n 1 n , n 2 n , · · · , n k n ) . Define the entropy of G to be the Shannon entropy of p .
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Local Entropy Measures Raychaudhury et al, 1984 Given a connected graph G , for vertices i , j , let d ( i , j ) be the distance between vertex i and vertex j . Define the entropy of G to be the Shannon entropy of the distributions of the distances { d ( i , j ) } .
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Gibbs Entropy This measures the number of bits needed to determine a graph generated from some model.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Shannon’s Entropy for Graph Models It measures the number of bits needed to describe the graph that is generated from a model.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Von Neumann Entropy This is defined by the spectral of the Laplacian of the graph. That is the distribution of the eigenvalues of the Laplacian of the graph. It is claimed to measure the complexity of quantum systems.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Hierarchical Thesis • The natural structure of a physical system is a hierarchical structure • The natural structure of a network evolving in Nature and Society is a hierarchical structure • The true structure of a structured noisy data is a hierarchical structure
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Decoding the Truth
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Decoding ECC Figure: Decoding error correcting code.
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching One-Dimensional Structural Information Definition (One-dimensional structural information) Given a connected graph G = ( V , E ) with n nodes and m edges, for each node d i i ∈ { 1 , 2 , · · · , n } , let d i be the degree of i in G , and let p i = 2 m . We define the one-dimensional structural information or positioning entropy of G by using the entropy function H as follows: � d 1 n � 2 m , . . . , d n d i d i H 1 ( G ) = H ( p ) = H � = − 2 m · log 2 2 m . (2) 2 m i = 1
The Challenges Previous Measures Our Ideas Structural Information Gene Map Resistance Theory New Searching Intuition of H 1 ( G ) • The Shannon information for graphs • It is the number of bits required to determine the code of the node that is accessible from random walk in G
Recommend
More recommend