 
              One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation y [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ ) . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
One-step-ahead Prediction vs Free-run Simulation System Identification Procedure Nonlinear Difference Equation � � y [ k ] = F y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ . One-step-ahead Prediction Free-run Simulation Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 14 / 56
Parameter Estimation Prediction Error Methods General Framework Noise model ⇒ Optimal Predictor: ˆ y [ k ] = E { y [ k ] | k − 1 } Compute errors: e [ k ] = ˆ y [ k ] − y [ k ] Find parameter Θ such the sum of square errors is minimized: � � e [ k ] � 2 min Θ k Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 15 / 56 Figure: Parameter estimation framework.
Parameter Estimation Prediction Error Methods General Framework Noise model ⇒ Optimal Predictor: ˆ y [ k ] = E { y [ k ] | k − 1 } Compute errors: e [ k ] = ˆ y [ k ] − y [ k ] Find parameter Θ such the sum of square errors is minimized: � � e [ k ] � 2 min Θ k Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 15 / 56 Figure: Parameter estimation framework.
Parameter Estimation Prediction Error Methods General Framework Noise model ⇒ Optimal Predictor: ˆ y [ k ] = E { y [ k ] | k − 1 } Compute errors: e [ k ] = ˆ y [ k ] − y [ k ] Find parameter Θ such the sum of square errors is minimized: � � e [ k ] � 2 min Θ k Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 15 / 56 Figure: Parameter estimation framework.
NARX Model Prediction Error Methods NARX (Nonlinear AutoRegressive with eXogenous input) model. True system y [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ )+ v [ k ] . ���� white noise Optimal Predictor One-step-ahead prediction: ˆ y 1 [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ ) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 16 / 56
NARX Model Prediction Error Methods Figure: NARX model prediction error. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 17 / 56
NOE Model Prediction Error Methods NOE (Nonlinear Output Error) model. True system y ∗ [ k ] = F ( y ∗ [ k − 1] , y ∗ [ k − 2] , y ∗ [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ ) , y [ k ] = y ∗ [ k ] + w [ k ] . ���� white noise Optimal Predictor Free-run simulation: ˆ y s [ k ] = F (ˆ y s [ k − 1] , ˆ y s [ k − 2] , ˆ y s [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3]; Θ ) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 18 / 56
NOE Model Prediction Error Methods Figure: NOE model prediction error. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 19 / 56
NARMAX Model Prediction Error Methods NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous input) model. True system y [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , v [ k − 1] , v [ k − 2] , v [ k − 3]; Θ ) + v [ k ] . ���� white noise Optimal Predictor ˆ y v [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , y [ k − 1] − ˆ y [ k − 1] , y [ k − 2] − ˆ y [ k − 2] , y [ k − 3] − ˆ y [ k − 3]; Θ ) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 20 / 56
NARMAX Model Prediction Error Methods NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous input) model. True system y [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , v [ k − 1] , v [ k − 2] , v [ k − 3] ; Θ ) + v [ k ] . ���� white noise Optimal Predictor ˆ y v [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , y [ k − 1] − ˆ y [ k − 1] , y [ k − 2] − ˆ y [ k − 2] , y [ k − 3] − ˆ y [ k − 3]; Θ ) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 20 / 56
NARMAX Model Prediction Error Methods NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous input) model. True system y [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , v [ k − 1] , v [ k − 2] , v [ k − 3] ; Θ ) + v [ k ] . ���� white noise Optimal Predictor ˆ y v [ k ] = F ( y [ k − 1] , y [ k − 2] , y [ k − 3] , u [ k − 1] , u [ k − 2] , u [ k − 3] , y [ k − 1] − ˆ y [ k − 1] , y [ k − 2] − ˆ y [ k − 2] , y [ k − 3] − ˆ y [ k − 3] ; Θ ) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 20 / 56
NARMAX Model Prediction Error Methods Figure: NARMAX model prediction error. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 21 / 56
Recurrent Structures in System Identification Motivation for this Dissertation Figure: Prediction depends only on measured values. Chalenges Unboundedness; Multiple Minima. Figure: Predictor has a recurrent structure. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 22 / 56
Nonlinear Least Squares Problem Nonlinear Least Squares Minimizing the sum of squared errors: Θ V ( Θ ) = � e ( Θ ) � 2 min Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 23 / 56
Objective Function Derivatives Nonlinear Least Squares Derivatives: ∇ V ( Θ ) = J ( Θ ) T e ( Θ ) , N e � � � ∇ 2 V ( Θ ) = J T ( Θ ) J ( Θ ) + ∇ 2 e i ( Θ ) e i ( Θ ) . i =1 Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 24 / 56
Objective Function Derivatives Nonlinear Least Squares Derivatives: ∇ V ( Θ ) = J ( Θ ) T e ( Θ ) , N e � � � ∇ 2 V ( Θ ) = J T ( Θ ) J ( Θ ) + ∇ 2 e i ( Θ ) e i ( Θ ) . i =1 Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 24 / 56
Algorithms Nonlinear Least Squares Iterative Algorithms. Starting in Θ 0 updates the solution: Θ n +1 = Θ n + ∆ Θ n Gauss-Newton: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − µ ���� � �� � � �� � step lenght Hessian approx. grad. Levenberg-Marquardt: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − + λ D � �� � � �� � Hessian approx. grad. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 25 / 56
Algorithms Nonlinear Least Squares Iterative Algorithms. Starting in Θ 0 updates the solution: Θ n +1 = Θ n + ∆ Θ n Gauss-Newton: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − µ ���� � �� � � �� � step lenght Hessian approx. grad. Levenberg-Marquardt: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − + λ D � �� � � �� � Hessian approx. grad. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 25 / 56
Algorithms Nonlinear Least Squares Iterative Algorithms. Starting in Θ 0 updates the solution: Θ n +1 = Θ n + ∆ Θ n Gauss-Newton: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − µ ���� � �� � � �� � step lenght Hessian approx. grad. Levenberg-Marquardt: � � − 1 J ( Θ ) T e ( Θ ) J T ( Θ ) J ( Θ ) ∆ Θ = − + λ D � �� � � �� � Hessian approx. grad. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 25 / 56
“Parallel Training Considered Harmful?” Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 26 / 56
Parallel vs Series-parallel Training “Parallel Training Considered Harmful?” Parallel training ⇒ NOE model; Series-parallel training ⇒ NARX model. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 27 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 28 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 28 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 28 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 28 / 56
References “Parallel Training Considered Harmful?” Narendra, K. S. and Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks , 1(1):4–27. Zhang, D.-y., Sun, L.-p., and Cao, J. (2006). Modeling of temperature-humidity for wood drying based on time-delay neural network. Journal of Forestry Research , 17(2):141–144. Singh, M., Singh, I., and Verma, A. (2013). Identification on non linear series-parallel model using neural network. MIT Int. J. Electr. Instrumen. Eng , 3(1):21–23. Beale, M. H., Hagan, M. T., and Demuth, H. B. (2017). Neural network toolbox for use with MATLAB. Technical report, Mathworks. Diaconescu, E. (2008). The use of NARX neural networks to predict chaotic time series. WSEAS Transactions on Computer Research , 3(3):182–191. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 29 / 56
References “Parallel Training Considered Harmful?” Saad, M., Bigras, P., Dessaint, L.-A., and Al-Haddad, K. (1994). Adaptive robot control using neural networks. IEEE Transactions on Industrial Electronics , 41(2):173–181. Saggar, M., Meri¸ cli, T., Andoni, S., and Miikkulainen, R. (2007). System identification for the Hodgkin-Huxley model using artificial neural networks. In Neural Networks, 2007. IJCNN 2007. International Joint Conference on , pages 2239–2244. IEEE. Warwick, K. and Craddock, R. (1996). An introduction to radial basis functions for system identification. a comparison with other neural network methods. In Decision and Control, 1996., Proceedings of the 35th IEEE Conference on , volume 1, pages 464–469. IEEE. Kami´ nnski, W., Strumitto, P., and Tomczak, E. (1996). Genetic algorithms and artificial neural networks for description of thermal deterioration processes. Drying Technology , 14(9):2117–2133. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 30 / 56
References “Parallel Training Considered Harmful?” Rahman, M. F., Devanathan, R., and Kuanyi, Z. (2000). Neural network approach for linearizing control of nonlinear process plants. IEEE Transactions on Industrial Electronics , 47(2):470–477. c, E., ´ c, ˇ c, V., ´ Petrovi´ Cojbaˇ si´ Z., Risti´ c-Durrant, D., Nikoli´ Ciri´ c, I., and Mati´ c, S. (2013). Kalman filter and NARX neural network for robot vision based human tracking. Facta Universitatis, Series: Automatic Control And Robotics , 12(1):43–51. Tijani, I. B., Akmeliawati, R., Legowo, A., and Budiyono, A. (2014). Nonlinear identification of a small scale unmanned helicopter using optimized NARX network with multiobjective differential evolution. Engineering Applications of Artificial Intelligence , 33:99–115. Khan, E. A., Elgamal, M. A., and Shaarawy, S. M. (2015). Forecasting the number of muslim pilgrims using NARX neural networks with a comparison study with other modern methods. British Journal of Mathematics & Computer Science , 6(5):394. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 31 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 32 / 56
Dynamic Systems Present During Identification Parallel Training and Unbounded Signals The following dynamic systems are present during the system identification procedure: 1 True System; 2 Predictor ; 3 Estimated Model. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 33 / 56
Dynamic Systems Present During Identification Parallel Training and Unbounded Signals The following dynamic systems are present during the system identification procedure: 1 True System; Predictor ; 2 3 Estimated Model. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 33 / 56
Feedforward Network Neural Network Training Figure: Three-layer feedforward network. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 34 / 56
Feedforward Network Neural Network Training Figure: Three-layer feedforward network. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 34 / 56
Feedforward Network Neural Network Training Figure: Three-layer feedforward network. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 34 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 35 / 56
Computational Cost per Stage Complexity Analysis Stage - Levenberg-Marquardt Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Computational Cost per Stage Complexity Analysis Stage - Levenberg-Marquardt Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 36 / 56
Feedforward Network Neural Network Training Figure: Three-layer feedforward network. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 37 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 38 / 56
Computational Cost per Stage Complexity Analysis Stage Series-parallel Parallel Compute error vector e O ( N · N w ) O ( N · N w ) O ( N · N Θ · N 2 Compute Jacobian matrix J O ( N · N w · N y ) y ) Parameter update O ( N · N 2 Θ + N 3 O ( N · N 2 Θ + N 3 Θ ) Θ ) � − 1 � J T e . J T J + λ D ∆ Θ = − Table: Complexity Analysis Relation N y < N 2 y < N w ≈ N Θ Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 38 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 39 / 56
Literature Review “Parallel Training Considered Harmful?” Series-parallel training alleged advantages Series-parallel to be preferred [Narendra and Parthasarathy, 1990]: 1 Bounded signals; 2 Smaller computational cost; 3 Simulated output should tend to the real one, therefore the results should not be significantly different; 4 More accurate inputs to the neural network during training. * Ribeiro, A. H., and Aguirre, L. A. (2017) ”Parallel Training Considered Harmful?”: Comparing Series-Parallel and Parallel Feedforward Network Training. arXiv preprint arXiv:1706.07119. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 39 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models Problem Statement Generate data using the following system: [Chen et al., 1990] (0 . 8 − 0 . 5exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 1] − y ∗ [ k ] = (0 . 3 + 0 . 9exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 2] + u [ k − 1] + 0 . 2 u [ k − 2] + 0 . 1 u [ k − 1] u [ k − 2] + v [ k ] y ∗ [ k ] + w [ k ] . y [ k ] = 10 nodes in the hidden layer; 800 samples for identification and 200 samples for validation; Compare error in validation window. Chen, S., Billings, S. A., and Grant, P. M. (1990). Non-linear system identification using neural networks. International Journal of Control , 51(6):1191–1214. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 40 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models Problem Statement Generate data using the following system: [Chen et al., 1990] (0 . 8 − 0 . 5exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 1] − y ∗ [ k ] = (0 . 3 + 0 . 9exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 2] + u [ k − 1] + 0 . 2 u [ k − 2] + 0 . 1 u [ k − 1] u [ k − 2] + v [ k ] y ∗ [ k ] + w [ k ] . y [ k ] = 10 nodes in the hidden layer; 800 samples for identification and 200 samples for validation; Compare error in validation window. Chen, S., Billings, S. A., and Grant, P. M. (1990). Non-linear system identification using neural networks. International Journal of Control , 51(6):1191–1214. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 40 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models Problem Statement Generate data using the following system: [Chen et al., 1990] (0 . 8 − 0 . 5exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 1] − y ∗ [ k ] = (0 . 3 + 0 . 9exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 2] + u [ k − 1] + 0 . 2 u [ k − 2] + 0 . 1 u [ k − 1] u [ k − 2] + v [ k ] y ∗ [ k ] + w [ k ] . y [ k ] = 10 nodes in the hidden layer; 800 samples for identification and 200 samples for validation; Compare error in validation window. Chen, S., Billings, S. A., and Grant, P. M. (1990). Non-linear system identification using neural networks. International Journal of Control , 51(6):1191–1214. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 40 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models Problem Statement Generate data using the following system: [Chen et al., 1990] (0 . 8 − 0 . 5exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 1] − y ∗ [ k ] = (0 . 3 + 0 . 9exp( − y ∗ [ k − 1] 2 ) y ∗ [ k − 2] + u [ k − 1] + 0 . 2 u [ k − 2] + 0 . 1 u [ k − 1] u [ k − 2] + v [ k ] y ∗ [ k ] + w [ k ] . y [ k ] = 10 nodes in the hidden layer; 800 samples for identification and 200 samples for validation; Compare error in validation window. Chen, S., Billings, S. A., and Grant, P. M. (1990). Non-linear system identification using neural networks. International Journal of Control , 51(6):1191–1214. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 40 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models 1.0 1.0 0.8 0.8 0.6 0.6 MSE MSE 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 σ w σ v (a) σ v = 0; (b) σ w = 0; Figure: MSE (mean square error) vs noise levels on the validation window for parallel training ( • ) and series-parallel training ( × ). Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 41 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models Table: Running time. Experiment Conditions Execution time Parallel Training Series-parallel Training N hidden N 10 1000 samples 3.7 s 3.1 s 30 1000 samples 6.4 s 5.7 s 10 5000 samples 14.6 s 11.0 s 30 5000 samples 18.5 s 17.5 s Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 42 / 56
Computer Generated Example Comparing Parallel and Series-parallel Models 10 4 LM CG BFGS k e s k 2 10 3 10 2 0 20 40 60 80 100 k Figure: Sum of squared simulation errors per epoch for: Levenberg-Marquardt (LM); Conjugate-gradient (CG); and, BFGS Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 43 / 56
Optimization Methods and Unboundedness Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 44 / 56
Gradient Descent Applied to Linear System Optimization Methods and Unboundedness First-Order Linear System y [ k ] = θ 1 ˆ ˆ y [ k − 1] + θ 2 u [ k − 1] Figure: Set of parameters ( θ 1 , θ 2 ) that yield a bounded solution ˆ y [ k ]. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 45 / 56
Gradient Descent Applied to Linear System Optimization Methods and Unboundedness First-Order Linear System y [ k ] = θ 1 ˆ ˆ y [ k − 1] + θ 2 u [ k − 1] Figure: Set of parameters ( θ 1 , θ 2 ) that yield a bounded solution ˆ y [ k ]. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 45 / 56
Gradient Descent Applied to Linear System Optimization Methods and Unboundedness First-Order Linear System y [ k ] = θ 1 ˆ ˆ y [ k − 1] + θ 2 u [ k − 1] Figure: Set of parameters ( θ 1 , θ 2 ) that yield a bounded solution ˆ y [ k ]. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 45 / 56
Gradient Descent Applied to Linear System Optimization Methods and Unboundedness First-Order Linear System y [ k ] = θ 1 ˆ ˆ y [ k − 1] + θ 2 u [ k − 1] Figure: Set of parameters ( θ 1 , θ 2 ) that yield a bounded solution ˆ y [ k ]. Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 45 / 56
Class of Algorithms that can cope with Unboundedness Optimization Methods and Unboundedness Trust-region methods; Levenberg-Marquardt; Backtrack line search; Pattern-Search; Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 46 / 56
Multiple Shooting Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 47 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Motivation Shooting Methods for Parameter Estimation of Output Error Models Multiple Shooting Applications: Boundary values problems; 1 ODE parameter estimation; 2 Optimal control; 3 Escape local minima; Better numerical stability; Can be implemented in parallel. Ribeiro, A. H., and Aguirre, L. A. (2017) Shooting methods for Parameter Estimation of Output Error Models. IFAC world congress (Toulouse, France 2017) . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 48 / 56
Single Shooting Shooting Methods for Parameter Estimation of Output Error Models Single Shooting Estimate NOE model solving: Θ � e s � 2 min Figure: The initial conditions are represented with circles � and subsequent simulated values with diamonds � . Antˆ onio H. Ribeiro (UFMG) Recurrent Structures in System Identification July 19, 2017 49 / 56
Recommend
More recommend