Here is a brief description of my research works. All the reference can be found in publication page.

BAMCAFE: Bayesian machine learning advanced forecast ensemble method

Ensemble forecast based on physics-informed models is one of the most widely used forecast algorithms for complex turbulent systems. A major difficulty in such a method is the model error that is ubiquitous in practice. Data-driven machine learning (ML) forecasts can reduce the model error but they often suffer from the partial and noisy observations. Here, a simple but effective BAMCAFE method is developed, which combines an available imperfect physics-informed model with data assimilation (DA) to facilitate the ML ensemble forecast. The method also aims at quantifying the forecast uncertainty in the form of non-Gaussian distributions made by a Gaussian mixture. Forecasting the entire distribution with uncertainty quantification is of practical importance for many turbulence and geophysical applications.

Reference: Chen, Nan, and Yingda Li. BAMCAFE: A Bayesian machine learning advanced forecast ensemble method for complex turbulent systems with partial observations. Chaos: An Interdisciplinary Journal of Nonlinear Science 31.11 (2021): 113114.

A Multiscale Model for El Nino Complexity

El Nino-Southern Oscillation (ENSO) exhibits diverse characteristics in spatial pattern, peak intensity, and temporal evolution. Here we develop a three-region multiscale stochastic model to show that the observed ENSO complexity can be explained by combining intraseasonal, interannual, and decadal processes. The model starts with a deterministic three-region system for the interannual variabilities. Then two stochastic processes of the intraseasonal and decadal variation are incorporated. The model can reproduce not only the general properties of the observed ENSO events, but also the complexity in patterns (e.g., Central Pacific vs. Eastern Pacific events), intensity (e.g., 10-20 year reoccurrence of extreme El Ninos), and temporal evolution (e.g., more multi-year La Ninas than multi-year El Ninos). While conventional conceptual models were typically used to understand the dynamics behind the common properties of ENSO, this model offers a powerful tool to understand and predict ENSO complexity that challenges our understanding of the 21st-century ENSO.


Modeling and Assimilating Sea Ice Using Sea Ice Floes

Sea ice is a complex media composed of discrete interacting elements of various sizes and thicknesses (floes), and at sufficiently small lengthscales it can not be approximated as a continuous media as routinely done at large scales. While the Eulerian data assimilation is a relatively mature field, techniques for assimilation of satellite-derived Lagrangian trajectories of sea ice floes remain poorly explored. In a series of work, we developed simple DEM models (and used more complicated version from our collaborators) and developed new efficient Lagrangian data assimliation schemes for recovering the unobserved ocean field, dynamical interpolation of missing floe trajectories, parameter estimation, and superfloe parameterizations of sea ice. These work should be of great interest to not only the sea ice community but also many other computational and applied math research areas.


Chen, Nan, Shubin Fu, and Georgy Manucharyan. "Lagrangian Data Assimilation and Parameter Estimation of an Idealized Sea Ice Discrete Element Model." Journal of Advances in Modeling Earth Systems, 13.10 (2021): e2021MS002513.

Chen, Nan, Shubin Fu, and Georgy E. Manucharyan. "An Efficient and Statistically Accurate Lagrangian Data Assimilation Algorithm with Applications to Discrete Element Sea Ice Models." Journal of Computational Physics, (2022), published online.

Chen, Nan, Quanling Deng, and Samuel N. Stechmann. "Lagrangian Data Assimilation and Uncertainty Quantification for Sea Ice Floes with an Efficient Physics-Constrained Superfloe Parameterization." SIAM/ASA Journal of Uncertainty Quantification, (2021), under revision.

Intracounty modeling of COVID-19 infection with human mobility

The COVID-19 pandemic is a global threat presenting health, economic, and social challenges that continue to escalate. Metapopulation epidemic modeling studies in the susceptible-exposed-infectious-removed (SEIR) style have played important roles in informing public health policy making to mitigate the spread of COVID-19. These models typically rely on a key assumption on the homogeneity of the population. This assumption certainly cannot be expected to hold true in real situations; various geographic, socioeconomic, and cultural environments affect the behaviors that drive the spread of COVID-19 in different communities. What¡¯s more, variation of intracounty environments creates spatial heterogeneity of transmission in different regions. To address this issue, we develop a human mobility flow-augmented stochastic SEIR-style epidemic modeling framework with the ability to distinguish different regions and their corresponding behaviors. This modeling framework is then combined with data assimilation and machine learning techniques to reconstruct the historical growth trajectories of COVID-19 confirmed cases in two counties in Wisconsin. The associations between the spread of COVID-19 and business foot traffic, race and ethnicity, and age structure are then investigated. The results reveal that, in a college town (Dane County), the most important heterogeneity is age structure, while, in a large city area (Milwaukee County), racial and ethnic heterogeneity becomes more apparent. Scenario studies further indicate a strong response of the spread rate to various reopening policies, which suggests that policy makers may need to take these heterogeneities into account very carefully when designing policies for mitigating the ongoing spread of COVID-19 and reopening.

(I wanted to highlight that the stochastic parameterization tools I used/developed in many other work plays a crucial role in the model here!!)


Hou, Xiao, et al. "Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race." Proceedings of the National Academy of Sciences 118.24 (2021).

Efficient statistically accurate algorithms for solving high-dimensional Fokker-Planck equation

Solving the Fokker-Planck equation for large-dimensional complex turbulent dynamical systems with highly intermittent non-Gaussian features is an important and practical issue. We have developed efficient statistically accurate algorithms for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear dynamical systems with conditional Gaussian structures. These systems are highly nonlinear and have strong non-Gaussian features for intermittency and rare/extreme events. A hybrid strategy is involved in these efficient statistically accurate algorithms. An extremely efficient parametric method based on data assimilation in a large dimension phase space is combined with a kernel method in a small dimension phase space.

Both numerical tests and rigorous analysis demonstrate that the efficient statistically accurate algorithms are able to overcome the curse of dimensionality. It is also shown with mathematical rigour that the algorithms are robust in long time provided that the system is controllable and stochastically stable.

The simplest version of our method can handle systems with dimension O(10) using only L = O(100) sample trajectories (left panel below). In light of a judicious block decomposition (and statistical symmetry if applicable), we are able to extend the method to systems with much larger dimensions, e.g., O(1000) or more (right panel below).

These algorithms will be very useful in understanding prediction, extreme events and causality issues.





Simple stochastic dynamical models capturing statistical diversity of El Niño Southern Oscillation

The El Niño Southern Oscillation (ENSO) has significant impact on global climate and seasonal prediction. It is also related with global warming.

ENSO consists of a cycle of anomalously warm El Niño conditions and cold La Niña conditions with considerable irregularity in amplitude, duration, temporal evolution and spatial structure. The traditional El Niño involves anomalous warm sea surface temperature (SST) in the equatorial eastern Pacific ocean. In recent decades, the central Pacific (CP) El Niño has been frequently observed, which is characterized by warm SST anomalies confined to the central Pacific.

Figure on the right: El Niño and global warming. Is the global warming hiatus due to the CP El Niño?

In a series of PNAS papers, we developed a simple modeling framework that captures the El Niño diversity. This simple modeling framework includes a dynamical model and a stochastic paramterization of the wind bursts, including both easterly and westerly wind bursts. The noise is state-dependent multiplicative that allows the asymmetricity and non-Gaussianity for the ENSO signals. Seasonal synchronization is also included in the model.

- The model simulation is compared with observations via re-analysis data. The non-Gaussian statistics in nature are well captured by the simple model.

- The mechnisms of each type of El Niño as well as La Niña are studied using the model.

- The observed episode during the 1990s is also successfully reproduced, where a series of 5-year CP El Niños is followed by a super El Niño and then a La Niña.

- The model is applied to understand and predict the 2014-2016 delayed super El Niño.

(Model simulations. Left: Starting with same initial conditions, stochastic wind bursts leads to (A) a delayed super El Niño, (B) a direct super El Niño, and (C) a moderate Niño. Here (A) and (B) mimic 2014-2016 and 1997-1998 events, respectively. Middle: Ensemble forecast tests. Starting with the same favored condition for El Niño, about 20% of the events will become delayed ones. Right: El Niño diversity. The observed eposide during 1990s is recovered, where a series of 5-year CP El Niños is followed by a super El Niño and then a La Niña .)





Predicting the large-scale Madden-Julian Oscillation (MJO) through physics-constrained low-order nonlinear stochastic models

The dominant mode of tropical intraseasonal variability is the Madden-Julian Oscillation (MJO) which is a slow moving planetary scale envelope of convection propagating eastward typically from the Indian Ocean through the Western Pacific. The MJO effects tropical precipitation, the frequency of tropical cyclones, and extratropical weather patterns. Understanding and predicting the MJO is a central problem in contemporary meteorology with large societal impacts.

The prediction of large-scale MJO is achieved in two steps:

Step 1. A recent advanced nonlinear time series technique, Nonlinear Laplacian Spectral Analysis (NLSA) is applied to the cloudiness data (with ~50,000 dimensions in space and ~70,000 data points in time) to define two spatial modes associated with the boreal winter MJO. NLSA requires no ad hoc detrending or spatial-temporal filtering of the full data set and captures both intermittency and low frequency variability. The resulting time series for the two spatial modes of the MJO are highly intermittent with large variation in amplitude from year to year. The two large-scale MJO-like cloud patterns coinciding in time with the two boreal winter MJOs observed during the TOGA-COARE of 1992-1993 (See the movie below).

Step 2. Physics constrained nonlinear low-order stochastic models are developed.


The model contains two observed MJO variables and two hidden variables that characterize the strong intermittency and random phases of the MJO indices. The model involves correlated multiplicative noise defined through energy conserving nonlinear interaction. The model simulations capture the non-Gaussian features of observations in a nearly perfect way.

The special structure of the model allows an efficient data assimilation algorithm to determine the initial values of two hidden variables that faciliates the ensemble prediction scheme. The skillful prediction results extend the forecast range using low-order models and determine the predictability limits of the MJO indices. In addition to the ensemble mean prediction, the ensemble spread is an accurate indicator of forecast uncertainty at long lead times.

The framework is also applied to predicting the large-scale features of monsoon. Recently, we also developed an effective and practical spatiotemporal reconstruction algorithm, which overcomes the difficulty in most data decomposition techniques with lagged embedding that requires extra information in the future beyond the predicted range of the indices. The predicted spatiotemporal patterns often have comparable skill as the indices.






Noisy Lagrangian tracers in filtering geophysical flows

Lagrangian tracers are drifters and floaters following a parcel of fluid's movement. Data assimilation with Lagrangian tracers is an important inverse problem that aims at recovering the underlying velocity field with observations (from tracers). Combining the information in the underlying dynamics and observations serve to reduce error and uncertainty.

Due to the complexity and highly nonlinear nature of Lagrangian data assimilation, there was little systematic analysis based on rigorous theory. Recently, we developed an analytically tractable nonlinear filtering framework for Lagrangian data assimilation, which allows the study of random incompressible/compressible flow field with full mathematical rigor.

We aim at answering the following questions:

1. What is the information gain as a function of the number of tracers?
2. How to design cheap practical strategies for multiscale and turbulent systems?
3. How to quantify the model error in various practical reduced filters?

Despite the inherent nonlinearity in measurement, we build exact closed analytic formulas for the optimal filter for the velocity field. In addition to proving a mean field limit at long times, we show with rigorous mathematical theory an exponential increase in the number of tracers for reducing the uncertainty by a fixed amount, which indicates a practical information barrier.

We also studied rotating shallow water models with multiscale features, where the slow modes represent random incompressible geostrophically balanced (GB) flows and the fast modes stand for random rotating compressible gravity waves. Different computationally efficient reduced filters motivated from mode reduction and 3D VAR are designed. Rigorous mathematical theories and numerical simulations show that all the filters have comparable skill in recovering the GB modes in the geophysical scenario with small Rossby number.

We further studied the even more complicated situations with nonlinearity coupling between GB and gravity modes. An Information theoretical framework is applied to quantify the uncertainty and model error in various reduced filters.






Data assimilation, state estimation and predicting conditional Gaussian systems with model error

Turbulent dynamical systems are ubiquitous in many disciplines of science and engineering. They are characterized by a large dimensional phase space and a large dimensional space of instability with positive Lyapunov exponents. Both understanding complex turbulent systems and improving initializations for prediction require filtering/data assimilation for an accurate state estimation from noisy partial observations. Since the filtering skill for turbulent signals from nature is often limited by errors due to utilizing imperfect forecast models, coping with model errors is an important issue. Many turbulent dynamics are summarized as conditional Gaussian systems. Here is the general framework of conditional Gaussian turbulent systems:

Despite the conditional Gaussianity, the coupled systems remain highly nonlinear and are able to capture the strong non-Gaussian features as observed in nature [6]. One of the desirable features of the conditional Gaussian system is that the conditional distribution in has closed analytical form.

We studied the model error in filtering conditional Gaussian systems. We showed that including energy-conserving nonlinear interactions in designing filters is necessary even in the parameterizations of unresolved variables. We also justified the practical strategy with noise inflation and suggested to avoid underdispersion. In addition, regarding uIIas parameters, the conditional Gaussian system can be used to understand the error in parameter estimation with rigorous mathematical justifications. We showed that the parameter estimation skill can be greatly improved by using stochastic parameterized equations, especially when the system loses observability.







Data assimilation of the spatial-extended stochastic systems with model error

Many turbulent dynamical models involve spatial-extended structures, i.e., PDEs. Data assimilation of such spatial-extended systems with noisy partial observations is a central topic for the understanding and prediction of nature. We proposed a nonlinear data assimilation framework and applied it to the stochastic skeleton model for the MJO.

The stochastic skeleton model is a stochastic PDE and is a suitable model in representing the MJO. An efficient data assimilation with noisy partial observations prepares for the real-time prediction of the MJO in the future.

In this framework:

1. A discrete Fourier transform is applied to the stochastic skeleton model. The result is a nonlinear filter with large dimensions.

2. Incorporating judicious model errors, the coupled system belongs to the conditional Gaussian framework, which has exact solution and is computationally affordable. The judicious model errors include artificial damping and noise inflation in the forecast model, which greatly reduce the accumulation of observational noise.

3. The model parameters are systematically calibrated via a cheap single-column model, which is a practical strategy.

4. An effectively balanced reduced filter involving a simple fast-wave averaging is utilized to deal with the fast oscillation modes. It significantly improves the filtering skill and enhances the total computational efficiency.

The filters succeed in recovering the MJO and other large-scale structures with realistic features. The reduced filter also greatly improves the skill of filtering small-scale fast oscillating waves.

This framework can be applied to filter and predict other spatial-extended systems.






Parameter estimation, uncertainty quantification and prediction for non-Gaussian turbulent models

We developed an Markov chain Monte Carlo (MCMC) algorithm which incorporates Bayesian inference with data augmentation that samples the missing path. A novel pre-estimation of hidden processes greatly enhances the efficiency. The algorithm was applied to a reduced model with only partial observations. The model is able to describe nature with hidden instability and highly non-Gaussian statistics. The model equipped with the estimated parameters succeeds in predicting the extreme events.

We also developed a class of statistically exactly solvable non-Gaussian test models, where a generalized Feynman-Kac formulation reduces the exact behavior of conditional statistical moments to the solution of inhomogeneous Fokker-Planck equations modified by linear lower order coupling and source terms. This procedure is applied to test models with hidden instabilities and combined with information theory to address two important issues in contemporary statistical prediction of turbulent systems: coarse-grained perfect model ensemble prediction and improving long range forecasting in imperfect models.






Ground water systems, karst aquifer

Karst aquifers are among the most important type of groundwater systems. They are mostly made up of a porous medium, referred to as the matrix, that contains a network of fissures and conduits that are the major underground highways for water transport. The matrix holds water while in conduits, one has a free flow. Despite the fact that fissures and conduits occupy less space compared to the matrix, they play an essential role in the transport of fluid and contaminants in karst aquifers.

Based on asymptotic analysis, we explored the coupling of the Stokes and Darcy systems with different choices for the interface conditions, such as Beavers-Joseph, Beavers-Joseph-Saffman-Jones, zero tangential velocity and free-slip interface condition in the one-dimensional and quasi-two-dimensional (periodic) cases.

We investigated the validity of the popular coupled-continuum pipe-flow (CCPF) model for flow in a karst aquifer. The (Navier) Stokes-Darcy model is used as the "true model" for calibrating the exchange coefficient in the CCPF model. We found that there is an almost universal choice for a nearly optimal exchange coefficient such that the relative error is below one percent. Our numerics suggest that the nearly optimal choice of the exchange coefficient should be sufficiently large instead of being a small quantity that is proportional to the hydraulic conductivity.

We also studied two inverse problems for the coupled continuum pipe flow (CCPF) model which describes fluid flows in karst aquifers. After generalizing the well posedness of the forward problem to the anisotropic exchange rate case which is a space-dependent variable, we presented the uniqueness of this parameter by measuring the Cauchy data. The uniqueness of the geometry of the conduit by the Cauchy data is verified as well. These results enhance the practicality of the CCPF model.






In early times, I did a little bit theoretical works about SDEs and PDEs. Back to ancient times ... I also did numerical simulations for some math biology problems. I'm trying to apply the mathematical tools my collaborators and I developed these years to all kinds of complex dynamical systems. See my publication list for the related papers.


I acknowledge all my collaborators. They are all very smart people. I really enjoy working with them.