DFG-NCN Projekt Integration von Text Mining Verfahren mit multivariater Zeitreihenanalyse
The resarch for this project is done in cooperation between Anna Staszewska-Bystrova (University of Lodz) and Peter Winker (University of Giessen). It is funded in the framework of a cooperation between DFG (Deutsche Forschungsgesellschaft) and NCN (Narodowe Centrum Nauk).
The project contributes to the development of methods for the joint modelling of multivariate economic time series and indicators derived from collections of texts by means of topic modelling. While the popularity of methods that include variables obtained from text mining in econometric models grows rapidly, there are relatively few studies investigating the statistical properties of such an analysis. The particular goals of the project are those related to
1. comparing of alternative topic modelling procedures,
2. developing methods for embedding textual indicators in time series models, and
3. applying these methods using trends in topics estimated on the basis of Polish and German scientific publications and real economic indicators.
Different topic modelling methods will be compared with respect to their sensitivity to parameter settings, robustness to variations of the (textual) sample and uncertainty related both to sampling variability and the stochastic nature of the algorithms. To perform this task, methods of comparing results of topic modelling across samples will be developed.
To allow for conclusions related to the use of textual indicators, e.g. trends in topics in econometric models, several tasks will be performed. First, different methods for deriving aggregate trends in topics will be compared. The main interest lies in the information content of the estimated trends and statistical properties of the estimators, e.g. related to topic weights. In the next step, the consequences of including such indicators in vector autoregressive models will be studied. In particular, we aim to analyze estimation uncertainty by means of appropriately constructed joint confidence bands.
A proof of concept will be given by constructing vector autoregressive models incorporating trends derived on the basis of topics found for scientific economic corpora for Poland and Germany and related real indicators. The relationships between those two groups of variables will be studied using impulse response analysis.