首页 > 资料专栏 > 论文 > 专题论文 > 其他论文 > MBA毕业论文_高维非线性混频数据模型及应用研究DOC

MBA毕业论文_高维非线性混频数据模型及应用研究DOC

杭州高维
V 实名认证
内容提供者
热门搜索
资料大小:3616KB(压缩后)
文档格式:DOC
资料语言:中文版/英文版/日文版
解压密码:m448
更新时间:2020/8/5(发布于浙江)
阅读:2
类型:金牌资料
积分:--
推荐:升级会员

   点此下载 ==>> 点击下载文档


文本描述
摘要 在大数据时代,随着信息科学与计算机技术的快速发展,使得海量多源数据 的收集与存储方便易行。在多源数据融合建模中,常常遇到观测频率不一致的混 频时间序列数据,由此带来了关于混频数据的影响模式探讨和精准预测研究。由 于传统回归模型建立在同频数据基础之上,往往需要对混频数据进行同频化处理, 这样导致大量高频信息损失。混频数据抽样(MIDAS,MIxed DAta Sampling)模 型的提出,为直接使用原始混频变量进行建模分析提供了可能。进一步,随着经 济管理领域中研究问题复杂性的增加,出现了许多现有混频数据模型难以有效解 决的反向、高维、非线性等类型的混频数据分析问题。有效解决此类数据建模过 程中的技术难题,拓展混频数据分析方法,对于推广使用混频数据模型以及探索 经济管理领域中复杂影响模式等,具有重要的理论意义和应用价值。 基于此,本文选取“高维非线性混频数据模型及应用研究”这一研究主题, 综合应用统计学、经济学、金融学和管理学等学科知识,采取理论分析、数值模 拟和应用研究相结合的范式,将现有的混频数据分析方法从“正向”拓展至“反 向”、从“低维”拓展至“高维”、从“线性”拓展至“非线性”,从而分别建 立相应的反向有约束混频数据模型、组惩罚混频数据模型以及神经网络混频数据 模型等,并将这些模型应用于现实经济管理问题的解决。论文的具体工作和主要 创新如下: (1)建立反向有约束混频数据模型(RR-MIDAS,Reverse Restricted MIDAS),用于解决使用低频信息预测高频变量的反向混频数据问题,同时不受 频率倍差限制,能够适用于更一般的混频数据情形。首先,借助 MIDAS 模型中 的权重约束思想和 RU-MIDAS(Reverse Unrestricted MIDAS)模型中的分时期结 构理论,给出 RR-MIDAS 模型构建过程中频率对齐、分时期处理、权重约束、参 数估计以及多步向前预测等完整步骤。其次,使用 Monte Carlo 数值模拟考察 RR-MIDAS 模型的有效性,通过比较小、中、大频率倍差水平下 RR-MIDAS 模型 与 RU-MIDAS 模型和 HF 模型的拟合与预测差异,结果表明前者具有最优的预测 表现。最后,将 RR-MIDAS 模型应用于中国和美国的市场化利率预测,同样证实 了该模型具有很好的预测能力,并且能够反映变量间的实时动态影响关系。 (2)建立组惩罚(正向/反向)无约束混频数据模型(GP-(R)U-MIDAS, Group Penalized (Reverse) Unrestricted MIDAS),用于解决具有高维特征的混频 数据分析问题,兼顾频率对齐和多阶滞后操作产生的组效应,能够实现混频数据 分析、变量降维、参数估计和关键变量识别,同时增强解释能力与预测能力。首 先,将组 LASSO、组 SCAD 和组 MCP 等组惩罚函数引入到(R)U-MIDAS 模型框II 架下,建立 GP-(R)U-MIDAS 模型,并给出模型设置、参数估计、组变量选择与 多步向前预测等完整建模过程。其次,使用 Monte Carlo 数值模拟考察 GP-(R)U- MIDAS 模型有效性,通过比较不同变量作用机制以及不同频率倍差情形下,GP- (R)U-MIDAS 模型与 P-(R)U-MIDAS 模型、 FC-(R)U-MIDAS 模型以及 (R)U- MIDAS 模型的变量选择、拟合与预测差异,结果表明前者在存在组效应情形下显 著优于其他模型。最后,将 GP-U-MIDAS 模型和 GP-RU-MIDAS 模型分别应用于 季度 GDP 预测和资产定价研究中,同样证实了 GP-(R)U-MIDAS 模型的预测表现 显著优于其他比较模型,并且能够在探讨变量间影响机制的同时识别出关键影响 因子。 (3)建立神经网络(有约束/无约束)混频数据模型(ANN-(U-)MIDAS), 用于探讨原始混频数据中潜在的非线性影响模式,能够充分利用高频有效信息, 同时充分发挥机器学习中的数据驱动与自适应学习能力。首先,将(U-)MIDAS 方 法引入至 ANN 模型框架下,建立 ANN-(U-)MIDAS 模型,并给出包括模型设置、 参数估计以及多步向前预测等在内的完整建模过程。其次,使用 Monte Carlo 数 值模拟考察 ANN-(U-)MIDAS 模型有效性,通过比较 ANN-(U-)MIDAS 模型与基 准的 ANN 模型和(U-)MIDAS 模型之间的拟合效果与预测能力,发现 ANN-(U- )MIDAS 模型表现最优。最后,将 ANN-(U-)MIDAS 模型应用于使用低频宏观经 济变量和高频金融市场信息预测月度通货膨胀率的研究中,实证结果证实了 ANN-(U-)MIDAS 模型具有最优的拟合与预测表现,能够有效解决非线性混频数 据问题。 本文研究工作,对于经济管理领域中出现的反向、高维、非线性等混频数据 分析问题,在现有研究基础上,对经典的(U-)MIDAS 模型进行了有意义的拓展, 建立一系列全新的混频数据分析模型与方法,充实了混频数据理论研究内容,丰 富了混频数据应用研究工具。同时,选取经济管理领域中的常见问题,在混频数 据框架下开展相关主题研究,致力于提高研究结果的解释能力与预测精度,从而 帮助政策制定者和投资者及时把握市场变化趋势,深度了解市场运行机制,最终 提高宏观审慎监管能力、提升投资决策和管理水平。 关键词:混频数据;高维变量;非线性模式;MIDAS 模型;RR-MIDAS 模型; GP-U-MIDAS 模型;GP-RU-MIDAS 模型;ANN-U-MIDAS 模型;ANN-MIDAS 模型III ABSTRACT In the era of big data, innovations in information science and computer technology have made the collection and storage of large dataset possible. In multi-source information fusion modeling, one is often confronted with many time series that can be sampled or observed at different frequencies, which raises the problem of how to explore complex patterns among mixed frequency data and making an accurate forecast. As a typical regression model involves data sampled at the same frequency, the common solution in such cases is to turn mixed frequency data into the same frequency. In the process, a lot of high frequency information may be discarded. As an alternative, proposal of a mixed data sampling model (MIDAS) provides possibility for directly accommodate variables sampled at different frequencies. Furthermore, with increasing complexity of economic and financial research, there are a lot of unexplored mixed frequency data analysis problems, such as reverse, high-dimensional, and nonlinear patterns, which are unable to solve effectively by using the existing MIDAS-based methods. To effectively solve the technical problems arise in modeling process and extend the model of mixed data analysis, can be extremely helpful in promoting the application of mixed data and exploring complex patterns of economic management. It is of great importance for theoretical approaches and practical implications. To this end, this dissertation selects the subject of “research on high-dimensional nonlinear mixed data sampling model with applications”. Through integrating the discipline of statistics, economics, finance, and management, and combining the methods of theoretical analysis, numerical simulation and application research, this dissertation attempts to extend the obverse, low-dimensional, and linear mixed frequency data analysis methods to reverse, high-dimensional, and nonlinear cases, and then construct a reverse restricted mixed data sampling model, a group penalized mixed data sampling model, and an artificial neural network mixed data sampling model, respectively. Moreover, these models are applied to solve the problems of economic management. The detailed researches and main innovations of this dissertation are as follows: (1) Construct a novel reverse restricted mixed data sampling (RR-MIDAS) model, which allows us to forecast high frequency variables using low frequency information. The RR-MIDAS model is applicable to more general mixed frequency data withoutIV frequency mismatch limit. Firstly, borrowing the ideas from parameter restrictions in MIDAS and periodic structures in RU-MIDAS, we provide a procedure for RR-MIDAS regressions including frequency alignment, periodic processing, parameters estimation, and multi-step forecasting. Second, the efficacy of the RR-MIDAS model is illustrated through Monte Carlo simulations. We consider small, medium, and large values of frequency mismatches and compare the RR-MIDAS with several competing models including RU-MIDAS and HF model, the numerical results show that the performance of RR-MIDAS consistently outperform the other models, in terms of predictive ability. Finally, the decent performance of the RR-MIDAS model is demonstrated in a real- world application on forecasting China and US market interest rates, since it is able to explore the dynamic relationships among variables. (2) Construct a novel group penalized (reverse) unrestricted mixed data sampling (GP-(R)U-MIDAS) model, which allows us to identify important variables at block level in high dimensional mixed frequency data analysis, and take into account the grouping structures produced via the frequency alignment and multiple lag operation. The GP-(R)U-MIDAS model is able to solve the problems of mixed data analysis, dimension reduction, parameters estimation, and key variables identification. In addition, it can enhance the interpretability and prediction ability. Firstly, we introduce the group LASSO, group SCAD, and group MCP penalized function into the (R)U- MIDAS regression framework, and propose the GP-(R)U-MIDAS model. Moreover, we provide detailed procedures for it with model setup, parameters estimation, group variables selection, and multi-step forecasting. Second, the efficacy of the GP-(R)U- MIDAS model is illustrated through Monte Carlo simulations. We consider the different forms of variables and different values of frequency mismatches, and compare the GP- (R)U-MIDAS model with several competing models including P-(R)U-MIDAS, FC- (R)U-MIDAS, and (R)U-MIDAS, in terms of variables selection, goodness-of-fit, and prediction accuracy, the numerical results show that the performance of the GP-(R)U- MIDAS model is significantly superior to the other models, when group effecting exist. Finally, the superiority of the GP-(R)U-MIDAS model is also illustrated in real-world applications on quarterly GDP growth forecast and asset pricing. The empirical results show that the GP-(R)U-MIDAS model outperforms the other competitive models, and is able to explore influencing mechanism and select crucial factors. (3) Construct a novel artificial neural network (unrestricted) mixed data sampling (ANN-(U-)MIDAS) model, which allows us to explore the potential nonlinear patternV hidden in raw mixed frequency data. The ANN-(U-)MIDAS model can make full use of high frequency effective information, and give full play to the data-driven and adaptive learning ability in machine learning. Firstly, we introduce the (U-)MIDAS approach into the ANNs framework, and propose the ANN-(U-)MIDAS model. Moreover, we provide detailed procedures for it including model setup, parameters estimation, and multi-step forecasting. Second, we conduct extensive Monte Carlo simulations to illustrate the efficacy of the ANN-(U-)MIDAS model, and then compare its decent performance with those of other competing models including ANN and (U-)MIDAS models in terms of goodness-of-fit and predictive ability. The numerical results show that the GP-(R)U-MIDAS model outperforms the other models. Finally, the decent performance of the ANN-(U-)MIDAS model, in terms of fitting and forecasting, is also demonstrated in a real-word application on monthly inflation forecasts by using both low frequency macroeconomic variables and high frequency financial market information. The results verify that ANN-(U-)MIDAS is an efficient tool to handle nonlinear mixed frequency data. In summary, consider the emerging problems of reverse, high-dimensional, and nonlinear mixed frequency data analysis in the filed of economic management, and based on previous research results, this dissertation further extend the classical (U- )MIDAS approach to develop a series of new mixed frequency data analysis models, which enrich the research content and application study of mixed frequency data. Moreover, this dissertation chooses the common problems in the filed of economic management, carry out the related subject research in the framework of mixed frequency data, and focus on promoting the interpretability and prediction accuracy. This will help policymakers and investors to keep abreast of changing development trends and deeper understanding of the market mechanism, and then improve the macro- prudential regulatory ability and raise the investment decision-making and management level. KEYWORDS: Mixed frequency data; High dimensionality; Nonlinear pattern; MIDAS; RR-MIDAS; GP-U-MIDAS; GP-RU-MIDAS; ANN-U-MIDAS; ANN-MIDASVI 目录 第 1章 绪论............................................................................................................. 1 1.1 选题背景与研究意义 ................................................................................. 1 1.1.1 选题背景 .............................................................................................. 1 1.1.2 研究问题 .............................................................................................. 5 1.1.3 研究意义 .............................................................................................. 5 1.2 国内外研究现状 ......................................................................................... 7 1.2.1 正向混频数据分析方法研究现状 ...................................................... 7 1.2.2 反向混频数据分析方法研究现状 .................................................... 10 1.2.3 高维混频数据分析方法研究现状 .....................................................11 1.2.4 非线性混频数据分析方法研究现状 ................................................ 12 1.3 主要创新与结构安排 ............................................................................... 13 1.3.1 主要创新 ............................................................................................ 13 1.3.2 结构安排 ............................................................................................ 15 第 2章 混频数据模型与研究进展....................................................................... 18 2.1 有约束混频数据模型 ............................................................................... 18 2.1.1 MIDAS 模型 ...................................................................................... 18 2.1.2 拓展形式 ............................................................................................ 22 2.1.3 模型评价 ............................................................................................ 24 2.2 无约束混频数据模型 ............................................................................... 25 2.2.1 U-MIDAS 模型 .................................................................................. 25 2.2.2 拓展形式 ............................................................................................ 27 2.2.3 模型评价 ............................................................................................ 27 2.3 反向无约束混频数据模型 ....................................................................... 28 2.3.1 RU-MIDAS 模型................................................................................ 28 2.3.2 模型评价 ............................................................................................ 29 2.4 本章小结 ................................................................................................... 30 第 3章 反向有约束混频数据模型及应用........................................................... 31 3.1 问题提出 ................................................................................................... 31 3.2 RR-MIDAS 模型构建............................................................................... 32 3.2.1 频率对齐 ............................................................................................ 32 3.2.2 分时期处理 ........................................................................................ 33 3.2.3 多项式权重约束 ................................................................................ 34VII 3.2.4 非线性最小二乘估计 ........................................................................ 36 3.2.5 多步向前预测 .................................................................................... 36 3.3 数值模拟 ................................................................................................... 37 3.3.1 数据生成 ............................................................................................ 37 3.3.2 实验设计 ............................................................................................ 37 3.3.3 模型比较 ............................................................................................ 38 3.3.4 结果讨论 ............................................................................................ 38 3.4 应用研究 ................................................................................................... 41 3.4.1 中国市场化利率预测 ........................................................................ 42 3.4.2 美国市场化利率预测 ........................................................................ 56 3.5 本章小结 ................................................................................................... 64 第 4章 组惩罚混频数据模型及应用................................................................... 65 4.1 问题提出 ................................................................................................... 65 4.2 GP-R(U)-MIDAS 模型构建 ..................................................................... 67 4.2.1 GP-U-MIDAS 模型构建.................................................................... 68 4.2.2 GP-RU-MIDAS 模型构建 ................................................................. 73 4.3 数值模拟 ................................................................................................... 78 4.3.1 数据生成 ............................................................................................ 78 4.3.2 实验设计 ............................................................................................ 80 4.3.3 模型比较 ............................................................................................ 81 4.4 应用研究 ................................................................................................... 86 4.4.1 宏观经济与国内生产总值关系研究 ................................................ 86 4.4.2 高维混频风险因子与资产定价研究 ................................................ 98 4.5 本章小结 ................................................................................................. 109 第 5章 神经网络混频数据模型及应用..............................................................111 5.1 问题提出 ..................................................................................................111 5.2 ANN-(U-)MIDAS 模型构建....................................................................112 5.2.1 ANN-U-MIDAS 模型设置 ...............................................................112 5.2.2 ANN-MIDAS 模型设置 ...................................................................115 5.2.3 梯度下降估计 ...................................................................................116 5.2.4 多步向前预测 ...................................................................................117 5.2.5 模型选择 ...........................................................................................118 5.3 数值模拟 ..................................................................................................118 5.3.1 数据生成 ...........................................................................................119VIII 5.3.2 实验设计 .......................................................................................... 121 5.3.3 模型比较 .......................................................................................... 122 5.4 应用研究 ................................................................................................. 128 5.4.1 研究背景 .......................................................................................... 128 5.4.2 数据描述 .......................................................................................... 129 5.4.3 模型比较 .......................................................................................... 130 5.4.4 相对重要性分析 .............................................................................. 132 5.4.5 敏感性分析 ...................................................................................... 133 5.5 本章小结 ................................................................................................. 134 第 6章 总结与展望............................................................................................. 136 6.1 研究总结 ................................................................................................. 136 6.1.1 研究成果 .......................................................................................... 136 6.1.2 研究意义 .......................................................................................... 137 6.2 研究展望 ................................................................................................. 138 6.2.1 不平等问题研究 .............................................................................. 138 6.2.2 密度预测问题研究 .......................................................................... 138 6.2.3 多源数据融合研究 .......................................................................... 139 参考文献...................................................................................................................... 140IX 插图清单 图 1.1 论文结构安排与内容体系................................................................................ 17 图 2.1 两参数指数 Almon 多项式权重....................................................................... 20 图 2.2 两参数 Beta多项式权重................................................................................... 21 图 3.1 m=3 时不同时期上相对 MAE和 RMSE 结果................................................. 39 图 3.2 m=12 时不同时期上相对 MAE和 RMSE 结果............................................... 40 图 3.3 m=66 时不同时期上相对 MAE和 RMSE 结果............................................... 41 图 3.4 混频时间序列.................................................................................................... 43 图 3.5 各模型的样本内拟合效果比较(以 RW 模型为对比基础)........................ 46 图 3.6 各模型的样本外预测效果比较(以 RW 模型为对比基础)........................ 46 图 3.7 各时期内隔夜 SHIBOR 不同滞后阶数的估计参数........................................ 50 图 3.8 各时期内 Bond 不同滞后阶数的估计参数...................................................... 50 图 3.9 各模型的样本内拟合效果比较(7天 SHIBOR 替换隔夜 SHIBOR)......... 51 图 3.10 各模型的样本外预测效果比较(7天 SHIBOR 替换隔夜 SHIBOR)....... 51 图 3.11 各模型样本内拟合效果比较(M1替换 M0) ............................................. 52 图 3.12 各模型样本外预测效果比较(M1替换 M0)............................................. 52 图 3.13 各模型样本内拟合效果比较(M2替换 M0)............................................. 53 图 3.14 各模型样本外预测效果比较(M2替换 M0)............................................. 53 图 3.15 各模型样本内拟合效果比较(PPI替换 CPI) ............................................ 54 图 3.16 各模型样本外预测效果比较(PPI替换 CPI) ............................................ 54 图 3.17 各模型样本内拟合效果比较(Trade 替换 IP)............................................ 55 图 3.18 各模型样本外预测效果比较(Trade 替换 IP)............................................ 55 图 3.19 实时公布数据与普通经济数据...................................................................... 58 图 3.20 不同预测时期上 MAE和 RMSE 比值结果................................................... 61 图 3.21 各模型的日度利率预测结果.......................................................................... 61 图 3.22 RR-MIDAS 模型中利率滞后、通货膨胀率和 GDP 的系数估计结果........ 62 图 4.1 情景 1至 3中指数 Almon 多项式权重分布情况............................................ 79 图 4.2 情景 2中 m=3 且 l=2 时所有解释变量的回归系数........................................ 80 图 4.3 G-LASSO-U-MIDAS 模型交叉验证过程中的组变量进入过程.................... 91 图 4.4 G-SCAD-U-MIDAS 模型交叉验证过程中的组变量进入过程...................... 92 图 4.5 G-MCP-U-MIDAS 模型交叉验证过程中的组变量进入过程........................ 92 图 4.6 G-LASSO-U-MIDAS 模型交叉验证结果........................................................ 93X 图 4.7 G-SCAD-U-MIDAS 模型交叉验证结果.......................................................... 93 图 4.8 G-MCP-U-MIDAS 模型交叉验证结果............................................................ 93 图 4.9 GP-U-MIDAS 模型滚动预测误差.................................................................... 97 图 4.10 G-LASSO-RU-MIDAS 模型时间序列交叉验证结果 ................................. 103 图 4.11 G-SCAD-RU-MIDAS 模型时间序列交叉验证结果.................................... 103 图 4.12 G-MCP-RU-MIDAS 模型时间序列交叉验证结果...................................... 104 图 4.13 不同风险因子的相对重要性........................................................................ 106 图 5.1 ANN-U-MIDAS 模型原理图 ...........................................................................114 图 5.2 ANN-MIDAS 模型原理图................................................................................115 图 5.3 方案 1中响应变量的时间序列图 ...................................................................119 图 5.4 方案 2中响应变量的时间序列图 .................................................................. 120 图 5.5 方案 3中响应变量的时间序列图 .................................................................. 120 图 5.6 通货膨胀率的时间序列图.............................................................................. 129 图 5.7 通货膨胀率预测中各解释变量的相对重要性.............................................. 133 图 5.8 通货膨胀率预测中对于各解释变量的敏感性分析...................................... 134XI 表格清单 表 3.1 各模型预测能力的 DM 检验结果.................................................................... 41 表 3.2 各模型最优滞后阶数选择结果........................................................................ 45 表 3.3 RR-MIDAS 模型的参数估计结果.................................................................... 49 表 3.4 数据描述性统计结果........................................................................................ 58 表 3.5 RR-MIDAS 模型参数估计结果........................................................................ 63 表 4.1 (组)变量误选率结果.................................................................................... 82 表 4.2 各模型样本内 RMSE 结果 ............................................................................... 84 表 4.3 各模型样本外 RMSE 结果 ............................................................................... 85 表 4.4 相关变量描述性统计结果................................................................................ 88 表 4.5 各模型拟合与预测结果比较............................................................................ 90 表 4.6 GP-U-MIDAS 模型的组变量选择与参数估计结果........................................ 94 表 4.7 G-LASSO-U-MIDAS 模型在滚动预测中的组变量选择结果........................ 95 表 4.8 G-SCAD-U-MIDAS 模型在滚动预测中的组变量选择结果.......................... 96 表 4.9 G-MCP-U-MIDAS 模型在滚动预测中的组变量选择结果............................ 96 表 4.10 风险因子及其描述性统计............................................................................ 101 表 4.11 GP-RU-MIDAS 模型风险因子选择结果..................................................... 105 表 4.12 不同时期上各模型的拟合效果和预测能力比较结果................................ 108 表 4.13 各模型不同时期上预测能力的 DM 检验结果............................................ 108 表 5.1 根据时间序列交叉验证确定的最优参数值.................................................. 121 表 5.2 方案 1中 m=3 时的 RMSE 结果..................................................................... 123 表 5.3 方案 2中 m=3 时的 RMSE 结果..................................................................... 123 表 5.4 方案 3中 m=3 时的 RMSE 结果..................................................................... 124 表 5.5 方案 1中 m=12 时的 RMSE 结果................................................................... 124 表 5.6 方案 2中 m=12 时的 RMSE 结果................................................................... 125 表 5.7 方案 3中 m=12 时的 RMSE 结果................................................................... 125 表 5.8 方案 1中 m=22 时的 RMSE 结果................................................................... 126 表 5.9 方案 2中 m=22 时的 RMSE 结果................................................................... 126 表 5.10 方案 3中 m=22 时的 RMSE 结果................................................................. 127 表 5.11 各变量的描述性统计结果 ............................................................................ 130 表 5.12 由时间序列交叉验证选择的最优参数值.................................................... 130 表 5.13 样本内和样本外 RMSE 值比较结果 ........................................................... 131