首页 > 资料专栏 > 论文 > 财税论文 > 金融投资论文 > MBA硕士毕业论文_于集成学习算法的A股投资策略研究PDF

MBA硕士毕业论文_于集成学习算法的A股投资策略研究PDF

richeng***
V 实名认证
内容提供者
资料大小:1604KB(压缩后)
文档格式:PDF
资料语言:中文版/英文版/日文版
解压密码:m448
更新时间:2022/2/13(发布于广东)

类型:金牌资料
积分:--
推荐:升级会员

   点此下载 ==>> 点击下载文档


文本描述
I 摘要 随着经济的高速增长,中国资本市场不断改革完善,未来A股市场将会变得 更加公开、透明,政策市的影响将逐步降低。截至2019年底,沪深两市市值合计 60多万亿元,上市公司达3800余家,投资者超过1.5亿人。 A股的规模迅速增长,每个交易日都会产生巨量的金融数据,证券市场风云变 化莫测,投资者的精力也是有限的,不可能逐个去研究投资标的,但是随着计算机 科学技术的迅速发展,量化投资应运而生。量化投资是指以数学、金融学、统计学 或机器学习算法为基础,利用计算机程序,设计并实现投资交易策略的过程。证券 市场波动频繁,股票价格受多种因素影响,传统的线性模型往往预测效果不佳,随 着人工智能的发展,机器学习算法逐渐被运用到量化交易中。集成学习是一种非常 经典的机器学习算法,通常与单一算法相比具有更多的优势。 本文重点研究了基于集成学习算法的A股投资策略,主要工作包括从聚宽量 化交易平台提取相关股票数据,并选取了通常会影响股价的50个变量构建特征因 子库,使用Python编程语言,将Random Forest和AdaBoost两种算法分别与特征 因子库相结合,从而构建集成学习算法选股模型。利用集成学习算法对股票样本数 据进行训练,把样本数据划分成训练数据集和验证数据集,分别利用模型对所选的 数据进行样本内训练和交叉验证调整参数,最后利用样本外数据进行测试。通过对 沪深300和中证500,以及全部A股等样本的对比测试,优化设置模型参数,并根 据最终测试结果对模型进行综合评价,选取预测上涨概率最大的10只个股建立投 资策略组合,并对投资组合进行回测,分析投资组合表现。 文中对Random Forest和AdaBoost算法理论进行了分析和阐述,并使用准确 率、AUC等指标对模型进行了评价。经过分析,采用这两种机器学习算法构建的 选股模型适用于A股市场,选取的投资组合在测试区间取得了较好收益,能够跑 赢大盘指数,对广大投资者和量化爱好者均具有参考意义。 关键词:Random Forest、AdaBoost、集成学习、投资策略、量化交易 ABSTRACT II ABSTRACT With the rapid economic growth and the continuous reform and improvement of China's capital market, the A-share market will become more open and transparent in the future, and the impact of the policy market will gradually reduce. By the end of 2019, the total market value of Shanghai and Shenzhen markets was more than 60 trillion yuan, with more than 3800 listed companies and more than 150 million investors. The scale of A-share is growing rapidly, and huge amounts of financial data are generated every trading day. The stock market is unpredictable, and investors’ energy is limited, so it is impossible to study the investment targets one by one. However, due to the rapid development of information technology, quantitative investment should come into being. Quantitative investment refers to the process of designing and implementing investment trading strategy based on mathematics, finance, statistics or machine learning algorithm by using computer program. The stock market fluctuates frequently, and the stock price is affected by many factors. The traditional linear model often has poor prediction effect. With the development of artificial intelligence, machine learning algorithm is gradually applied to quantitative trading. Integrated learning is a very classical machine learning algorithm, which has more advantages than single algorithm. This paper focuses on the A-share investment strategy based on the integrated learning algorithm. The main work includes extracting relevant stock data from the JoinQuant quantitative trading platform, and selecting 50 variables that usually affect the stock price to build a feature factor library. Using the Python programming language, the Random Forest and AdaBoost are combined with feature factor library to construct an integrated learning algorithm stock selection model. Use the integrated learning algorithm to train the stock sample data, divide the sample data into training data sets and verification data sets, use the model to perform in-sample training and cross-validation adjustment on the selected data, and finally use the out-of-sample data for testing. Through the comparative testing of CSI 300 and CSI 500, and all A-share samples, optimize the setting of model parameters, and comprehensively evaluate the model based on the final test results, select the 10 stocks with the highest predicted rise probability to establish an investment strategy portfolio, And backtest the portfolio. In this paper, the theory of Random Forest and AdaBoost algorithm is analyzed and ABSTRACT III elaborated, and the model is evaluated by using accuracy, AUC and other indicators. After analysis, the stock selection model constructed by these two machine learning algorithms is suitable for A-share market. The selected portfolio has achieved good returns in the test range, and can outperform the market index, which has reference significance for investors and quantitative enthusiasts. Keywords: Random Forest, AdaBoost, Integrated Learning, Investment Strategy, Quantitative Trading 目录 IV 目 录 第一章 绪论 ....................... 1 1.1 研究背景与意义 ... 1 1.1.1 研究背景 ... 1 1.1.2 研究意义 ... 2 1.2 国内外研究现状 ... 3 1.2.1 国内研究现状 .......................... 3 1.2.2 国外研究现状 .......................... 5 1.3 研究目标及内容 ... 6 1.3.1 本文研究目标 .......................... 6 1.3.2 本文研究内容 .......................... 7 第二章 量化投资概述 ....... 9 2.1 量化交易简介 ....... 9 2.1.1 量化选股和择时 ...................... 9 2.1.2 算法交易和统计套利 ............ 10 2.2 量化交易发展状况 ............................ 10 2.2.1 主要发展历程 ........................ 10 2.2.2 常见经典策略 ......................... 11 2.3 量化交易优势与缺点 ........................ 12 2.3.1 量化主要优势 ........................ 12 2.3.2 量化主要缺点 ........................ 12 第三章 集成学习理论和研究方法 ................ 13 3.1 机器学习介绍 ..... 13 3.1.1 数据挖掘简介 ........................ 13 3.1.2 机器学习算法 ........................ 13 3.2 决策树分类算法 . 14 3.2.1 ID3算法介绍 .......................... 15 3.2.2 C4.5算法介绍 ........................ 15 3.2.3 CART算法介绍 ...................... 16 3.2.4 决策树的剪枝 ........................ 16 3.3 集成学习算法理论 ............................ 16 目录 V 3.3.1 集成学习简介 ........................ 16 3.3.2 并行集成法Random Forest ... 17 3.3.3 串行集成法AdaBoost ........... 18 3.4 研究流程及技术工具 ........................ 19 3.4.1 聚宽平台简介 ........................ 19 3.4.2 编程语言Python .................... 20 3.4.3 本文研究流程 ........................ 20 第四章 数据处理与特征因子库 .................... 22 4.1 样本数据获取 ..... 22 4.1.1 建立股票池 ............................ 22 4.1.2 确定回测区间 ........................ 22 4.2 特征和标签提取 . 22 4.3 特征因子预处理 . 26 4.3.1 中位数去极值 ........................ 26 4.3.2 缺失值的处理 ........................ 26 4.3.3 行业市值中性化 .................... 26 4.3.4 特征因子标准化 .................... 27 4.4 训练集和交叉验证集合成 ................ 27 4.4.1 沪深300内选股 .................... 27 4.4.2 中证500内选股 .................... 28 4.4.3 全A股市场选股 ................... 28 4.5 降维与特征因子选择 ........................ 28 4.5.1 主要降维方法 ........................ 28 4.5.2 重要特征选择 ........................ 29 4.5.3 特征比例选择 ........................ 31 第五章 构建并执行集成学习模型 ................ 33 5.1 算法模型的实现 . 33 5.1.1 实现Random Forest模型 ...... 33 5.1.2 实现AdaBoost模型 .............. 33 5.2 模型样本内训练 . 33 5.2.1 Random Forest训练 ................ 33 5.2.2 AdaBoost训练 ........................ 34 5.3 模型主要评价指标 ............................ 34 目录 VI 5.3.1 模型评价的意义 .................... 34 5.3.2 模型的性能评价 .................... 34 5.4 交叉验证调整参数 ............................ 35 5.4.1 Random Forest调参 ................ 36 5.4.2 AdaBoost调参 ........................ 38 5.5 进行样本外测试 . 40 5.5.1 Random Forest测试 ................ 40 5.5.2 AdaBoost测试 ........................ 41 5.5.3 AdaBoost与Random Forest对比 ......................... 42 5.5.4 因子相关性分析 .................... 43 第六章 模型选取的投资组合分析 ................ 45 6.1 投资组合简介 ........................... 45 6.2 投资组合分析 ........................... 45 6.2.1 组合选股情况 .......