首页 > 资料专栏 > 论文 > 技研论文 > 研发技术论文 > MBA毕业论文_随机森林在技术指标量化选股中的应用

MBA毕业论文_随机森林在技术指标量化选股中的应用

迷雾森林
V 实名认证
内容提供者
热门搜索
资料大小:2078KB(压缩后)
文档格式:DOC
资料语言:中文版/英文版/日文版
解压密码:m448
更新时间:2019/10/17(发布于吉林)
阅读:4
类型:金牌资料
积分:--
推荐:升级会员

   点此下载 ==>> 点击下载文档


文本描述
Application of Random Forest in Quantitative
Stocks Selection of Technical Indicators
A Master Thesis Submitted to
University of Electronic Science and Technology of China
Discipline: Master of Business Administration
Author: Wu weixing
Supervisor: Tian yixiang
School: School of Management and Economics
摘要
I
摘要
随着中国经济的高速发展,截止到2017年12月19日整个A股总市值达到了
60多万亿元,其中流通市值超过了44万亿元。全A股市场的股票数量超过了3400
只,每天产生大量的金融数据。人的精力是有限的,如果投资者使用基本面分析
法,3400多只股票的财务数据不会全部分析;同样每天也不会看3400多只股票的
技术图形和技术指标。电子科学与信息技术的发展推动了计算机与互联网的普及,
使量化投资进入了投资者的视线。量化投资是用现代金融学、数学、统计学、计
算机科学等科学技术将投资者的投资理念程序化的过程。众所周知股票市场是一
个多噪声的复杂系统[35],影响股票价格的因素有很多,同时这些因素大多是非线
性的,使得传统的线性模型不能很好的解决这类问题。2017年A股市场上大盘股
表现更好,漂亮50与白马股一路高歌,然而许多量化基金的业绩却表现平平,让
一度热捧的量化基金受到考验。原因是2017年的这种市场行情让一些因子失效或
所选因子不能识别行情。近两年人工智能站在了风口上,机器学习算法在大数据
挖掘上日趋成熟,恰好机器学习中的许多算法能解决非线性问题。因此,本文采
用机器学习中的随机森林算法,对纯技术指标建立的数据集进行分析,构建量化
选股模型
本文研究工作有:从Wind终端提取KDJ、MACD、RSI、ROC、布林线等技
术指标数据,建立22个技术因子;在Anaconda Navigator平台上采用python语言
将技术指标与机器学习专用模块(sklearn)中的随机森林算法相结合,构建一个多因
子选股模型。以回测年份开始日期为基准,向前推两年的数据作为随机森林模型
的训练数据,并对训练数据集做10折交叉验证;用网格搜索优化参数,对模型中
的可设置的参数测试与分析;每周的技术因子数据通过随机森林进行预测,选取
上涨概率最高的10 、20、30、40只股票作为组合进行投资。因为创业板2010年
才上市,最终选取2010年至2017年中证500成份股为研究对象
通过回测,发现此种方法构建的模型适合中国的A股市场,选取投资组合从
2012至2017年取得了一定的收益,累积收益能够跑赢中证500指数。对爱好股票
投资者而言有一定的参考意义。文中也阐述了技术分析理论及技术指标,随机森
林相关知识
关键词: 量化选股,技术指标,机器学习,随机森林
ABSTRACT
II
ABSTRACT
With the rapid development of Chinese economy, A-share market get a big step .on
December 19, 2017, the total market value reached more than ¥60 trillion and
circulation market value are more than ¥44 trillion. The total number of shares
exceeded 3,400. Every day, a lot of financial data will be produced. People's daily
energy is limited. To face the much data and 3,400 shares, it is difficult to choose shares
when investors use fundamental analysis. To use technical analysis choice shares is also
inefficient. The development of electronic science and information technology has
promoted the popularization of computers and the Internet, bringing quantitative
investment into the eyes of investors, and quantifying investment is the use of modern
finance, mathematics, statistics, computer science and other scientific technologies to
bring investors' investment ideas into practice. The process of it is well known that the
stock market is a complex system with many noises. There are many factors that affect
prices, and most of these factors are non-linear. The traditional linear model can not
solve this kind of problem very well. For example, in the A-share market in 2017, the
50-and-white horse sang the beautiful style of the A-share market. However, the
performance of many quantified funds was flat, so that the fluctuating quantitative funds
were tested. The reason is that such market conditions in 2017 have caused some factors
to fail or the selected factors do not recognize the market. In the past two years, artificial
intelligence has been on the air, and machine learning algorithms are maturing on big
data mining. Many algorithms in machine learning can solve nonlinear problems.
Therefore, in this paper, the random forest algorithm in machine learning is used to
analyze the data set created by pure technical indicators, and a quantitative stock
selection model is constructed.
The research work in this paper: Extract technical indicators data such as KDJ,
MACD, RSI, ROC, Bollinger, etc. from Wind Terminal, establish 22 technical factors;
use python language in Anaconda Navigator platform for technical indicators and
special modules for machine learning (sklearn) .The random forest algorithm is
combined to construct a multi-factor selection model. Using the start date of the test
year as the benchmark, push forward two years of data as training data for the random
forest model, and perform a 10-fold cross validation on the training data set; use the。