基于协同训练的互联网在线虚假评论识别研究

首页 > 资料专栏 > 论文 > 技研论文 > IT论文 > 基于协同训练的互联网在线虚假评论识别研究_MBA毕业论文DOC

基于协同训练的互联网在线虚假评论识别研究_MBA毕业论文DOC

德娱在线: V 实名认证

内容提供者

联系反馈

热门搜索

互联网在线训练

资料大小：2982KB(压缩后)
文档格式：DOC
资料语言：中文版/英文版/日文版
解压密码：m448
更新时间：2019/3/6(发布于山东)
阅读：4
类型：金牌资料
积分：--
推荐：升级会员

下载地址

文档软件 | 转换工具

==>> 点击下载文档

相关下载

推荐资料

文本描述

摘要
随着互联网的蓬勃发展,网上购物逐渐成为人们工作生活不可获缺的
一部分。人们在进行网上消费之前,会有目的性的参考以往用户评论进行
消费决策。正因为这些评论信息可以很大程度上影响消费者的购买意愿,
加之在购物平台上人为的添加评论信息难度和成本都不高,部分商家会刻
意的雇佣“水军”向自己的产品添加正面的评论,甚至向竞争者的产品添
加负面评论。这些人为捏造的评论会误导消费者,破坏正常的网上商业秩
序,因而有必要对它们进行识别和剔除,不过这些虚假评论有很强的隐蔽
性,人工的方法进行识别效率很低,成本很高,准确率也难以保证

本论文针对上述问题,对互联网虚假评论分析技术进行了调研,对网
商平台的虚假评论进行了研究与实验。由于基于机器学习的技术对解决该
类型问题有特有的优势,故而本文提出了基于协同训练的两种虚假评论识
别方法:一种将评论中所含有的词语与概率上下文无关文法进行协同即
CoSpa模型,一种将评论中所含词语用信息熵的形式均分成不同的特征集
进行协同即CoFea模型。在虚假评论的标准数据集上展开实验,实验结果
证明,本文所提出的方法CoSpa和CoFea均比作为对比的SVM分类器有
较优的识别准确率,其中CoSpa-U (90%的识别准确率)在迭代学习趋于
稳定后识别准确度比CoSpa-C (85%的识别准确率)略高5%,高于用来
对比的SVM分类器(75-80%的识别准确率);CoFea-T (83%的识别准确
率)在迭代学习趋于稳定后识别准确度比CoFea-S (80%的识别准确率)北京化工大学硕士学位论文
略高3%,高于用来对比的SVM分类器(75-80%的识别准确率)。横向对
比CoSpa和CoFea两种方法,前者有相对较高的识别准确率,后者有更
优秀的运算速度。本文的研宄为解决互联网电商在线虚假评论问题提供了
前景广阔的思路和方法,将本文的研宄成果加以利用可以消除互联网在线
虚假评论对网上商业秩序造成的负面影响

关键词:虚假评论,协同训练,CoSpa模型,CoFea模型ABSTRACT
AN ANALYSIS OF INTERNET ONLINE FALSE COMMENT
BASED ON CO-TRAINING
ABSTRACT
With the rapid development of Internet, electronic commerce has become
part of the fabric of people’s life. People purposely refer to the previous users’
comments before making consumption decisions. Potential consumers may be
influenced by existing users’ opinions correspondingly. In addition, those
opinions are easily to be published on the shopping platform with quite few cost.
It becomes one of the reasons some of the sellers hiring “Online Water Army”
to post positive comments to promote their own products, or to post negative
comments to defame competitors’ products. Those artificially fabricated
comments can jeopardize Internet business order through misleading consumers.
It is necessary to identify and remove the disgustful comments. At the same
time, it is hardly possible to point out the authenticity of one piece of comment
by human without enough experience or time. The high cost is as big a problem
as low accuracy while using human labor.
This paper studies the false comments recognition techniques, and
researches the Golden set of false comment with both observation and
experiments. The state of the art of machine learning has inherent advantages
in solving the above problems. This paper proposes two novel approaches baseMASTER DISSERTATION OF BEIJING UNIVERSITY OF CHEMICAL TECHNOLOGY
on one of the machine learning models named Co-training. One approach trains
classifier with terms and PCFG rules named CoSpa model, and the other one
trains with terms only but equally grouping by entropy named CoFea model.
The experiments show that those new approaches have certain advantages
compared with popular methods commonly used at present. For example, the
identification accuracy result of CoSpa-U algorithm is 90%. The identification
accuracy result of CoSpa-C algorithm is 85%. Both strategies of the CoSpa
algorithm are better than the referenced SVM classifier which has a 75-80%
identification accuracy. The identification accuracy result of CoFea-T algorithm
is 83%, slightly higher than the result of CoFea-U algorithm which is 80%. Both
strategies of the CoFea algorithm are better than the SVM classifier which has
a 75-80% identification accuracy. In general, the results show the CoSpa
algorithm produces higher identification accuracy than the CoFea algorithm,
while the CoFea algorithm leads a less running time consumption. The research
of this paper provides a promising idea and method to solve the problem of
internet online false comment.
KEY WORDS: false comment, Co-training, CoSpa model, CoFea model目录
%-n 麟 i
i.i研宄背景 1
1.2研宄对象与意义1.2.1研宂对象 2
1.2.2研宄意义 2
U国内外研宄综述1.3.1虚假评论研究综述1.3.2协同训练算法研宄综述1.4本文贡献 7
第二章相关技术介绍2.1信息熵 9
2.2概率上下文无关文法2.3词频-逆向文件频率2.4朴素贝叶斯 12
2.5支持向量机 12
2.6协同训练模型2.7检验与验证方法2.7.1威尔考克森符号秩检验2.72曼.惠特尼U检验2.7.3十折交叉验证2.7.4分类准确率第三章在线虚假评论识别模型方法3.1 CoSpa 模型 19
3.2CoFea 模型 23
第四章实验
29
I
北京化工大学硕士学位论文
4.1数据集特征 29
4.2实验内容 32
4.2.丨基于CoSpa模型的实验
32
4.2.2基于CoFea模型的实验
33
4.3实验结果 33
4.3.1基于CoSpa模型的实验结果
34
4么2基于CoFea模型的实验结果
36
4.4针对实验结果的讨论
38
第五章结论
41
5.1本文工作总结
41
5.2相关技术展望
41
参考文献 43
研究成果及发表的学术论文
47
sc m 49
作者和导师简介
51CONTENTS
Contents
Chapter 1 Introduction1.1 Research Background1.2 Research Objects and Significances1.2.1 Research Objects1.2.2 Research Significances1.3 Reviews of Related Researches from Home and Abroad1.3.1 Related Researches about False Comments1.3.2 Related Researches about Co-training Model1.4 Contribution of This PaperChapter 2 Introduction of Related Techniques2.1 Information Entropy2.2PCFG 10
2.3 Tf-idf. ]]
2.4 Naive Bayes 12
2.5 SVM 12
2.6 Co-training Model2.7 Related Test and Validation Techniques2.7.1 Wilcoxon&39;s Signed Rank Test2.7.2 Mann Whitney U Test2.7.3 10-fold Cross-validation2.7.4 Classification Accuracy
】7
Chapter 3 Model and Method for Online False Comment Identification. 19
3.1 CoSpa Model3.2 CoFea Model 23
Chapter 4 Experiments
29
in
MASTER DISSERTATION OF BEIJING UNIVERSITY OF CHEMICAL TECHNOLOGY
4.1 Dataset Features
29
4.2 Experiments 32
4.2.1 Experiment Based on CoSpa Model
32
4.2.2 Experiment Based on CoFea Model
33
4.3 Experiment Results
33
4.3.1 Experiment Results Based on CoSpa Model
34
4.3.2 Experiment Results Based on CoFea Model
36
4.4 Discussions 38
Chapter 5 Conclusions
41
5.1 Conclusion of This Study
41
5.2 Outlook of Related Techniques
41
References 43
Research Achievements and Academic Papers Published
47
Acknowledgement
49
Brief Introduction of Author and Research Supervisor
51
IV
第一章绪论
第一章绪论
1.1研究背景
互联网技术的快速发展极大改变了人们的生活、工作、学习方式。其中最常见的
应用是网上购物。从20世纪90年代开始,网购在全世界范围内开始普及,直至今日
己成为融入居民生活的生活方式[1]。根据第七届APEC电子商务工商联盟论坛发布的
《中国电子商务报告(2016)》对互联网市场的调研统计结果,2016年中国网络购物
交易额达到5.01万亿元人民币,社会消费品零售额所占总额的14.8%,同比增长约
30.7%

表1 2011-2016年中国网络零售业市场交易规模
。。。以上简介无排版格式，详细内容请下载查看