摘要: |
为了方便读者能在海量的图书资源中快速有效的找到需要的书籍,利用MapReduce框架分块处理,结合关联分析Apriori算法,将数据挖掘技术应用到图书管理系统中。但需要多次扫描数据库和产生大量候选集,对Hadoop平台处理速度带来了巨大挑战,因此,针对传统的Apriori算法,提出基于内存计算、弹性分布式数据集处理的Spark平台为读者推荐书籍,指引读者的借阅行为。 |
关键词: Apriori关联规则 Spark平台 图书借阅行为模式 频繁项集 |
DOI:10.13610/j.cnki.1672-352x.20180825.002 |
|
基金项目:国家自然科学青年科学基金项目(31601741)和安徽省高等学校自然科学研究重点项目(KJ2016A221)资助。 |
|
Research of associative model for libraries’ book lending data based on the spark |
GAO Qijuan,LIU Kai,CHEN Jia |
(School of Information and Computer Science, Anhui Agricultural University, Hefei 230036;Modern Educational Technology Center of Anhui Agricultural University, Hefei 230036;High-standard clustering Service Center Department of China Telecom Co., Ltd., Wuhu 241003) |
Abstract: |
In order to search the required books from a tremendous amount of resources immediately for authors, we tried to use the method of MapReduce for dealing the process of block data, combining the algorithm of Apriori, and applying data mining technology to the library management system. But the method referring to above need scan database many times and emerge a large number of candidate set, which will produce tremendous challenge to the speed with Hadoop processing method. Thus, compared to the tradition method of Apriori, there is a new method based on the memory and RDD to compute in Spark platform to recommending books for readers and guiding their borrowing behavior. |
Key words: apriori associative rules spark platform the borrow behavior model of books frequent itemset |