首页 ›存档› 技术 › 查看内容

AlphaGo核心部分的Python原生重现

2018-3-30 13:00 |来自: 互联网 343 0

摘要: 我相信这么好看的你已经置顶了我摘要转自：爱可可-爱生活This is a pure Python implementation of the essential parts of AlphaGo.The logic / control flow of AlphaGo itself is not very complic ...

我相信这么好看的你

已经置顶了我

摘要

转自：爱可可-爱生活

This is a pure Python implementation of the essential parts of AlphaGo.
The logic / control flow of AlphaGo itself is not very complicated and is replicated here. The secret sauce of AlphaGo is in its various neural networks.
(As I understand it) AlphaGo uses three neural networks during play. The first NN is a slow but accurate policy network. This network is trained to predict human moves (~57% accuracy), and it outputs a list of plausible moves, with probabilities attached to each move. This first NN is used to seed the Monte Carlo tree search with plausible moves. One of the reasons this first NN is slow is because of its size, and because the inputs to the neural network are various expensive-to-compute properties of the Go board (liberty counts; ataris; ladder status; etc.). The second NN is a smaller, faster but less accurate (~24% accuracy) policy network, and doesn't use computed properties as input. Once a leaf node of the current MCTS tree is reached, the second faster network is used to play the position out to the end with vaguely plausible moves, and score the end position. The third NN is a value network: it outputs an expected win margin for that board, without attempting to play anything out. The results of the monte carlo playout using NN #2 and the value calculation using NN #3 are averaged, and this value is recorded as the approximate result for that MCTS node.

Using the priors from NN #1 and the accumulating results of MCTS, a new path is chosen for further Monte Carlo exploration.

链接：
https://github.com/brilee/MuGo

原文链接：
http://weibo.com/1402400261/EpP8YmzoQ?from=page_1005051402400261_profile

声明：文章版权归原作者所有部分文章转自互联网如有侵权请联系 [邮箱地址] 删除

路过

雷人

握手

鲜花

鸡蛋

收藏分享邀请

上一篇：【Python量化投资】系列之SVR预测第二天开盘趋势和股价的正负统计分析（附代码）下一篇：详解Python使用模拟退火算法求解列表“最大值”

AlphaGo核心部分的Python原生重现

相关分类