[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$faCQw2R-ygd-4zlSL7ra58yeJvQSV29soiSv0FA_wT58":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":12,"question":15,"related":16,"source":24,"type":44},[],"2024-11-25 08:15:23",999757176,[8,9,10,11],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fef44ff31cc6dc1652dfe3220bd87286d.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F11859f378500544dcb5038a169f9e605.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F22bbccb46dfdcb434edd5e654fb95087.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F203751151e20e8c52e9b45acfdb1ea2f.jpg\">",{"courseImg":13,"courseName":14},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","在强化学习值函数近似中,时序差分方法对参数的更新公式是( )",[17,26,36,45,54,62,71,76,79,88],{"answer":18,"createTime":5,"id":19,"options":20,"question":23,"source":24,"type":25},[],999757106,[21,22],"对","错","深度优先搜索的空间复杂度更小,而广度优先算法的时间复杂度更小,而且更健壮.( )","v2",3,{"answer":27,"createTime":5,"id":28,"options":29,"question":34,"source":24,"type":35},[],999757107,[30,31,32,33],"BFS","DFS","UCS","无","若一搜索树的树高有限且所有单步损耗均非负,则为每条边增加一正损耗c&gt;0,以下树搜索算法中( )所得搜索路径保持不变",1,{"answer":37,"createTime":5,"id":38,"options":39,"question":43,"source":24,"type":44},[],999757110,[40,41,42],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F97b167f3818a90dea33605a6ed34d7a7.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcedeec654add2b9a6a5a787694ce6f00.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fb6e7c89a3f5b337c14d00444d8e0b40d.png\">","使用\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F29311cf92ac2797226068a7e6ae0bde8.png\">-贪心Q-learning算法得到的最优策略是( )",0,{"answer":46,"createTime":5,"id":47,"options":48,"question":53,"source":24,"type":44},[],999757113,[49,50,51,52],"存在着传递性,对于智能体如果A&gt;B, B&gt;C,则A&gt;C","存在有序性,若A&gt;B, B&gt;A,则A~B","行为的效用值不一定是最大化的","存在一个实值函数U,使得\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe3883e6d09bc4f75777862ec536d84d5.png\"> \u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F79bee72768ba32b40111de4601c48f69.png\">","理性的倾向选择不满足的条件是( )",{"answer":55,"createTime":5,"id":56,"options":57,"question":61,"source":24,"type":44},[],999757118,[58,59,60],"-1","-2","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F0667569d70d702a708ffd70eafae0159.png\">","一个MDP问题中有A,B,C这三个状态,智能体可以执行的动作是向右(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F314a42688fdce41a09ed9f49b8584a7e.png\">),转移模型如下.我们据此完成无限次迭代的Q-learning.若衰减因子为1,学习率为1,则\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F1689b9d180a8ea9f0638df278b32f729.png\">( )\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F03a647b1f55e3d7f0768ba11068dbf8f.png\">",{"answer":63,"createTime":5,"id":64,"options":65,"question":70,"source":24,"type":35},[],999757157,[66,67,68,69],"值迭代方法","状态迭代方法","策略迭代方法","回报迭代方法","在有模型的强化学习中,属于动态规划求解的是( )",{"answer":72,"createTime":5,"id":73,"options":74,"question":75,"source":24,"type":25},[],999757162,[21,22],"似然权重、马尔可夫蒙特卡洛方法这样的随机近似技术,可以对网络的真实后验概率进行合理估计,并能够比精确算法处理规模大得多的网络.( )",{"answer":77,"createTime":5,"id":6,"options":78,"question":15,"source":24,"type":44},[],[8,9,10,11],{"answer":80,"createTime":5,"id":81,"options":82,"question":87,"source":24,"type":35},[],999757192,[83,84,85,86],"搜索过程中必须记住从目标返回的路径","是一种在图中寻找路径的方法","图的每个节点对应一个状态,每条连线对应一个操作符","搜索过程中必须记住哪些点走过了","下列关于图搜索策略说法正确的是( )",{"answer":89,"createTime":5,"id":90,"options":91,"question":96,"source":24,"type":35},[],999757198,[92,93,94,95],"降低内存消耗","计算更加复杂","减少对经验的需求","增加对采样的需求","强化学习中,泛化表示的特点有( )"]