[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fQORdQhs_DPOTLvZDtBbFcBV9zHc_YFc1wIg5WF6Bl-g":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":11,"question":14,"related":15,"source":25,"type":35},[],"2025-05-11 08:18:03",1060768923,[8,9,10],"-1","-2","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F0667569d70d702a708ffd70eafae0159.png\">",{"courseImg":12,"courseName":13},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","一个MDP问题中有A,B,C这三个状态,智能体可以执行的动作是向右(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F314a42688fdce41a09ed9f49b8584a7e.png\">),转移模型如下.我们据此完成无限次迭代的Q-learning.若衰减因子为1,学习率为1,则\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F1689b9d180a8ea9f0638df278b32f729.png\">( )\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F03a647b1f55e3d7f0768ba11068dbf8f.png\">",[16,27,36,39,48,56,61,70,79,84],{"answer":17,"createTime":5,"id":18,"options":19,"question":24,"source":25,"type":26},[],1060768783,[20,21,22,23],"BFS","DFS","UCS","无","若一搜索树的树高有限且所有单步损耗均非负,则为每条边增加一正损耗c&gt;0,以下树搜索算法中( )所得搜索路径保持不变","v2",1,{"answer":28,"createTime":5,"id":29,"options":30,"question":34,"source":25,"type":35},[],1060768829,[31,32,33],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F97b167f3818a90dea33605a6ed34d7a7.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcedeec654add2b9a6a5a787694ce6f00.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fb6e7c89a3f5b337c14d00444d8e0b40d.png\">","使用\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F29311cf92ac2797226068a7e6ae0bde8.png\">-贪心Q-learning算法得到的最优策略是( )",0,{"answer":37,"createTime":5,"id":6,"options":38,"question":14,"source":25,"type":35},[],[8,9,10],{"answer":40,"createTime":5,"id":41,"options":42,"question":47,"source":25,"type":26},[],1060769090,[43,44,45,46],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F9b90370e5ec69b2b59be48507b6e3572.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F1d4d44977437618fde6664aceef8a95d.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Ff33327ccda535f9d90f8b9f6c47ef6d7.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F13d08f20bd847d79a33137bb55671741.png\">","下列公式正确的有( )",{"answer":49,"createTime":5,"id":50,"options":51,"question":54,"source":25,"type":55},[],1060769135,[52,53],"对","错","基于模型的强化学习涉及纯离线计算,而模型无关的强化学习需要与环境进行在线交互.( )",3,{"answer":57,"createTime":5,"id":58,"options":59,"question":60,"source":25,"type":55},[],1060769164,[52,53],"广度优先搜索可以找到步数最短的搜索路径,并且能保证路径的代价最小.( )",{"answer":62,"createTime":5,"id":63,"options":64,"question":69,"source":25,"type":26},[],1060769641,[65,66,67,68],"值迭代方法","状态迭代方法","策略迭代方法","回报迭代方法","在有模型的强化学习中,属于动态规划求解的是( )",{"answer":71,"createTime":5,"id":72,"options":73,"question":78,"source":25,"type":35},[],1060769751,[74,75,76,77],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F337926b18a7ceaabdfad5b2639b7f157.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F4380e14a56df3bb7de25cefb3358a2f9.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fc960c31c4270d294fcb0f674bb6fc0af.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5f869f433b69cf3430fc9bb56d268ccd.jpg\">","在强化学习值函数近似中,蒙特卡罗方法对参数的更新公式是( )",{"answer":80,"createTime":5,"id":81,"options":82,"question":83,"source":25,"type":55},[],1060769796,[52,53],"贪心搜索算法一定能找到最优解,因为它总是朝着离目标状态靠近的方向生成和扩展节点.( )",{"answer":85,"createTime":5,"id":86,"options":87,"question":92,"source":25,"type":26},[],1060770085,[88,89,90,91],"宽度优先搜索的特点是先生成的节点先扩展","深度优先搜索的特点是先生成的节点先扩展","深度优先搜索的特点是先扩展最新产生的节点","宽度优先搜索的特点是先扩展最新产生的节点","宽度优先搜索与深度优先搜索有何区别是( )"]