[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fPJOph5RFvZBOHzHHkU5vh6C-e3X53Il3inrmyq4RGi8":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":12,"question":15,"related":16,"source":26,"type":27},[],"2024-11-25 08:25:44",999757205,[8,9,10,11],"代价最小","深度最小","深度最大","代价最大",{"courseImg":13,"courseName":14},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","在等代价搜索算法中,总是选择( )的节点进行扩展",[17,28,38,46,55,60,69,72,81,90],{"answer":18,"createTime":5,"id":19,"options":20,"question":25,"source":26,"type":27},[],999757103,[21,22,23,24],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F2805907a1e7b9a0547b332877297e4ae.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F8a95a2284af97c60bbac298893a22bf8.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F4ba696f660ebea22ad4fedd4feffe342.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe00b40e10d3a0eef658f14bc64d5a6c0.png\">","Q-learning的一个推广假设MDP问题的状态空间为S,动作空间为A,奖励函数为R(s, a, s'),衰减因子为\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F6494510fd9def2a2b5ff2ece65f0aa59.png\">.我们的最终目标是学习一种机器人可以在现实世界中使用的策略.然而我们只能获得模拟软件的数据而非真实机器人的数据.该模拟软件是根据转移模型\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F11957ae58492f97996dbe380ad9ef63e.png\">建立的,该模型与真实机器人转移模型\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fd445a3eb7e7399e60d0b14c9be70fd97.png\">不同.在不改变仿真模拟软件的情况下,我们希望使用从模拟器中提取的样本来学习我们的真实机器人的q值.Q-learning的更新公式可以写为:\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F064d3e5c1d221745132312e5fea65740.png\">假设样本是从仿真模拟软件中抽取,则可以学到真实世界Q值得q值更新函数为:( )","v2",0,{"answer":29,"createTime":5,"id":30,"options":31,"question":36,"source":26,"type":37},[],999757116,[32,33,34,35],"BFS","DFS","UCS","无","若一搜索树的树高有限且所有单步损耗均非负,则为每条边的损耗乘上一正常数w&gt;0,以下树搜索算法中( )所得搜索路径保持不变",1,{"answer":39,"createTime":5,"id":40,"options":41,"question":44,"source":26,"type":45},[],999757128,[42,43],"对","错","基于模型的强化学习涉及纯离线计算,而模型无关的强化学习需要与环境进行在线交互.( )",3,{"answer":47,"createTime":5,"id":48,"options":49,"question":54,"source":26,"type":37},[],999757155,[50,51,52,53],"h(x)是从节点x到目标节点的最优路径的估计代价","h(x)是从节点x到目标节点的实际代价","g(x)是从初始节点到节点x的实际代价","g(x)是从初始节点到节点x的最优路径的估计代价","在估价函数中,对于g(x)和h(x) 下面描述正确的是( )",{"answer":56,"createTime":5,"id":57,"options":58,"question":59,"source":26,"type":45},[],999757175,[42,43],"贪心搜索算法一定能找到最优解,因为它总是朝着离目标状态靠近的方向生成和扩展节点.( )",{"answer":61,"createTime":5,"id":62,"options":63,"question":68,"source":26,"type":37},[],999757195,[64,65,66,67],"宽度优先搜索的特点是先生成的节点先扩展","深度优先搜索的特点是先生成的节点先扩展","深度优先搜索的特点是先扩展最新产生的节点","宽度优先搜索的特点是先扩展最新产生的节点","宽度优先搜索与深度优先搜索有何区别是( )",{"answer":70,"createTime":5,"id":6,"options":71,"question":15,"source":26,"type":27},[],[8,9,10,11],{"answer":73,"createTime":5,"id":74,"options":75,"question":80,"source":26,"type":37},[],999757207,[76,77,78,79],"从随机初始值开始的值迭代能收敛到\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Feb83dde30e38dd8e2deec363d6351a90.png\">,其中\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe82100bc61ebe5e50da86f6d69771a87.png\">是最优策略","Q-learning采用对最优动作价值函数的近似作为学习目标,与行动策略无关,是off-policy的","当具有确定性转移模型时,Q-learning不需要探索就能收敛到最优策略","在MDP问题中,一个较大的衰减因子(接近1)意味着代理更重视长期回报","下列关于MDP和RL的说法中,正确的有( )",{"answer":82,"createTime":5,"id":83,"options":84,"question":89,"source":26,"type":27},[],999757213,[85,86,87,88],"目标状态对应的动作路径消耗是一样的","约束满足问题存在最优解","在搜索时,回溯的原因是某些冲突导致搜索不能继续进行下去","前向检查是提前将不合理的值去掉的方法","关于约束满足问题,以下说法错误的是( )",{"answer":91,"createTime":5,"id":92,"options":93,"question":94,"source":26,"type":45},[],999757224,[42,43],"取值为负数的生存奖励总可以用小于1的衰减因子表示.( )"]