[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fiPQHPSJnoM_K4eWcBN8v88GBMtS5p84B-l7t3GhCrDo":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":10,"question":13,"related":14,"source":24,"type":57},[],"2025-05-11 08:24:24",1060770964,[8,9],"对","错",{"courseImg":11,"courseName":12},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","在alpha-beta剪枝算法中,对于MAX节点,当它的效用值比当前的alpha低时可以进行剪枝.( )",[15,26,36,45,54,58,67,76,85,94],{"answer":16,"createTime":5,"id":17,"options":18,"question":23,"source":24,"type":25},[],1060769551,[19,20,21,22],"h(x)&le;h*(x)","h(x)&ne;h*(x)","h(x)&ge;h*(x)","h(x)&gt;h*(x)","依据估价函数f(x)=g(x)+h(x) (其中g(x)为初始节点到节点x已实际付出的代价,h(x)是节点x到目标节点的最优路径的估计代价)对OPEN表中的节点进行排序,并且要求启发函数满足( ),则称这种状态空间图的搜索算法为A*算法","v2",0,{"answer":27,"createTime":5,"id":28,"options":29,"question":34,"source":24,"type":35},[],1060769566,[30,31,32,33],"蒙特卡洛法","梯度下降法","牛顿法","时序差分法","强化学习的近似的策略评估中,计算真值的方法有( )",1,{"answer":37,"createTime":5,"id":38,"options":39,"question":44,"source":24,"type":25},[],1060769817,[40,41,42,43],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fef44ff31cc6dc1652dfe3220bd87286d.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F11859f378500544dcb5038a169f9e605.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F22bbccb46dfdcb434edd5e654fb95087.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F203751151e20e8c52e9b45acfdb1ea2f.jpg\">","在强化学习值函数近似中,时序差分方法对参数的更新公式是( )",{"answer":46,"createTime":5,"id":47,"options":48,"question":53,"source":24,"type":35},[],1060770584,[49,50,51,52],"从随机初始值开始的值迭代能收敛到\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Feb83dde30e38dd8e2deec363d6351a90.png\">,其中\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe82100bc61ebe5e50da86f6d69771a87.png\">是最优策略","Q-learning采用对最优动作价值函数的近似作为学习目标,与行动策略无关,是off-policy的","当具有确定性转移模型时,Q-learning不需要探索就能收敛到最优策略","在MDP问题中,一个较大的衰减因子(接近1)意味着代理更重视长期回报","下列关于MDP和RL的说法中,正确的有( )",{"answer":55,"createTime":5,"id":6,"options":56,"question":13,"source":24,"type":57},[],[8,9],3,{"answer":59,"createTime":5,"id":60,"options":61,"question":66,"source":24,"type":25},[],1060771340,[62,63,64,65],"O(1)","O(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5d860c080d2fcc6eda26b5d5e1923198.png\">)","O(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F097d5da3cee5a216801e06108e13a10d.png\">)","O(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F990554dd6b8fa36eb1b1ce788547a4e4.png\">)","如果回溯搜索算法运行弧相容检查并应用MRV和LCV来选择变量和值,那么回溯算法可能需要回溯的最大次数是多少?( )",{"answer":68,"createTime":5,"id":69,"options":70,"question":75,"source":24,"type":35},[],1060772253,[71,72,73,74],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F7c25348ca03061b47c343421cb77300c.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F2547f9854c2507259552ba69a9dfecf4.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F2d8a1ecf160a1cd7cb65c8ef813a949a.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F4978dff94f32aeec97f23ad87e2868fa.png\">","如果x,y在z的条件下相互独立,则下列公式正确的有( )",{"answer":77,"createTime":5,"id":78,"options":79,"question":84,"source":24,"type":25},[],1060772455,[80,81,82,83],"假设马尔可夫决策问题(MDP)的状态是有限的,则对于\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F48d35d2a0e26d154fdc0d252397f4ab9.png\">,如果我们只改变奖励函数R,最优策略会保持不变","假设马尔可夫决策问题(MDP)的状态是有限的,若衰减因子\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fa1b2ddee715b730a9ff177ab1b2358ea.png\">满足\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F6d93e4897668e26e38f5cb3b8fcae955.png\">,则值迭代一定会收敛","假设马尔可夫决策问题(MDP)的状态是有限的,通过值迭代找到的策略优于通过策略迭代找到的策略","如果两个MDP之间的唯一差异是衰减因子的值,那么它们一定拥有相同的最优策略","下列关于马尔可夫决策问题(MDP)的说法中,正确的是( )",{"answer":86,"createTime":5,"id":87,"options":88,"question":93,"source":24,"type":35},[],1060772489,[89,90,91,92],"如果初始概率为P(V0 = a) = P(V0 = b) = P(V0 = c) = 1\u002F3,则最终不动点为P(Vn = a) = P(Vn = b) = P(Vn = c) = 1\u002F3","如果初始概率为P(V0 = a) = P(V0 = b) = P(V0 = c) = 1\u002F3,则最终不动点为P(Vn = a) = P(Vn = b) =1\u002F4, P(Vn = c) = 1\u002F2","如果初始概率为P(V0 = b) = 1.0,则最终不动点为P(Vn = a) = P(Vn = b) = P(Vn = c) = 1\u002F3","如果初始概率为P(V0 = b) = 1.0,则最终不动点为P(Vn = a) = P(Vn = b) =1\u002F2, P(Vn = c) = 0","对于如下的马尔可夫模型以及对应的转移概率,下列说法正确的有( ).\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F40c5f342e02caeb9582bad624e91aa71.png\">\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F2b3df65e315e5a36c1a83237b5603d4d.png\">",{"answer":95,"createTime":5,"id":96,"options":97,"question":102,"source":24,"type":25},[],1060772595,[98,99,100,101],"0","10%","20%","80%","在如下图所示的不确定性的网格游戏中,智能体的行动和结果是不确定的,智能体有80%的概率按计划行动,去向动作方向;有20%的概率去向与预期方向垂直的方向.则如图当智能体处于位置(3,1),执行向上的动作后,出现在(2,1)位置的概率是( )\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F6df9431af351a5d03b6a0d67d09f3f67.png\">"]