[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fHY1Oho1sgCgR0JtLiBDbn8z7cjOB-ums9L-ei8Xs_t0":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":11,"question":14,"related":15,"source":23,"type":43},[],"2024-11-25 08:15:23",999757200,[8,9,10],"在基于模型的强化学习中,我们已知明确的转移模型和奖励模型,而模型无关的强化学习中这两部分是未知的","在基于模型的强化学习中我们做探索,而在模型无关的强化学习中则没有","基于模型的强化学习是异策略的,模型无关的强化学习是同策略的",{"courseImg":12,"courseName":13},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","以下说法正确的是( )",[16,25,35,44,53,61,70,75,84,93],{"answer":17,"createTime":5,"id":18,"options":19,"question":22,"source":23,"type":24},[],999757106,[20,21],"对","错","深度优先搜索的空间复杂度更小,而广度优先算法的时间复杂度更小,而且更健壮.( )","v2",3,{"answer":26,"createTime":5,"id":27,"options":28,"question":33,"source":23,"type":34},[],999757107,[29,30,31,32],"BFS","DFS","UCS","无","若一搜索树的树高有限且所有单步损耗均非负,则为每条边增加一正损耗c&gt;0,以下树搜索算法中( )所得搜索路径保持不变",1,{"answer":36,"createTime":5,"id":37,"options":38,"question":42,"source":23,"type":43},[],999757110,[39,40,41],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F97b167f3818a90dea33605a6ed34d7a7.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcedeec654add2b9a6a5a787694ce6f00.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fb6e7c89a3f5b337c14d00444d8e0b40d.png\">","使用\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F29311cf92ac2797226068a7e6ae0bde8.png\">-贪心Q-learning算法得到的最优策略是( )",0,{"answer":45,"createTime":5,"id":46,"options":47,"question":52,"source":23,"type":43},[],999757113,[48,49,50,51],"存在着传递性,对于智能体如果A&gt;B, B&gt;C,则A&gt;C","存在有序性,若A&gt;B, B&gt;A,则A~B","行为的效用值不一定是最大化的","存在一个实值函数U,使得\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe3883e6d09bc4f75777862ec536d84d5.png\"> \u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F79bee72768ba32b40111de4601c48f69.png\">","理性的倾向选择不满足的条件是( )",{"answer":54,"createTime":5,"id":55,"options":56,"question":60,"source":23,"type":43},[],999757118,[57,58,59],"-1","-2","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F0667569d70d702a708ffd70eafae0159.png\">","一个MDP问题中有A,B,C这三个状态,智能体可以执行的动作是向右(\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F314a42688fdce41a09ed9f49b8584a7e.png\">),转移模型如下.我们据此完成无限次迭代的Q-learning.若衰减因子为1,学习率为1,则\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F1689b9d180a8ea9f0638df278b32f729.png\">( )\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F03a647b1f55e3d7f0768ba11068dbf8f.png\">",{"answer":62,"createTime":5,"id":63,"options":64,"question":69,"source":23,"type":34},[],999757157,[65,66,67,68],"值迭代方法","状态迭代方法","策略迭代方法","回报迭代方法","在有模型的强化学习中,属于动态规划求解的是( )",{"answer":71,"createTime":5,"id":72,"options":73,"question":74,"source":23,"type":24},[],999757162,[20,21],"似然权重、马尔可夫蒙特卡洛方法这样的随机近似技术,可以对网络的真实后验概率进行合理估计,并能够比精确算法处理规模大得多的网络.( )",{"answer":76,"createTime":5,"id":77,"options":78,"question":83,"source":23,"type":43},[],999757176,[79,80,81,82],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fef44ff31cc6dc1652dfe3220bd87286d.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F11859f378500544dcb5038a169f9e605.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F22bbccb46dfdcb434edd5e654fb95087.jpg\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F203751151e20e8c52e9b45acfdb1ea2f.jpg\">","在强化学习值函数近似中,时序差分方法对参数的更新公式是( )",{"answer":85,"createTime":5,"id":86,"options":87,"question":92,"source":23,"type":34},[],999757192,[88,89,90,91],"搜索过程中必须记住从目标返回的路径","是一种在图中寻找路径的方法","图的每个节点对应一个状态,每条连线对应一个操作符","搜索过程中必须记住哪些点走过了","下列关于图搜索策略说法正确的是( )",{"answer":94,"createTime":5,"id":95,"options":96,"question":101,"source":23,"type":34},[],999757198,[97,98,99,100],"降低内存消耗","计算更加复杂","减少对经验的需求","增加对采样的需求","强化学习中,泛化表示的特点有( )"]