[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$f33VYCqQ7lm6sLxQ9Nt2PnOOnXeXWeidoOt8_dIyrGGc":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":12,"question":15,"related":16,"source":20,"type":21},[],"2025-05-11 08:21:53",1060768724,[8,9,10,11],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F2805907a1e7b9a0547b332877297e4ae.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F8a95a2284af97c60bbac298893a22bf8.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F4ba696f660ebea22ad4fedd4feffe342.png\">","\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fe00b40e10d3a0eef658f14bc64d5a6c0.png\">",{"courseImg":13,"courseName":14},"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","Q-learning的一个推广假设MDP问题的状态空间为S,动作空间为A,奖励函数为R(s, a, s'),衰减因子为\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F6494510fd9def2a2b5ff2ece65f0aa59.png\">.我们的最终目标是学习一种机器人可以在现实世界中使用的策略.然而我们只能获得模拟软件的数据而非真实机器人的数据.该模拟软件是根据转移模型\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F11957ae58492f97996dbe380ad9ef63e.png\">建立的,该模型与真实机器人转移模型\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fd445a3eb7e7399e60d0b14c9be70fd97.png\">不同.在不改变仿真模拟软件的情况下,我们希望使用从模拟器中提取的样本来学习我们的真实机器人的q值.Q-learning的更新公式可以写为:\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F064d3e5c1d221745132312e5fea65740.png\">假设样本是从仿真模拟软件中抽取,则可以学到真实世界Q值得q值更新函数为:( )",[17,22,32,41,46,51,60,69,78,87],{"answer":18,"createTime":5,"id":6,"options":19,"question":15,"source":20,"type":21},[],[8,9,10,11],"v2",0,{"answer":23,"createTime":5,"id":24,"options":25,"question":30,"source":20,"type":31},[],1060769606,[26,27,28,29],"h(x)是从节点x到目标节点的最优路径的估计代价","h(x)是从节点x到目标节点的实际代价","g(x)是从初始节点到节点x的实际代价","g(x)是从初始节点到节点x的最优路径的估计代价","在估价函数中,对于g(x)和h(x) 下面描述正确的是( )",1,{"answer":33,"createTime":34,"id":35,"options":36,"question":39,"source":20,"type":40},[],"2025-05-11 08:21:54",1060769704,[37,38],"对","错","似然权重、马尔可夫蒙特卡洛方法这样的随机近似技术,可以对网络的真实后验概率进行合理估计,并能够比精确算法处理规模大得多的网络.( )",3,{"answer":42,"createTime":5,"id":43,"options":44,"question":45,"source":20,"type":40},[],1060770838,[37,38],"在状态变量很多时,可以采用粒子滤波这种有效的精确推理算法.( )",{"answer":47,"createTime":34,"id":48,"options":49,"question":50,"source":20,"type":40},[],1060770859,[37,38],"取值为负数的生存奖励总可以用小于1的衰减因子表示.( )",{"answer":52,"createTime":5,"id":53,"options":54,"question":59,"source":20,"type":31},[],1060770879,[55,56,57,58],"描述状态演变的转移模型","描述观察过程的传感器模型","观察变量分布的概率模型","状态变量的联合分布概率","时序概率模型包含了( )",{"answer":61,"createTime":5,"id":62,"options":63,"question":68,"source":20,"type":21},[],1060770925,[64,65,66,67],"有向有环图","有向无环图","无向有环图","无向无环图","贝叶斯网络是( )",{"answer":70,"createTime":5,"id":71,"options":72,"question":77,"source":20,"type":21},[],1060771314,[73,74,75,76],"直接采样先采样父节点,再采样子节点变量","拒绝采样适合计算条件概率,它会在生成过程中拒绝与证据变量不一致的样本","似然权重会固定证据变量,并以非证据变量给定父节点后的条件概率乘积确定权重大小","吉布斯采样是特殊形式的马尔可夫链蒙特卡洛算法","下面关于采样算法说法错误的是( )",{"answer":79,"createTime":34,"id":80,"options":81,"question":86,"source":20,"type":31},[],1060771432,[82,83,84,85],"删除Y并不会改变其他未观察变量的后验概率","如果Y没有子节点,那么删除Y并不会改变其他变量的后验概率.否则,删去Y会影响Y的后续节点的后验概率","删去Y后,仍然可以使用拒绝采样","删去Y后,仍然可以使用似然权重法","Y是贝叶斯网络的一个未观察变量,而Y的马尔可夫覆盖MB(Y)都被观察到,下列说法正确的有( )",{"answer":88,"createTime":5,"id":89,"options":90,"question":95,"source":20,"type":21},[],1060771493,[91,92,93,94],"包含G的初始因子是P(G|B,C),维度为3,有8个元素","第一个消元变量是B时,可以产生维度最大的因子f(A,F,G,C)","为了使第一个生成的因子维度最小,我们可以先消D或者F","F,B,C,G,A是最佳消元顺序之一","如图所示的贝叶斯网络,每个变量取值范围都是{-1,0,1},目标概率是P(D|e=0),下列说法正确的是( ).\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F17b45af3a92960fbbfe74cd168b50908.png\">"]