[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$f-w6cIsMHVUgSqU4S8G1UdsodhuDtTqzbSTj9vOFZhhQ":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":10,"question":16,"related":17,"source":23,"type":24},[],"2024-10-25 12:41:08",999757523,[8,9],"对","错",{"courseId":11,"courseImg":12,"courseName":13,"workId":14,"workName":15},"1000000860","https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fcf3bb414b5ea2367f316b2d3561124c7.jpg","[共享课]人工智能","59023989","第五章单元测试","当在一个MDP中只执行有限数量的步骤时,最优策略是平稳的.平稳的策略是指在给定状态下采取相同操作的策略,与智能体处于该状态的时间无关.( )",[18,25,30,35,40],{"answer":19,"createTime":5,"id":20,"options":21,"question":22,"source":23,"type":24},[],999757507,[8,9],"假设马尔可夫决策问题(MDP)的状态是有限的,则对于\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5b0e53fe3fdd717a53e7667ea982209c.png\">,如果我们只改变奖励函数R,最优策略会保持不变.( )","v2",3,{"answer":26,"createTime":5,"id":27,"options":28,"question":29,"source":23,"type":24},[],999757513,[8,9],"假设马尔可夫决策问题(MDP)的状态是有限的,若衰减因子\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F22c8c0aa4b236c64c83dc83a0e71fe55.png\">满足\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F91aa50d39ebdc1676c11481226187853.png\">,则值迭代一定会收敛.( )",{"answer":31,"createTime":5,"id":32,"options":33,"question":34,"source":23,"type":24},[],999757514,[8,9],"假设马尔可夫决策问题(MDP)的状态是有限的,通过值迭代找到的策略优于通过策略迭代找到的策略.( )",{"answer":36,"createTime":5,"id":37,"options":38,"question":39,"source":23,"type":24},[],999757520,[8,9],"如果两个MDP之间的唯一差异是衰减因子的值,那么它们一定拥有相同的最优策略.( )",{"answer":41,"createTime":5,"id":6,"options":42,"question":16,"source":23,"type":24},[],[8,9]]