[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$f3PdRNGu-Wt7TqkHmdcZQ22rXKp3H-ROJn0hzT40058Y":3},{"answer":4,"createTime":5,"id":6,"options":7,"origin":12,"question":18,"related":19,"source":29,"type":30},[],"2024-05-09 08:56:22",986807594,[8,9,10,11],"评估学习方式、有标注信息学习方式、端到端学习方式","有标注信息学习方式、端到端学习方式、端到端学习方式","评估学习方式、端到端学习方式、端到端学习方式","无标注学习、有标注信息学习方式、端到端学习方式",{"courseId":13,"courseImg":14,"courseName":15,"workId":16,"workName":17},"1000076607","https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002Fd7ea7086329261ddbccb0e9e00c955b1.jpg","[智慧共享课]人工智能引论","57250204","第七章单元测试","下面对强化学习、监督学习和深度卷积神经网络学习的描述正确的是( )",[20,31,40,49,58],{"answer":21,"createTime":5,"id":22,"options":23,"question":28,"source":29,"type":30},[],986807312,[24,25,26,27],"策略优化","价值函数","动作-价值函数","采样函数","在本章内容范围内,&quot;在状态\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">,按照某个策略行动后在未来所获得回报值的期望&quot;,这句话描述了状态\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">的( B );&quot;在状态\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">,按照某个策略采取动作\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">后在未来所获得回报值的期望&quot;,这句话描述了状态\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">的( )","v2",0,{"answer":32,"createTime":5,"id":33,"options":34,"question":39,"source":29,"type":30},[],986807439,[35,36,37,38],"反馈","动作","终止状态","概率转移矩阵","与马尔可夫奖励过程相比,马尔可夫决策过程引入了哪一个新的元素( )",{"answer":41,"createTime":5,"id":42,"options":43,"question":48,"source":29,"type":30},[],986807486,[44,45,46,47],"\u003Cimg src=\"https:\u002F\u002Ftihai-oss-cloud.itihey.com\u002Fimg\u002F5e543256c480ac577d30f76f9120eb74.webp\">贪心策略","蒙特卡洛采样","动态规划","贝尔曼方程","在强化学习中,哪个机制的引入使得强化学习具备了在利用与探索中寻求平衡的能力( )",{"answer":50,"createTime":5,"id":51,"options":52,"question":57,"source":29,"type":30},[],986807542,[53,54,55,56],"价值函数计算与动作-价值函数计算","动态规划与Q-Learning","贪心策略优化与Q-learning","策略优化与策略评估","在强化学习中,通过哪两个步骤的迭代,来学习得到最佳策略( )",{"answer":59,"createTime":5,"id":6,"options":60,"question":18,"source":29,"type":30},[],[8,9,10,11]]