sarsa

网络  撒尔沙; 沙士; 洋菝葜

医学



双语例句

  1. SARSA ( λ) Algorithm of Reinforcement Learning Basd on States Clustering
    一种基于状态聚类的SARSA(λ)强化学习算法
  2. The learning of this method is divided into two processes, state space learning using K-means clustering algorithm for adaptive discretization of continuous states and policy learning using Sarsa algorithm for finding optimal policy.
    该方法的学习过程分为两部分:对连续状态空间进行自适应离散化的状态空间学习,使用K-均值聚类算法;寻找最优策略的策略学习,使用替代合适迹Sarsa学习算法。
  3. Reinforcement Learning and two classes of learning algorithms is introduced. A class of the state discretization based on RBF function for the Reinforcement Learning is proposed and preliminary empirical results are presented to compare the performance of the new method.
    介绍了激励学习和两类学习算法:Q学习和SARSA学习,提出一类基于RBF函数的特征状态离散化方法,并对该方法进行了初步的实验比较。
  4. Sarsa Reinforcement Learning Algorithm Based on Neural Networks
    基于神经网络的Sarsa强化学习算法
  5. Based on eligibility trace theory, a delayed fast reinforcement learning algorithm DFSARSA(λ) is proposed in this paper.
    在对资格迹理论研究的基础上,提出了一种延迟快速强化学习算法DFSARSA(λ)(延迟快速SARSA(λ)算法)。
  6. Based on the factored representation of a state, a new SARSA ( λ) algorithm is proposed.
    基于状态的因素化表达,提出了一个新的SARSA(λ)激励学习算法。
  7. Also the policy learned by Actor-Critic is better than that learned by Sarsa(λ), a value-based reinforcement method on the condition that the players have 360 view and the problem itself is not so large.
    对于小的问题,球员在360度视角下,通过Actor-Critic强化学习方法得到的策略比基于值函数强化学习方法Sarsa(λ)得到的策略要好。
  8. The conventional reinforcement method such as Q-learning, TD learning or Sarsa learning has a common characteristic of estimating the value function only and action selection is determined by the value function estimation completely.
    强化学习中常用算法如Q-学习、TD学习、Sarsa学习的一个共同特点是仅对值函数进行估计,动作选择策略则由值函数的估计完全确定。