Multi-agent decision problems in unknown en- vironments are common where the agents are usually empowered with di®erent decision pow- ers and involved in some sort of the prisoner's dilemma problem. A general solution to this kind of complex decision problem is that the agents cooperate to play a joint action. Asym- metric Nash bargaining solution is an attractive approach to such cooperative games with players of di®erent powers. In this paper, a new multi- agent learning algorithm based on the asymmet- ric Nash bargaining solution is presented. Sim- ulation is performed on a testbed of stochastic games. The experimental results demonstrate that the algorithm is fast and converges to a Pareto-optimal solution. Compared with the learning algorithms based on non-cooperative equilibrium, this approach is faster and avoids the disturbing problem of equilibrium selection.