Millimeter-wave (mmW) spectrum is a major candidate to support the high data rates of 5G systems. However, due to directionality of mmW communication systems, misalignments between the transmit and receive beams occur frequently, making link maintenance particularly challenging and motivating the need for fast and efficient beam tracking. In this paper, we propose a multi-armed bandit framework, called MAMBA, for beam tracking in mmW systems. We develop a reinforcement learning algorithm, called adaptive Thompson sampling (ATS), that MAMBA embodies for the selection of appropriate beams and transmission rates along these beams. ATS uses prior beam-quality information collected through the initial access and updates it whenever an ACK/NACK feedback is obtained from the user. The beam and the rate to be used during next downlink transmission are then selected based on the updated posterior distributions. Due to its model-free nature, ATS can accurately estimate the best beam/rate pair, without making assumptions regarding the temporal channel and/or user mobility. We conduct extensive experiments over the 28 GHz band using a 4x8 phased- array antenna to validate the efficiency of ATS, and show that it improves the link throughput by up to 182%, compared to the beam management scheme proposed for 5G.