The use of highly directional antennas in millimeter wave (mmWave) cellular networks necessitates precise beam alignment between a base station (BS) and a user equipment (UE), which requires beam sweeping over a large number of directions and causes high initial access (IA) delay. Intuitively, such delay can be lowered by using wider beams, as fewer directions need to be swept. However, this results in a weak received signal and higher misdetection probability, which in turn increases the IA delay as more rounds of beam sweeping would be required to discover a UE. In this paper, we propose a multi-armed bandit approach for beamwidth optimization in 5G New Radio (NR) mmWave cellular networks. We aim to find the optimal beamwidths at the BS and the UE that minimize the beam sweeping delay for a successful IA. We first formulate the beamwidth optimization problem based on analyzing the interplay among beamwidth, beam sweeping overhead, and misdetection probability. Then, we propose a two-stage solution framework based on a multi-armed bandit approach. In the first stage, an initial solution of the BS beamwidth and the optimal solution of UE beamwidth are derived. In the second stage, each BS learns its optimal beamwidth by solving a multi-armed bandit problem with a Thompson sampling-based algorithm. Our extensive simulation results show that, the proposed algorithms can decrease the IA delay by more than 50% compared to the traditional fixed-beamwidth schemes.