Empowered by the recent development in Ma-chine Learning (ML), signatureless ML-based malware detectors present promising performance in identifying unseen mal ware variants and zero days without requiring expensive dynamic malware analysis. However, it has been recently shown that ML-based malware detectors are vulnerable to adversarial malware attacks, in which an attacker modifies a known malware exe-cutable to trick the malware detector into recognizing the modi-fied variant as benign. Adversarial malware example generation has become an emerging area in adversarial ML that studies creating functionality-preserving adversarial malware variants. Advancements in this area have led to an eternal game between the adversary and defender. While the area has attracted much attention in the security community, a large body of these studies merely focuses on attack methods against ML-based malware detectors. There has been little work on understanding how these adversarial variants can be systematically used by the defender to strengthen the robustness of these detectors and stand ahead of the adversary. Latest efforts have led to emergence of adversarial learning. In this work, we propose a simple wargame approach to empirically conduct the adversarial minimax optimization underlying in the adversarial learning for improving the robustness of ML-based malware detectors. Our proposed approach employs adversarial malware variants generated from a reinforcement learning-based adversarial attack policy in a minimax game alternating between strengthening the attack policy and improving the detectors' robustness. We evaluated the effectiveness of our approach on a testbed with 33.2 GB working malware collected from VirusTotal. Despite the sub-optimal nature of our method, it was able to surprisingly enhance the robustness of three known open-source ML-based malware detectors (LGBM, MalConv, and NonNeg) against the adversarial malware variants by 4, 7, and 11 times, respectively.