As the real-world applications (image segmentation, speech recognition, machine translation, etc.) are increasingly adopting Deep Neural Networks (DNNs), DNN's vulnerabilities in a malicious environment have become an increasingly important research topic in adversarial machine learning. Adversarial machine learning (AML) focuses on exploring vulnerabilities and defensive techniques for machine learning models. Recent work has shown that most adversarial audio generation methods fail to consider audios' temporal dependency (TD) (i.e., adversarial audios exhibit weaker TD than benign audios). As a result, the adversarial audios are easily detectable by examining their TD. Therefore, one area of interest in the audio AML community is to develop a novel attack that evades a TD-based detection model. In this contribution, we revisit the LSTM model for audio transcription and propose a new audio attack algorithm that evades the TD-based detection by explicitly controlling the TD in generated adversarial audios. The experimental results show that the detectability of our adversarial audio is significantly reduced compared to the state-of-the-art audio attack algorithms. Furthermore, experiments also show that our adversarial audios remain nearly indistinguishable from benign audios with only negligible perturbation magnitude.