Feature selection is an important step in data analysis to address the curse of dimensionality. Such dimensionality reduction techniques are particularly important when if a classification is required and the model scales in polynomial time with the size of the feature (e.g., some applications include genomics, life sciences, cyber-security, etc.). Feature selection is the process of finding the minimum subset of features that allows for the maximum predictive power. Many of the state-of-the-art information-theoretic feature selection approaches use a greedy forward search; however, there are concerns with the search in regards to the efficiency and optimality. A unified framework was recently presented for information-theoretic feature selection that tied together many of the works in over the past twenty years. The work showed that joint mutual information maximization (JMI) is generally the best options; however, the complexity of greedy search for JMI scales quadratically and it is infeasible on high dimensional datasets. In this contribution, we propose a fast approximation of JMI based on information theory. Our approach takes advantage of decomposing the calculations within JMI to speed up a typical greedy search. We benchmarked the proposed approach against JMI on several UCI datasets, and we demonstrate that the proposed approach returns feature sets that are highly consistent with JMI, while decreasing the run time required to perform feature selection.