Autism spectrum disorder (ASD) affects 1 in 59 children in the US and costs the US economy $66 billion annually. The Center for Disease Control and Prevention (CDC) has collected a large set of EHR as part of surveillance in the US. In Arizona, the dataset contains 4480 EHR with 10 million free text tokens over ten years. It contains detailed descriptions of children with ASD-like behaviors. While the knowledge about and the diagnostic criteria of ASD have evolved, the data collected from earlier years have not been re-evaluated. To more efficiently leverage this data and uncover causes for the increase in ASD prevalence observed in epidemiological surveillance, we use Latent Dirichlet Allocation (LDA) to analyze the content of the text data automatically. Preliminary results suggest LDA can model topics in EHR content and show variations in content that are consistent with changes in the data collection effort.