Bacteria and their viruses (phage) are fundamental drivers of many ecosystem processes including global biogeochemistry and horizontal gene transfer. While databases and resources for studying function in uncultured bacterial communities are relatively advanced, many fewer exist for their viral counterparts. The issue is largely technical in that the majority (often 90%) of viral sequences are functionally 'unknown' making viruses a virtually untapped resource of functional and physiological information. Here, we provide a community resource that organizes this unknown sequence space into 27 K high confidence protein clusters using 32 viral metagenomes from four biogeographic regions in the Pacific Ocean that vary by season, depth, and proximity to land, and include some of the first deep pelagic ocean viral metagenomes. These protein clusters more than double currently available viral protein clusters, including those from environmental datasets. Further, a protein cluster guided analysis of functional diversity revealed that richness decreased (i) from deep to surface waters, (ii) from winter to summer, (iii) and with distance from shore in surface waters only. These data provide a framework from which to draw on for future metadata-enabled functional inquiries of the vast viral unknown.
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)