Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in > or =2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.
View details for DOI 10.1073/pnas.0607274103
View details for Web of Science ID 000242249400053
View details for PubMedID 17065321
View details for PubMedCentralID PMC1622926