Identifying Vulnerable GitHub Repositories and Users in Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach

Ben Lazarine, Sagar Samtani, Mark Patton, Hongyi Zhu, Steven Ullman, Benjamin Ampel, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

The scientific cyberinfrastructure community heavily relies on public internet-based systems (e.g., GitHub) to share resources and collaborate. GitHub is one of the most powerful and popular systems for open source collaboration that allows users to share and work on projects in a public space for accelerated development and deployment. Monitoring GitHub for exposed vulnerabilities can save financial cost and prevent misuse and attacks of cyberinfrastructure. Vulnerability scanners that can interface with GitHub directly can be leveraged to conduct such monitoring. This research aims to proactively identify vulnerable communities within scientific cyberinfrastructure. We use social network analysis to construct graphs representing the relationships amongst users and repositories. We leverage prevailing unsupervised graph embedding algorithms to generate graph embeddings that capture the network attributes and nodal features of our repository and user graphs. This enables the clustering of public cyberinfrastructure repositories and users that have similar network attributes and vulnerabilities. Results of this research find that major scientific cyberinfrastructures have vulnerabilities pertaining to secret leakage and insecure coding practices for high-impact genomics research. These results can help organizations address their vulnerable repositories and users in a targeted manner.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728188003
DOIs
StatePublished - Nov 9 2020
Event18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 - Virtual, Arlington, United States
Duration: Nov 9 2020Nov 10 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020

Conference

Conference18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
Country/TerritoryUnited States
CityVirtual, Arlington
Period11/9/2011/10/20

Keywords

  • GitHub
  • graph embedding
  • scientific cyberinfrastructure
  • vulnerability scanning

ASJC Scopus subject areas

  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'Identifying Vulnerable GitHub Repositories and Users in Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach'. Together they form a unique fingerprint.

Cite this