@article{65dc755e2d8c47139bf0828cfce87d60,
title = "Machine learning techniques for chemical and type analysis of ocean oil samples via handheld spectrophotometer device",
abstract = "We designed and constructed a handheld, sturdy fluorescence spectrometry device for identifying samples from ocean oil spills. Two large training databases of autofluorescence spectra from raw oil samples (538 samples/1614 spectra and 767 samples/2301 spectra) were cross validated using support vector machine (SVM) to identify oil type and SARA (saturate, aromatic, resin, and asphaltene) contents. The device's performance was then validated on an independent set of 79 ocean oil samples, which were added to and then collected from ocean water during outdoor exposure to hot, humid weather to represent an actual oil spill. It successfully classified oil types with 92%–100% sensitivity and specificity and F1 scores of 85.7–100%. Further classification of light fuel oils into marine gas oil (MGO)-like and Bunker A (BA)-like categories was successful with the training set (raw oil samples), while less successful with the independent validation set (ocean oil samples). SARA content classification models performed well in training for the saturate (80.8% accuracy) and asphaltene (90.7%) contents. The developed training model was validated using ocean oil samples, and the resulting accuracies were 62.0% (saturate) and 93.7% (asphaltene). These results indicate the difficulties in classifying volatile light fuel oils with a low molecular weight that have experienced weathering effects, while high molecular weight compounds and general oil type can be analyzed.",
keywords = "Asphaltene, Fluorescence spectroscopy, Oil spill, Saturate, Support vector machine",
author = "Katelyn Sosnowski and Andrew Loh and Zubler, {Alanna V.} and Hasina Shir and Ha, {Sung Yong} and Yim, {Un Hyuk} and Yoon, {Jeong Yeol}",
note = "Funding Information: In particular, low-cost, portable spectroscopy devices would be ideal for on-site characterization of oil samples so that samples do not need to be sent back to the laboratory for testing. Furthermore, machine learning methods for classification and regression analyses have the potential to transform portable, low-cost sensors by giving them robust classification and prediction capabilities. For example, a new sample's type and properties that have not yet been tested can be quickly identified after the models are trained on large databases of similar data. In addition, machine learning-based classification can analyze complex samples (such as oil spill samples) through a fingerprinting approach that considers numerous variables all at once. Previously, our group demonstrated a handheld prototype device that uses spectroscopy to identify the type of oil in a sample based on its autofluorescence under UV excitation (Bills et al., 2020). This prototype device was trained using principal component analysis (PCA) for feature reduction and support vector machine (SVM) for classification. All of these features were embedded into the Raspberry Pi-powered device, thus providing a promising platform for automatic data analysis and display of the result to the user. We have significantly improved the device hardware to withstand on-site conditions in the current work. This sturdy device also has improved capabilities for predicting saturate and asphaltene contents from samples and sub-classifying the light fuel oils into two additional categories (marine gas oil and Bunker A). Rather than identifying crude oil, which can be present in large publicized spills, it focuses on predicting oil types that are under-investigated yet often present in smaller oil spills, such as fuel and lubricant oils (Shankar et al., 2020). Unlike in our previous work, the new algorithms were validated on an independent set of ocean oil samples created using ocean water off the coast of Geoje-si near Busan, Republic of Korea, under exposure to hot, humid weather. In addition, SARA content estimation has been attempted using SVM, with significantly improved accuracy compared to linear regression models performed previously. These measures ensure that our device is ready to be used for rapid, on-site analysis of mysterious oil spills or leakages without requiring complex laboratory equipment so that effective cleanup strategies can be implemented as quickly as possible. The device is handheld and sturdy with numerous user-friendly features. It also alerts about samples that may be contaminated, improperly diluted, or otherwise challenging to classify. The overall procedure of this machine learning-based classification is graphically illustrated in Fig. 1. The use of this new device is also shown in Supplementary Video S1.K.S. acknowledges the Computational and Mathematical Modeling of Biomedical Systems Training Grant from the National Institute of General Medical Sciences (NIGMS), U.S. National Institutes of Health (NIH), grant number GM132008. This research was a part of the project titled “Development of Advanced Oil Fingerprinting System (PN67490)” funded by the Korea Coast Guard, Republic of Korea. The authors also acknowledge Monitoring of Source and Behavior of the Particulate Matter at Busan Seaport Area (NRF-2019 2019M1A2A210395512) funded by the Ministry of Education through the National Research Foundation, Korea. Finally, the authors wish to thank Alexander S. Day at the University of Arizona for computational methodology guidance and assistance with using MIT App Inventor, Kenneth Schackart at The University of Arizona for computational methodology guidance, and Dr. Sangsik Kim at the University of Arizona for helping with data collection. Funding Information: K.S. acknowledges the Computational and Mathematical Modeling of Biomedical Systems Training Grant from the National Institute of General Medical Sciences (NIGMS) , U.S. National Institutes of Health (NIH) , grant number GM132008 . This research was a part of the project titled “Development of Advanced Oil Fingerprinting System (PN67490)” funded by the Korea Coast Guard , Republic of Korea. The authors also acknowledge Monitoring of Source and Behavior of the Particulate Matter at Busan Seaport Area ( NRF-2019 2019M1A2A210395512 ) funded by the Ministry of Education through the National Research Foundation , Korea. Finally, the authors wish to thank Alexander S. Day at the University of Arizona for computational methodology guidance and assistance with using MIT App Inventor, Kenneth Schackart at The University of Arizona for computational methodology guidance, and Dr. Sangsik Kim at the University of Arizona for helping with data collection. Publisher Copyright: {\textcopyright} 2022 The Authors",
year = "2022",
month = may,
doi = "10.1016/j.biosx.2022.100128",
language = "English (US)",
volume = "10",
journal = "Biosensors and Bioelectronics: X",
issn = "2590-1370",
publisher = "Elsevier Ltd",
}