TY - GEN
T1 - Potential of Interpreter Specialization for Data Analysis
AU - He, Wei
AU - Strout, Michelle Mills
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Scientists frequently implement data analyses in high-level programming languages such as Python, Perl, Lu, and R. Many of these languages are inefficient due to the overhead of being dynamically typed and interpreted. In this paper, we report the potential performance improvement of domain-specific interpreter specialization for data analysis workloads and evaluate how the characteristics of data analysis workloads affect the specialization, both positively and negatively. Assisted by compilers, we specialize the Lu and CPython interpreters at source-level using the script being interpreted and the data types during the interpretation as invariants for five common tasks from real data analysis workloads. Through experiments, we measure 9.0–39.6% performance improvement for Lu and 11.0–17.2% performance improvement for CPython for benchmarks that perform data loading, histogram computation, data filtering, data transformation, and dataset shuffle. This specialization does not include misspeculation checks of data types at possible type conversion code that may be necessary for other workloads. We report the details of our evaluation and present a semi-automatic method for specializing the interpreters.
AB - Scientists frequently implement data analyses in high-level programming languages such as Python, Perl, Lu, and R. Many of these languages are inefficient due to the overhead of being dynamically typed and interpreted. In this paper, we report the potential performance improvement of domain-specific interpreter specialization for data analysis workloads and evaluate how the characteristics of data analysis workloads affect the specialization, both positively and negatively. Assisted by compilers, we specialize the Lu and CPython interpreters at source-level using the script being interpreted and the data types during the interpretation as invariants for five common tasks from real data analysis workloads. Through experiments, we measure 9.0–39.6% performance improvement for Lu and 11.0–17.2% performance improvement for CPython for benchmarks that perform data loading, histogram computation, data filtering, data transformation, and dataset shuffle. This specialization does not include misspeculation checks of data types at possible type conversion code that may be necessary for other workloads. We report the details of our evaluation and present a semi-automatic method for specializing the interpreters.
KW - Compiler-assisted specialization
KW - Interpreter specialization
KW - Profile-based optimization
UR - http://www.scopus.com/inward/record.url?scp=85119846524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119846524&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-90539-2_14
DO - 10.1007/978-3-030-90539-2_14
M3 - Conference contribution
AN - SCOPUS:85119846524
SN - 9783030905385
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 225
BT - High Performance Computing - ISC High Performance Digital 2021 International Workshops, 2021, Revised Selected Papers
A2 - Jagode, Heike
A2 - Anzt, Hartwig
A2 - Ltaief, Hatem
A2 - Luszczek, Piotr
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Conference on High Performance Computing, ISC High Performance 2021
Y2 - 24 June 2021 through 2 July 2021
ER -