Greg Pauloski
Computer Scientist // Software Engineer
Hello there! I am a fourth-year Ph.D. student in Computer Science at the University of Chicago interested in high-performance computing and deep learning frameworks. I am a member of Globus Labs where I am co-advised by Ian Foster and Kyle Chard. I completed my Bachelors in Computer Science at the University of Texas at Austin and previously worked at Apple, Google, and the Texas Advanced Computing Center.
science RESEARCH link
chevron_right
Scalable Deep Learning: We are exploring new techniques for improving deep learning training time and scalability by (1) exploiting scalable algorithms for second-order information approximation; (2) developing methods for adapting to different computer hardware by tuning computation and communication to maximize training speed; and (3) exploring compression techniques to reduce communication overheads.
chevron_right
Workflow Systems: Modern computational science experiments are increasingly written as a coupled set of many distinct software coordinated by a central workflow system. We are designing new programming models which decouple communication from application design to enable multiple data movement methods depending on where data are moved, what are moved, or when they are moved.
chevron_right
Scientific Language Models: We are building large (billion+ parameter) transformer-based language models on broad scientific literature to automate knowledge extraction. We are evaluating the training methods for these models to quantify the impact of training corpus size, model size, and pretraining time on downstream performance, and we are investigating better methods for assessing the quality of the trained models.
engineering PROJECTS link
Check out all of my projects on GitHub.
chevron_right
ProxyStore: Pass-by-reference semantics for distributed Python applications
[Code]
chevron_right
K-FAC: Distributed PyTorch K-FAC gradient preconditioner
[Code]
chevron_right
LLM Training: Tools and scripts for large language model training
[Code]
chevron_right
Colmena: A framework for steering large campaigns of simulations on HPC
[Code]
chevron_right
3pseatBot: A hobby Discord bot
[Code]
star SELECTED PUBLICATIONS link
Ordered by most recent.
chevron_right
TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks [Sep 2024] link |
J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard |
eScience 2024 |
TLDR | PDF | Website | Code | Preprint | BibTex |
TLDR: Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Research into these task executors has accelerated as computational sciences increasingly need to take advantage of parallel compute and/or heterogeneous hardware. However, the lack of evaluation standards makes it challenging to compare and contrast novel systems against existing implementations. Here, we introduce TaPS, the Task Performance Suite, to support continued research in parallel task executor frameworks. TaPS provides (1) a unified, modular interface for writing and evaluating applications using arbitrary execution frameworks and data management systems and (2) an initial set of reference synthetic and real-world science applications.
|
@misc{pauloski2024taps, author = {J. Gregory Pauloski and Valerie Hayot-Sasson and Maxime Gonthier and Nathaniel Hudson and Haochen Pan and Sicheng Zhou and Ian Foster and Kyle Chard}, title = {{TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks}}, archiveprefix = {arXiv}, eprint = {2408.07236}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2408.07236}, year = {2024} } |
chevron_right
Object Proxy Patterns for Accelerating Distributed Applications [Jul 2024] link |
J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster |
arXiv Preprint |
TLDR | PDF | Website | Code | Preprint | BibTex |
TLDR: In prior work, we demonstrated the transparent object proxy, which provides wide-area references that can resolve to data regardless of location, as an effective low-level building block for data flow optimization in distributed application design. Here we propose three high-level proxy-based programming patterns---distributed futures, streaming, and ownership---that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage.
|
@misc{pauloski2024proxystore, author = {J. Gregory Pauloski and Valerie Hayot-Sasson and Logan Ward and Alexander Brace and André Bauer and Kyle Chard and Ian Foster}, title = {{Object Proxy Patterns for Accelerating Distributed Applications}}, archiveprefix = {arXiv}, eprint = {2407.01764}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2407.01764}, year = {2024} } |
chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Nov 2023] link |
J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster |
SC 2023 |
TLDR | PDF | Website | Code | Poster | Slides | Publication | BibTex |
TLDR: We describe ProxyStore, a system that decouples control flow from data flow by extending the pass-by-reference model to distributed applications using object proxies that act as wide-area object references with just-in-time resolution. This proxy model enables data producers to communicate data unilaterally, transparently, and efficiently to both local and remote consumers. We demonstrate the benefits of this model with synthetic benchmarks and real-world scientific applications, running across various computing platforms.
|
@inproceedings{pauloski2023proxystore, author = {Pauloski, J. Gregory and Hayot-Sasson, Valerie and Ward, Logan and Hudson, Nathaniel and Sabino, Charlie and Baughman, Matt and Chard, Kyle and Foster, Ian}, title = {{Accelerating Communications in Federated Applications with Transparent Object Proxies}}, address = {New York, NY, USA}, articleno = {59}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.1145/3581784.3607047}, isbn = {9798400701092}, location = {Denver, CO, USA}, numpages = {15}, publisher = {Association for Computing Machinery}, series = {SC '23}, url = {https://doi.org/10.1145/3581784.3607047}, year = {2023} } |
chevron_right
Deep Neural Network Training With Distributed K-FAC [Mar 2022] link |
J. Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian Foster, Zhao Zhang |
TPDS 2022 |
TLDR | PDF | Code | Publication | BibTex |
TLDR: We extend our SC 2020 paper to evaluate the convergence and scaling properties of our K-FAC gradient preconditioner, for image classification, object detection, and language modeling applications. In all applications, our implementation converges to baseline performance targets in 9—25% less time than the standard first-order optimizers on GPU clusters across a variety of scales.
|
@article{pauloski2022kfac, author = {Pauloski, J. Gregory and Huang, Lei and Xu, Weijia and Chard, Kyle and Foster, Ian T. and Zhang, Zhao}, title = {{Deep Neural Network Training With Distributed K-FAC}}, doi = {10.1109/TPDS.2022.3161187}, journal = {IEEE Transactions on Parallel and Distributed Systems}, number = {12}, pages = {3616-3627}, volume = {33}, year = {2022} } |
chevron_right
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks [Nov 2021] link |
J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang |
SC 2021 |
TLDR | PDF | Code | Slides | Publication | BibTex |
TLDR: We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase scalability. Compared to the original optimizers, KAISA converges 18.1—36.3% faster across applications with the same global batch size.
|
@inproceedings{pauloski2021kaisa, author = {Pauloski, J. Gregory and Huang, Qi and Huang, Lei and Venkataraman, Shivaram and Chard, Kyle and Foster, Ian and Zhang, Zhao}, title = {{KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks}}, abstract = {Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase scalability. We quantify the tradeoffs between memory and communication cost and evaluate KAISA on large models, including ResNet-50, Mask R-CNN, U-Net, and BERT, on up to 128 NVIDIA A100 GPUs. Compared to the original optimizers, KAISA converges 18.1--36.3% faster across applications with the same global batch size. Under a fixed memory budget, KAISA converges 32.5% and 41.6% faster in ResNet-50 and BERT-Large, respectively. KAISA can balance memory and communication to achieve scaling efficiency equal to or better than the baseline optimizers.}, address = {New York, NY, USA}, articleno = {13}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.1145/3458817.3476152}, isbn = {9781450384421}, keywords = {second-order optimization, machine learning, distributed computing, K-FAC, data-parallel algorithms}, location = {St. Louis, Missouri}, numpages = {14}, publisher = {Association for Computing Machinery}, series = {SC '21}, url = {https://doi.org/10.1145/3458817.3476152}, year = {2021} } |
chevron_right
Convolutional Neural Network Training with Distributed K-FAC [Nov 2020] link |
J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian Foster |
SC 2020 |
TLDR | PDF | Code | Slides | Publication | BibTex |
TLDR: We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. Our distributed optimizer design trains Resnet-50 18—25% faster than SGD.
|
@inproceedings{pauloski2020kfac, author = {Pauloski, J. Gregory and Zhang, Zhao and Huang, Lei and Xu, Weijia and Foster, Ian T.}, title = {{Convolutional Neural Network Training with Distributed K-FAC}}, abstract = {Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-1k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet-1k dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18--25% less time than does the classic stochastic gradient descent (SGD) optimizer across scales on a GPU cluster.}, articleno = {94}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.5555/3433701.3433826}, isbn = {9781728199986}, keywords = {optimization methods, neural networks, high performance computing, scalability}, location = {Atlanta, Georgia}, numpages = {14}, publisher = {IEEE Press}, series = {SC '20}, year = {2020} } |
article PUBLICATIONS link
Ordered by most recent. Bibtex file available for download here.
chevron_right
Employing Artificial Intelligence to Steer Exascale Workflows with Colmena [Aug 2024] link |
Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster |
IJHPCA 2024 |
TLDR | PDF | Website | Code | Preprint | BibTex |
TLDR: We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI.
|
@misc{ward2024colmena, author = {Logan Ward and J. Gregory Pauloski and Valerie Hayot-Sasson and Yadu Babuji and Alexander Brace and Ryan Chard and Kyle Chard and Rajeev Thakur and Ian Foster}, title = {{Employing Artificial Intelligence to Steer Exascale Workflows with Colmena}}, archiveprefix = {arXiv}, eprint = {2408.14434}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2408.14434}, year = {2024} } |
chevron_right
Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision [Dec 2023] link |
Nathaniel C. Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, André Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster |
BDCAT 2023 |
TLDR | PDF | Publication | BibTex |
TLDR: Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters---such as Huawei's PanGu-Σ. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
|
@inproceedings{hudson2023trillion, author = {Hudson, Nathaniel C and Pauloski, J. Gregory and Baughman, Matt and Kamatar, Alok and Sakarvadia, Mansi and Ward, Logan and Chard, Ryan and Bauer, Andr\'{e} and Levental, Maksim and Wang, Wenyi and Engler, Will and Price Skelly, Owen and Blaiszik, Ben and Stevens, Rick and Chard, Kyle and Foster, Ian}, title = {{Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision}}, abstract = {Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters---such as Huawei's PanGu-Σ. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.}, address = {New York, NY, USA}, articleno = {15}, booktitle = {Proceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies}, doi = {10.1145/3632366.3632396}, isbn = {9798400704734}, keywords = {artificial intelligence, grid computing, deep learning applications, systems design, survey}, location = {<conf-loc>, <city>Taormina (Messina)</city>, <country>Italy</country>, </conf-loc>}, numpages = {10}, publisher = {Association for Computing Machinery}, series = {BDCAT '23}, url = {https://doi.org/10.1145/3632366.3632396}, year = {2024} } |
chevron_right
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies [Nov 2023] link |
Collaboration between Microsoft, Rutgers University, University of Sydney, Columbia University, Harvard University, Argonne National Laboratory, University of Chicago, Oak Ridge National Laboratory, Brookhaven National Laboratory, Princeton University, AMD, and NVIDIA |
arXiv Preprint |
TLDR | PDF | Website | Preprint | BibTex |
TLDR: We present the DeepSpeed4Science initiative which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models.
|
@misc{song2023deepspeed4science, author = {Shuaiwen Leon Song and Bonnie Kruft and Minjia Zhang and Conglong Li and Shiyang Chen and Chengming Zhang and Masahiro Tanaka and Xiaoxia Wu and Jeff Rasley and Ammar Ahmad Awan and Connor Holmes and Martin Cai and Adam Ghanem and Zhongzhu Zhou and Yuxiong He and Pete Luferenko and Divya Kumar and Jonathan Weyn and Ruixiong Zhang and Sylwester Klocek and Volodymyr Vragov and Mohammed AlQuraishi and Gustaf Ahdritz and Christina Floristean and Cristina Negri and Rao Kotamarthi and Venkatram Vishwanath and Arvind Ramanathan and Sam Foreman and Kyle Hippe and Troy Arcomano and Romit Maulik and Maxim Zvyagin and Alexander Brace and Bin Zhang and Cindy Orozco Bohorquez and Austin Clyde and Bharat Kale and Danilo Perez-Rivera and Heng Ma and Carla M. Mann and Michael Irvin and J. Gregory Pauloski and Logan Ward and Valerie Hayot and Murali Emani and Zhen Xie and Diangen Lin and Maulik Shukla and Ian Foster and James J. Davis and Michael E. Papka and Thomas Brettin and Prasanna Balaprakash and Gina Tourassi and John Gounley and Heidi Hanson and Thomas E Potok and Massimiliano Lupo Pasini and Kate Evans and Dan Lu and Dalton Lunga and Junqi Yin and Sajal Dash and Feiyi Wang and Mallikarjun Shankar and Isaac Lyngaas and Xiao Wang and Guojing Cong and Pei Zhang and Ming Fan and Siyan Liu and Adolfy Hoisie and Shinjae Yoo and Yihui Ren and William Tang and Kyle Felker and Alexey Svyatkovskiy and Hang Liu and Ashwin Aji and Angela Dalton and Michael Schulte and Karl Schulz and Yuntian Deng and Weili Nie and Josh Romero and Christian Dallago and Arash Vahdat and Chaowei Xiao and Thomas Gibbs and Anima Anandkumar and Rick Stevens}, title = {{DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies}}, archiveprefix = {arXiv}, eprint = {2310.04610}, primaryclass = {cs.AI}, year = {2023} } |
chevron_right
The Diminishing Returns of Masked Language Models to Science [May 2023] link |
Zhi Hong, Aswathy Ajith, J. Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster |
Findings of the Association for Computational Linguistics: ACL 2023 |
TLDR | PDF | Website | Publication | BibTex |
TLDR: We use 14 domain-specific transformer based models (including ScholarBERT, a new 770M-parameter science-focused masked language model pretrained on up to 225B tokens) to evaluate the impact of training data, model size, pretraining and finetuning time on 12 downstream scientific tasks. Interestingly, we find that increasing model sizes, training data, or compute time does not always lead to measurable improvements for scientific information extraction tasks.
|
@inproceedings{hong2023scholarbert, author = {Hong, Zhi and Ajith, Aswathy and Pauloski, J. Gregory and Duede, Eamon and Chard, Kyle and Foster, Ian}, title = {{The Diminishing Returns of Masked Language Models to Science}}, abstract = {Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by pretraining larger models for longer on more data. In this work, we empirically evaluate the extent to which these results extend to tasks in science. We use 14 domain-specific transformer-based models (including ScholarBERT, a new 770Mparameter science-focused masked language model pretrained on up to 225B tokens) to evaluate the impact of training data, model size, pretraining and finetuning time on 12 downstream scientific tasks. Interestingly, we find that increasing model size, training data, or compute time does not always lead to significant improvements (i.e., {\textgreater}1{\%} F1), if any, in scientific information extraction tasks. We offer possible explanations for this surprising result.}, address = {Toronto, Canada}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, doi = {10.18653/v1/2023.findings-acl.82}, editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki}, month = {July}, pages = {1270--1283}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-acl.82}, year = {2023} } |
chevron_right
Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources [Mar 2023] link |
Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster |
HCW @ IPDPS 2023 |
TLDR | PDF | Code | Publication | BibTex |
TLDR: We describe our experiences in building and deploying AI driven workflows across multiple computing sites without networking hassles and without losing performance using Colmena, Globus, FuncX, and ProxyStore.
|
@misc{ward2023colmena, author = {Ward, Logan and Pauloski, J. Gregory and Hayot-Sasson, Valerie and Chard, Ryan and Babuji, Yadu and Sivaraman, Ganesh and Choudhury, Sutanay and Chard, Kyle and Thakur, Rajeev and Foster, Ian}, title = {{Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources}}, copyright = {arXiv.org perpetual, non-exclusive license}, doi = {10.48550/ARXIV.2303.08803}, keywords = {Distributed, Parallel, and Cluster Computing (cs.DC), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences}, publisher = {arXiv}, url = {https://arxiv.org/abs/2303.08803}, year = {2023} } |
chevron_right
GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics [Oct 2022] link |
Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M Mann, Michael Irvin, J Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J Davis, Michael E Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan |
IJHPCA — ACM Gordon Bell Special Prize for COVID-19 Research |
TLDR | PDF | Code | Publication | BibTex |
TLDR: We build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million prokaryotic gene sequences, and then finetuning a SARS-CoV-2 specific model on 1.5 million genomes, we show that GenSLM can accurately and rapidly identify variants of concern.
|
@article{zvyagin2022genslm, author = {Zvyagin, Maxim and Brace, Alexander and Hippe, Kyle and Deng, Yuntian and Zhang, Bin and Orozco Bohorquez, Cindy and Clyde, Austin and Kale, Bharat and Perez-Rivera, Danilo and Ma, Heng and Mann, Carla M. and Irvin, Michael and Pauloski, J. Gregory and Ward, Logan and Hayot, Valerie and Emani, Murali and Foreman, Sam and Xie, Zhen and Lin, Diangen and Shukla, Maulik and Nie, Weili and Romero, Josh and Dallago, Christian and Vahdat, Arash and Xiao, Chaowei and Gibbs, Thomas and Foster, Ian and Davis, James J. and Papka, Michael E. and Brettin, Thomas and Stevens, Rick and Anandkumar, Anima and Vishwanath, Venkatram and Ramanathan, Arvind}, title = {{GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics}}, abstract = {Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million prokaryotic gene sequences, and then finetuning a SARS-CoV-2 specific model on 1.5 million genomes, we show that GenSLM can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLM represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate the scaling of GenSLMs on both GPU-based supercomputers and AI-hardware accelerators, achieving over 1.54 zettaflops in training runs. We present initial scientific insights gleaned from examining GenSLMs in tracking the evolutionary dynamics of SARS-CoV-2, noting that its full potential on large biological data is yet to be realized.Competing Interest StatementThe authors have declared no competing interest.}, doi = {10.1101/2022.10.10.511571}, elocation-id = {2022.10.10.511571}, eprint = {https://www.biorxiv.org/content/early/2022/10/11/2022.10.10.511571.full.pdf}, journal = {bioRxiv}, publisher = {Cold Spring Harbor Laboratory}, url = {https://www.biorxiv.org/content/early/2022/10/11/2022.10.10.511571}, year = {2022} } |
chevron_right
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing [Nov 2021] link |
Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster |
MLHPC @ SC 2021 |
TLDR | PDF | Website | Code | Publication | BibTex |
TLDR: We present Colmena, an open-source Python framework that allows users to steer massive computational campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
|
@inproceedings{ward2021colmena, author = {Ward, Logan and Sivaraman, Ganesh and Pauloski, J. Gregory and Babuji, Yadu and Chard, Ryan and Dandu, Naveen and Redfern, Paul C. and Assary, Rajeev S. and Chard, Kyle and Curtiss, Larry A. and Thakur, Rajeev and Foster, Ian}, title = {{Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing}}, booktitle = {2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)}, doi = {10.1109/MLHPC54614.2021.00007}, number = {}, pages = {9-20}, volume = {}, year = {2021} } |
chevron_right
Models and Processes to Extract Drug-like Molecules From Natural Language Text [Aug 2021] link |
Zhi Hong, J. Gregory Pauloski, Logan Ward, Kyle Chard, Ben Blaiszik, Ian Foster |
Frontiers in Molecular Biosciences |
TLDR | PDF | Publication | BibTex |
TLDR: We present (1) an iterative model-in-the-loop method that makes judicious use of scarce human expertise in generating training data for an NER model and (2) the application and evaluation of this method to identifying drug-like molecules in the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198,875 papers.
|
@article{hong2021moleculesnlp, author = {Hong, Zhi and Pauloski, J. Gregory and Ward, Logan and Chard, Kyle and Blaiszik, Ben and Foster, Ian}, title = {{Models and Processes to Extract Drug-like Molecules From Natural Language Text}}, abstract = {Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of viral research. However, this literature is too large for human review and features unusual vocabularies for which existing named entity recognition (NER) models are ineffective. We report here on a project that leverages both human and artificial intelligence to detect references to such molecules in free text. We present 1) a iterative model-in-the-loop method that makes judicious use of scarce human expertise in generating training data for a NER model, and 2) the application and evaluation of this method to the problem of identifying drug-like molecules in the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198,875 papers. We show that by repeatedly presenting human labelers only with samples for which an evolving NER model is uncertain, our human-machine hybrid pipeline requires only modest amounts of non-expert human labeling time (tens of hours to label 1778 samples) to generate an NER model with an F-1 score of 80.5%—on par with that of non-expert humans—and when applied to CORD’19, identifies 10,912 putative drug-like molecules. This enriched the computational screening team’s targets by 3,591 molecules, of which 18 ranked in the top 0.1% of all 6.6 million molecules screened for docking against the 3CLPro protein.}, doi = {10.3389/fmolb.2021.636077}, issn = {2296-889X}, journal = {Frontiers in Molecular Biosciences}, pages = {826}, url = {https://www.frontiersin.org/article/10.3389/fmolb.2021.636077}, volume = {8}, year = {2021} } |
chevron_right
Efficient I/O for Neural Network Training with Compressed Data [May 2020] link |
Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster |
IPDPS 2020 |
TLDR | PDF | Code | Publication | BibTex |
TLDR: We investigate the tradeoff between runtime overhead and data compression ratio on real-world deep learning training datasets and applications. We show that storage can be reduced by 2—13x with minimal additional runtime overhead.
|
@inproceedings{zhang2020compressed, author = {Z. {Zhang} and L. {Huang} and J. G. {Pauloski} and I. T. {Foster}}, title = {{Efficient I/O for Neural Network Training with Compressed Data}}, booktitle = {2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, doi = {10.1109/IPDPS47924.2020.00050}, number = {}, pages = {409-418}, volume = {}, year = {2020} } |
chevron_right
Aggregating Local Storage for Scalable Deep Learning I/O [Dec 2019] link |
Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster |
DLS 2019 |
TLDR | PDF | Code | Publication | BibTex |
TLDR: We develop a a user-level transient object store that provides low-latency and scalable POSIX-compliant file access for scalable deep learning training.
|
@inproceedings{zhang2019aggregating, author = {Z. {Zhang} and L. {Huang} and J. G. {Pauloski} and I. {Foster}}, title = {{Aggregating Local Storage for Scalable Deep Learning I/O}}, booktitle = {2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS)}, doi = {10.1109/DLS49591.2019.00014}, number = {}, pages = {69-75}, volume = {}, year = {2019} } |
chevron_right
Glioma Segmentation and a Simple Accurate Model for Overall Survival Prediction [Nov 2018] link |
Evan Gates, J. Gregory Pauloski, Dawid Schellingerhout, David Fuentes |
BrainLes 2018 |
TLDR | PDF | Publication | BibTex |
TLDR: We develop a multi-stage pipeline for accurate patient survival prediction from brain tumor MRI scans. We segment tumor subvolumes using a multi-scale convolutional network, extract intensity and shape features, then use an ensemble of machine learning models to predict patient outcomes.
|
@inproceedings{gates2019glioma, author = {Gates, Evan and Pauloski, J. Gregory and Schellingerhout, Dawid and Fuentes, David}, title = {{Glioma Segmentation and a Simple Accurate Model for Overall Survival Prediction}}, abstract = {Brain tumor segmentation is a challenging task necessary for quantitative tumor analysis and diagnosis. We apply a multi-scale convolutional neural network based on the DeepMedic to segment glioma subvolumes provided in the 2018 MICCAI Brain Tumor Segmentation Challenge. We go on to extract intensity and shape features from the images and cross-validate machine learning models to predict overall survival. Using only the mean FLAIR intensity, nonenhancing tumor volume, and patient age we are able to predict patient overall survival with reasonable accuracy.}, address = {Cham}, booktitle = {Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries}, editor = {Crimi, Alessandro and Bakas, Spyridon and Kuijf, Hugo and Keyvan, Farahani and Reyes, Mauricio and van Walsum, Theo}, isbn = {978-3-030-11726-9}, pages = {476--484}, publisher = {Springer International Publishing}, year = {2019} } |
co_present PRESENTATIONS link
Ordered by most recent.
chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Nov 2023] |
SC 2023 |
Slides |
chevron_right
ProxyStore: Decoupling Control and Data Flow in Workflows [Oct 2023] |
ParslFest 2023 |
Slides | Video |
chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Apr 2023] |
Greater Chicago Area Systems Research Workshop (GCASR) 2023 |
Poster |
chevron_right
ProxyStore: a Data Fabric for Parsl and FuncX [Sep 2022] |
ParslFest 2022 |
Slides | Video |
chevron_right
Scalable Deep Neural Network Training with Distributed K-FAC [Mar 2022] |
Masters Presentation @ UChicago |
Slides |
chevron_right
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks [Nov 2021] |
SC 2021 |
Slides |
chevron_right
Convolutional Neural Network Training with Distributed K-FAC [Nov 2020] |
SC 2020 |
Slides |
chevron_right
Optimizing Deep Learning Methods for Image Segmentation with Distributed Training [Sep 2018] |
TACCSTER 2018 |
Poster |