Greg Pauloski

Computer Scientist // Software Engineer

email | github | linkedin | scholar

Hello there! I recently completed my Ph.D. in Computer Science with Globus Labs at the University of Chicago, where I was co-advised by Ian Foster and Kyle Chard. My research spans high-performance computing, distributed systems, and deep learning frameworks. I completed my Bachelors in Computer Science at the University of Texas at Austin and previously worked at Apple, Google, and the Texas Advanced Computing Center.

RESEARCH

Distributed Systems: We are designing new programming paradigms which decouple communication from application design to enable multiple data movement methods depending on where data are moved, what are moved, or when they are moved. We are using these paradigms to build scalable scientific workflows, federated function-as-a-service platforms, and multi-agent systems.

Scalable Deep Learning: We are exploring new techniques for improving deep learning training time and scalability by (1) exploiting scalable algorithms for second-order information approximation; (2) developing methods for adapting to different computer hardware by tuning computation and communication to maximize training speed; (3) exploring compression techniques to reduce communication overheads; and (4) enabling complex, hierarchical federated learning across diverse ecosystems of hardware.

AI for Science: We are (1) training large (billion+ parameter) transformer-based language models on broad scientific literature to automate knowledge extraction; (2) developing frameworks for coupling AI and simulations on exascale supercomputers; and (3) building innovative and large-scale solutions to scientific challenges in genome evolution, next-generation battery design, and carbon capture.

DISSERTATIONS

Programming the Continuum: Towards Better Techniques for Developing Distributed Science Applications [Jun 2025]

Scalable Deep Neural Network Training with Distributed K-FAC [Mar 2022]

Abstract | Committee | PDF | Slides | Masters Thesis

PROJECTS

Check out all of my projects on GitHub.

ProxyStore: Pass-by-reference semantics for distributed Python applications [Code]

K-FAC: Distributed PyTorch K-FAC gradient preconditioner [Code]

Academy: Build and deploy stateful agents across federated resources [Code]

TaPS: Benchmarking suite for distributed/parallel task executors [Code]

LLM Training: Tools and scripts for large language model training [Code]

Colmena: Steering large campaigns of simulations on HPC with AI [Code]

3pseatBot: A hobby Discord bot [Code]

SELECTED PUBLICATIONS

Ordered by most recent.

Empowering Scientific Workflows with Federated Agents [May 2025]

J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Mansi Sakarvadia, Kyle Chard, Ian Foster

arXiv Preprint

Object Proxy Patterns for Accelerating Distributed Applications [Dec 2024]

J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

TPDS 2024

TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks [Sep 2024]

J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard

eScience 2024 — Best Paper

Accelerating Communications in Federated Applications with Transparent Object Proxies [Nov 2023]

J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster

SC 2023

Deep Neural Network Training With Distributed K-FAC [Mar 2022]

J. Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian Foster, Zhao Zhang

TPDS 2022

TLDR | PDF | Code | Publication | BibTex

KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks [Nov 2021]

J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

SC 2021

Convolutional Neural Network Training with Distributed K-FAC [Nov 2020]

J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian Foster

SC 2020

ALL PUBLICATIONS

Ordered by most recent and grouped by topic. Bibtex file available for download here.

DISTRIBUTED SYSTEMS

May 2025	Empowering Scientific Workflows with Federated Agents
	TLDR \| PDF \| Authors \| Website \| Code \| Preprint \| BibTex \| arXiv Preprint
	TLDR: Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, the agentic frameworks used to build these systems have not previously enabled use with research cyberinfrastructure. We introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, Academy supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control.
	Authors: J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Mansi Sakarvadia, Kyle Chard, Ian Foster
	@misc{pauloski2025academy, title = {Empowering {S}cientific {W}orkflows with {F}ederated {A}gents}, author = {J. Gregory Pauloski and Yadu Babuji and Ryan Chard and Mansi Sakarvadia and Kyle Chard and Ian Foster}, archiveprefix = {arXiv}, eprint = {2505.05428}, primaryclass = {cs.MA}, url = {https://arxiv.org/abs/2505.05428}, year = {2025} }
May 2025	AERO: An autonomous platform for continuous research
	TLDR \| PDF \| Authors \| Code \| Preprint \| BibTex \| arXiv Preprint
	TLDR: The COVID-19 pandemic highlighted the need for new data infrastructure, as epidemiologists and public health workers raced to harness rapidly evolving data, analytics, and infrastructure in support of cross-sector investigations. To meet this need, we developed AERO, an automated research and data sharing platform for continuous, distributed, and multi-disciplinary collaboration. AERO supports the automatic ingestion, validation, and transformation of monitored data into a form suitable for analysis; the automated execution of analyses on this data; and the sharing of data among different entities. AERO leverages capabilities provided by the Globus platform and GitHub for automation, distributed execution, data sharing, and authentication.
	Authors: Valérie Hayot-Sasson, Abby Stevens, Nicholson Collier, Sudershan Sridhar, Kyle Conroy, J Gregory Pauloski, Yadu Babuji, Maxime Gonthier, Nathaniel Hudson, Dante D Sanchez-Gallegos, Ian Foster, Jonathan Ozik, Kyle Chard
	@misc{hayotsasson2025aero, title = {{AERO}: {A}n autonomous platform for continuous research}, author = {Valérie Hayot-Sasson and Abby Stevens and Nicholson Collier and Sudershan Sridhar and Kyle Conroy and J. Gregory Pauloski and Yadu Babuji and Maxime Gonthier and Nathaniel Hudson and Dante D. Sanchez-Gallegos and Ian Foster and Jonathan Ozik and Kyle Chard}, archiveprefix = {arXiv}, eprint = {2505.18408}, primaryclass = {cs.CE}, url = {https://arxiv.org/abs/2505.18408}, year = {2025} }
May 2025	D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| ICS 2025
	TLDR: As data volumes grow rapidly, we need distributed storage systems. But storage devices often differ in size, speed, and reliability, which makes managing them hard. Erasure coding—a method that protects against data loss by spreading data across devices—is widely used, but it's computationally expensive and typically doesn’t account for differences in device performance. In this work, we address how to use erasure coding efficiently on diverse storage systems. Our goal is to store more data, reduce encoding/decoding time, and meet reliability needs for each piece of data. We introduce two adaptive scheduling algorithms—D-Rex LB and D-Rex SC—that choose the right coding settings and assign data to devices based on their capabilities. D-Rex SC offers better storage use and speed but is more computationally demanding; D-Rex LB is faster with slightly lower performance. In tests, our methods stored 45% more data on average without hurting I/O performance compared to leading approaches.
	Authors: Maxime Gonthier, Dante D Sanchez-Gallegos, Haochen Pan, Bogdan Nicolae, Sicheng Zhou, Hai Duc Nguyen, Valerie Hayot-Sasson, J Gregory Pauloski, Jesus Carretero, Kyle Chard, Ian Foster
	@inproceedings{gonthier2025drex, title = {D-{R}ex: {H}eterogeneity-{A}ware {R}eliability {F}ramework and {A}daptive {A}lgorithms for {D}istributed {S}torage}, author = {Gonthier, Maxime and Sanchez-Gallegos, Dante D and Pan, Haochen and Nicolae, Bogdan and Zhou, Sicheng and Nguyen, Hai Duc and Hayot-Sasson, Valerie and Pauloski, J Gregory and Carretero, Jesus and Chard, Kyle and Foster, Ian}, address = {Salt Lake City, United States}, booktitle = {{ICS'25: The 2025 ACM International Conference on Supercomputing}}, doi = {10.1145/3721145.3730412}, month = {June}, organization = {{ACM}}, pdf = {https://hal.science/hal-05097054v1/file/D_rex_paper.pdf}, url = {https://hal.science/hal-05097054}, year = {2025} }
Mar 2025	WRATH: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks
	TLDR \| PDF \| Authors \| Preprint \| BibTex \| CCGrid 2025
	TLDR: Failures in Task-based Parallel Programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient methods such as retry and checkpointing mechanisms, often apply uniform retry mechanisms regardless of the root cause of failures, failing to account for the unique characteristics of TBPP frameworks such as heterogeneous resource availability and task-level failures. We describe WRATH, a novel systematic approach that categorizes failures based on the unique layered structure of TBPP frameworks and defines specific responses to address failures at different layers. WRATH combines a distributed monitoring system and a resilient module to collaboratively address different types of failures in real time. The monitoring system captures execution and resource information, reports failures, and profiles tasks across different layers of TBPP frameworks. The resilient module then categorizes failures and responds with appropriate actions, such as hierarchically retrying failed tasks on suitable resources.
	Authors: Sicheng Zhou, Zhuozhao Li, Valerie Hayot-Sasson, Haochen Pan, Maxime Gonthier, J. Gregory Pauloski, Ryan Chard, Kyle Chard, Ian Foster
	@misc{zhou2025wrath, title = {{WRATH}: {W}orkload {R}esilience {A}cross {T}ask {H}ierarchies in {T}ask-based {P}arallel {P}rogramming {F}rameworks}, author = {Sicheng Zhou and Zhuozhao Li and Valérie Hayot-Sasson and Haochen Pan and Maxime Gonthier and J. Gregory Pauloski and Ryan Chard and Kyle Chard and Ian Foster}, archiveprefix = {arXiv}, eprint = {2503.12752}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2503.12752}, year = {2025} }
Dec 2024	Object Proxy Patterns for Accelerating Distributed Applications
	TLDR \| PDF \| Authors \| Website \| Code \| Publication \| BibTex \| TPDS 2024
	TLDR: In prior work, we demonstrated the transparent object proxy, which provides wide-area references that can resolve to data regardless of location, as an effective low-level building block for data flow optimization in distributed application design. Here we propose three high-level proxy-based programming patterns---distributed futures, streaming, and ownership---that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage.
	Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster
	@article{pauloski2024proxystore, title = {Object {P}roxy {P}atterns for {A}ccelerating {D}istributed {A}pplications}, author = {Pauloski, J. Gregory and Hayot-Sasson, Valerie and Ward, Logan and Brace, Alexander and Bauer, André and Chard, Kyle and Foster, Ian}, doi = {10.1109/TPDS.2024.3511347}, journal = {IEEE Transactions on Parallel and Distributed Systems}, pages = {1-13}, year = {2024} }
Nov 2024	Establishing a High-Per. and Productive Ecosystem for Dist. Execution of Python Functions Using Globus Compute
	TLDR \| PDF \| Authors \| Website \| Code \| Slides \| BibTex \| HUST @ SC 2024
	TLDR: The research computing ecosystem is increasingly heterogeneous and diverse. Democratizing access to these essential resources is critical for accelerating research progress. However, the gap between a high-level workload, such as Python in a Jupyter notebook, and the resources and interfaces exposed by HPC systems is significant. Users must securely authenticate, manage network connections, deploy and manage software, provision and configure nodes, and manage workload execution. Globus Compute reduces these barriers by providing a managed, fire-and-forget model that enables execution of Python functions across any resource to which a user has access. However, while Globus Compute has relieved users from many of the challenges of remote computing, we have observed some inefficiencies that remain in terms of use. For example, many users wrap external applications, such as C/C++, Fortran, and even MPI applications, in Python functions and users must deploy many endpoints on a single computer to exploit different configurations. We describe enhancements to Globus Compute to address these barriers: an asynchronous, future-based executor interface for submitting and monitoring tasks, shell and MPI-based function types, and a multi-user endpoint that can be deployed by administrators and used by authorized users.
	Authors: (Alphabetical) Rachana Ananthakrishnan, Yadu Babuji, Josh Bryan, Kyle Chard, Ryan Chard, Ben Clifford, Ian Foster, Lev Gorenstein, Kevin Hunter Kesling, Chris Janidlo, Daniel Katz, Reid Mello, J. Gregory Pauloski, Lei Wang
	@inproceedings{ananthakrishnan2024compute, title = {Establishing a {H}igh-{P}erformance and {P}roductive {E}cosystem for {D}istributed {E}xecution of {P}ython {F}unctions {U}sing {G}lobus {C}ompute}, author = {Rachana Ananthakrishnan and Yadu Babuji and Josh Bryan and Kyle Chard and Ryan Chard and Ben Clifford and Ian Foster and Lev Gorenstein and Kevin Hunter Kesling and Chris Janidlo and Daniel Katz and Reid Mello and J. Gregory Pauloski and Lei Wang}, booktitle = {IEEE/ACM International Workshop on HPC User Support Tools (HUST)}, doi = {10.1109/SCW63240.2024.00083}, year = {2024} }
Oct 2024	Accelerating Python Applications with Dask and ProxyStore
	TLDR \| PDF \| Authors \| Code \| Slides \| Preprint \| BibTex \| arXiv Preprint & HPPSS 2024 Demo
	TLDR: Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.
	Authors: J. Gregory Pauloski, Klaudiusz Rydzy, Valerie Hayot-Sasson, Ian Foster, Kyle Chard
	@misc{pauloski2024accelerating, title = {Accelerating {P}ython {A}pplications with {D}ask and {ProxyStore}}, author = {J. Gregory Pauloski and Klaudiusz Rydzy and Valerie Hayot-Sasson and Ian Foster and Kyle Chard}, archiveprefix = {arXiv}, eprint = {2410.12092}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2410.12092}, year = {2024} }
Sep 2024	TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
	TLDR \| PDF \| Authors \| Website \| Code \| Slides \| Publication \| BibTex \| eScience 2024 — Best Paper
	TLDR: Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Research into these task executors has accelerated as computational sciences increasingly need to take advantage of parallel compute and/or heterogeneous hardware. However, the lack of evaluation standards makes it challenging to compare and contrast novel systems against existing implementations. Here, we introduce TaPS, the Task Performance Suite, to support continued research in parallel task executor frameworks. TaPS provides (1) a unified, modular interface for writing and evaluating applications using arbitrary execution frameworks and data management systems and (2) an initial set of reference synthetic and real-world science applications.
	Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard
	@inproceedings{pauloski2024taps, title = {{TaPS}: {A} {P}erformance {E}valuation {S}uite for {T}ask-based {E}xecution {F}rameworks}, author = {Pauloski, J. Gregory and Hayot-Sasson, Valerie and Gonthier, Maxime and Hudson, Nathaniel and Pan, Haochen and Zhou, Sicheng and Foster, Ian and Chard, Kyle}, address = {New York, NY, USA}, booktitle = {IEEE 20th International Conference on e-Science}, doi = {10.1109/e-Science62913.2024.10678702}, pages = {1-10}, publisher = {IEEE}, year = {2024} }
Sep 2024	An Empirical Investigation of Container Building [...] to Reduce Cold Starts in Sci. Computing Serverless Functions
	TLDR \| PDF \| Authors \| Publication \| BibTex \| eScience 2024
	TLDR: Serverless platforms dynamically create execution environments, often using containers. The cost to create and deploy these environments is known as "cold start" latency, and this cost can be particularly detrimental to scientific computing workloads characterized by sporadic and dynamic demands. We investigate methods to mitigate cold start issues in scientific computing applications by pre-installing Python packages in container images. Using data from Globus Compute and Binder, we empirically analyze cold start behavior and evaluate four strategies for building containers, including fully pre-built environments and dynamic, on-demand installations. Our results show that pre-installing all packages reduces initial cold start time but requires significant storage. Conversely, dynamic installation offers lower storage requirements but incurs repetitive delays. Additionally, we implemented a simulator and assessed the impact of different warm times, finding that moderate warm times significantly reduce cold starts without the excessive overhead of maintaining always-hot states.
	Authors: André Bauer, Maxime Gonthier, Haochen Pan, Ryan Chard, Daniel Grzenda, Martin Straesser, J. Gregory Pauloski, Alok Kamatar, Matt Baughman, Nathaniel Hudson, Ian Foster, Kyle Chard
	@inproceedings{bauer2024containers, title = {An {E}mpirical {I}nvestigation of {C}ontainer {B}uilding {S}trategies and {W}arm {T}imes to {R}educe {C}old {S}tarts in {S}cientific {C}omputing {S}erverless {F}unctions}, author = {Bauer, André and Gonthier, Maxime and Pan, Haochen and Chard, Ryan and Grzenda, Daniel and Straesser, Martin and Pauloski, J. Gregory and Kamatar, Alok and Baughman, Matt and Hudson, Nathaniel and Foster, Ian and Chard, Kyle}, booktitle = {IEEE 20th International Conference on e-Science (e-Science)}, doi = {10.1109/e-Science62913.2024.10678668}, pages = {1-10}, year = {2024} }
Nov 2023	Accelerating Communications in Federated Applications with Transparent Object Proxies
	TLDR \| PDF \| Authors \| Website \| Code \| Poster \| Slides \| Publication \| BibTex \| SC 2023
	TLDR: We describe ProxyStore, a system that decouples control flow from data flow by extending the pass-by-reference model to distributed applications using object proxies that act as wide-area object references with just-in-time resolution. This proxy model enables data producers to communicate data unilaterally, transparently, and efficiently to both local and remote consumers. We demonstrate the benefits of this model with synthetic benchmarks and real-world scientific applications, running across various computing platforms.
	Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster
	@inproceedings{pauloski2023proxystore, title = {Accelerating {C}ommunications in {F}ederated {A}pplications with {T}ransparent {O}bject {P}roxies}, author = {Pauloski, J. Gregory and Hayot-Sasson, Valerie and Ward, Logan and Hudson, Nathaniel and Sabino, Charlie and Baughman, Matt and Chard, Kyle and Foster, Ian}, address = {New York, NY, USA}, articleno = {59}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.1145/3581784.3607047}, isbn = {9798400701092}, location = {Denver, CO, USA}, numpages = {15}, publisher = {Association for Computing Machinery}, series = {SC '23}, url = {https://doi.org/10.1145/3581784.3607047}, year = {2023} }

SCALABLE DEEP LEARNING

Feb 2025	COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers
	TLDR \| PDF \| Authors \| Publication \| BibTex \| PPoPP 2025
	TLDR: Gradient compression, a technique commonly used to accelerate communication for first-order approaches, often results in low communication reduction ratios, decreased model accuracy, and/or high compression overhead when applied to second-order methods. We introduce a novel gradient compression method for second-order optimizers called COMPSO, which reduces communication costs while preserving the advantages of second-order optimization. COMPSO employs stochastic rounding to maintain accuracy and filters out minor gradients to improve compression ratios. We develop GPU optimizations to minimize compression overhead and performance modeling to ensure end-to-end performance gains across various systems. Evaluation of COMPSO on different DNN models shows that it achieves a compression ratio of 22.1x, reduces communication time by 14.2x, and improves overall performance by 1.9x, all without any drop in model accuracy..
	Authors: Baixi Sun, eijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao
	@inproceedings{sun2025compso, title = {{COMPSO}: {O}ptimizing {G}radient {C}ompression for {D}istributed {T}raining with {S}econd-{O}rder {O}ptimizers}, author = {Sun, Baixi and Liu, Weijin and Pauloski, J. Gregory and Tian, Jiannan and Jia, Jinda and Wang, Daoce and Zhang, Boyuan and Zheng, Mingkai and Di, Sheng and Jin, Sian and Zhang, Zhao and Yu, Xiaodong and Iskra, Kamil A. and Beckman, Pete and Tan, Guangming and Tao, Dingwen}, address = {New York, NY, USA}, booktitle = {Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming}, doi = {10.1145/3710848.3710852}, isbn = {9798400714436}, location = {Las Vegas, NV, USA}, numpages = {13}, pages = {212–224}, publisher = {Association for Computing Machinery}, series = {PPoPP '25}, url = {https://doi.org/10.1145/3710848.3710852}, year = {2025} }
Sep 2024	Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning
	TLDR \| PDF \| Authors \| Code \| Preprint \| BibTex \| arXiv Preprint
	TLDR: Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
	Authors: Nathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji, Matt Baughman, J. Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard
	@misc{hudson2024flight, title = {Flight: {A} {FaaS}-{B}ased {F}ramework for {C}omplex and {H}ierarchical {F}ederated {L}earning}, author = {Nathaniel Hudson and Valerie Hayot-Sasson and Yadu Babuji and Matt Baughman and J. Gregory Pauloski and Ryan Chard and Ian Foster and Kyle Chard}, archiveprefix = {arXiv}, eprint = {2409.16495}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2409.16495}, year = {2024} }
Dec 2023	Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision
	TLDR \| PDF \| Authors \| Publication \| BibTex \| BDCAT 2023
	TLDR: Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters---such as Huawei's PanGu-Σ. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
	Authors: Nathaniel C. Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, André Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster
	@inproceedings{hudson2023trillion, title = {Trillion {P}arameter {AI} {S}erving {I}nfrastructure for {S}cientific {D}iscovery: {A} {S}urvey and {V}ision}, author = {Hudson, Nathaniel C and Pauloski, J. Gregory and Baughman, Matt and Kamatar, Alok and Sakarvadia, Mansi and Ward, Logan and Chard, Ryan and Bauer, Andr\'{e} and Levental, Maksim and Wang, Wenyi and Engler, Will and Price Skelly, Owen and Blaiszik, Ben and Stevens, Rick and Chard, Kyle and Foster, Ian}, address = {New York, NY, USA}, articleno = {15}, booktitle = {Proceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies}, doi = {10.1145/3632366.3632396}, isbn = {9798400704734}, location = {Taormina (Messina), Italy}, numpages = {10}, publisher = {Association for Computing Machinery}, series = {BDCAT '23}, url = {https://doi.org/10.1145/3632366.3632396}, year = {2024} }
Mar 2022	Deep Neural Network Training With Distributed K-FAC
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| TPDS 2022
	TLDR: We extend our SC 2020 paper to evaluate the convergence and scaling properties of our K-FAC gradient preconditioner, for image classification, object detection, and language modeling applications. In all applications, our implementation converges to baseline performance targets in 9—25% less time than the standard first-order optimizers on GPU clusters across a variety of scales.
	Authors: J. Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian Foster, Zhao Zhang
	@article{pauloski2022kfac, title = {Deep {N}eural {N}etwork {T}raining {W}ith {D}istributed {K}-{FAC}}, author = {Pauloski, J. Gregory and Huang, Lei and Xu, Weijia and Chard, Kyle and Foster, Ian T. and Zhang, Zhao}, doi = {10.1109/TPDS.2022.3161187}, journal = {IEEE Transactions on Parallel and Distributed Systems}, number = {12}, pages = {3616-3627}, volume = {33}, year = {2022} }
Nov 2021	KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks
	TLDR \| PDF \| Authors \| Code \| Slides \| Publication \| BibTex \| SC 2021
	TLDR: We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase scalability. Compared to the original optimizers, KAISA converges 18.1—36.3% faster across applications with the same global batch size.
	Authors: J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang
	@inproceedings{pauloski2021kaisa, title = {{KAISA}: {A}n {A}daptive {S}econd-{O}rder {O}ptimizer {F}ramework for {D}eep {N}eural {N}etworks}, author = {Pauloski, J. Gregory and Huang, Qi and Huang, Lei and Venkataraman, Shivaram and Chard, Kyle and Foster, Ian and Zhang, Zhao}, address = {New York, NY, USA}, articleno = {13}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.1145/3458817.3476152}, isbn = {9781450384421}, location = {St. Louis, Missouri}, numpages = {14}, publisher = {Association for Computing Machinery}, series = {SC '21}, url = {https://doi.org/10.1145/3458817.3476152}, year = {2021} }
Nov 2020	Convolutional Neural Network Training with Distributed K-FAC
	TLDR \| PDF \| Authors \| Code \| Slides \| Publication \| BibTex \| SC 2020
	TLDR: We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. Our distributed optimizer design trains Resnet-50 18—25% faster than SGD.
	Authors: J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian Foster
	@inproceedings{pauloski2020kfac, title = {Convolutional {N}eural {N}etwork {T}raining with {D}istributed {K}-{FAC}}, author = {Pauloski, J. Gregory and Zhang, Zhao and Huang, Lei and Xu, Weijia and Foster, Ian T.}, articleno = {94}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.5555/3433701.3433826}, isbn = {9781728199986}, location = {Atlanta, Georgia}, numpages = {14}, publisher = {IEEE Press}, series = {SC '20}, year = {2020} }
May 2020	Efficient I/O for Neural Network Training with Compressed Data
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| IPDPS 2020
	TLDR: We investigate the tradeoff between runtime overhead and data compression ratio on real-world deep learning training datasets and applications. We show that storage can be reduced by 2—13x with minimal additional runtime overhead.
	Authors: Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster
	@inproceedings{zhang2020compressed, title = {Efficient {I/O} for {N}eural {N}etwork {T}raining with {C}ompressed {D}ata}, author = {Z. {Zhang} and L. {Huang} and J. G. {Pauloski} and I. T. {Foster}}, booktitle = {IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, doi = {10.1109/IPDPS47924.2020.00050}, pages = {409-418}, year = {2020} }
Dec 2019	Aggregating Local Storage for Scalable Deep Learning I/O
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| DLS 2019
	TLDR: We develop a a user-level transient object store that provides low-latency and scalable POSIX-compliant file access for scalable deep learning training.
	Authors: Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster
	@inproceedings{zhang2019aggregating, title = {Aggregating {L}ocal {S}torage for {S}calable {D}eep {L}earning {I/O}}, author = {Z. {Zhang} and L. {Huang} and J. G. {Pauloski} and I. {Foster}}, booktitle = {IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS)}, doi = {10.1109/DLS49591.2019.00014}, pages = {69-75}, year = {2019} }

AI FOR SCIENCE

May 2025	HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights
	TLDR \| PDF \| Authors \| Code \| Preprint \| BibTex \| PASC25
	TLDR: The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. We introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm that enhances retrieval accuracy by using contrastive learning and late-interaction techniques. HiPerRAG delivers robust performance on existing scientific question answering benchmarks and two new benchmarks introduced in this work, achieving 90% accuracy on SciQ and 76% on PubMedQA-outperforming both domain-specific models like PubMedGPT and commercial LLMs such as GPT-4.
	Authors: Ozan Gokdemir, Carlo Siebenschuh, Alexander Brace, Azton Wells, Brian Hsu, Kyle Hippe, Priyanka V. Setty, Aswathy Ajith, J. Gregory Pauloski, Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, Nicholas Chia, Thomas Gibbs, Michael E. Papka, Thomas Brettin, Francis J. Alexander, Anima Anandkumar, Ian Foster, Rick Stevens, Venkatram Vishwanath, Arvind Ramanathan
	@article{gokdemir2025hiperrag, title = {{HiPerRAG}: {H}igh-{P}erformance {R}etrieval {A}ugmented {G}eneration for {S}cientific {I}nsights}, author = {Ozan Gokdemir, Carlo Siebenschuh, Alexander Brace, Azton Wells, Brian Hsu, Kyle Hippe, Priyanka V. Setty, Aswathy Ajith, J. Gregory Pauloski, Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, Nicholas Chia, Thomas Gibbs, Michael E. Papka, Thomas Brettin, Francis J. Alexander, Anima Anandkumar, Ian Foster, Rick Stevens, Venkatram Vishwanath, Arvind Ramanathan}, journal = {arXiv preprint arXiv:2505.04846}, year = {2025} }
Jan 2025	MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow
	TLDR \| PDF \| Authors \| Code \| Preprint \| BibTex \| arXiv Preprint
	TLDR: We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO2 adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.
	Authors: Xiaoli Yan, Nathaniel Hudson, Hyun Park, Daniel Grzenda, J. Gregory Pauloski, Marcus Schwarting, Haochen Pan, Hassan Harb, Samuel Foreman, Chris Knight, Tom Gibbs, Kyle Chard, Santanu Chaudhuri, Emad Tajkhorshid, Ian Foster, Mohamad Moosavi, Logan Ward, E. A. Huerta
	@misc{yan2025mofa, title = {{MOFA}: {D}iscovering {M}aterials for {C}arbon {C}apture with a {GenAI}- and {S}imulation-{B}ased {W}orkflow}, author = {Xiaoli Yan and Nathaniel Hudson and Hyun Park and Daniel Grzenda and J. Gregory Pauloski and Marcus Schwarting and Haochen Pan and Hassan Harb and Samuel Foreman and Chris Knight and Tom Gibbs and Kyle Chard and Santanu Chaudhuri and Emad Tajkhorshid and Ian Foster and Mohamad Moosavi and Logan Ward and E. A. Huerta}, archiveprefix = {arXiv}, eprint = {2501.10651}, primaryclass = {cs.DC}, url = {https://arxiv.org/abs/2501.10651}, year = {2025} }
Oct 2024	Employing Artificial Intelligence to Steer Exascale Workflows with Colmena
	TLDR \| PDF \| Authors \| Website \| Code \| Publication \| BibTex \| IJHPCA 2024
	TLDR: We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI.
	Authors: Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster
	@article{ward2024colmena, title = {Employing {A}rtificial {I}ntelligence to {S}teer {E}xascale {W}orkflows with {C}olmena}, author = {Logan Ward and J. Gregory Pauloski and Valerie Hayot-Sasson and Yadu Babuji and Alexander Brace and Ryan Chard and Kyle Chard and Rajeev Thakur and Ian Foster}, doi = {10.1177/10943420241288242}, eprint = {https://doi.org/10.1177/10943420241288242}, journal = {The International Journal of High Performance Computing Applications}, number = {0}, pages = {10943420241288242}, url = {https://doi.org/10.1177/10943420241288242}, volume = {0}, year = {0} }
Nov 2023	GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| IJHPCA — ACM Gordon Bell Special Prize for COVID-19 Research
	TLDR: We build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million prokaryotic gene sequences, and then finetuning a SARS-CoV-2 specific model on 1.5 million genomes, we show that GenSLM can accurately and rapidly identify variants of concern.
	Authors: Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M Mann, Michael Irvin, J Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J Davis, Michael E Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan
	@article{zvyagin2023genslms, title = {{GenSLMs}: {G}enome-scale language models reveal {SARS}-{CoV}-2 evolutionary dynamics}, author = {Zvyagin, Maxim and Brace, Alexander and Hippe, Kyle and Deng, Yuntian and Zhang, Bin and Bohorquez, Cindy Orozco and Clyde, Austin and Kale, Bharat and Perez-Rivera, Danilo and Ma, Heng and others}, journal = {The International Journal of High Performance Computing Applications}, number = {6}, pages = {683--705}, publisher = {SAGE Publications Sage UK: London, England}, volume = {37}, year = {2023} }
Nov 2023	DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
	TLDR \| PDF \| Authors \| Website \| Preprint \| BibTex \| arXiv Preprint
	TLDR: We present the DeepSpeed4Science initiative which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models.
	Authors: Collaboration between Microsoft, Rutgers University, University of Sydney, Columbia University, Harvard University, Argonne National Laboratory, University of Chicago, Oak Ridge National Laboratory, Brookhaven National Laboratory, Princeton University, AMD, and NVIDIA
	@misc{song2023deepspeed4science, title = {{DeepSpeed4Science} {I}nitiative: {E}nabling {L}arge-{S}cale {S}cientific {D}iscovery through {S}ophisticated {AI} {S}ystem {T}echnologies}, author = {Shuaiwen Leon Song and Bonnie Kruft and Minjia Zhang and Conglong Li and Shiyang Chen and Chengming Zhang and Masahiro Tanaka and Xiaoxia Wu and Jeff Rasley and Ammar Ahmad Awan and Connor Holmes and Martin Cai and Adam Ghanem and Zhongzhu Zhou and Yuxiong He and Pete Luferenko and Divya Kumar and Jonathan Weyn and Ruixiong Zhang and Sylwester Klocek and Volodymyr Vragov and Mohammed AlQuraishi and Gustaf Ahdritz and Christina Floristean and Cristina Negri and Rao Kotamarthi and Venkatram Vishwanath and Arvind Ramanathan and Sam Foreman and Kyle Hippe and Troy Arcomano and Romit Maulik and Maxim Zvyagin and Alexander Brace and Bin Zhang and Cindy Orozco Bohorquez and Austin Clyde and Bharat Kale and Danilo Perez-Rivera and Heng Ma and Carla M. Mann and Michael Irvin and J. Gregory Pauloski and Logan Ward and Valerie Hayot and Murali Emani and Zhen Xie and Diangen Lin and Maulik Shukla and Ian Foster and James J. Davis and Michael E. Papka and Thomas Brettin and Prasanna Balaprakash and Gina Tourassi and John Gounley and Heidi Hanson and Thomas E Potok and Massimiliano Lupo Pasini and Kate Evans and Dan Lu and Dalton Lunga and Junqi Yin and Sajal Dash and Feiyi Wang and Mallikarjun Shankar and Isaac Lyngaas and Xiao Wang and Guojing Cong and Pei Zhang and Ming Fan and Siyan Liu and Adolfy Hoisie and Shinjae Yoo and Yihui Ren and William Tang and Kyle Felker and Alexey Svyatkovskiy and Hang Liu and Ashwin Aji and Angela Dalton and Michael Schulte and Karl Schulz and Yuntian Deng and Weili Nie and Josh Romero and Christian Dallago and Arash Vahdat and Chaowei Xiao and Thomas Gibbs and Anima Anandkumar and Rick Stevens}, archiveprefix = {arXiv}, eprint = {2310.04610}, primaryclass = {cs.AI}, year = {2023} }
May 2023	The Diminishing Returns of Masked Language Models to Science
	TLDR \| PDF \| Authors \| Website \| Publication \| BibTex \| Findings of the Association for Computational Linguistics: ACL 2023
	TLDR: We use 14 domain-specific transformer based models (including ScholarBERT, a new 770M-parameter science-focused masked language model pretrained on up to 225B tokens) to evaluate the impact of training data, model size, pretraining and finetuning time on 12 downstream scientific tasks. Interestingly, we find that increasing model sizes, training data, or compute time does not always lead to measurable improvements for scientific information extraction tasks.
	Authors: Zhi Hong, Aswathy Ajith, J. Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster
	@inproceedings{hong2023scholarbert, title = {The {D}iminishing {R}eturns of {M}asked {L}anguage {M}odels to {S}cience}, author = {Hong, Zhi and Ajith, Aswathy and Pauloski, J. Gregory and Duede, Eamon and Chard, Kyle and Foster, Ian}, address = {Toronto, Canada}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, doi = {10.18653/v1/2023.findings-acl.82}, editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki}, month = {July}, pages = {1270--1283}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-acl.82}, year = {2023} }
Mar 2023	Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources
	TLDR \| PDF \| Authors \| Code \| Publication \| BibTex \| HCW @ IPDPS 2023
	TLDR: We describe our experiences in building and deploying AI driven workflows across multiple computing sites without networking hassles and without losing performance using Colmena, Globus, FuncX, and ProxyStore.
	Authors: Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster
	@inproceedings{ward2023colmena, title = {Cloud {S}ervices {E}nable {E}fficient {AI}-{G}uided {S}imulation {W}orkflows across {H}eterogeneous {R}esources}, author = {Ward, Logan and Pauloski, J. Gregory and Hayot-Sasson, Valerie and Chard, Ryan and Babuji, Yadu and Sivaraman, Ganesh and Choudhury, Sutanay and Chard, Kyle and Thakur, Rajeev and Foster, Ian}, booktitle = {IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, doi = {10.1109/IPDPSW59300.2023.00018}, pages = {32-41}, year = {2023} }
Nov 2021	Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing
	TLDR \| PDF \| Authors \| Website \| Code \| Publication \| BibTex \| MLHPC @ SC 2021
	TLDR: We present Colmena, an open-source Python framework that allows users to steer massive computational campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
	Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster
	@inproceedings{ward2021colmena, title = {Colmena: {S}calable {M}achine-{L}earning-{B}ased {S}teering of {E}nsemble {S}imulations for {H}igh {P}erformance {C}omputing}, author = {Ward, Logan and Sivaraman, Ganesh and Pauloski, J. Gregory and Babuji, Yadu and Chard, Ryan and Dandu, Naveen and Redfern, Paul C. and Assary, Rajeev S. and Chard, Kyle and Curtiss, Larry A. and Thakur, Rajeev and Foster, Ian}, booktitle = {IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)}, doi = {10.1109/MLHPC54614.2021.00007}, pages = {9-20}, year = {2021} }
Aug 2021	Models and Processes to Extract Drug-like Molecules From Natural Language Text
	TLDR \| PDF \| Authors \| Publication \| BibTex \| Frontiers in Molecular Biosciences
	TLDR: We present (1) an iterative model-in-the-loop method that makes judicious use of scarce human expertise in generating training data for an NER model and (2) the application and evaluation of this method to identifying drug-like molecules in the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198,875 papers.
	Authors: Zhi Hong, J. Gregory Pauloski, Logan Ward, Kyle Chard, Ben Blaiszik, Ian Foster
	@article{hong2021moleculesnlp, title = {Models and {P}rocesses to {E}xtract {D}rug-like {M}olecules {F}rom {N}atural {L}anguage {T}ext}, author = {Hong, Zhi and Pauloski, J. Gregory and Ward, Logan and Chard, Kyle and Blaiszik, Ben and Foster, Ian}, doi = {10.3389/fmolb.2021.636077}, issn = {2296-889X}, journal = {Frontiers in Molecular Biosciences}, pages = {826}, url = {https://www.frontiersin.org/article/10.3389/fmolb.2021.636077}, volume = {8}, year = {2021} }
Nov 2018	Glioma Segmentation and a Simple Accurate Model for Overall Survival Prediction
	TLDR \| PDF \| Authors \| Publication \| BibTex \| BrainLes 2018
	TLDR: We develop a multi-stage pipeline for accurate patient survival prediction from brain tumor MRI scans. We segment tumor subvolumes using a multi-scale convolutional network, extract intensity and shape features, then use an ensemble of machine learning models to predict patient outcomes.
	Authors: Evan Gates, J. Gregory Pauloski, Dawid Schellingerhout, David Fuentes
	@inproceedings{gates2019glioma, title = {Glioma {S}egmentation and a {S}imple {A}ccurate {M}odel for {O}verall {S}urvival {P}rediction}, author = {Gates, Evan and Pauloski, J. Gregory and Schellingerhout, Dawid and Fuentes, David}, address = {Cham}, booktitle = {Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries}, editor = {Crimi, Alessandro and Bakas, Spyridon and Kuijf, Hugo and Keyvan, Farahani and Reyes, Mauricio and van Walsum, Theo}, isbn = {978-3-030-11726-9}, pages = {476--484}, publisher = {Springer International Publishing}, year = {2019} }

PRESENTATIONS

Ordered by most recent.

May 2025	Empowering Scientific Workflows with Federated Agents
	Poster \| Greater Chicago Area Systems Research Workshop (GCASR)
May 2025	Empowering Scientific Workflows with Federated Agents
	Slides \| Joint Laboratory for Extreme-Computing Workshop (JLESC)
Nov 2024	Accelerating Python Applications with Dask and ProxyStore
	Slides \| SC24 Workshop on High Performance Python for Science at Scale
Nov 2024	Accelerating Communications in High-Performance Scientific Workflows
	Slides \| Poster \| Doctoral Showcase at Supercomputing 2024
Nov 2024	Distributed Execution of Python Functions Using Globus Compute
	Slides \| SC24 Workshop on HPC User Support Tools
Sep 2024	TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
	Slides \| Video \| ParslFest
Sep 2024	TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
	Slides \| IEEE International Conference on eScience (eScience)
Nov 2023	Accelerating Communications in Federated Applications with Transparent Object Proxies
	Slides \| Supercomputing 2023
Oct 2023	ProxyStore: Decoupling Control and Data Flow in Workflows
	Slides \| Video \| ParslFest
Apr 2023	Accelerating Communications in Federated Applications with Transparent Object Proxies
	Poster \| Greater Chicago Area Systems Research Workshop (GCASR)
Sep 2022	ProxyStore: a Data Fabric for Parsl and FuncX
	Slides \| Video \| ParslFest
Nov 2021	KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks
	Slides \| Supercomputing 2021
Nov 2020	Convolutional Neural Network Training with Distributed K-FAC
	Slides \| Supercomputing 2020
Sep 2018	Optimizing Deep Learning Methods for Image Segmentation with Distributed Training
	Poster \| TACC Symposium for Texas Researchers (TACCSTER)

Greg Pauloski

science RESEARCH link

book DISSERTATIONS link

engineering PROJECTS link

star SELECTED PUBLICATIONS link

article ALL PUBLICATIONS link

co_present PRESENTATIONS link