IRHPIT: Institute for Reliable High Performance Information Technology

DISC | File system & Structure
Large Scale Parallel Tools | Incast

Projects

Data Intensive Super Computing for Science (DISC)

This project explores how well suited data intensive computing programming/run time paradigms like map reduce and other graphs apply to scientific applications.

Lead: Julio Lopez, CMU
Project Mentors: Gary Grider, James Nunez, John Bent, Salman Habib

Publications

In Search of an API for Scalable File Systems: Under the table or above it? Swapnil Patil, Garth A. Gibson, Gregory R. Ganger, Julio Lopez, Milo Polte, Wittawat Tantisiroj, and Lin Xiao. USENIX HotCloud Workshop 2009. June 2009, San Diego CA.
Abstract / PDF [260K]

Introducing Map-Reduce to High End Computing
Grant Mackey, Saba Sehrish, Julio Lopez, John Bent, Salman Habib, Jun Wang, University of Central Florida, Carnegie Mellon University, and Los Alamos National Laboratory
Paper (PDF) / Slides

File System and File Structure

The concept is to head HPC file system storage towards file formats in the file system. It is quite possible that many file types like N processes to 1 file with small strided writes might be served well by special handling at the file system level. Decades ago, file types and access methods were used and were supported within a single file system. The IBM MVS storage systems allowed for many different file types, partitioned data sets, indexed sequential, virtual sequential, and sequential to name a few. Storage for modern HPC systems may benefit from a new parallel/scalable version of file types. There is much research to be done in this area to determine the usefulness of this concept and how such a thing would work with modern supercomputers and future HPC languages and operating environments.

Lead: Garth Gibson, CMU
Student: Milo Polte, CMU
Mentors: Gary Grider, James Nunez, John Bent

Publications

PLFS: A Checkpoint Filesystem for Parallel Applications. John Bent, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, Meghan Wingate. LANL Technical Release LA-UR 09-02117, April 2009.
Abstract / PDF [415K]

Fast Log-based Concurrent Writing of Checkpoints. Milo Polte, Jiri Simsa, Wittawat Tantisiriroj, Garth Gibson, Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla. Proceedings of the 3rd Petascale Data Storage Workshop held in conjunction with Supercomputing '08, November 17, 2008, Austin, TX.
Abstract / PDF [262K]

GIGA+: Scalable Directories for Shared File Systems. Swapnil V. Patil, Garth A. Gibson, Sam Lang, Milo Polte. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07) held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF

Comparing Performance of Solid State Devices and Mechanical Disks. Milo Polte, Jiri Simsa, Garth Gibson. Proceedings of the 3rd Petascale Data Storage Workshop held in conjunction with Supercomputing '08, November 17, 2008, Austin, TX.
Abstract / PDF [99K]

Large Scale Parallel Tools: OpenSpeedShop Frameworks Plugins

The purpose of this project is to study the OpenSpeedShop tool framework and identify a set of constraints it imposes on tool plugins, drawing on experience from OpenSpeedShop developers at LANL. The project aimsto understand the rationale behind these constraints in terms of framework quality attributes like performance and extensibility, and will explore potential design alternatives and their tradeoffs with respect to those quality attributes. The OpenSpeedShop project will utilize the insights from the research to refactor the OpenSpeedShop plugin architecture for increased usability and ease of incorporating collection tools into the infrastructure. This will make OpenSpeedShop a more effective tool suite for understanding applications and adapting them to better take advantage of new architecture.

Lead: Jonathan Aldrich, CMU
Student: Ciera Jaspan, CMU
Mentors: Dave Montoya, Steve Painter

Publications

Checking Framework Interactions with Relationships. Ciera Jaspan and Jonathan Aldrich. In Proceedings of the European Conference on Object Oriented Programming (ECOOP ’09), July 2009.
PDF [550K]

Checking Framework Interactions with Relationships. Ciera Jaspan. Proceedings of the OOPSLA Doctoral Symposium, Nashville, USA, 2008. Winner ACM SIGPLAN John Vlissides Award.
PDF

TCP/IP Networking Incast

This project analyzes one important barrier to high-performance storage over TCP/IP: the Incast problem. TCP Incast is a catastrophic TCP throughput collapse that occurs as the number of storage servers sending data to a client increases past the ability of an Ethernet switch to buffer packets.

Leads: Garth Gibson, CMU; Greg Ganger, CMU; David Andersen, CMU
Mentors: HB Chen, Andrew Shewmaker, Parks Fields

Publications

Solving TCP Incast in Cluster Storage Systems. Vijay Vasudevan, Hiral Shah, Amar Phanishayee, Elie Krevat, David Andersen, Greg Ganger, Garth Gibson. FAST 2009 Work in Progress Report. 7th USENIX Conference on File and Storage Technologies. Feb 24-27, 2009, San Francisco, CA.
PDF [70K]

Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems. Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Srinivasan Seshan. 6th USENIX Conference on File and Storage Technologies (FAST '08). Feb. 26-29, 2008. San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-07-105, September 2007.
Abstract / PDF [374K]

On Application-level Approaches to Avoiding TCP Throughput Collapse in Cluster-Based Storage Systems. E. Krevat, V. Vasudevan, A. Phanishayee, D. Andersen, G. Ganger, G. Gibson, S. Seshan. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07) held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF [124K]

IRHPIT

institute for reliable high performance
information technology

Projects

Data Intensive Super Computing for Science (DISC)

File System and File Structure

Large Scale Parallel Tools: OpenSpeedShop Frameworks Plugins

TCP/IP Networking Incast

information

Related Links

IRHPIT

institute for reliable high performance information technology

Projects

Data Intensive Super Computing for Science (DISC)

File System and File Structure

Large Scale Parallel Tools: OpenSpeedShop Frameworks Plugins

TCP/IP Networking Incast

information

Related Links

institute for reliable high performance
information technology