







|

|
This section provides access to publications
associated with my research projects.
A list of paper titles organized by subject follows.
To see the abstract
(and citation) for a paper, click on its title.
With most abstracts there will be an icon that looks like . If you click on that icon,
the postscript/pdf/doc form
of the paper will be retrieved for viewing and optional printing.
|
|
Books & Book Chapters
|
|
|
Published Papers
|
|
A Case Study on Grid Performance Modeling ,
November, 2006.
Initial Starting Point Analysis for K-Means
Clustering: A Case Study ,
March, 2006.
Adaptive Automatic Grid Reconfiguration
Using Workload Phase Identification ,
December, 2005.
Comparison of Protein Structures
by Transformation into Dihedral Angle Sequences ,
August, 1996.
BioSCAN: A Dynamically Reconfigurable
Systolic Array for Biosequence Analysis , June, 1996.
BioSCAN: A Network Sharable Computational
Resource for Searching Biosequence Databases , March, 1996.
Rapid Protein Structure Classification
using One-dimensional Structure Profiles on the BioSCAN Parallel
Computer, October, 1995.
A Scalable Systolic Multiprocessor System for
Analysis of Biological Sequences , March, 1993.
|
|
Technical Notes
|
|
The Resilient Earth: Science, Global Warming, and the Future of Humanity,
Doug L. Hoffman and Allen Simmons,
in press, expected in 2008.
ABSTRACT:
A million years after the birth of our sun, the violent explosion of a nearby supernova nearly ended life on Earth before it began. Over the next four and a half billion years, forces of nature shaped our planet and the life it harbored. Barely surviving the traumatic birth of the Moon, buffeted by supernovae, and bombarded by asteroids, the resilient Earth endured. And despite planet-freezing ice ages, devastating mass extinctions, and ever changing climate, life not only survived, it thrived. Today, we are told all life on Earth is threatened by a new peril--human-caused global warming. The Resilient Earth presents the science behind global warming for a general audience, separating fact from fiction and truth from exaggeration.
|
Performance Modeling of Enterprise Grid Systems,
D. L. Hoffman, A. Apon, L. Dowdy, B. Lu, et al, in Data Engineering: Mining, Information, and Intelligence, T. Talley, J. Talburt, and Y. Chan, Eds., Springer,
expected in 2008.
ABSTRACT:
Modeling has long been recognized as an invaluable tool for predicting the performance behavior of computer systems. Modeling software, both commercial and open source, is widely used as a guide for the development of new systems and the upgrading of exiting ones. Unfortunately, no set of comprehensive tools exists for modeling complex distributed computing environments such as the ones found in emerging grid deployments. This chapter addresses concepts, methodologies, and tools that are useful when designing, implementing, and tuning the performance in grid and cluster environments.
|
A Case Study on Grid Performance Modeling,
B. Lu, A. Apon, L. Dowdy, F. Robinson, D. Hoffman, and D. Brewer,
nternational Conference on Parallel and Distributed Computing Systems,
November 13, 2006.
ABSTRACT:
The purpose of this case study is to develop a performance
model for an enterprise grid for performance management
and capacity planning1. The target environment includes
grid applications such as health-care and financial services
where the data is located primarily within the resources of a
worldwide corporation. The approach is to build a discrete
event simulation model for a representative work-flow grid.
Five work-flow classes, found using a customized k-means
clustering algorithm characterize the workload of the grid.
Analyzing the gap between the simulation and measurement
data validates the model. The case study demonstrates
that the simulation model can be used to predict the
grid system performance given a workload forecast. The
model is also used to evaluate alternative scheduling strategies.
The simulation model is flexible and easily incorporates
several system details.
|
Initial Starting Point Analysis for K-Means Clustering: A Case Study,
F. Robinson, A. Apon, D. Brewer, L. Dowdy, D. Hoffman, B. Lu,
Proceedings of ALAR 2006 Conference on Applied Research in Information
Technology, March, 2006.
ABSTRACT:
Workload characterization is an important part of systems performance modeling. Clustering is a method used to find classes of jobs within workloads. K-Means is one of the most popular clustering algorithms. Initial starting point values are needed as input parameters when performing k-means clustering. This paper shows that the results of the running the k-means algorithm on the same workload will vary depending on the values chosen as initial starting points. Fourteen methods of composing initial starting point values are compared in a case study. The results indicate that a synthetic method, scrambled midpoints, is an effective starting point method for k-means clustering.
|
Adaptive Automatic Grid Reconfiguration Using Workload Phase
Identification,
B. Lu, M. Tinker, A. Apon, D. Hoffman, and L. Dowdy,
Proceedings of EScience 2005, December, 2005.
ABSTRACT:
The purpose of this study is to develop an adaptive model
of a very large scale data processing and storage environment.
The target environment includes grid applications
such as health-care and finance in which the data may be
located primarily within the resources of a worldwide corporation.
The approach is to use phase identification techniques
that can detect over-utilized grid resources, and then
to make dynamic decisions to reassign additional resources
to that portion of the application processing. Two phase
identification techniques are proposed, a variation technique
and a real-time threshold-based technique. The techniques
are validated with a simulation model and a case
study using measured data from a production grid environment.
The case study demonstrates that phase identification
techniques can be used as the intelligent component of
a reactive mechanism for a grid to adapt to changing environmental
conditions by dynamic automatic reconfiguration.
Results show that threshold based phase identifying
techniques combined with dynamic resource allocation capabilities
are effective in alleviating performance hot spots
and improving response time in a large scale data grid.
|
Comparison of Protein Structures by Transformation
into Dihedral Angle Sequences,
D. L. Hoffman, PhD dissertation, University of North
Carolina at Chapel Hill, 1996.
ABSTRACT:
Proteins are large complex organic molecules that are essential to the
existence of life. Decades of study have revealed that proteins having
different sequences of amino acids can posses very similar
three-dimensional structures. To date, protein structure comparison
methods have been accurate but costly in terms of computer time. This
dissertation presents a new method for comparing protein structures using
dihedral transformations. Atomic XYZ coordinates are transformed into a
sequence of dihedral angles, which is then transformed into a sequence of
dihedral sectors. Alignment of two sequences of dihedral
sectors reveals similarities between the original protein
structures. Experiments have shown that this method detects structural
similarities between sequences with less than 20% amino acid sequence
identity, finding structural similarities that would not have been
detected using amino acid alignment techniques. Comparisons can be
performed in seconds that had previously taken minutes or hours.
|
BioSCAN: A Dynamically Reconfigurable Systolic Array for
Biosequence Analysis,
Raj K. Singh, W. D. Dettloff, V. L. Chi, D. L. Hoffman, S. G. Tell,
C. T. White, S. F. Altschul, and B. W. Erickson, Proc. of CERCS96,
National Science Foundation, Arlington, VA, June 22-24, 1996.
ABSTRACT:
We describe the design, implementation, and deployment via the
Internet of BioSCAN, an application-specific computer system for
the rapid determination of statistically significant alignments
of biopolymer (DNA, RNA, protein) sequences. BioSCAN continues
to outperform other systems designed to perform this basic task
of molecular biology, which continues to grow in magnitude and
importance. The BioSCAN system is hosted by a general-purpose
workstation containing a special-purpose hardware engine that
accelerates the core algorithm for comparing two biosequences.
Careful partitioning of the computational tasks between hardware
and software provides not only high performance but also
programmability. The BioSCAN system can compare a sequence of
up to 12,992 characters with an arbitrarily large database
containing arbitrarily long sequences at a rate of 2 million
database characters per second. This rate is nearly 1,000 times
greater than the rate achieved by a state-of-the-art workstation
using software alone. This network-sharable computational
resource is accessible interactively via the World Wide Web
using Mosaic, Netscape or other client software.
|
BioSCAN: A Network-Sharable Computational Resource for Searching
Biosequence Databases,
Raj K. Singh, D. L. Hoffman, S. G. Tell, and C. T. White,
Computer Applications in the Biosciences, Vol. 12, No. 3, 1996,
pp. 191-196.
ABSTRACT:
We describe a network sharable, interactive computational tool for
rapid and sensitive search and analysis of biomolecular sequence
databases such as GenBank, GenPept, Protein Identification Resource,
and SWISS-PROT. The resource is accessible via the World Wide Web
using popular client software such as Mosaic and Netscape. The client
software is freely available on a number of computing platforms including
Macintosh, IBM-PC, and Unix workstations.
|
Rapid Protein Structure Classification using One-dimensional Structure
Profiles on the BioSCAN Parallel Computer,
D. L. Hoffman, S. Laiter, Raj K. Singh, I. I. Vaisman, and A. Tropsha,
Computer Applications in the Biosciences, Vol. 11, No. 6, 1995, pp. 675-679.
ABSTRACT:
Rapid growth of protein structures database in recent years requires an
effective approach for objective comparison and classification of
deposited protein structures. We describe a novel method for structure
comparison and classification based on the alignment of one-dimensional
structure profiles. These profiles are obtained by calculating
the OCCO pseudodihedral angles (formed by O-C-C-O atoms of carbonyl
groups of consecutive amino acid residues) from protein three-dimensional
coordinates. These angle measurements are then converted into a 24 letter
alphabet, and the protein structures are represented by sequences of letter
from this alphabet. The BioSCAN parallel computer, designed for primary
sequence alignment, is used to rapidly align and classify these
one-dimensional structure profiles. We have developed and implemented
weighted scoring matrix to identify structural classes based on commonly
found structural motifs. The results of our experiments are in good
agreement with the traditional protein structure classification schemes.
One-dimensional structure profiles significantly improve efficiency of
structure comparison and classification.
|
A Scalable Systolic Multiprocessor for Analysis of Biological
Sequences,
Raj K. Singh, S. G. Tell, C. T. White, D. L. Hoffman, V. L. Chi, and
B. W. Erickson, Proc. of the Symposium on Integrated Systems,
Seattle, WA, March 3-5, 1993, MIT Press, Cambridge, MA, pp. 168-182.
ABSTRACT:
The design and implementation of an application-specific, fault-tolerant,
and scalable multiprocessor system called BioSCAN (Biological Sequence
Comparative Analysis Node) are described. Discussed are system
partitioning and integration, functional decomposition between hardware
and software, the algorithm and its implementation in VLSI, the early
results of using the system, and comparison with other hardware and
software solutions for biological sequence analysis.
|
Design of the BioSCAN Server Software,
D. L. Hoffman, Department of Computer Science, University of North
Carolina, Chapel Hill, NC. TR93-049, 1993.
ABSTRACT:
This paper is an exploration of the design goals for the Biological
Sequence Comparative Analysis Node (BioSCAN) network server software and of
the impact that these goals had on the overall structure and implementation
of that software. The primary audience for this paper consists of computer
scientists and computational biologists involved in developing similar
server software. Biologists who are users of the BioSCAN computational
node and have a desire for deeper understanding of how the server functions
will also find this paper useful. It is assumed that the reader is
familiar with UNIX and with basic networking concepts. The
peculiarities of implementing a network server for a batch resource will be
identified and the solutions chosen by the BioSCAN design team explained.
Research for the BioSCAN project, including the design of the server
software, was supported in part by NSF grant MIP-9024585.
|
A comparison of the BioSCAN algorithm on Multiple
architectures,
D. L. Hoffman, Department of Computer Science, University of North
Carolina, Chapel Hill, NC. TR93-050, 1993.
ABSTRACT:
This paper compares the performance characteristics of the BioSCAN
biological sequence matching algorithm on several different computer
architectures. The architectures examined are a conventional {\sc RISC\/}
general purpose uni-processor, a vector oriented ``supercomputer'', and a
Single Instruction Multi Data ({\sc simd\/}) massively parallel computer.
These architectures are represented by the following hardware platforms: a
Sun 490 RISC, a Convex 240, and a MasPar MP-1. The performance of these
three platforms is compared with that of the custom built BioSCAN hardware.
|
|