Home Page
Personal Information
Boffo Places
Published Papers
UNC Chapel Hill
Public Library
Weather Info
Web Search
 

Search: 
 
 

This section provides access to publications associated with my research projects. A list of paper titles organized by subject follows. To see the abstract (and citation) for a paper, click on its title.

With most abstracts there will be an icon that looks like . If you click on that icon, the postscript/pdf/doc form of the paper will be retrieved for viewing and optional printing.

Books & Book Chapters
> The Resilient Earth: Science, Global Warming, and the Future of Humanity, 2008 (in press).

> Performance Modeling of Enterprise Grid Systems chapter in Data Engineering: Mining, Information, and Intelligence, 2008 (in press).

Published Papers
> A Case Study on Grid Performance Modeling , November, 2006.

> Initial Starting Point Analysis for K-Means Clustering: A Case Study , March, 2006.

> Adaptive Automatic Grid Reconfiguration Using Workload Phase Identification , December, 2005.

> Comparison of Protein Structures by Transformation into Dihedral Angle Sequences , August, 1996.

> BioSCAN: A Dynamically Reconfigurable Systolic Array for Biosequence Analysis , June, 1996.

> BioSCAN: A Network Sharable Computational Resource for Searching Biosequence Databases , March, 1996.

> Rapid Protein Structure Classification using One-dimensional Structure Profiles on the BioSCAN Parallel Computer, October, 1995.

> A Scalable Systolic Multiprocessor System for Analysis of Biological Sequences , March, 1993.

Technical Notes
> Design of the BioSCAN server software , April, 1993.

> A Comparison of the BioSCAN Algorithm on Multiple Architectures , May, 1993.

> A Computer Architecture for Fast Aproximate Pattern Matching , April, 1993.

> UnCvL: The University of North Carolina C Vector Library , May, 1993.


The Resilient Earth: Science, Global Warming, and the Future of Humanity, Doug L. Hoffman and Allen Simmons, in press, expected in 2008.

ABSTRACT:

A million years after the birth of our sun, the violent explosion of a nearby supernova nearly ended life on Earth before it began. Over the next four and a half billion years, forces of nature shaped our planet and the life it harbored. Barely surviving the traumatic birth of the Moon, buffeted by supernovae, and bombarded by asteroids, the resilient Earth endured. And despite planet-freezing ice ages, devastating mass extinctions, and ever changing climate, life not only survived, it thrived. Today, we are told all life on Earth is threatened by a new peril--human-caused global warming. The Resilient Earth presents the science behind global warming for a general audience, separating fact from fiction and truth from exaggeration.


Performance Modeling of Enterprise Grid Systems, D. L. Hoffman, A. Apon, L. Dowdy, B. Lu, et al, in Data Engineering: Mining, Information, and Intelligence, T. Talley, J. Talburt, and Y. Chan, Eds., Springer, expected in 2008.

ABSTRACT:

Modeling has long been recognized as an invaluable tool for predicting the performance behavior of computer systems. Modeling software, both commercial and open source, is widely used as a guide for the development of new systems and the upgrading of exiting ones. Unfortunately, no set of comprehensive tools exists for modeling complex distributed computing environments such as the ones found in emerging grid deployments. This chapter addresses concepts, methodologies, and tools that are useful when designing, implementing, and tuning the performance in grid and cluster environments.


A Case Study on Grid Performance Modeling, B. Lu, A. Apon, L. Dowdy, F. Robinson, D. Hoffman, and D. Brewer, nternational Conference on Parallel and Distributed Computing Systems, November 13, 2006.

ABSTRACT:

The purpose of this case study is to develop a performance model for an enterprise grid for performance management and capacity planning1. The target environment includes grid applications such as health-care and financial services where the data is located primarily within the resources of a worldwide corporation. The approach is to build a discrete event simulation model for a representative work-flow grid. Five work-flow classes, found using a customized k-means clustering algorithm characterize the workload of the grid. Analyzing the gap between the simulation and measurement data validates the model. The case study demonstrates that the simulation model can be used to predict the grid system performance given a workload forecast. The model is also used to evaluate alternative scheduling strategies. The simulation model is flexible and easily incorporates several system details.


Initial Starting Point Analysis for K-Means Clustering: A Case Study, F. Robinson, A. Apon, D. Brewer, L. Dowdy, D. Hoffman, B. Lu, Proceedings of ALAR 2006 Conference on Applied Research in Information Technology, March, 2006.

ABSTRACT:

Workload characterization is an important part of systems performance modeling. Clustering is a method used to find classes of jobs within workloads. K-Means is one of the most popular clustering algorithms. Initial starting point values are needed as input parameters when performing k-means clustering. This paper shows that the results of the running the k-means algorithm on the same workload will vary depending on the values chosen as initial starting points. Fourteen methods of composing initial starting point values are compared in a case study. The results indicate that a synthetic method, scrambled midpoints, is an effective starting point method for k-means clustering.


Adaptive Automatic Grid Reconfiguration Using Workload Phase Identification, B. Lu, M. Tinker, A. Apon, D. Hoffman, and L. Dowdy, Proceedings of EScience 2005, December, 2005.

ABSTRACT:

The purpose of this study is to develop an adaptive model of a very large scale data processing and storage environment. The target environment includes grid applications such as health-care and finance in which the data may be located primarily within the resources of a worldwide corporation. The approach is to use phase identification techniques that can detect over-utilized grid resources, and then to make dynamic decisions to reassign additional resources to that portion of the application processing. Two phase identification techniques are proposed, a variation technique and a real-time threshold-based technique. The techniques are validated with a simulation model and a case study using measured data from a production grid environment. The case study demonstrates that phase identification techniques can be used as the intelligent component of a reactive mechanism for a grid to adapt to changing environmental conditions by dynamic automatic reconfiguration. Results show that threshold based phase identifying techniques combined with dynamic resource allocation capabilities are effective in alleviating performance hot spots and improving response time in a large scale data grid.


Comparison of Protein Structures by Transformation into Dihedral Angle Sequences, D. L. Hoffman, PhD dissertation, University of North Carolina at Chapel Hill, 1996.

ABSTRACT:

Proteins are large complex organic molecules that are essential to the existence of life. Decades of study have revealed that proteins having different sequences of amino acids can posses very similar three-dimensional structures. To date, protein structure comparison methods have been accurate but costly in terms of computer time. This dissertation presents a new method for comparing protein structures using dihedral transformations. Atomic XYZ coordinates are transformed into a sequence of dihedral angles, which is then transformed into a sequence of dihedral sectors. Alignment of two sequences of dihedral sectors reveals similarities between the original protein structures. Experiments have shown that this method detects structural similarities between sequences with less than 20% amino acid sequence identity, finding structural similarities that would not have been detected using amino acid alignment techniques. Comparisons can be performed in seconds that had previously taken minutes or hours.


BioSCAN: A Dynamically Reconfigurable Systolic Array for Biosequence Analysis, Raj K. Singh, W. D. Dettloff, V. L. Chi, D. L. Hoffman, S. G. Tell, C. T. White, S. F. Altschul, and B. W. Erickson, Proc. of CERCS96, National Science Foundation, Arlington, VA, June 22-24, 1996.

ABSTRACT:

We describe the design, implementation, and deployment via the Internet of BioSCAN, an application-specific computer system for the rapid determination of statistically significant alignments of biopolymer (DNA, RNA, protein) sequences. BioSCAN continues to outperform other systems designed to perform this basic task of molecular biology, which continues to grow in magnitude and importance. The BioSCAN system is hosted by a general-purpose workstation containing a special-purpose hardware engine that accelerates the core algorithm for comparing two biosequences. Careful partitioning of the computational tasks between hardware and software provides not only high performance but also programmability. The BioSCAN system can compare a sequence of up to 12,992 characters with an arbitrarily large database containing arbitrarily long sequences at a rate of 2 million database characters per second. This rate is nearly 1,000 times greater than the rate achieved by a state-of-the-art workstation using software alone. This network-sharable computational resource is accessible interactively via the World Wide Web using Mosaic, Netscape or other client software.


BioSCAN: A Network-Sharable Computational Resource for Searching Biosequence Databases, Raj K. Singh, D. L. Hoffman, S. G. Tell, and C. T. White, Computer Applications in the Biosciences, Vol. 12, No. 3, 1996, pp. 191-196.

ABSTRACT:

We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations.


Rapid Protein Structure Classification using One-dimensional Structure Profiles on the BioSCAN Parallel Computer, D. L. Hoffman, S. Laiter, Raj K. Singh, I. I. Vaisman, and A. Tropsha, Computer Applications in the Biosciences, Vol. 11, No. 6, 1995, pp. 675-679.

ABSTRACT:

Rapid growth of protein structures database in recent years requires an effective approach for objective comparison and classification of deposited protein structures. We describe a novel method for structure comparison and classification based on the alignment of one-dimensional structure profiles. These profiles are obtained by calculating the OCCO pseudodihedral angles (formed by O-C-C-O atoms of carbonyl groups of consecutive amino acid residues) from protein three-dimensional coordinates. These angle measurements are then converted into a 24 letter alphabet, and the protein structures are represented by sequences of letter from this alphabet. The BioSCAN parallel computer, designed for primary sequence alignment, is used to rapidly align and classify these one-dimensional structure profiles. We have developed and implemented weighted scoring matrix to identify structural classes based on commonly found structural motifs. The results of our experiments are in good agreement with the traditional protein structure classification schemes. One-dimensional structure profiles significantly improve efficiency of structure comparison and classification.


A Scalable Systolic Multiprocessor for Analysis of Biological Sequences, Raj K. Singh, S. G. Tell, C. T. White, D. L. Hoffman, V. L. Chi, and B. W. Erickson, Proc. of the Symposium on Integrated Systems, Seattle, WA, March 3-5, 1993, MIT Press, Cambridge, MA, pp. 168-182.

ABSTRACT:

The design and implementation of an application-specific, fault-tolerant, and scalable multiprocessor system called BioSCAN (Biological Sequence Comparative Analysis Node) are described. Discussed are system partitioning and integration, functional decomposition between hardware and software, the algorithm and its implementation in VLSI, the early results of using the system, and comparison with other hardware and software solutions for biological sequence analysis.


Design of the BioSCAN Server Software, D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-049, 1993.

ABSTRACT:

This paper is an exploration of the design goals for the Biological Sequence Comparative Analysis Node (BioSCAN) network server software and of the impact that these goals had on the overall structure and implementation of that software. The primary audience for this paper consists of computer scientists and computational biologists involved in developing similar server software. Biologists who are users of the BioSCAN computational node and have a desire for deeper understanding of how the server functions will also find this paper useful. It is assumed that the reader is familiar with UNIX and with basic networking concepts. The peculiarities of implementing a network server for a batch resource will be identified and the solutions chosen by the BioSCAN design team explained. Research for the BioSCAN project, including the design of the server software, was supported in part by NSF grant MIP-9024585.


A comparison of the BioSCAN algorithm on Multiple architectures, D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-050, 1993.

ABSTRACT:

This paper compares the performance characteristics of the BioSCAN biological sequence matching algorithm on several different computer architectures. The architectures examined are a conventional {\sc RISC\/} general purpose uni-processor, a vector oriented ``supercomputer'', and a Single Instruction Multi Data ({\sc simd\/}) massively parallel computer. These architectures are represented by the following hardware platforms: a Sun 490 RISC, a Convex 240, and a MasPar MP-1. The performance of these three platforms is compared with that of the custom built BioSCAN hardware.


A Computer Architecture for Fast Aproximate Pattern Matching, R. E. Faith and D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-051, 1993.

ABSTRACT:


UnCvL: The University of North Carolina C Vector Library, R. E. Faith, D. L. Hoffman, and D. G. Stahl, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-063, 1993.

ABSTRACT:


Copyright © 1999, 2007, Doug L. Hoffman, all rights reserved

Questions or comments about this site?
Contact hoffman@bogus.org