|
What is
TechGrid?
TechGrid is the union of over
600 networked Texas Tech University desktop
WindowsXP lab machines and WindowsXP office
machines, and Linux server machines.
What is a Grid?
A Grid is a dynamic network of computing
resources that work together as a single,
uniform operating environment. It can span
locations and administrative domains, and
can flexibly support dynamically changing
organizations and computing requirements. In
order to accomplish this, Grids must: n
Provide access to resources needed by users,
including processing, data, and
applications. n Provide a security
infrastructure sufficient to support
Grid-wide sharing of resources, yet allow
for local, diverse, and changing security
policies.n Be resilient in the face of
individual system failure or changes in the
composition of the Grid. n Incorporate
heterogeneous hardware, operating systems,
and system configurations.
How has the
concept of a Grid evolved? The term
"Grid computing" originated in the early
1990s as a metaphor for making computer
power as easy to access as an electric power
Grid. The first applications focused on
aggregating or “harvesting” the unused
processing in desktop computers. An example
is SETI@home, which aggregated PC processing
power in order to execute a
computing-intensive application. But
comprehensive Grids go beyond PC cycle
aggregation to include a range of hardware
and operating systems. They also go beyond
processing power to provide access to data
and applications. As a result, they provide
the kind of industrial-strength support that
large organizations need without having to
purchase more super computers.
What type of Grid
is
TechGrid? According to the
figure below,
TechGrid is a campus-wide grid.
How does the Grid
work? The Grid distributes a
compute job among compute nodes within the
Grid using Grid middleware as the modus
operandi to facilitate distributed
computing. The name of the Grid middleware
is Condor.
Where is
TechGrid now?
TechGrid’s compute nodes are located in
the ATLC, the High Performance Computing
Center at Reese Center, the Computer Science
department, the Business Building, the North
Computing Center, and the Math Building.
Currently,
TechGrid is made up of 600 compute nodes
spanning several domains and various
operating systems.
-
What is
Condor? Condor is a specialized
workload management system for
compute-intensive jobs. Like other
full-featured batch systems, Condor
provides a job queuing mechanism,
scheduling policy, priority scheme,
resource monitoring, and resource
management. Users submit their serial or
parallel jobs to Condor, Condor places
them into a queue, chooses when and
where to run the jobs based upon a
policy, carefully monitors their
progress, and ultimately informs the
user upon completion. While providing
functionality similar to that of a more
traditional batch queuing system,
Condor's novel architecture allows it to
succeed in areas where traditional
scheduling systems fail. Condor can be
used to manage a cluster of dedicated
compute nodes (such as a "Beowulf"
cluster). In addition, unique mechanisms
enable Condor to effectively harness
wasted CPU power from otherwise idle
desktop workstations. For instance,
Condor can be configured to only use
desktop machines where the keyboard and
mouse are idle. Should Condor detect
that a machine is no longer available
(such as a key press detected), in many
circumstances Condor is able to
transparently produce a checkpoint and
migrate a job to a different machine
which would otherwise be idle. Condor
does not require a shared file system
across machines - if no shared file
system is available, Condor can transfer
the job's data files on behalf of the
user, or Condor may be able to
transparently redirect all the job's I/O
requests back to the submit machine. As
a result, Condor can be used to
seamlessly combine all of an
organization's computational power into
one resource. The
ClassAd mechanism in Condor provides
an extremely flexible and expressive
framework for matching resource requests
(jobs) with resource offers (machines).
Jobs can easily state both job
requirements and job preferences.
Likewise, machines can specify
requirements and preferences about the
jobs they are willing to run. These
requirements and preferences can be
described in powerful expressions,
resulting in Condor's adaptation to
nearly any desired policy. Condor can be
used to build Grid-style computing
environments that cross administrative
boundaries. Condor's "flocking"
technology allows multiple Condor
compute installations to work together.
Condor incorporates many of the emerging
Grid-based computing methodologies and
protocols. For instance,
Condor-G is fully interoperable with
resources managed by
Globus. Condor is the product of the
Condor Research Project at the
University of Wisconsin-Madison
(UW-Madison), and it was first
installed as a production system in the
UW-Madison Department of Computer
Sciences
nearly 15 years ago.
|