A Placement & Routing Optimizer for Garp

based on

Total Wire Length Minimization


Eylon Caspi *** CS 294-7 *** U.C.Berkeley



CONTENTS




ABSTRACT

A one-dimensional placement and routing optimizer is developed for the Garp Gate Array RISC Processor, using a greedy algorithm to minimize total vertical wire length. The performance of the optimizer with respect to wire allocation is explored and found to be good for logic designs with reasonably sparse inter-row communication. The relevance of the optimizer with respect to retiming is also explored.

INTRODUCTION

It is common practice for modern reconfigurable arrays to introduce hierarchy into interconnect resources. For large enough arrays, connecting all compute blocks is not only impractical with respect to circuit area, but also wasteful, since most applications do not need such complete interconnect. Hence reconfigurable devices typically provide a restricted set of interconnect with some form of hierarchy, for instance wires spanning a power of 2 in inter-block distance. In such environments, long-range interconnect may be significantly more expensive (ex., slower, narrower) than local interconnect. Because local interconnect exists in short supply, logic placement and routing tools face a complicated optimization task of maximizing the locality of logic block communication.

This paper explores one such an optimization scheme for the Garp Gate Array RISC Processor [HAWA97]. Garp, a Berkeley research effort, is a reconfigurable device which includes a conventional MIPS processor core and coarse-grained FPGA on the same die. Garp's reconfigurable array is designed to support wide-bit arithmetic and logic operations, typically using rows (or parts thereof) of logic blocks as bit slices. Hence Garp's horizontal interconnect is plentiful to support bit shifting, while its vertical interconnect is restricted to a hierarchy of lengths. All vertical wires span some power of two in inter-row distance, the shortest wires spanning 4 rows, and the longest spanning the entire height of the array.

Figure 1 In Garp's sequential timing model, "long" vertical wires which span 16 or more rows require an entire cycle for transmission and buffering. "Short" vertical wires which span 8 or 4 rows are free of such restrictions and may comprise data paths which pass through several logic blocks in one cycle. The start- and end- points of wires are staggered throughout the array as shown in Figure 1 (the configuration shown is available identically in each column of logic blocks). Note that Garp's interconnect is not hierarchic in the sense of HFPGAs [AGLE94], as it does not geometrically partition the array into distinct localized parts.

Span-4 wires are the most plentiful and originate from every row. Longer wires are fewer in quantity for any given span and originate only from every 4th, 8th, 16th row, etc. Hence it should be the objective of any logic designer or automated placement/routing tool to use the fast and plentiful span-4 wires whenever possible, or the fast span-8 wires when available. Alternatively, one may avoid vertical wires altogether and communicate via horizontal wires from any row to the next row below it, provided those wires are not already needed for a shifting operation. The use of span-16 or longer vertical wires incurs the penalty of requiring a full cycle and a buffer register at the destination.

Two interrelated problems arise in logic synthesis for an interconnect hierarchy such as Garp's. During logic retiming, it is important to know whether a communications line will be mapped to a long vertical wire, necessitating a cycle delay and a buffer. Once timing has been fixed, there remains the problem of allocating short and long wires from a physically available configuration (as in Fig.1) such that Garp's interconnect timing restrictions are not violated. A circular dependency arises, since in order to minimize data-path delay, the retiming mechanism needs access to interconnect information that is traditionally handled by a placement and routing mechanism.

At present, the suite of automated synthesis tools for Garp is such the logic mapping, retiming, and placement facility (Gama, the Garp Mapper) is entirely separate from the final routing tool (gatoconfig, the Garp Configurator). The logic and placement phase, either a manual design or using Gama, specifies vertical wiring requirements by source row, destination row, and a "short" or "long" attribute. It remains the job of the Configurator program to allocate wires from the physically available pool, given a specified row placement. Because Gama has no detailed model of vertical wire positioning, and because there do not exist wires of arbitrary length between arbitrary rows, Gama defaults to specifying a long wire for any communication that spans more than 5 rows. Although such restrained use of short wires makes the Configurator's job rather easy, the Configurator can do no more than report an error if it fails to allocate enough wires for the given row placement.

The disparity between the mapping, placement, and routing phases can be alleviated by a row-placement engine which places communicating rows as close as possible, so as to use the shortest wires possible. For optimal mapping, such a placement engine should have complete knowledge of a device's physical wire availability, and could give feedback to a logic retiming facility in some kind of iterative mapping optimization. In practice, however, such an iterative process may be too slow for comfort, necessitating some simplifications. One approach is to simplify the placement engine's knowledge of wiring resources so that it can give faster feedback to the retiming facility. Alternatively, one could implement the placement engine as a post-pass optimizer, to affect only placement for a design with fixed timing. The latter method is explored in this paper, with the optimizer being coded into the Configurator.

The objective of the post-pass placement/routing optimizer presented herein is to assist the Configurator's wire allocation phase, namely by moving the rows of a configuration until a suitable set of short wires can be assigned to it. Ideally, such an optimizer would free the mapping and retiming mechanism from worrying about the endpoints of wires, and enable it to confidently specify a short wire for communications spanning 8 (or possibly more) logic rows. In addition, analysis of the optimizer's actions and performance can help determine the potential value of implementing a similar placement engine to interact directly with Gama's retiming facility. If enough of Gama's needlessly long wires can be converted to short wires, then it would be valuable to add a similar optimizer into Gama.


FORMULATION

A precise formulation of the placement optimization problem must capture the intent that communication between rows is to be "localized" to use the shortest wires possible. In the case of Garp, the problem is thankfully reduced to a one-dimensional case, as only vertical placement of rows is in question. There has been much success in the literature for placement using min-cut partitioning techniques [FIMA82], [HABO96]. Because Garp's interconnect structure does not naturally partition the array into parts, it is not clear that partitioning -- including the common recursive formulation of bipartitioning -- would effect appropriate uniform localization throughout the array.

Instead, we formulate the placement optimization problem as finding a series of row motions to minimize total vertical wire length. This formulation captures in a very intuitive manner the intent of localization for Garp - to use the shortest wire possible for any communication. The formulation also supports the conversion of length-1 vertical wires into horizontal wires between rows, an optimization which essentially frees up a span-4 vertical wire. Comparing this formulation of the placement optimization problem to min-cut techniques would be an interesting endeavor, though it is beyond the scope of this paper.

The optimization problem uses several simplifying assumptions. First, the array configuration is collapsed into a single representative row. This simplification is appropriate for a one-dimensional optimization problem, where most of the logic blocks on a row are identically- routed as bit slices of a wide-bit operation. This simplification, however, is inaccurate for the case where different columns of a row drive unrelated vertical wires in different directions. Because in Garp a logic block can drive only one vertical wire at a time, the unrelated wires appear in the collapsed model as a single wire emanating in both directions from its source. Although this representative wire will effect valid length minimization for all its constituent wires, its incorrect length could make it seem as an illegally- long short wire. Hence at present, the optimizer ignores such cases and does not guarantee that wires designated "short" will in fact remain span-8 or shorter through row motions.

A second simplification to the optimizer is that it essentially ignores the physical start- and end- points of Garp's wires. The total wire length to be minimized is taken to be the sum of spans of all vertical communication lines, irrespective of the endpoints of physically available wires. This idealization, as well as the simplification described in the previous paragraph, gives the optimizer a much-needed freedom in moving rows of an array configuration through physically illegal stages. However, these simplifications have the undesirable effect of allowing the optimizer to reach a minimum-length configuration that still fails wire allocation. A possible work-around is to attempt wire allocation after every row motion, though this can severely slow the minimization process. In the analyses of this paper, we always allow the optimizer to complete length minimization before attempting to allocate wires. The analyses are pessimistic in that the optimizer may reach an illegal final configuration after missing a legal intermediate configuration, but the analyses capture the minimum performance to be expected from the optimizer.


ALGORITHM

Given a cost function such as total vertical wire length, there are a variety of approaches to minimizing cost. The analogous graph problem defines vertices for rows, directed edges for vertical communication lines, and edges weights by inter-row distance. Row motions correspond to transformations of edge weights which do not affect connectivity. However, the function to be minimized is non-trivial (not simply the sum of edge weights), and the author is not aware of any polynomial time solutions. Polynomial time approximations for cost minimization include gradient descent methods, greedy heuristic methods, and simulated annealing [KIGE83].

The algorithm devised for this paper is a greedy one of order approximately O(N^4) in the number of rows. The cost function to be minimized is the straightforward sum of (idealized) vertical wire lengths. It may be a good idea to weight the length of designated "short" wires more than "long" wires, since the availability of short wires is crucial for timing. However, no weighting is done in this particular implementation, which (as noted above) largely ignores the difference between short and long wires.

The basis of the greedy algorithm is to move contiguous blocks of rows so as to minimize total wire length. The notion of moving blocks of rows is an attempt to capture the natural spatial localization of logic computation. An array configuration tends to have subcomputation regions with dense local communication but only sparse communication to other blocks (in the extreme, a memory access in the Garp array requires no vertical i/o, so a subcomputation may have zero connections to the rest of the array). Hence there can arise cases where moving a block of rows will reduce total wire length when moving a single row will only increase it.

Figure 2 The greedy algorithm attempts to move increasingly large blocks of rows. At each step, the algorithm performs the best possible row-block motion with the current block size, i.e. the motion which reduces total wire length most drastically. The algorithm continues using the same block size until no length-reducing motions are possible, at which time it tries the next largest block size. Note that it is not necessary to check block sizes larger than half the array's rows, since a motion of any block larger than half corresponds precisely to a motion in the opposite direction of a block smaller than half. Figure 2 depicts the downward motion of a block in blue and the corresponding upward motion of a block in pink. Note that the motion effectively switches the relative positions of the two blocks.

Pseudo-code for the greedy algorithm follows:

  for blockSize = 1 .. (number of rows) / 2
  {
    repeat until exhaust length-reducing motions at this block size
    {
      for sourceRow = 1 .. (number of rows)
      {
        for destRow = 1 .. (number of rows)
        {
          evaluate the change in total wire length
          for moving blockSize rows from sourceRow to destRow
        }
      }
      perform best length-reducing motion discovered above
    }
  }
  
Evaluating the change in total wire length for a given row motion (in the innermost loop) requires examining the length change of every vertical wire which passes the blocks being switched. Since each logic cell in Garp can drive only one vertical wire at a time, the evaluation of a motion can be performed in linear time in the number of rows, by maintaining and perturbing a list of vertical wire endpoints. Specifically, the evaluation is linear in the number of vertical wires used, which may be significantly smaller than the number of rows. The cost of the repeat-until loop is not obvious, though experience suggests that it is smaller than linear and is quite small. Thus the algorithm run-time is between O(N^4) and O(N^5) in the number of rows.

The number of iterations in the loop structure is actually lower than the pseudo-code suggests, because not all motions are legal. For instance, the destination row cannot be inside the source block of rows being moved. Also, a row that receives input via horizontal wires from the row above it (a fused row) may not be separated from that row by any motion.

Note that this greedy algorithm is not guaranteed to find a global wire-length minimum. The local minimum to which it settles is sensitive to the choice of "best row motion" among motions with equal reductions in wire length.


RESULTS - Optimization of Gama Output

We test the performance of the placement and routing optimizer on array configurations produced by Gama from three well-known programs: espresso 2.3, eqntott 9, and gzip 1.2.4. The first two are SPEC integer benchmarks, while the latter is popular tool for compression/decompression. These programs represent a good mix of general purpose applications, featuring boolean logic, addition, loops structures, and memory accesses. The Suif C compiler and Gama are used to extract 139 computational kernels from the C source code of these 3 programs, and to convert them into ".ga" format Garp array descriptions. The processing done herein includes logic mapping from C source to Garp logic blocks, as well as placement, but not wire allocation. The array descriptions are subsequently passed through the optimizing Garp Configurator for verification and wire allocation.

Of the 139 computational kernels, we find that only 4 array configurations fail wire allocation without the optimizer, and only 1 fails after optimization. Thus we see that the optimizer finds little use in improving the mappability of Gama output. We do nevertheless wish to study the performance of the optimizer in reducing total wire length, as well as in reducing long wires to potentially short ones. Of the 139 kernels, we analyze the 30 largest array configurations, from 14 to 35 rows long, plus one 94 row design.

Figure 3 Reductions in total wire length of up to 82% are seen in the 30 largest configurations, and up to 100% in the smaller configurations. Note that 100% reduction means that all vertical wire requirements were converted to use horizontal wires between adjacent rows. The optimizer typically needs less than 10 row-block motions to settle on a minimum total wire length, moving mostly single rows at a time, but occasionally moves 10 or more rows in a single block. The scatter plot of Figure 3 reveals an approximately linear relationship between array configuration size and the number of motions performed during optimization. This finding indicates that the repeat-until loop of the pseudo-code requires far less than linear time, and that the optimizer run-time is nearly O(N^4) in the number of rows.

A sample run of the optimizer is shown below, with an ASCII representation of vertical wiring requirements between rows. Rows are listed in physical order, labeled by their original numerical position. Note that all single fanout wires are reduced to horizontal wires, and that the long, high-fanout wires are all reduced to short-wire lengths. A symbol legend for the ASCII graph follows:

+ signal source
| span of wire
L signal destination, reached by long wire
S signal destination, reached by short wire
^ signal destination, reached by horizontal wire

design is initially mappable
5 of 33 rows fused (15.15%)
initial totalWireLength:  50
v-wire graph:
   0:  +       L             
   1:  ^       |             
   2:          |             
   3:          |             
   4:          L             
   5:          |             
   6:          |             
   7:          |             
   8:    +     |             
   9:    ^ +   |             
  10:      |   |             
  11:      L + |             
  12:      | ^ |             
  13:      |   |             
  14:      |   |             
  15:      |   |             
  16:      |   |             
  17:      |   |   L         
  18:      |   L   |         
  19:      |   | L |         
  20:      |   + | |         
  21:      |   L | |         
  22:      |     + |         
  23:      |       +         
  24:      |         +       
  25:      |         ^ +     
  26:      |           ^     
  27:      |             +   
  28:      |             ^   
  29:      L                 
  30:                        
  31:                      + 
  32:                      ^ 
performing motion: { srcRow=17 numRowsToMove=1 destRow=24 } saving 7
performing motion: { srcRow=18 numRowsToMove=1 destRow=22 } saving 4
performing motion: { srcRow=13 numRowsToMove=1 destRow=0 } saving 2
performing motion: { srcRow=14 numRowsToMove=1 destRow=0 } saving 2
performing motion: { srcRow=15 numRowsToMove=1 destRow=0 } saving 2
performing motion: { srcRow=16 numRowsToMove=1 destRow=0 } saving 2
performing motion: { srcRow=6 numRowsToMove=1 destRow=0 } saving 1
performing motion: { srcRow=11 numRowsToMove=1 destRow=0 } saving 1
performing motion: { srcRow=17 numRowsToMove=1 destRow=6 } saving 1
performing motion: { srcRow=18 numRowsToMove=1 destRow=6 } saving 1
performing motion: { srcRow=19 numRowsToMove=1 destRow=6 } saving 8
performing motion: { srcRow=11 numRowsToMove=2 destRow=6 } saving 2
performing motion: { srcRow=20 numRowsToMove=2 destRow=0 } saving 2
performing motion: { srcRow=22 numRowsToMove=2 destRow=0 } saving 2
performing motion: { srcRow=24 numRowsToMove=3 destRow=0 } saving 3
performing motion: { srcRow=27 numRowsToMove=4 destRow=22 } saving 1
v-wire graph:
  24:                +       
  25:                ^ +     
  26:                  ^     
  23:              +         
  17:              ^         
  22:            +           
  19:            ^           
   7:                        
   2:                        
  16:                        
  15:                        
  14:                        
  13:                        
   3:                        
   4:          L             
  21:          L             
  20:          +             
  18:          L             
   0:  +       L             
   1:  ^                     
   5:                        
   6:                        
  27:                    +   
  28:                    ^   
  29:      L                 
  30:      |                 
   8:    + |                 
   9:    ^ +                 
  10:      |                 
  11:      L +               
  12:        ^               
  31:                      + 
  32:                      ^ 
final   totalWireLength:  9  (saved 41, 82.00%)
optimized design is mappable

In order to study how the optimizer behaves for different kinds of array configurations, we identify several characteristic parameters:

- Configuration size: Number of rows
- Wire density: Percentage of rows which output to a vertical wire.
Array configurations with high wire density are more prone to running out of wires during wire allocation.
- Fused row density: Percentage of rows receiving input via horizontal wires from the row above (hence "fused" to that row).
A high fused row density may hinder optimizer performance, since there are fewer possible row motions.

Figure 4
Figures 4(a) - 4(c)

Figures 4(a) - 4(c) scatter-plot the percent reduction in total wire length versus each of the above three parameters. We notice a slight positive correlation between configuration size and percent length reduction. This point suggests that larger configurations have more needlessly long wires, perhaps due to long lines between distant computational blocks. Automated row shuffling that would not be intuitive to a manual designer or simple placement tool can have a big effect in such cases. Also, there appears to be a weak negative correlation between fused row density and percent length reduction. As expected, when more rows are fused by horizontal wires and cannot be separated, fewer row motions are possible, and the optimizer cannot do as good a job. Interestingly, there does not appear to be any meaningful correlation between wire density and percent length reduction.

Figure 5
Figures 5(a) - 5(c)

Figures 5(a) - 5(c) scatter-plot the percentage of vertical wires shrunk to a shorter type versus the above three parameters. These plots depict wires shrunk to fit in span-8 and span-4 wires, as well as wires shrunk to length 1 (span 2). Although there are no length-1 vertical wires, most length-1 connections are in fact converted into horizontal wires, saving the use of one vertical wire. Wires shrunk down to span-8 are particularly interesting because they represent long wires that might have been retimed to be short. We see that up to 50% of a configuration's wires can be shrunk from long to span-8, and some may be shrunk further. If these long wires are in the critical path of a logic computation, then they could be retimed as short wires to save cycles.

The correlations in figures 5(a) - 5(c) are not so clear. There appears to be a weak negative correlation in all three cases for wires shrunk to length-1. That is to say, more wires can be shrunk to length-1 in smaller as well as sparser configurations. A similar very weak correlation seems to exist for wires reduced to span-8. These trends suggest that the optimizer's effectiveness is limited for designs with dense communication (high wire density, high fused row density).


EXTRAPOLATION - Optimization of Random Configurations

In the above analysis, we found that nearly all array configurations produced by Gama are mappable, i.e. pass wire allocation. Thus the question remains unanswered: how well does wire length minimization fare in making unwirable placements legally wirable? In the above analysis we saw a weak trend to suggested that this optimization is less effective as inter-row communication grows denser. Yet Gama does not provide enough data points to verify these notions. In this section, we generate pseudo-random array configurations with denser communications in order to evaluate the optimizer.

The pseudo-random array configurations are characterized by four parameters pertaining to inter-row communication:

- Wire density: Percentage of rows which output to a vertical wire
- Exponential fanout distribution: Most vertical wires in Gama-emitted array configurations fan out to one or two destination rows. A few long wires fan out to 3, 4, or more rows. We model fanout by an exponential random variable with a settable exponential parameter "fe". Thus the probability density for fanout is given by: p(x)==Cexp{-fe*x} for fanout x>0 (disallow x=0; C is a normalizing constant). The mean fanout for such a distribution is (1 + 1/fe). A fanout exponent fe=2 (mean fanout 1.5) models the fanout of Gama configurations fairly well.
- Exponential wire length distribution: Most vertical wires in Gama-emitted array configurations travel only a few rows to their destination, while a few long wires may span the entire array. We model inter-row communications distance by an exponential random variable with a settable exponential parameter "we". Thus the probability density for inter-row communications distance is given by: p(x)==Cexp{-we*x} for distance x>0 (disallow x=0; C is a normalizing constant). An expression for mean wire length is difficult to derive from such a distribution of inter-row communications distances, but we may estimate it at 2(1 + 1/fe). Note that a wire length exponent of 0 signifies a uniform distribution of wire lengths over all possible lengths.
- Fused row density: The percentage of rows receiving input from the row above via horizontal wire, so that they cannot be separated from that row by row motions. The Gama-emitted designs of the previous section have a mean fused row density of approximately 30%.

To generate realistic pseudo-random array configurations, we fix the fanout exponent and fused row density to match those of Gama-emitted designs. We can then generate configurations of varying communication densities by varying the wire density and wire length exponent. To test for mappability of such designs, we generate them with all vertical wires designated as short. Thus we can easily tell when wires are too long or too numerous to pass wire allocation, both before and after the optimization. All designs are made 32 rows long, as this is the size of the Garp implementation currently proposed for fabrication.

Figure 6
Figures 6(a), 6(b): fused row density 30%, fanout exponent 2

Figures 6(a) and 6(b) scatter-plot the results of varying wire density and wire length exponent with other parameters fixed to match Gama output: a fanout exponent of 2, and fused row density of 30%. Figure 6(a) plots the percentage of pseudo-random designs that pass wire-allocation before optimization is applied. Notice that designs with low enough wire density and short enough wires (i.e. high wire length exponent) have 100% success in initial mapping. Figure 6(b) plots the percentage of initially unmappable designs which were converted by the optimizer into valid, wirable configurations. We see that the optimizer succeeds in correcting designs with reasonably small wire densities (mostly below 50%) but fails for denser designs. This is precisely the trend suggested at the end of the previous section.

Figure 7
Figures 7(a), 7(b): fused row density 30%, fanout exponent 1

To make further use of the parametric pseudo-random array configuration model, we examine the optimizer's performance in the presence of higher fanout. By reducing the fanout exponent at 1, we increase the mean fanout to 2 and allow a wider variance of fanouts. Figures 7(a) and 7(b) plot the mappability results for this case in a manner analogous to figure 6. We see in Figure 7(a) that the percentage of initially mappable (wirable) designs drops off earlier than before in the presence of high wire densities and longer wires (smaller wire length exponent). The added fanout clearly makes it more difficult to allocate wires for these designs. In Figure 7(b), we see that the optimizer does not fare as well as before in recovering unwirable designs -- it succeeds only for sparse designs with wire density under 30%, and even there, no more than 60% of unwirable designs can be made wirable. In addition, we see some negative values in this graph, indicating that the optimization process made an initially mappable design unwirable in the end.

Figure 8
Figures 8(a), 8(b): fused row density 60%, fanout exponent 2

Another point to explore in the parametric model is the effect of higher fused row densities. We are likely to find higher usage of horizontal wires for shifting in arithmetic applications, especially in managing the partial products of a multiplier. Gama currently does not support general purpose multipliers, so such data points are missing from the analysis of the previous section. Figures 8(a) and 8(b) scatter-plot wirability results with a fused row density of 60% and fanout exponent of 2. Here we find that the optimizer can recover unwirable designs at all wire densities, though with a limited success rate of 40%-80%. Thus in the presence of many horizontally fused rows, it is something of a gamble whether or not the optimizer can emit a wirable design.


CONCLUSION

A one-dimensional row placement and routing optimizer based on minimizing total vertical wire length has been presented. We find that the optimizer is effective in assisting a wire-allocation routing phase only for logic designs with reasonably low communication requirements. The optimizer fails at making an unwirable design wirable as the density and fanout of logic outputs grows. Optimizer performance also degrades in the presence of horizontally-fused rows which restrict its selection of row motions. These results are somewhat pessimistic since the optimizer was allowed to skip over intermediate wirable array configurations in its search for a minimum-wire-length configuration. It is likely that the limited benefits of the optimizer to improving wirability stem directly from its ignorance of Garp's vertical wire positioning. The minimization of total wire length may be better suited to work in combination with other post-pass mechanisms which have a more specific model of Garp wire availability.

The analysis of Gama-emitted configurations suggests that there may be some benefit for retiming in the use of a length-minimizing placement engine early in the logic mapping phase. Such an engine could reduce critical path delays and buffers associated with needlessly long wires. The reliability of this conclusion is not so clear, since the optimizer used in this paper is coded into a late-phase Configurator and has no knowledge of the retiming decisions made by Gama.


ACKNOWLEDGEMENTS

Thanks to John Hauser for continued guidance on Garp and for providing source code to the Garp Configurator. Thanks to Timothy Callahan for enhancements to Gama and for providing benchmark test cases. Thanks to Andre' DeHon for guidance in the formulation of this project.


REFERENCES

[AGLE94]
Aditya A. Agarwal and David Lewis. Routing Architectures for Hierarchical Field Programmable Gate Arrays. In Proceedings 1994 IEEE International Conference on Computer Design, pp. 475--478. IEEE, October 1994.
[FIMA82]
C. M. Fiduccia and R. M. Mattheyses. A Linear Time Heuristic for Improving Network Partitions. In Proceedings of the 19th Design Automation Conference, pp. 175--181, 1982.
[HABO96]
Scott Hauck and Gaetano Borriello, "An Evaluation of Bipartitioning Techniques", 1996. [HTML]
[KIGE83]
S. Kirkpatrik, C. D. Gellatt, Jr., and M. P. Vecchi. "Optimization by Simulated Annealing." Science, 220(4598):671--680, May 1983.
[HAWA97]
J. Hauser and J. Wawrzynek. "Garp: A MIPS Processor with a Reconfigurable Coprocessor." In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '97, April 16-18), pp. 24-33, 1997. [HTML]
[HAUS97]
J. Hauser. The Garp Architecture. U.C.Berkeley, 1997.



This report is part of a class project for Andre' DeHon's course: " CS 294-7 Reconfigurable Computing", spring 1997, Department of Computer Science, U.C.Berkeley.

Comments to Eylon Caspi.
May 6, 1997