Clock-Tree Synthesis

For synchronized designs, data transfer between functional elements are synchronized by clock signals. In a top level digital design, you will have one more more clock sources, like PLLs or oscillators within the chip. You may also have an external clock source connection through an IO. For a digital only block, you will have a clock pin that will be the clock source for the block in question. Clock balancing is important for meeting the design constraints and clock tree synthesis is done after placement to achieve the performance goals.
After placement you have positions of all the cells, including macros and standard cells. However, you still have an ideal clock. (For simplicity, we will assume that we are dealing with a single clock for the whole design). At this stage, buffer insertion and gate sizing and any other optimization technique is employed on the data paths, but no change is done to the clock net.
The same clock net connects all the synchronous elements in the design, irrespective of the number.
This is how your design’s clock network is at this point.
clock net before CTS
clock net before CTS
This is definitely not something we want. Think just about the load of one clock net. No driver can drive that many flops! But when it is a synchronising signal like clock, load or fanout is not the only thing we are worried about. We also want a “balanced” tree, that is the skew value for the clock tree should be zero. After clock tree synthesis, the clock net will be buffered as below.
clocktree
Clock Net After CTS.
The main concerns in CTS are:
  1. Skew – One of the major goals of CTS is to reduce clock skew.
    Let is see some definitions before we go into clock skew.
    • Clock Source
      Clock sources may be external or internal to your chip/block. But for CTS, what we are concerned about is the point from where the clock propagation starts for the digital circuitry. The can be a IO port, outputs or PLL,Oscillators, or even the outputs of a gate down the line. (e.g a mux output).A clock source for CTS may also be specified using ‘create_generated_clock’ command. This defines an internally generated clock for which you want to build a separate tree, with it’s own skew, timing and inter-clock relations.
      You specify the clock source(s), using the command create_clock.
    • Clock Sinks
      Sinks or clock stop points are nodes which receive the clock. Default sinks are the clock pins of your synchronous elements like Flipflops.
    Now let us define skew as the maximum difference among the delays from the clock source to clock sinks..
    clockskew
    In the picture above, the delay to clock sinks are given. The skew in this case is the difference between the maximum delay and minimum delay.
    Skew = 20ns-5ns = 15ns
    The goal of clock tree synthesis is to get the skew in the design to be close to zero. i.e. every clock sink should get the clock at the same time.
  2. Power – Clock is a major power consumer in your design. Clock power consumption depends on switching activity and wire length. Switching activity is high, since clock toggles constantly. Clock gating is a common technique for reducing clock power by shutting off the clock to unused sinks. Clock gating per se is not done in layout; it should be incorporated in the design. However,lock tree synthesis tools can recognise the clock gates, and also do a power aware CTS.
    clockgating
    In the picture above, FF1 gets the ungated clock CLK, and FF2 and any subsequent flop gets a gated clock. This clock is turned on only when the signal EN is present. (See ICGcells)
Make sure that you specify the clock as propagated at CTS stage. i.e. instead of ideal delay for clock, you are now calculating the actual delay value for the clock. This will in turn give you a more realistic report of the timing of the design. You can propagate the clock using the command set_propgated_clock [all_clocks]

3 comments:

  1. Hello sir, What is insertion delay?

    ReplyDelete
  2. Question 1: In any design two scenarios are there
    A. Skew is 200Ps and insertion delay is 600Ps
    B. Skew is 300Ps and insertion delay is 500Ps
    Both the cases setup and hold are meeting and also targeted skew also meeting then which one will you prefer and why???????

    ReplyDelete
    Replies
    1. B. looks to be better in-terms of power. But this cannot be quantified unless we know the clock structure (there can be lot of un-common clock tree structures as well giving rise to more buffers)

      Delete