Archive for the ‘Verification Planning’ Category

Serious Challenges to Verification Teams

Posted on August 17th, 2010 by admin

What do you see as a serious challenge to verification? In a recent survey, we asked 24 processor verification engineers what they thought were the most daunting problems facing them today. Do you agree with these results? Why not contribute to the discussion and let your thoughts be known?

[click image to enlarge]

Challenges-in-Processor-Verification

Power7 Verification: It’s Not Rocket Science (It’s More Advanced)

Posted on May 26th, 2010 by admin

By Hemendra Talesara

Complexity

In his recent presentation discussing verification of the Power7 processor, John Ludden of IBM opened with a quote from an IBM exec more than a decade ago. “it’s not rocket science”- a perception held by some members of the management and design communities at that time.

However, designs have become a whole lot more complex over time. The Power7 processor at 45nm has 1.2B transistors on a 567 sq. mm die, supporting 8 cores with 4 threads each, an on-chip eDRAM, 3 levels of caches and 2 DDR memory controllers. Yet as verification complexity multiplies in this multi-threaded design, it’s very helpful to have some of the more advanced tools and methodology at your disposal.

Tools and Methodology

Fortunately for Ludden and the Power7 team, IBM has invested in verification technology for years (in spite the quote from the exec). The company continues to develop and rely on in-house tools for many of the advanced verification technologies for processor-specific testing. These include the test-bench, multi-thread test generators, hardware accelerators, formal and semi-formal tools, micro-architecture checkers (API based), cache coherency checkers and coverage tools. Exercisers
originally developed for post-silicon validation were used to exploit the hardware acceleration platform. Forty-five thousand coverage points were organized to assist with big picture and were used to re-direct the test generator and exercisers for accelerators.

To support corner case testing for events that occur rarely, especially in multi-threaded scenarios, software irritator threads were used. These irritators are capable of creating the worst possible contentions. Through their application, twenty-three high quality bugs were revealed hiding in the corners.

A methodical application of these tools and technology clearly captured and advanced the industry best practices.

Designing for Verification

Designing for Verification was an important element in managing the overall risk to verification time line. IBM minimized the risks by maintaining a tight interaction between the specification and verification teams during the design phase and allowing the verification team to maintain architectural changes. “Chicken switches” were placed in silicon that allowed verification team to back-off an area considered risky or possible of otherwise compromising the verification effort. These switches provide workarounds, with some small impact on performance but no functional change, for accessing difficult to verify micro architectural features. Hardware irritators were also used to enable stress testing of corner cases in both pre-silicon and post-silicon testing.

Conclusion

The Power7 draws many architectural features from the Power5 and 6 designs, although it is a much more complex and powerful processor with a much shorter verification cycle. Ludden and the Power7 team accomplished this remarkable feat with a lot of foresight in planning, metrics collection and careful execution. Tight interlocking between metrics collected and verification plan was key part of tracking mechanism and functional closure. This project should serve as an example of how to plan for and manage risks in a complex verification project.

Kudos to John and the IBM team. His full presentation can be downloaded here.

Intelligent Testbench vs. Random Test Generator

Posted on March 11th, 2010 by admin

By Melanie Typaldos

The idea of an intelligent testbench has long been of interest in functional processor verification, although it has always seemed to fall short of expectations when it comes down to just what degree of “intelligence” is really involved. Throughout this document, we will present the argument that a sophisticated, well-evolved, dynamic random test generator, when used as part of a complete verification plan, can be of more value than marketing-driven intelligent verification products.

Defining Intelligent Testbench

An accurate definition of “intelligent testbench” is difficult to find, so let’s begin with that offered by Wikipedia. The intelligent testbench is described as something that “uses information derived from the design and existing test description to automatically update the test description to target design functionality not verified, or covered by the existing tests”[1]. This implies that a feedback loop exists which is capable of creating new test sequences based on what has and has not yet been tested. Other than this closed loop system, the concept is very similar to that of a random test generator.

Not all Test Generators are Created Equally

I remember at one time, we were speaking with a potential client who said something along the lines of “I don’t know why you people want so much money for this RAVEN thing. I can just get a co-op to write one in a week.” He wasn’t wrong in his assumption that a relatively unskilled engineer could conceivably write a generator in a short amount of time. However, the old adage remains that you get what you pay for. A generator of this sort would be incapable of effectively verifying a design of any complexity whatsoever. This is analogous to running every instruction followed by every other instruction and calling verification complete. There are no standard methodologies for constructing test generators, and each one will have different methods for achieving randomness, different capabilities in pipeline exploration, varying abilities in multi-processor testing, etc.

Functional Coverage

Many intelligent testbench products claim to automatically create test sequences based on pre-designated coverage points. However, the belief that hitting every coverage point means that your design is verified is a big mistake. By the very fact that coverage points are singular points in a vast space, they cannot cover the entire design. Engineers can work hundreds of hours writing more coverage points, but it will never be enough to completely verify the design. Because of this, our test generator uses templates (created by engineers) that automatically create sequences to hit not only the coverage point, but also other behaviors around that point.

Figure 1. Abbreviated Flow of Randomness

Random Stimulus Compared to Feedback Looping

RAVEN is very good at what it does; it is designed to hit both simple and complex behaviors randomly with little direction from human users. For instance, if we’re looking at instruction A followed by instruction B with operands X, Y and Z, then we’re going to hit that randomly with ease. Constrained-random templates can replace 95-98% of all directed test sequences. It’s only a matter of time and simulation power applied before we randomly hit all of these simple scenarios and many of the complex ones as well.

The whole point here is that coverage points that are easy to direct (i.e. via feedback), will have already been covered by virtue of random testing. Highly complex behaviors and difficult-to-reach corner cases require a significant degree of architectural knowledge, and they are too difficult and too architecturally dependent to effectively be covered by a piece of software. If an effective feedback loop could be done with good logic or programming skills, we would have already done it.

“At a recent ASIC verification panel discussion at DesignCon, a question was asked about intelligent testbenches — something promised for a long time but not really delivered by the EDA companies. One of the panel members from a design company responded, and said that if you ever tell his engineers that his testbenches are not intelligent then he would be very upset. I am sorry but I have to break the news to him. Testbenches are dumb!” – Brian Bailey [2]

The Promise of Eliminating Redundant Testing

Eliminating “redundant testing” via software is a dangerous thing to do. Suppose that we have two similar sequences of 20 instructions, but the second test has one instruction that is different. Are those tests redundant? They seem like it, and they have a lot in common. But depending on what those instructions are, what registers they use, what the pipeline looks like and whether they took exceptions, that one instruction can be the one that makes the difference. So it’s dangerous for a piece of software to make the supposition that this test is redundant. It could very well be that this next instruction could be the one that reveals an error.

When I was working at a major processor company several years ago, we found a case in silicon where the processor would hang for seemingly no reason. What we found was that an illegal access to a register was causing the error approximately 1000 instructions before the hang would occur. This taught me that sometimes even the designer is not aware of the conditions that can lead to a bug. Designers may know their own block, but the interactions between the blocks can be very complex, and oftentimes this can be confusing even for experienced engineers. So I think that it’s really dangerous to assume that you can get rid of redundant tests in this manner.

But this is not to say that ineffective tests should be continually simulated. Test templates should remain in the suite only as long as they continue to uncover errors. When it no longer seems like it’s finding bugs, that template should be archived and replaced by another. But having a piece of software decide that for you is not a good thing.

Verification Planning for ARM Architectures

Posted on October 8th, 2009 by admin

By Melanie Typaldos and Tim Short

The early stages of functional verification present several unique challenges that must be overcome to achieve first silicon on schedule. This article identifies several common problems that Obsidian recently encountered in the verification of a new ARM based processor design and discusses how the RAVEN random test generator was used in overcoming these obstacles.

Eliminating Bottlenecks

Many organization want to begin verification as soon as the first logical unit of RTL becomes available. However, the tool-chain (i.e. assembler, linker, compiler, etc.) often lags behind RTL development and architectural changes, creating a bottleneck in verification. RAVEN allows verification teams to completely bypass this problem and begin identifying functional errors before the tool-chain is implemented. Customers using RAVEN can have reported large productivity gains over directed testing, especially in the beginning and intermediate stages of verification – hitting 95-98% of all required coverage points using random tests.

Verifying Intermediate RTL

Let’s assume that the ALU unit of your design is ready for testing at an early stage, but the MMU, floating-point and load/store units are not. Some teams might begin by testing each instruction in order, varying only one or two operands for the sake of expedience. However, this creates a problem. Following this methodology, only a small portion of the RTL will be tested and the entire design will be exposed to errors that should have been corrected early on. With the same investment in time, RAVEN can produce significantly better results with only slight modifications to the default template.

Start Test
Random Instruction (ALU)………………..min=10     max=20
End Test

RAVEN template files are used to designate which behaviors should be tested. At the most basic level, these templates provide the ability to interchange instruction groups, instruction quantity, and selection of operands. More advanced configurations allow for the biasing of certain exceptions and addressing/operand modes. This ability is important in the sense that it allows additional behaviors to be added incrementally as bugs are discovered and new areas of the RTL become available in emerging designs.

In the case of our ALU example, the Random Instruction control can be directed to use only instructions in the ALU group and can even disable the selection of individual instructions within that group. Available instructions can then be tested with an almost infinite number of parameter configurations. As further instruction groups and processor features become available (e.g. integer divide, MMU, load/store) RAVEN templates can be automatically updated to include these new behaviors. Simply re-generating the same templates then allows for the creation of thousands of new tests.

Planning for Bugs in Your Testing Methodology

As you’re creating your first tests, it’s important to think about how long they’ll take to simulate and debug. Smart design teams take into account where they are in the development cycle and the amount of processing power available for simulation when creating tests. This way tests can be run and completed within a targeted amount of time.

When a generated test hits a bug, it’s also much easier to isolate and identify the error if it is detected in a test with ten instructions rather than one with a thousand instructions. Even if you hit a bug in only one out of every thousand tests of ten instructions, this is still beneficial when compared to the time required to debug large tests.

Once you’re able to generate 10-20K short tests without finding errors, then it’s time to move up to tests of a hundred instructions and eventually on to tests of a thousand instructions. The reasoning here is that the more rare your bugs then become, the more dependent they will be on processor state, and longer tests will be more useful in this respect. Changing the quantity of instructions in a RAVEN template is as easy as adding extra zeros and regenerating.

Once bugs are found, RAVEN can be configured to continue testing around the error by disabling access to problematic behaviors in the RTL. Most internally developed generators don’t have this fine-grained level of control. It’s far easier for RAVEN to hit new behaviors while avoiding others that aren’t yet ready to be tested than for most other EDA tools on the market.

Processor Verification Methodology

Posted on January 3rd, 2009 by admin

Obsidianʼs coverage-driven verification methodology utilizes functional coverage goals in conjunction with the RAVEN® random test generator to achieve optimal results. RAVEN may be used in a simulation, emulation or post-silicon environment to generate vast amounts of valid and interesting stimuli. Tests are typically generated using the customerʼs Instruction Set Simulator (ISS) with results compared against the RTL.

The Importance of Random Test Generation

Random test generators offer the advantage of automatically hitting coverage points without the need to directly author each test sequence. Modern generators automatically create valid sequences of processor instructions with associated data values, initial values for multilevel caches and initial processor register values. This saves time normally wasted on repetitive register setups and generating invalid test sequences.

Hitting Coverage Goals

Coverage metrics are the dominant method for measuring verification progress in the industry today. Coverage points are normally designated by design engineers according to functional definitions and risk assessment. Obsidianʼs verification methodology begins by generating large numbers of purely random tests. Coverage monitors report the points hit by these tests and are used to identify regions which require further testing.

Closing Coverage Gaps

As the number of coverage points hit by random tests begins to decline, generator templates are updated and constrained toward specific regions of the verification space. Some coverage points may require a long series of events before the targeted behavior takes place. In this case, directed tests and directed templates are used. Directed tests are used to target the most difficult coverage points while directed templates are written to allow as much random behavior around the coverage point as possible.

The RAVEN software retains knowledge of how to reach key verification coverage points including:

  • Every instruction executed following every other instruction
  • FPU overflows and underflows
  • All paths of state transitions throughout the memory hierarchy
  • All possible load / store orderings and conflicts
  • Using instructions to create all possible exceptions
  • Accessing all possible paging modes and physical memory size boundaries
  • Memory sharing with multiple bus masters or multiprocessor tests

Review – Stop Writing Assertions! Creating Efficient Verification Methodologies

Posted on August 28th, 2008 by admin

Introduction

August drew our largest DVClub event yet in Silicon Valley with over 140 attendees coming out to listen to David Whipp of NVIDIA talk about his ideas on redefining how verification gets accomplished. If you missed his presentation, be sure to have a look his paper entitled “Stop Writing Assertions! Creating Efficient Verification Methodologies”.

In his presentation, Whipp makes some clever assertions and points out that “Verification is 70% of the problem” when it comes to chip design. Although this is widely accepted to be true, few verification engineers have done much to change this over the years. Of course, there are companies that offer various products, services, and methodologies to aid in the daunting challenge of verification, but few have sought to break down the verification process into smaller, more manageable components such as Whipp has done here.

Statement of the Problem

In the figure below, Whipp explains the typical workflow of the verification process. That is, a “big paper spec” is first written. Although it quickly becomes outdated, and hopelessly remains so, the spec is deemed vitally important as the team attempts to make it match the actual design. Furthermore, the verification department is tasked with solving all problems shown below in the red boxes. They must write the testbench, checkers, derive useful tests and so on.

Figure 1 – Bad (Traditional) Hardware Development Flow

Common Hardware Development Flow

A Possible Solution

Whipp’s proposed solution for this problem is to reconsider the way in which we approach verification. Rather than writing an enormously bloated paper spec that will be inevitably deemed obsolete, he states that it may be more beneficial to use an executable spec. My first thought is that this is cheating, but the more that I consider the details of doing this, I think that he may have something here. As you can see below, the paper spec does still exist, but it has been drastically reduced in size for manageability.

Figure 2 – Revised Flow

Better Hardware Development Flow

The red boxes above remain the responsibility of the verification team; however, the design team now steps in to handle the green boxes on the left. The remaining white boxes in the middle become shared tasks that both teams must work together on.

Going from 70% to 30%

How can we then reasonably expect to get all of the work done by doing less? The first step is to drop the detailed spec. This frees up the design team to do other things. In the traditional verification model, the natural language spec is never really maintained, and DV engineers struggle to write checkers based on outdated specs, which is inherently problematic. By contrast, Whipp’s new model uses a very short spec. Under this system, architects build a set of models at different levels of abstraction, creating and using them as executable models.

Overworking the Design Team?

The new executable spec will include multiple models at different levels of abstractions including ISSs, thread-based models, and structural models.

The architecture team must then:

  • build models
  • make sure that they are correct
  • connect them to standard interface

But will dropping the Big-Spec even out the workflow given these added tasks? It’s difficult to say and may depend on many other factors, but my first impression is that the architecture team should be more productive at these tasks. If not, all of this added work may create a backlash within some departments, and this could possibly become a difficult challenge to overcome. Design departments may or may not be equipped to undertake the added workload, and it is possible that personnel may need to be shifted between teams to match the newly distributed workflow. But if all of these things can be satisfied, this plan possesses the potential to streamline the verification process and reduce time-to-market figures for the entire project.

Conclusion: ESL, Triage, and Debug

When comparing figures 1 and 2 above, it seems that a lot of the traditional verification effort is shifted to the architecture team and labeled “ESL”. I have to admit, I’ve been hot and cold on ESL. It has the potential to be a really great idea with tremendous vision if implemented correctly, but the engineer in me has a hard time pinning down what exactly ESL is, and ambiguity has a way of making people nervous.

On the typical “BAD” flow, 70% of the boxes fall under DV. On the new and improved flow, it is only 30%. The first major change is ESL. ESL is one of those great things that you can stick anything into. In this case it means that the architects build a useful high level model of the design that can be used by the entire team. At Nvidia, this takes the form of the de-emphasizing the English specification and creating a transaction level specification that IS the spec. In my experience it’s usually the case that most groups end up using the C model as the specification as the design documentation wilts throughout the project. The difference in this case is that using the C model is the goal from the beginning. Assertions are then added to the transaction model as part of the QA cycle. Next, a structural C model is created that models the implementation and allows the assertions to be reused by adoption of common interface points. The end result is a C transaction model and a C implementation model that can be reused in DV with all the architects’ assertions. This is the most interesting implementation of ESL that I’ve heard in a while.