Posts Tagged ‘random test generators’

Intelligent Testbench vs. Random Test Generator

Posted on March 11th, 2010 by admin

By Melanie Typaldos

The idea of an intelligent testbench has long been of interest in functional processor verification, although it has always seemed to fall short of expectations when it comes down to just what degree of “intelligence” is really involved. Throughout this document, we will present the argument that a sophisticated, well-evolved, dynamic random test generator, when used as part of a complete verification plan, can be of more value than marketing-driven intelligent verification products.

Defining Intelligent Testbench

An accurate definition of “intelligent testbench” is difficult to find, so let’s begin with that offered by Wikipedia. The intelligent testbench is described as something that “uses information derived from the design and existing test description to automatically update the test description to target design functionality not verified, or covered by the existing tests”[1]. This implies that a feedback loop exists which is capable of creating new test sequences based on what has and has not yet been tested. Other than this closed loop system, the concept is very similar to that of a random test generator.

Not all Test Generators are Created Equally

I remember at one time, we were speaking with a potential client who said something along the lines of “I don’t know why you people want so much money for this RAVEN thing. I can just get a co-op to write one in a week.” He wasn’t wrong in his assumption that a relatively unskilled engineer could conceivably write a generator in a short amount of time. However, the old adage remains that you get what you pay for. A generator of this sort would be incapable of effectively verifying a design of any complexity whatsoever. This is analogous to running every instruction followed by every other instruction and calling verification complete. There are no standard methodologies for constructing test generators, and each one will have different methods for achieving randomness, different capabilities in pipeline exploration, varying abilities in multi-processor testing, etc.

Functional Coverage

Many intelligent testbench products claim to automatically create test sequences based on pre-designated coverage points. However, the belief that hitting every coverage point means that your design is verified is a big mistake. By the very fact that coverage points are singular points in a vast space, they cannot cover the entire design. Engineers can work hundreds of hours writing more coverage points, but it will never be enough to completely verify the design. Because of this, our test generator uses templates (created by engineers) that automatically create sequences to hit not only the coverage point, but also other behaviors around that point.

Figure 1. Abbreviated Flow of Randomness

Random Stimulus Compared to Feedback Looping

RAVEN is very good at what it does; it is designed to hit both simple and complex behaviors randomly with little direction from human users. For instance, if we’re looking at instruction A followed by instruction B with operands X, Y and Z, then we’re going to hit that randomly with ease. Constrained-random templates can replace 95-98% of all directed test sequences. It’s only a matter of time and simulation power applied before we randomly hit all of these simple scenarios and many of the complex ones as well.

The whole point here is that coverage points that are easy to direct (i.e. via feedback), will have already been covered by virtue of random testing. Highly complex behaviors and difficult-to-reach corner cases require a significant degree of architectural knowledge, and they are too difficult and too architecturally dependent to effectively be covered by a piece of software. If an effective feedback loop could be done with good logic or programming skills, we would have already done it.

“At a recent ASIC verification panel discussion at DesignCon, a question was asked about intelligent testbenches — something promised for a long time but not really delivered by the EDA companies. One of the panel members from a design company responded, and said that if you ever tell his engineers that his testbenches are not intelligent then he would be very upset. I am sorry but I have to break the news to him. Testbenches are dumb!” – Brian Bailey [2]

The Promise of Eliminating Redundant Testing

Eliminating “redundant testing” via software is a dangerous thing to do. Suppose that we have two similar sequences of 20 instructions, but the second test has one instruction that is different. Are those tests redundant? They seem like it, and they have a lot in common. But depending on what those instructions are, what registers they use, what the pipeline looks like and whether they took exceptions, that one instruction can be the one that makes the difference. So it’s dangerous for a piece of software to make the supposition that this test is redundant. It could very well be that this next instruction could be the one that reveals an error.

When I was working at a major processor company several years ago, we found a case in silicon where the processor would hang for seemingly no reason. What we found was that an illegal access to a register was causing the error approximately 1000 instructions before the hang would occur. This taught me that sometimes even the designer is not aware of the conditions that can lead to a bug. Designers may know their own block, but the interactions between the blocks can be very complex, and oftentimes this can be confusing even for experienced engineers. So I think that it’s really dangerous to assume that you can get rid of redundant tests in this manner.

But this is not to say that ineffective tests should be continually simulated. Test templates should remain in the suite only as long as they continue to uncover errors. When it no longer seems like it’s finding bugs, that template should be archived and replaced by another. But having a piece of software decide that for you is not a good thing.

Using RAVEN to Generate Self-Checking Tests in Post Silicon Validation

Posted on October 2nd, 2009 by admin

By Tim Short and Melanie Typaldos, Obsidian Software

Sure, self-checking code can be used with directed tests. But it’s time consuming, cumbersome and there’s no randomization. RAVEN has several features that make it work really well in a silicon validation environment for creating self-checking tests.

Inherent Self-Checking

Taking them one by one. If you’re leaving RAVEN undirected, which is usually what you want to do, then what gets intermixed are lots of jumps that depend on the previously generated values. Since tests generated by RAVEN can go anywhere and do anything, it’s not uncommon for complex tests to end with a jump off of some value, leading to another instruction sequence. So, inherently RAVEN tests are self checking.

Inside of a random test, you’ll usually have lots of jumps. If any of the calculations leading to those jumps change, then you’ll jump off into an area that you don’t want to be fairly quickly because some calculated value went wrong. Depending on your architecture, what some companies will do is to preload the memory area so that undefined instructions will cause traps resulting in a fail. Now you have to go into your silicon and figure out how you got to this point, but at least you know that you have a failure. Directed tests won’t have the same results as RAVEN because the engineers writing them won’t jump off of their results into strange places. It’s too hard for humans to use the results of calculations as jump targets, but for RAVEN it’s actually quite easy.

Configurable Self-Checking Features

The second thing that users can do is to turn on RAVEN’s self-checking feature, which includes a number of options for how you want to do self-checking. This feature tells RAVEN to insert self-checking code, much like a directed programmer would do to validate something, like a series registers. Since RAVEN knows when registers are updated, we can tell it to check all registers to make sure that their information is valid. Alternatively, we can check only the registers that were written or read. We can do this check after a preset number of instructions, at the end of the sequence, or it can be randomized to occur between a certain number of instructions.

Adapting Self-Checking Tests to Fit the Hardware Environment

For many customers, the actual test or hardware that they are operating in is embedded in an SoC with some test mode that allows them to bring signals out. But their environment is very different from the environment of their simulation world. In simulation, they may have a large amount of memory to use for testing. In this new environment though, they may want to restrict the tests to use only the on-board device memory. This might be a only very small amount, let’s say somewhere between 256K and 2MB.

Because RAVEN has configuration files that describe the environment that the chip is in, you can move tests originally written for a 4GB address space into 1MB of memory. RAVEN can then re-generate all of the tasks from your templates, forcing them into that much more constrained memory area. Now you can take your whole suite, and probably with some exceptions, regenerate all of your tests toward your real hardware platform and even re-simulate them again in their original environment with slight modifications to mimic what the hardware will look like. If all of these tests can be successfully run at full speed, then there is a high degree of confidence that the model is accurately reflected in the design and that there won’t be hidden problems in the silicon.

Ability to Verify RTL and Instruction Set Simulator Agreement

Another, more comprehensive method of self-checking in the RTL environment is the intermediate state information provided by RAVEN. Our test output files contain information about all updates to registers and memory that occur as a result of instruction execution. The testbench can be instrumented so that there are checkers that watch registers and memory to make sure that they progress through values predicted by simulation. This allows the testbench to detect the discrepancy in the exact instruction that caused it or within just a few cycles of that instruction, greatly reducing the time required for a verification engineer to isolate the problem.

The Evolution of Processor Test Generation Technology

Posted on February 12th, 2009 by admin

Eric Hennenhoefer and Melanie Typaldos
PDF Version

Abstract

Random test generation (RTG) technology is the backbone of modern processor functional verification strategies. These programs create pseudo-random assembly level tests based on some level of user preferences. The resulting tests are used throughout the verification process from early RTL bring-up to the final steps of massive regression and sometimes even in post silicon hardware testing.

This paper provides an overview of the evolution of RTG across multiple generations of test generation technology including table-based generators, static test generators, dynamic test generators, knowledge based generators, and commercial grade knowledge based test generation systems. Also outlined are several usage models to help determine the right technology or combination of technologies for a given project based on the complexity of the verification challenge and life expectancy of the processor architecture.

Introduction

Random test generators are the backbone of modern functional verification methodologies for processors. Recent papers claim to have implemented random test generators in as little as a few weeks [1], whereas, experts in the field spend millions [2] and employ large research groups with the charter to maintain and enhance RTG technology. In reality, both approaches produce test generators but the level of robustness and the types of end users vary radically between these cases.

This paper provides an overview of various test generation technologies available for creating pseudo random tests for the functional verification of processors. Each technology or method will be analyzed based on creation costs and the ability of the test generator to meet the needs of modern processor development teams. The paper also presents a methodology for determining the optimal technology for a new processor based on the complexity of the ISA and the implementation.

Finally an overview of the necessary technology and process needed to deploy Obsidian Software’s commercial test generator, RAVEN®, will be reviewed.

Motivation

The goal of verification teams is to achieve bug free first silicon on schedule. The cost of failure is extremely high due to the high costs of mask sets and the loss of market window implied in a missed schedule.

A typical processor pre-silicon functional verification process involves running a large number of assembly level tests in RTL simulation. The more random tests that are run in RTL pre-silicon, the greater the chance the DV team has of finding all the bugs. The concept of massive regression combined with automated results checking and coverage results is the foundation of modern processor verification.

There are two primary problem spaces to which RTG can be applied. The first is enabling the massive regression system and the second is providing a way of building on existing knowledge to increase the productivity of the stimulus creation process.

Taxonomy of Test Generators

There is a vast landscape of test generators used in the industry today. These range from simple scripts and parameterized macros that can be created in a matter of weeks to full featured systems used by cutting edge processor verification teams. The following sections will provide an overview of the major types of generators. This outline will be presented in historical ordering. Table 1 provides a summary of existing RTG technologies.

Table 1: Overview of Verification Methods

Technology Description
Directed ASM tests DV engineers write tests by hand.
Table-based Generators Simple tables of macros, instructions, and operands mixed with random parameters.
Static Generators Tables are combined with complex procedural code.
Dynamic Generators Uses the state of the ISA to create more robust tests.
Knowledge-based Generators Dynamic test generators that allow testing knowledge capture for future projects. Multiple solvers are used to handle complex constraints and pipeline allocation. Typically GUI front end if usage extends beyond author.
Commercial Test Generators Complex, comprehensive test generators available from 3rd parties

In many cases, a processor design team will select a simple test generator for the first project and gradually evolve it into a more advanced form. This evolution stems from several causes. The verification effort may be underestimated or minimized during the justification phase of a new product. During development of later revisions, the verification phase estimate can be more realistic since it is based on knowledge gained in earlier revisions. Earlier designs tend to be simpler in structure with later designs adding more features and more complexity. Simple designs can often be verified using less sophisticated technology while verification of later designs may prove to be too complex for the simple approach. Products that go through several revisions and enhancements are likely to be those that have proven successful in the market and these tend to have better funding for both design and verification.

Directed ASM Tests

Description

Hand crafted assembly language tests.

Pros

  • Easy to construct
  • Simple to debug
  • Requires DV engineers to learn the ISA

Cons

  • Very time consuming
  • Requires all DV engineers to learn the ISA and the software development environment
  • Can be difficult to coordinate test creation to ensure coverage, especially of corner cases.
  • Engineers’ knowledge of the design can bias test creation.

Usage

  • New ISA features
  • Early bring up
  • DV engineer training
  • Targets known holes in test generation
Enables massive test generation No
Enables knowledge capture Very low
Coverage productivity gain None

Table based Generators

Description

A simple test generator constructed from data tables. These generators have little to no logic in them; everything is in the tables or macros. Examples are UMA DGL macro generator and Specman CPU examples.

Pros

  • Can be developed in about a month
  • Can achieve high encoding coverage of very regular data path instructions
  • Tables and macros are easy to understand without requiring a full knowledge of the ISA

Cons

  • Quickly breaks down in complex ISA areas
  • Macros have low randomness
  • Large numbers of macros may be required to hit interesting corner cases.

Usage

  • Table based generators are a temporary solution until more robust generators are available

Static Generators

Description

Static generators are similar to table based generators with the difference being that the majority of the instruction, operand, and data selection is now in complex procedural code

Pros

  • Increased randomness over table based generators
  • Better support for control flow instructions

Cons

  • Lacks state information
  • Insertion of helper instructions decreases randomness
  • Macros are still required to reach difficult cases

Usage

  • Medium complexity projects along with directed tests
Enables massive test generation Moderate, scales better on simple ISAs
Enables knowledge capture Low, source code modifications required
Coverage productivity gain Moderate

Dynamic Generators

Description

Test generator that uses the current state of the processor.

Pros

  • Thorough test generation for complex ISAs
  • Creates dense, interesting tests

Cons

  • Slower test generation
  • Significant engineer investment to create

Usage

  • Medium to high complexity designs.
  • Some directed tests still required.
Enables massive test generation Moderate to high
Enables knowledge capture Moderate
Coverage productivity gain Moderate

Knowledge-based Generators

Description

Dynamic test generators combined with advanced constraint and pipeline solvers. These generators separate generic testing knowledge from ISA specific information. The ISA specific fraction of the test generator can be enhanced or swapped for use in future projects

Pros

  • Generic, reusable, constraint solvers
  • Scalable testing knowledge
  • Pipeline resource solvers

Cons

  • Development costs are high, comparable to a compiler development

Usage

  • Knowledge-based generators are capable of providing high-quality verification tests for all designs.
  • Usage is often decreased if the tool lacks documentation, user interfaces, and an EDA support team.
Enables massive test generation High
Enables knowledge capture High
Coverage productivity gain High

Commercial Test Generators

Description

Complex, comprehensive generators that abstract as much testing knowledge as possible into reusable cores. ISA specific information is added on top of a proven base

Pros

  • The reusable core of the generator has been used on multiple projects and is fully debugged
  • A full-featured test generator is available early since only the ISA specific layer needs to be added
  • The vendor provides a tool development team that is experienced with multiple architectures and verification strategies as well as with the development of the tool itself
  • A full-featured, user friendly GUI usually provided
  • User manuals explaining the complete functionality of the tool are provided
  • Vendor supplies support personnel for learning and debugging
  • Improvements made to the core for other architectures or customers are included in updates

Cons

  • In a large company, the internal tools development group may resent the use of an outside vendor
  • Knowledge of the ISA must be made available to the tool developer
  • Acceptance and deployment to multiple groups may be an issue.

Usage

Commercial test generators are full featured generators that provide tests for designs of all levels. The verification phase of the design is expedited since the tool is available early in development

Enables massive test generation High
Enables knowledge capture High
Coverage productivity gain High

Determining the Best Technology for a Project

The choice of the correct verification tool for a new design project will depend on many factors, among which are:

  • The level of experience of the design and verification teams
  • The complexity of the ISA or processor
  • The project schedule
  • The tools currently in use within the company
  • The level of staffing for verification, design and tool development
  • Funding available for verification

Several areas of microprocessor design are increasing the complexity of the average design project. Microprocessors themselves have become increasingly complex, incorporating advanced memory management units, floating point, multimedia instructions, SIMD, VLIW and multi-tasking and multi-processing. In addition, processors are tending toward becoming microcontrollers or SoCs with the inclusion of DSP, DMA and other functions previously relegated to board components.

The more complex the design, the more complex and comprehensive tool is required for verification. Even on simpler designs, it may be advantageous to select a more thorough verification approach to avoid having to revamp or recreate a verification environment for potentially more complex follow-on projects.

Table 1 provides an overview of the functions provided by each type of generator, directed ASM, table-based, static, dynamic, knowledge-based and commercial. A comparison of the required testing parameters may be used along with this table may help determine the correct tool.

The RAVEN® Generator

For many projects, the complexity of the design and the need for fast verification preclude the development of an RTG from scratch. Design teams may attempt to improve an existing test generator, moving it from one of the simpler types to a more sophisticated knowledge-based generator. However, this is a major effort that could draw essential resources from the design team or require skills that are not available. This is especially true since the development of such an advanced tool requires experienced engineers who have knowledge of processor architecture, verification and high level software development. In addition, the end product of such an effort is likely to be difficult to use with little documentation.

At this point, a commercial RTG becomes attractive. The RAVEN® generator developed by Obsidian Software is often a better solution than that described above. RAVEN® is a full-featured RTG composed of three main elements: 1) the core, 2) the ISA layer and 3) the GUI.

The RAVEN® core includes constraint solvers, a simulator interface, multi-processor support, and test output functions. The ISA layer includes information specific to the particular architecture under test. This layer is modified to support new ISAs. The GUI provides an easy-to-use interface to all of the configuration features of the generator. The generator itself can be run from the GUI or from the command line. The features provided by RAVEN® are outlined under the Commercial column in Table 2.

Table 2: Feature Comparison

Table

Static

Dynamic

Knowledge

Commercial

Random Instructions X X X X X
Random Data X X X X X
Parameterized Macros X X X X X X
Support for complex ISA X X X X
Control flow generation X X X X
Macros X X X X
Loops with low randomness X X X X
Link with ISS X X X
State set up code not needed; enhances randomness X X X
State aware generation X X X
Integrated self-checking code X X X
Loops with moderate randomness X X X
Support for dynamic resource scheduling X X
Constraint Engine X X
Architectural pipeline solver X X
Complex Interrupt & exception support X X
Full Paging X X
Cache controls X X
Biases / API exposed to the user X X
Reusable generation core X
Separate architecture layer for each ISA X
Loops with full randomness X
Multiprocessor / Multithreaded X
Detailed instruction testing knowledge X
User accessible settings to control test intent X
User accessible parameters for instruction and operand selection X
Iterators to support common testing methods; every instruction followed by every other instruction X
GUI X
Manual X
Tutorials X
Generator development system including regression X
Commercial support X

RAVEN® can generate significant tests for a new platform within a few weeks. Advanced features such as interrupt control and task switching may require more time to implement. Depending on the similarity of the new features to those already supported in RAVEN®. However, even while this development is proceeding, the full power of the core functions is available. In addition, the GUI is available and generally complete from initial deployment, although some user-configuration parameters may be added as architectural support becomes more complete – for example controls for the generation of task switches or exceptions may be modified to reflect specific architectures.

RAVEN® currently supports multiple architectures such as ARM, MIPS, PowerPC, and proprietary DSPs. Additionally, Obsidian can port RAVEN® to fit almost any proprietary architecture. Porting RAVEN® typically requires between 3-9 months depending on ISA complexity and similarities to existing architectures.

Once the initial version of RAVEN® has been deployed, verification engineers can immediately begin generating tests. The help provided in the tool and the manual are usually sufficient for engineers to understand how to use the tool. Obsidian also offers training classes and on-site support.

Conclusions

Random test generators have a long development history, most of it confined within the individual companies where the tool was to be used. As designs become more complex and design cycles shorter, verification has become a bottleneck in product development, typically taking 50-70% of the total product development time. Internal development of tools often saps resources from other areas of development. Commercial RTGs have recently become available and can be quickly ported to new architectures. These provide a basis of fully tested core functions upon which to build the architecture specific support.

References

[1] Glassner, C. “Random Test Generation for a RISC Processor Core” Verifyer, June 2001

[2] Aharon, A., D. Goodman, M. Levinger, Y. Lichtenstein, Y. Malka, C. Metzger, M. Molcho, G. Shurek “Test Program Generation for Functional Verificaton of PowerPC Procesors in IBM,” Proceedings of 32nd ACM/IEE Design Automation Conference, 1995