Posts Tagged ‘random verification’

Intelligent Testbench vs. Random Test Generator

Posted on March 11th, 2010 by admin

By Melanie Typaldos

The idea of an intelligent testbench has long been of interest in functional processor verification, although it has always seemed to fall short of expectations when it comes down to just what degree of “intelligence” is really involved. Throughout this document, we will present the argument that a sophisticated, well-evolved, dynamic random test generator, when used as part of a complete verification plan, can be of more value than marketing-driven intelligent verification products.

Defining Intelligent Testbench

An accurate definition of “intelligent testbench” is difficult to find, so let’s begin with that offered by Wikipedia. The intelligent testbench is described as something that “uses information derived from the design and existing test description to automatically update the test description to target design functionality not verified, or covered by the existing tests”[1]. This implies that a feedback loop exists which is capable of creating new test sequences based on what has and has not yet been tested. Other than this closed loop system, the concept is very similar to that of a random test generator.

Not all Test Generators are Created Equally

I remember at one time, we were speaking with a potential client who said something along the lines of “I don’t know why you people want so much money for this RAVEN thing. I can just get a co-op to write one in a week.” He wasn’t wrong in his assumption that a relatively unskilled engineer could conceivably write a generator in a short amount of time. However, the old adage remains that you get what you pay for. A generator of this sort would be incapable of effectively verifying a design of any complexity whatsoever. This is analogous to running every instruction followed by every other instruction and calling verification complete. There are no standard methodologies for constructing test generators, and each one will have different methods for achieving randomness, different capabilities in pipeline exploration, varying abilities in multi-processor testing, etc.

Functional Coverage

Many intelligent testbench products claim to automatically create test sequences based on pre-designated coverage points. However, the belief that hitting every coverage point means that your design is verified is a big mistake. By the very fact that coverage points are singular points in a vast space, they cannot cover the entire design. Engineers can work hundreds of hours writing more coverage points, but it will never be enough to completely verify the design. Because of this, our test generator uses templates (created by engineers) that automatically create sequences to hit not only the coverage point, but also other behaviors around that point.

Figure 1. Abbreviated Flow of Randomness

Random Stimulus Compared to Feedback Looping

RAVEN is very good at what it does; it is designed to hit both simple and complex behaviors randomly with little direction from human users. For instance, if we’re looking at instruction A followed by instruction B with operands X, Y and Z, then we’re going to hit that randomly with ease. Constrained-random templates can replace 95-98% of all directed test sequences. It’s only a matter of time and simulation power applied before we randomly hit all of these simple scenarios and many of the complex ones as well.

The whole point here is that coverage points that are easy to direct (i.e. via feedback), will have already been covered by virtue of random testing. Highly complex behaviors and difficult-to-reach corner cases require a significant degree of architectural knowledge, and they are too difficult and too architecturally dependent to effectively be covered by a piece of software. If an effective feedback loop could be done with good logic or programming skills, we would have already done it.

“At a recent ASIC verification panel discussion at DesignCon, a question was asked about intelligent testbenches — something promised for a long time but not really delivered by the EDA companies. One of the panel members from a design company responded, and said that if you ever tell his engineers that his testbenches are not intelligent then he would be very upset. I am sorry but I have to break the news to him. Testbenches are dumb!” – Brian Bailey [2]

The Promise of Eliminating Redundant Testing

Eliminating “redundant testing” via software is a dangerous thing to do. Suppose that we have two similar sequences of 20 instructions, but the second test has one instruction that is different. Are those tests redundant? They seem like it, and they have a lot in common. But depending on what those instructions are, what registers they use, what the pipeline looks like and whether they took exceptions, that one instruction can be the one that makes the difference. So it’s dangerous for a piece of software to make the supposition that this test is redundant. It could very well be that this next instruction could be the one that reveals an error.

When I was working at a major processor company several years ago, we found a case in silicon where the processor would hang for seemingly no reason. What we found was that an illegal access to a register was causing the error approximately 1000 instructions before the hang would occur. This taught me that sometimes even the designer is not aware of the conditions that can lead to a bug. Designers may know their own block, but the interactions between the blocks can be very complex, and oftentimes this can be confusing even for experienced engineers. So I think that it’s really dangerous to assume that you can get rid of redundant tests in this manner.

But this is not to say that ineffective tests should be continually simulated. Test templates should remain in the suite only as long as they continue to uncover errors. When it no longer seems like it’s finding bugs, that template should be archived and replaced by another. But having a piece of software decide that for you is not a good thing.

Multicore Verification of an ARM Processor

Posted on November 10th, 2009 by admin

The goal of multi-processor verification, simply put, is to find errors that emerge when individual CPUs interact with each other. This typically involves behaviors encountered when sharing various resources such as caches, registers and main memory. Multi-processor (MP) errors are among the most difficult to debug in all of functional verification, especially given the complexity modern MP systems. This document serves to outline Obsidian’s random testing methodology for verifying caches. Although written specifically for the ARM architecture, the underlying methodology of this paper may be applied generally to encompass other designs as well.

Don’t Begin Multi-Processor Testing Before You’re Ready

One of the core principles of Obsidian’s verification methodology is to catch every possible bug in the simplest possible configuration. That stands to reason that the single processor configuration needs to be thoroughly verified with a high degree of confidence before MP verification begins. The reason for this is simple – if the same bugs can be found in a simpler processor configuration, much less time will go to waste. Multiplied hundreds of times over, this savings in time can result in a time-to-market improvement of weeks or even months.

Beginning Multi-Processor Verification with Non-Sharing Tests

The first stage of MP testing that we implement is non-sharing. The goal at this stage is to find and correct errors that occur with the least amount of sharing as possible to simplify debugging and save time troubleshooting complex scenarios.

In the initial stages of MP verification, it’s best to divide the memory up into large partitions that are each allocated to a single processor core. Some sharing will always exist between the CPUs in spite of restrictions. This is due to the way that multiple CPUs behave upon reset (i.e. beginning the execution stream at the same reset address) and because complex MP systems typically require each CPU to be aware of their own CPU number (i.e. CPU0, CPU1, etc.).

Ideally, sharing should be minimized as much as possible at this stage of verification, although it is impossible to eliminate it altogether. If the architecture permits, it’s also beneficial to start with paging disabled to simplify debugging. This eliminates the possibility of errors where one physical address is mapped to multiple pages by more than one CPU.

With these restrictions imposed, large numbers of random test sequences are generated and debugged as needed. Once errors become extremely sparse in this mode of testing, the memory is divided up into cache lines which are each assigned to allow accesses by only a particular CPU. A more complex environment is then created in which more complex bugs can be found, such as paging errors. As before, copious amounts of random test sequences are generated and more errors will be found.

Some verification teams will completely forgo the large partition phase of false-sharing and start by immediately dividing the memory into cache lines. While this does work, it also tends to complicate debugging and forces the verification team being to prematurely deal with paging issues while debugging simple errors.

False-Sharing Multi-Processor Tests

The false-sharing arrangement is similar to the second stage of non-sharing in that the memory is divided up into cache lines. In this mode, multiple CPUs are allowed to hit the same cache lines, but not the exact same physical addresses. What this does is allow the cache accesses to come very close to each other without hitting the same spots. Several cache problems can be found and corrected using this method.

Debugging caches in this way greatly simplifies the process of testing actual shared memory and better limits the occurrence of simple errors in complex scenarios. At every stage of the process, it is important to perform cacheable and non-cacheable testing as there may be bugs lurking in both of these areas.

True-Sharing Multi-Processor Tests

True-sharing tests are more complex as they allow multiple CPUs to access the same physical memory addresses. In both deterministic and in non-deterministic modes, tests generated by RAVEN are guaranteed to execute correctly regardless of timing issues including fast or slow memory or insertion of external interrupts.

Deterministic True-Sharing

As a simplified example of deterministic true-sharing, let us assume that we are verifying a dual CPU configuration. We begin by dividing up the execution stream using semaphores. Each divided zone may contain different numbers of instructions for each CPU with varying amounts of memory accesses, exceptions, etc. Because of this, some CPUs may complete their zones more quickly than others. During the same semaphore, the CPUs are unaware of values written by the other. Not until the semaphore has been completed will the CPUs sync up again and be aware of the other’s transactions. Once this happens, CPU0 can determine that it is legal to write to a line that CPU1 completed in the last semaphore. Upon completion of all zones, final values are compared, checking for consistency and errors. Although our rules are much more complicated than this, the example should elaborate on the kind of errors that can occur in MP verification. Because our tool implements complex rules, the test generated can determine if memory location will be deterministic or non-deterministic based on execution divided by semaphores. This is how we do deterministic sharing and actual reads and writes to the same physical memory addresses. This can find a lot of bugs.

Non-Deterministic True-Sharing

Non-deterministic true-sharing is the most complex mode of sharing and should always be conducted as the final stage of MP verification. In this method, we don’t use semaphores to divide up the execution stream, but instead allow CPUs to read addresses that are being written by other CPUs. If two CPUs write to the same address, then a read of this address will result in an unknown value. If this value was intended for use in address generation or arithmetic calculation, then the result of these tasks will be unknown, and verification becomes much more difficult. Non-deterministic registers then cannot be used as sources, but only destinations until they take on deterministic values.

If non-deterministic transactions set flags, then the flag register must immediately be re-written. This is especially important in ARM processors where every instruction is conditionally executed. If something is done that results in a flag, the next instruction must be set to always execute regardless of flag values and perform the additional step of re-writing the flags. We have settings and rules that determine what percentage of registers are allowed to become non-deterministic based on their type (i.e FPR, GPR, etc.). So we do have some pretty complicated algorithms for handling determinism and non-determinism and determining legality of actions.

Non-deterministic tests become much more difficult to debug because you will have a list of possible values for each access and the errors can be timing dependent. These areas are important to cover, but they should be done last as the difficulty in debugging errors of this type is extremely high.

Verifying the Snooping Mechanism

One of the most critical areas in MP verification is cache coherence. Errors of this type can be detrimental to practical operation of the design. For example, it is possible to have errors in which data integrity is compromised due to multiple CPUs accessing shared memory.

To further illustrate an error of this type, let us assume that we are verifying two CPUs that both share a single L1 cache. At some point the L1 cache will have an image of data that resides in the main memory. If both CPUs write to the same cache line of the L1, then it is necessary to verify that the correct values get written back to main memory.

Adding an L2 memory to a situation such as this makes matters even more complicated. In this case, cache lines in the L1 will be an image taken the L2, which itself contains portions of data in the main memory. However, there is now the additional problem of L1 data being written back to main memory without crossing the L2. This creates a corresponding line of L2 cache that must now be invalidated to avoid errors.

Further conflicts can occur if a CPU flushes a cache line in the L1 that another CPU has written to, was going to write to, or is in the process of writing to. Fox example, suppose that we have a four CPU situation with two L1 caches, each being shared by a pair of CPUs and an L2 that is shared by all four CPUs. This is a common paradigm in MP designs. In this situation, it is physically possible for CPUs 0 /1 to write to the same line of memory as CPUs 2/3, creating another situation that has to be resolved. Each cluster of CPUs must then ask the other if for permission to read/write to main memory to avoid conflicts.

Verification of Shared Registers

One of the great things about having a random test generator is that it can effectively deal with obscure cases and propagate them across multiple CPUs. In the case of shared registers, we can divide them up into an arrangement similar to our true sharing example or allow only one CPU to handle all the shared registers. We can also use semaphores to do time/execution division of accesses and do non-deterministic testing, although this depends on the type of register. Although shared registers present some unique challenges, they also allow for some different opportunities to take advantage of.

These are all complicated issues which may be further compounded by timing issues. Errors such as these are what MP verification is really about and where the bulk of the errors can be found.

Verification Planning for ARM Architectures

Posted on October 8th, 2009 by admin

By Melanie Typaldos and Tim Short

The early stages of functional verification present several unique challenges that must be overcome to achieve first silicon on schedule. This article identifies several common problems that Obsidian recently encountered in the verification of a new ARM based processor design and discusses how the RAVEN random test generator was used in overcoming these obstacles.

Eliminating Bottlenecks

Many organization want to begin verification as soon as the first logical unit of RTL becomes available. However, the tool-chain (i.e. assembler, linker, compiler, etc.) often lags behind RTL development and architectural changes, creating a bottleneck in verification. RAVEN allows verification teams to completely bypass this problem and begin identifying functional errors before the tool-chain is implemented. Customers using RAVEN can have reported large productivity gains over directed testing, especially in the beginning and intermediate stages of verification – hitting 95-98% of all required coverage points using random tests.

Verifying Intermediate RTL

Let’s assume that the ALU unit of your design is ready for testing at an early stage, but the MMU, floating-point and load/store units are not. Some teams might begin by testing each instruction in order, varying only one or two operands for the sake of expedience. However, this creates a problem. Following this methodology, only a small portion of the RTL will be tested and the entire design will be exposed to errors that should have been corrected early on. With the same investment in time, RAVEN can produce significantly better results with only slight modifications to the default template.

Start Test
Random Instruction (ALU)………………..min=10     max=20
End Test

RAVEN template files are used to designate which behaviors should be tested. At the most basic level, these templates provide the ability to interchange instruction groups, instruction quantity, and selection of operands. More advanced configurations allow for the biasing of certain exceptions and addressing/operand modes. This ability is important in the sense that it allows additional behaviors to be added incrementally as bugs are discovered and new areas of the RTL become available in emerging designs.

In the case of our ALU example, the Random Instruction control can be directed to use only instructions in the ALU group and can even disable the selection of individual instructions within that group. Available instructions can then be tested with an almost infinite number of parameter configurations. As further instruction groups and processor features become available (e.g. integer divide, MMU, load/store) RAVEN templates can be automatically updated to include these new behaviors. Simply re-generating the same templates then allows for the creation of thousands of new tests.

Planning for Bugs in Your Testing Methodology

As you’re creating your first tests, it’s important to think about how long they’ll take to simulate and debug. Smart design teams take into account where they are in the development cycle and the amount of processing power available for simulation when creating tests. This way tests can be run and completed within a targeted amount of time.

When a generated test hits a bug, it’s also much easier to isolate and identify the error if it is detected in a test with ten instructions rather than one with a thousand instructions. Even if you hit a bug in only one out of every thousand tests of ten instructions, this is still beneficial when compared to the time required to debug large tests.

Once you’re able to generate 10-20K short tests without finding errors, then it’s time to move up to tests of a hundred instructions and eventually on to tests of a thousand instructions. The reasoning here is that the more rare your bugs then become, the more dependent they will be on processor state, and longer tests will be more useful in this respect. Changing the quantity of instructions in a RAVEN template is as easy as adding extra zeros and regenerating.

Once bugs are found, RAVEN can be configured to continue testing around the error by disabling access to problematic behaviors in the RTL. Most internally developed generators don’t have this fine-grained level of control. It’s far easier for RAVEN to hit new behaviors while avoiding others that aren’t yet ready to be tested than for most other EDA tools on the market.

Using RAVEN to Generate Self-Checking Tests in Post Silicon Validation

Posted on October 2nd, 2009 by admin

By Tim Short and Melanie Typaldos, Obsidian Software

Sure, self-checking code can be used with directed tests. But it’s time consuming, cumbersome and there’s no randomization. RAVEN has several features that make it work really well in a silicon validation environment for creating self-checking tests.

Inherent Self-Checking

Taking them one by one. If you’re leaving RAVEN undirected, which is usually what you want to do, then what gets intermixed are lots of jumps that depend on the previously generated values. Since tests generated by RAVEN can go anywhere and do anything, it’s not uncommon for complex tests to end with a jump off of some value, leading to another instruction sequence. So, inherently RAVEN tests are self checking.

Inside of a random test, you’ll usually have lots of jumps. If any of the calculations leading to those jumps change, then you’ll jump off into an area that you don’t want to be fairly quickly because some calculated value went wrong. Depending on your architecture, what some companies will do is to preload the memory area so that undefined instructions will cause traps resulting in a fail. Now you have to go into your silicon and figure out how you got to this point, but at least you know that you have a failure. Directed tests won’t have the same results as RAVEN because the engineers writing them won’t jump off of their results into strange places. It’s too hard for humans to use the results of calculations as jump targets, but for RAVEN it’s actually quite easy.

Configurable Self-Checking Features

The second thing that users can do is to turn on RAVEN’s self-checking feature, which includes a number of options for how you want to do self-checking. This feature tells RAVEN to insert self-checking code, much like a directed programmer would do to validate something, like a series registers. Since RAVEN knows when registers are updated, we can tell it to check all registers to make sure that their information is valid. Alternatively, we can check only the registers that were written or read. We can do this check after a preset number of instructions, at the end of the sequence, or it can be randomized to occur between a certain number of instructions.

Adapting Self-Checking Tests to Fit the Hardware Environment

For many customers, the actual test or hardware that they are operating in is embedded in an SoC with some test mode that allows them to bring signals out. But their environment is very different from the environment of their simulation world. In simulation, they may have a large amount of memory to use for testing. In this new environment though, they may want to restrict the tests to use only the on-board device memory. This might be a only very small amount, let’s say somewhere between 256K and 2MB.

Because RAVEN has configuration files that describe the environment that the chip is in, you can move tests originally written for a 4GB address space into 1MB of memory. RAVEN can then re-generate all of the tasks from your templates, forcing them into that much more constrained memory area. Now you can take your whole suite, and probably with some exceptions, regenerate all of your tests toward your real hardware platform and even re-simulate them again in their original environment with slight modifications to mimic what the hardware will look like. If all of these tests can be successfully run at full speed, then there is a high degree of confidence that the model is accurately reflected in the design and that there won’t be hidden problems in the silicon.

Ability to Verify RTL and Instruction Set Simulator Agreement

Another, more comprehensive method of self-checking in the RTL environment is the intermediate state information provided by RAVEN. Our test output files contain information about all updates to registers and memory that occur as a result of instruction execution. The testbench can be instrumented so that there are checkers that watch registers and memory to make sure that they progress through values predicted by simulation. This allows the testbench to detect the discrepancy in the exact instruction that caused it or within just a few cycles of that instruction, greatly reducing the time required for a verification engineer to isolate the problem.

Uncovering Processor Design Errors Spanning Page Boundaries

Posted on September 18th, 2009 by admin

Functional errors are not always easy to find in processor designs, especially when they require a specific series of events to occur before their presence is revealed. Obsidian’s verification engineers recently discovered an error of this type spanning page boundaries in a multi-core ARM design.

When a load/store instruction crosses a page boundary, it is difficult to create all possible combinations of exceptions for both halves of the instruction. For example, if 2 possible exceptions exist then there will be 16 possible combinations of exceptions for 2 halves of the access. Because of this, it can be very hard to reach these exceptions, even with directed testing.

Obsidian’s RAVEN technology addresses this issue with its random methodology. User configurable biases may be used to direct the generator into areas where data access instructions may cross page-boundaries. Difficult to reach errors such as these can be uncovered with minimal effort and without diverting your greatest resource, the time of your most experienced engineers.

Knowing When Verification is Complete

Posted on March 27th, 2009 by admin

Introduction

This article presents an overview of functional design verification using a coverage driven methodology while attempting to answer the question of how much testing is enough. The part being verified in this case will be a general purpose microprocessor, such as those found in mobile computing devices. Note that an approach of this magnitude is not always required. Designs with very limited instruction sets or highly restricted functionalities may be satisfied by simply writing directed assembly code tests to verify their intended functionalities.

Comparison of Simple and Complex Architectures

Figure 1 depicts a simple architecture as compared to a complex one. Note that the number of corner cases and unpredictability of the verification space increases as the architecture gains complexity. Thus, the complexity of the architecture determines how much testing will need to be accomplished to properly verify the component’s function.

Figure 1. Comparison of Verification Spaces

Measuring Verification Progress

Coverage metrics are the dominant method for measuring verification progress in the industry today. Coverage points are normally designated by the design engineers looking at the logic of their block and by verification or system engineers looking at the functional definition of the part. Both of these are critical insights into the required verification coverage of the design.

Coverage points, indicated by the red dots in Figure 2, are deliberately chosen with respect to placement and density according to design knowledge and risk assessment.

Figure 2. Distribution of Coverage Points

Distribution of Coverage Points

Directed Testing VS Random Methodology

Some organizations will use only directed tests to hit coverage points. Because directed tests are by their very nature highly targeted and relatively inflexible, this results in much of the design not being tested as is shown by the ratio of red to gray in Figure 2. In addition to the low overall coverage that results from this approach, creation of directed tests is time consuming, requiring approxomately 20-30 minutes per test, and requires highly skilled engineers. In this approach, testbench checkers that detect hits to coverage points are often overlooked with the assumptions that the engineers writing the tests know how to hit the required coverage points and that human errors will not a significant problem. As the design changes over the course of RTL development, directed tests may lose track of their targeted coverage points. Without coverage monitors, these types of errors will not be detected and the design will not be as thoroughly verified as it appears to be on paper.

Using a Random Test Generator to Close Coverage

As processor designs became more complex, the need to hit more coverage points became apparent. Once the grid has been established, large numbers of purely random tests may be incorporated to begin closing coverage. Some of these tests may hit points on the coverage grid while others will not.

Figure 3. Intersection of Coverage Grid and Pure Random

Intersection of Coverage Grid and Pure Random

Hitting Corner Cases

Approaching the problem of hitting coverage points from a random test generator viewpoint, a single engineer begins by writing a few generator templates and then generates tests using those templates. The generated tests are then run on a testbench which incorporates coverage monitors. The coverage monitors report all coverage points that are hit by the tests. As long as tests generated from the templates continue to hit new coverage points, the templates are kept in the nightly suite. As the rate of hitting new coverage points declines, new generator templates are created to target coverage holes. This approach requires skilled engineers to write checkers for the testbench but less skilled engineers to run the test generator.

Directed-random templates are created around points not hit by the purely random templates. We now begin to see the coverage grid closing more tightly (around 95%), and the verification process comes closer to completion.

Figure 4. Coverage Grid, Directed Random and Pure Random

Coverage Grid, Directed Random and Pure Random

Hitting Corner Cases

Not all coverage points will be hit by fully random or directed random templates. Some coverage points require a long series of events before the targeted behavior takes place. In this case, there are two possible approaches: write directed tests and write directed templates. Directed tests can get to these most difficult coverage points more quickly but prove only one or a few cases around that point. Directed templates take more time to create but can be written to allow as much random behavior around the coverage point as possible.

Figure 5. Review Templates and Relax Restrictions

Review Templates and Relax Restrictions

Finally, existing tests are reviewed, and as much directed behavior as possible is removed before the tests are run again. Coverage then reaches full closure, and these tests are run until the schedule no longer permits.

Random Test Generator Taxonomy

Posted on February 27th, 2009 by admin

There is a vast landscape of test generators used in the industry today. These range from simple scripts and parameterized macros that can be created in a matter of weeks to full featured systems used by cutting edge processor verification teams.

In many cases, a processor design team will write a simple test generator for the first phase of a project and gradually evolve it into a more advanced form as the architecture matures. This continual evolution of test generator technology stems from several causes:

• Earlier designs tend to be simpler with later revisions adding more features and complexity.

• Later designs may prove complex enough to require a new approach.

• The verification effort may initially be underestimated.

• Estimates become more realistic over time as they become based on knowledge gained in earlier revisions.

• Products that go through several revisions and enhancements are likely to be those that have proven successful in the market and these tend to have better funding for both design and verification.

Table Based Generators

Table based test generators are the simplest possible generators. Creation of such generators can be accomplished relatively quickly, and maintenance requirements are often low. These generators work by capturing ISA knowledge and storing it in a central table for later use. Because of their simplistic nature, table based generators may be used by less skilled personnel to create tests. There is a drawback to these generators however, as their implementation is generally restricted to simple architectures. Usage on more complex designs may result in an inability to reach corner-cases or create complex scenarios. Table based generators may also generate invalid tests at times.

Static Generators

Static generators are similar to table based generators with the exception that the majority of the instruction, operand and data selection reside in complex procedural code. Static generators are capable of producing more random behavior than table based generators, but still have trouble hitting many corner-cases. In addition, the skill level required to create and maintain such a tool rises sharply once this level of sophistication is reached.

Dynamic Generators

Dynamic generators incorporate significant knowledge about the architecture being tested. They enhance the ability of less-skilled users to generate complex tests that can hit hard-to-reach corner cases without stumbling on subtle programming pitfalls. This added knowledge, flexibility and ease-of-use is reflected in a more complex generator and consequently the cost of creating and maintaining the generator are greater than for table-based or static generators.

Comparison of Various Aspects of Random Test Generators

Obsidian Software’s RAVEN is a dynamic random test generator that has been developed and maintained by processor verification experts since 1997. During this time RAVEN has been used to verify dozens of processor implementations by design teams across the globe.

The Evolution of Processor Test Generation Technology

Posted on February 12th, 2009 by admin

Eric Hennenhoefer and Melanie Typaldos
PDF Version

Abstract

Random test generation (RTG) technology is the backbone of modern processor functional verification strategies. These programs create pseudo-random assembly level tests based on some level of user preferences. The resulting tests are used throughout the verification process from early RTL bring-up to the final steps of massive regression and sometimes even in post silicon hardware testing.

This paper provides an overview of the evolution of RTG across multiple generations of test generation technology including table-based generators, static test generators, dynamic test generators, knowledge based generators, and commercial grade knowledge based test generation systems. Also outlined are several usage models to help determine the right technology or combination of technologies for a given project based on the complexity of the verification challenge and life expectancy of the processor architecture.

Introduction

Random test generators are the backbone of modern functional verification methodologies for processors. Recent papers claim to have implemented random test generators in as little as a few weeks [1], whereas, experts in the field spend millions [2] and employ large research groups with the charter to maintain and enhance RTG technology. In reality, both approaches produce test generators but the level of robustness and the types of end users vary radically between these cases.

This paper provides an overview of various test generation technologies available for creating pseudo random tests for the functional verification of processors. Each technology or method will be analyzed based on creation costs and the ability of the test generator to meet the needs of modern processor development teams. The paper also presents a methodology for determining the optimal technology for a new processor based on the complexity of the ISA and the implementation.

Finally an overview of the necessary technology and process needed to deploy Obsidian Software’s commercial test generator, RAVEN®, will be reviewed.

Motivation

The goal of verification teams is to achieve bug free first silicon on schedule. The cost of failure is extremely high due to the high costs of mask sets and the loss of market window implied in a missed schedule.

A typical processor pre-silicon functional verification process involves running a large number of assembly level tests in RTL simulation. The more random tests that are run in RTL pre-silicon, the greater the chance the DV team has of finding all the bugs. The concept of massive regression combined with automated results checking and coverage results is the foundation of modern processor verification.

There are two primary problem spaces to which RTG can be applied. The first is enabling the massive regression system and the second is providing a way of building on existing knowledge to increase the productivity of the stimulus creation process.

Taxonomy of Test Generators

There is a vast landscape of test generators used in the industry today. These range from simple scripts and parameterized macros that can be created in a matter of weeks to full featured systems used by cutting edge processor verification teams. The following sections will provide an overview of the major types of generators. This outline will be presented in historical ordering. Table 1 provides a summary of existing RTG technologies.

Table 1: Overview of Verification Methods

Technology Description
Directed ASM tests DV engineers write tests by hand.
Table-based Generators Simple tables of macros, instructions, and operands mixed with random parameters.
Static Generators Tables are combined with complex procedural code.
Dynamic Generators Uses the state of the ISA to create more robust tests.
Knowledge-based Generators Dynamic test generators that allow testing knowledge capture for future projects. Multiple solvers are used to handle complex constraints and pipeline allocation. Typically GUI front end if usage extends beyond author.
Commercial Test Generators Complex, comprehensive test generators available from 3rd parties

In many cases, a processor design team will select a simple test generator for the first project and gradually evolve it into a more advanced form. This evolution stems from several causes. The verification effort may be underestimated or minimized during the justification phase of a new product. During development of later revisions, the verification phase estimate can be more realistic since it is based on knowledge gained in earlier revisions. Earlier designs tend to be simpler in structure with later designs adding more features and more complexity. Simple designs can often be verified using less sophisticated technology while verification of later designs may prove to be too complex for the simple approach. Products that go through several revisions and enhancements are likely to be those that have proven successful in the market and these tend to have better funding for both design and verification.

Directed ASM Tests

Description

Hand crafted assembly language tests.

Pros

  • Easy to construct
  • Simple to debug
  • Requires DV engineers to learn the ISA

Cons

  • Very time consuming
  • Requires all DV engineers to learn the ISA and the software development environment
  • Can be difficult to coordinate test creation to ensure coverage, especially of corner cases.
  • Engineers’ knowledge of the design can bias test creation.

Usage

  • New ISA features
  • Early bring up
  • DV engineer training
  • Targets known holes in test generation
Enables massive test generation No
Enables knowledge capture Very low
Coverage productivity gain None

Table based Generators

Description

A simple test generator constructed from data tables. These generators have little to no logic in them; everything is in the tables or macros. Examples are UMA DGL macro generator and Specman CPU examples.

Pros

  • Can be developed in about a month
  • Can achieve high encoding coverage of very regular data path instructions
  • Tables and macros are easy to understand without requiring a full knowledge of the ISA

Cons

  • Quickly breaks down in complex ISA areas
  • Macros have low randomness
  • Large numbers of macros may be required to hit interesting corner cases.

Usage

  • Table based generators are a temporary solution until more robust generators are available

Static Generators

Description

Static generators are similar to table based generators with the difference being that the majority of the instruction, operand, and data selection is now in complex procedural code

Pros

  • Increased randomness over table based generators
  • Better support for control flow instructions

Cons

  • Lacks state information
  • Insertion of helper instructions decreases randomness
  • Macros are still required to reach difficult cases

Usage

  • Medium complexity projects along with directed tests
Enables massive test generation Moderate, scales better on simple ISAs
Enables knowledge capture Low, source code modifications required
Coverage productivity gain Moderate

Dynamic Generators

Description

Test generator that uses the current state of the processor.

Pros

  • Thorough test generation for complex ISAs
  • Creates dense, interesting tests

Cons

  • Slower test generation
  • Significant engineer investment to create

Usage

  • Medium to high complexity designs.
  • Some directed tests still required.
Enables massive test generation Moderate to high
Enables knowledge capture Moderate
Coverage productivity gain Moderate

Knowledge-based Generators

Description

Dynamic test generators combined with advanced constraint and pipeline solvers. These generators separate generic testing knowledge from ISA specific information. The ISA specific fraction of the test generator can be enhanced or swapped for use in future projects

Pros

  • Generic, reusable, constraint solvers
  • Scalable testing knowledge
  • Pipeline resource solvers

Cons

  • Development costs are high, comparable to a compiler development

Usage

  • Knowledge-based generators are capable of providing high-quality verification tests for all designs.
  • Usage is often decreased if the tool lacks documentation, user interfaces, and an EDA support team.
Enables massive test generation High
Enables knowledge capture High
Coverage productivity gain High

Commercial Test Generators

Description

Complex, comprehensive generators that abstract as much testing knowledge as possible into reusable cores. ISA specific information is added on top of a proven base

Pros

  • The reusable core of the generator has been used on multiple projects and is fully debugged
  • A full-featured test generator is available early since only the ISA specific layer needs to be added
  • The vendor provides a tool development team that is experienced with multiple architectures and verification strategies as well as with the development of the tool itself
  • A full-featured, user friendly GUI usually provided
  • User manuals explaining the complete functionality of the tool are provided
  • Vendor supplies support personnel for learning and debugging
  • Improvements made to the core for other architectures or customers are included in updates

Cons

  • In a large company, the internal tools development group may resent the use of an outside vendor
  • Knowledge of the ISA must be made available to the tool developer
  • Acceptance and deployment to multiple groups may be an issue.

Usage

Commercial test generators are full featured generators that provide tests for designs of all levels. The verification phase of the design is expedited since the tool is available early in development

Enables massive test generation High
Enables knowledge capture High
Coverage productivity gain High

Determining the Best Technology for a Project

The choice of the correct verification tool for a new design project will depend on many factors, among which are:

  • The level of experience of the design and verification teams
  • The complexity of the ISA or processor
  • The project schedule
  • The tools currently in use within the company
  • The level of staffing for verification, design and tool development
  • Funding available for verification

Several areas of microprocessor design are increasing the complexity of the average design project. Microprocessors themselves have become increasingly complex, incorporating advanced memory management units, floating point, multimedia instructions, SIMD, VLIW and multi-tasking and multi-processing. In addition, processors are tending toward becoming microcontrollers or SoCs with the inclusion of DSP, DMA and other functions previously relegated to board components.

The more complex the design, the more complex and comprehensive tool is required for verification. Even on simpler designs, it may be advantageous to select a more thorough verification approach to avoid having to revamp or recreate a verification environment for potentially more complex follow-on projects.

Table 1 provides an overview of the functions provided by each type of generator, directed ASM, table-based, static, dynamic, knowledge-based and commercial. A comparison of the required testing parameters may be used along with this table may help determine the correct tool.

The RAVEN® Generator

For many projects, the complexity of the design and the need for fast verification preclude the development of an RTG from scratch. Design teams may attempt to improve an existing test generator, moving it from one of the simpler types to a more sophisticated knowledge-based generator. However, this is a major effort that could draw essential resources from the design team or require skills that are not available. This is especially true since the development of such an advanced tool requires experienced engineers who have knowledge of processor architecture, verification and high level software development. In addition, the end product of such an effort is likely to be difficult to use with little documentation.

At this point, a commercial RTG becomes attractive. The RAVEN® generator developed by Obsidian Software is often a better solution than that described above. RAVEN® is a full-featured RTG composed of three main elements: 1) the core, 2) the ISA layer and 3) the GUI.

The RAVEN® core includes constraint solvers, a simulator interface, multi-processor support, and test output functions. The ISA layer includes information specific to the particular architecture under test. This layer is modified to support new ISAs. The GUI provides an easy-to-use interface to all of the configuration features of the generator. The generator itself can be run from the GUI or from the command line. The features provided by RAVEN® are outlined under the Commercial column in Table 2.

Table 2: Feature Comparison

Table

Static

Dynamic

Knowledge

Commercial

Random Instructions X X X X X
Random Data X X X X X
Parameterized Macros X X X X X X
Support for complex ISA X X X X
Control flow generation X X X X
Macros X X X X
Loops with low randomness X X X X
Link with ISS X X X
State set up code not needed; enhances randomness X X X
State aware generation X X X
Integrated self-checking code X X X
Loops with moderate randomness X X X
Support for dynamic resource scheduling X X
Constraint Engine X X
Architectural pipeline solver X X
Complex Interrupt & exception support X X
Full Paging X X
Cache controls X X
Biases / API exposed to the user X X
Reusable generation core X
Separate architecture layer for each ISA X
Loops with full randomness X
Multiprocessor / Multithreaded X
Detailed instruction testing knowledge X
User accessible settings to control test intent X
User accessible parameters for instruction and operand selection X
Iterators to support common testing methods; every instruction followed by every other instruction X
GUI X
Manual X
Tutorials X
Generator development system including regression X
Commercial support X

RAVEN® can generate significant tests for a new platform within a few weeks. Advanced features such as interrupt control and task switching may require more time to implement. Depending on the similarity of the new features to those already supported in RAVEN®. However, even while this development is proceeding, the full power of the core functions is available. In addition, the GUI is available and generally complete from initial deployment, although some user-configuration parameters may be added as architectural support becomes more complete – for example controls for the generation of task switches or exceptions may be modified to reflect specific architectures.

RAVEN® currently supports multiple architectures such as ARM, MIPS, PowerPC, and proprietary DSPs. Additionally, Obsidian can port RAVEN® to fit almost any proprietary architecture. Porting RAVEN® typically requires between 3-9 months depending on ISA complexity and similarities to existing architectures.

Once the initial version of RAVEN® has been deployed, verification engineers can immediately begin generating tests. The help provided in the tool and the manual are usually sufficient for engineers to understand how to use the tool. Obsidian also offers training classes and on-site support.

Conclusions

Random test generators have a long development history, most of it confined within the individual companies where the tool was to be used. As designs become more complex and design cycles shorter, verification has become a bottleneck in product development, typically taking 50-70% of the total product development time. Internal development of tools often saps resources from other areas of development. Commercial RTGs have recently become available and can be quickly ported to new architectures. These provide a basis of fully tested core functions upon which to build the architecture specific support.

References

[1] Glassner, C. “Random Test Generation for a RISC Processor Core” Verifyer, June 2001

[2] Aharon, A., D. Goodman, M. Levinger, Y. Lichtenstein, Y. Malka, C. Metzger, M. Molcho, G. Shurek “Test Program Generation for Functional Verificaton of PowerPC Procesors in IBM,” Proceedings of 32nd ACM/IEE Design Automation Conference, 1995

Random Test Generation for Multi-Processor Systems

Posted on February 12th, 2009 by admin

By Becky Cavanaugh and Melanie Typaldos
PDF Version

Abstract

Multi-processor systems have presented a challenge to the development of random test generation tools. This challenge is inherent in the added complexity of interactions between multiple processors. Controlling these interactions while providing substantial random behavior can be accomplished through a variety of methods. In this paper we will discuss some of the issues and trade-offs in time, complexity and random behavior for different approaches to multi-processor random test generators. The paper will focus on Obsidian Software’s RAVEN tool and highlight design decisions made during the development of the multi-processor version of this tool.

1. Memory Sharing Modes

The basic difficulty in multi-processor random test generation and verification revolves around the issue of memory accesses. Multi-processing is traditionally divided into categories based on the types of sharing that are allowed. These divisions are also useful during verification since correct behavior at each stage in the verification process is prerequisite to correct behavior at the next more advanced stage. It is also clear that verification and debug of simpler sharing modes allows for faster isolation of potential processor design flaws.

1.1 Non-sharing

This is the simplest possible memory division for multiprocessor systems. In this mode, multiple processors exist in the same system, but memory is divided in such a way that no two processors access a memory address that is in the same block of memory. Blocks are typically defined in such a way that cache lines are isolated, preventing interactions between the caches of the various processors. This eliminates cache coherency issues from the verification problem space.

1.2 False sharing

At this level, multiple processors can access memory in the same memory block (or cache line) as another processor, but no two processors will access the exact same addresses. This allows for some checking of cache coherency but without the introduction of nondeterministic behavior or the need for semaphores. These tests can be run quickly and debugged easily.

1.3 Deterministic True Sharing

This mode is implemented through the use of semaphores. Memory accesses by multiple processors to the same address are allowed but are controlled such that processor state is deterministic at each semaphore boundary. The nature of the access, whether it is a read or write, is key to this mechanism. This level of multiprocessing allows for testing of most multiprocessor issues including cache coherency and is easier to detect and isolate failures than non-deterministic truesharing (true-sharing without semaphores). Crossmodifying code can be supported at this level of sharing.

1.4 Non-deterministic True Sharing

In this mode, processors read and write to identical memory locations in such a way that the values read from memory and propagated into the registers cannot be determined by simulation in a non-cycle accurate test bench. Even given such a test bench, stochastic variables may influence the behavior of the system so that results vary with each execution. This makes this level of verification much more difficult but the processors will be fully exercised. Nevertheless, non-deterministic behavior must be limited or controlled in some fashion otherwise the test becomes meaningless.

1.5 Multiple Sharing Modes

In general, processors in a multi-processor test do not need to be restricted to the same type of sharing. For example an eight processor system might contain one processor in non-sharing mode, three in false sharing mode, two in deterministic true sharing and two in nondeterministic true sharing. During generation of each processor, the sharing allowed for all other processors would need to be taken into consideration and accesses correctly restricted. For example, a deterministic truesharing processor would be allowed to hit in the same cache line with a false sharing processor but not at the same address – even though the current processor allows true sharing.

Allowing independent sharing modes complicates and slows test generation and does not add to either ease or completeness of testing.

2. Processor Generation

Two approaches can be taken to processor generation, either round-robin generation or sequential generation. In two processor round robin generation, for example, test generation starts with CPU1 which generates a single instruction then continues to CPU2 which also generates a single instruction. Generation would then return to CPU1 for its second instruction and so on.

In sequential generation, the entire instruction sequence for each processor is generated in turn. Each processor has knowledge only of previously generated processors. For example, in a four CPU test, CPU1 would have no knowledge of any other processors’ accesses during generation whereas CPU3 would have knowledge of CPU1 and CPU2 but not of CPU4.

These two approaches are approximately equivalent. In the round robin mode, the higher level of communication during generation results in increased overhead for all processors for all instructions, and becomes increasingly more severe as the number of instructions increases. In the sequential generation mode, the first CPU generates as easily as in a single CPU test. Each addition CPU encounters more restrictions on memory usage and this results in subsequent slowdown.

The sequential mode generation method has the potential to allow expansion of existing MP tests or the replacement of later CPUs with newly generated CPUs. For example, a two CPU test could be expanded to four CPUs without the need to regenerate CPUs 1 and 2. Or CPUs 3 and 4 in a four CPU test could be replaced with newly generated CPUs. This flexibility results from the fact that each CPU generation depends only on those processors generated earlier in the sequence and not on higher numbered CPUs.

The RAVEN test generator implements sequential mode generation. The decision to implement this was based largely on ease of programming. The current RAVEN implementation does not support the addition of or replacement of CPUs, although this could be added with simple changes to the user interface. This is facilitated by RAVEN’s use of a central multi-processor test configuration file (the .mp file) and individual processor test configuration files. The .mp file contains information about the expected number of processors, the names of the configuration files for each of the processors and the memory map, as discussed in the next section. Processor configuration files can be modified or substituted as long as they satisfy a few constraints, such as compatible sharing modes.

3. Division of Memory

In multi-processor test generation, the memory usage of each processor must be controlled in a way that meets the sharing requirements of the test. There are two possible approaches to this. In the first approach, all or most memory is assumed to be shared. In this paper this method will be referred to as On-the-Fly Memory Division. In the second approach, memory is divided into regions that are allocated to one or more processors. This method will be referred to as Regional Memory Division.

3.1 On-the-Fly Memory Division

On-the-fly memory division needs to address two main issues: 1) the need for large blocks of data to support some data structures and 2) the need to mark critical data as non-sharable. In all randomly generated tests there are some data blocks – such as system tables – that exist as large, contiguous, memory blocks. A processor may create such a data structure and yet not initialize all memory within the structure. An example of this would be a page table where not all entries in the table are accessed. Processors generated later in the sequence can potentially use these “holes” in memory for their own smaller data structures. This behavior is acceptable as long as it meets sharing constraints. However, the more memory accesses per processor and the more processors, the fewer large blocks of unaccessed memory will remain. This can result in significant slow-down of processors generated late in the sequence for sequential generation or for instructions generated late in the test for round-robin generation. Some data structures can be very large and the inability to find a free memory block may cause the test to fail to generate. Again, an example of this is a page table. When the table is created for the first page, it is not easy to predict or restrict the generation of accesses to memory in other pages in the same page table. If the generator is unable to find a large enough block of free memory to contain the entire table, it may either fail to generate or suffer the considerable overhead of checking each access to ensure that the page table entry can be written.

Data in page tables, interrupt tables, task state segments and other system tables cannot be allowed to become non-deterministic or the behavior of the processor – and the test – will become widely divergent depending on the exact sequence of accesses by the different processors to the critical memory region. This issue arises in nondeterministic true sharing tests. Because of this unpredictable behavior, there must be a method of marking memory blocks as reserved for a given processor.

Marking critical memory blocks as non-shared is not a difficult procedure. However, each memory write by any processor needs to check sharing status for all accessed memory. This results in additional overhead, slowing generation, and does not provide for interesting behavior.

3.2 Regional Memory Division

In regional memory division, memory must be allocated in a way that allows for the appropriate mode of sharing, allows processors to have private memory and prevents processors from randomly writing over each other’s code. This is the method implemented in RAVEN.

RAVEN handles these issues through an extension of its memory map construct. In a single processor test, RAVEN allows for the division of memory into named regions. These regions are collectively referred to as the memory map. Each memory region can be configured to support instructions, data or both, along with other relevant attributes such as caching constraints and paging attributes. Regions defined in the memory map direct the test’s memory usage.

This same construct has been expanded to support multiprocessor systems. In addition to the normal region attributes, MP memory maps also specify sharing characteristics. The only memory sharing characteristic currently maintained in the .mp file is the list of processors that can use each of the defined regions. A region can be defined for use by a single processor or by any subset of processors. For MP tests, the memory map is maintained in a file used by all processors rather than having a copy of the map in the configuration file for each processor. This prevents the use of conflicting memory maps.

The generator uses the defined memory regions when memory addresses are needed during random generation. Using user configurable biases, the generator first determines if a shared access is desired. If no sharing is selected for the current access, a memory region defined as non-shared is used. If sharing is selected, the generator queries the settings to select the processor to attempt to share memory with. A memory region is then chosen that is shared both by the current CPU and the “target” CPU.

Once the memory region for the access has been determined, the generator determines the type of sharing desired. In the current RAVEN implementation this is based solely on the configuration of the current CPU. To support different sharing modes in the different processors, the currently generating CPU would need to know the sharing configuration of the previously generated processors.

4. Shared Memory Usage Information

In order for a sequentially generated processor in a multiprocessor test to correctly and actively generate accesses to shared memory, knowledge of previous processors’ memory accesses must be forwarded to the currently generating processor. The exact nature of the information required is dependent on the type of sharing.

For non-sharing multi-processor tests, only memory regions designated as non-shared will be used by any processor. Since this memory cannot be used by another processor, no data concerning memory accesses needs to be passed forward.

For false sharing multi-processor tests, the exact addresses used by a processor must be passed forward to later-generated processors. The nature of these accesses, whether they are reads or writes, is not important in this case. Since no address used by the current processor will be used by another processor, data integrity is not an issue. The currently generating processor uses the passed forward list of addresses to attempt to generate accesses to similar addresses, resulting in hits to the same cache line.

In deterministic true sharing, a complex set of rules must be followed regarding memory accesses in order to ensure that non-deterministic data is not accessed. In these tests, each semaphore creates a “zone” separating instructions executed before the semaphore code is reached from those that are executed after the semaphore check is passed. Each memory access is then associated with the zone during which it occurred. Note that a zone is not a memory or time division but a division based on code execution. In addition, the access must be marked as a read, a write, a read followed by a write or a write followed by a read. This data must be passed forward to later generated processors so that only deterministic accesses will be made to shared memory. This is discussed in detail in a later section.

In non-deterministic true sharing, all reads from shared memory are assumed to be non-deterministic. The nature of the accesses of previous processors is therefore not used in the generation of future processors. For this case, all that needs to be passed forward are the addresses of the accesses.

At the end of each processor generation, RAVEN creates a memory access map. This map is a list of every address used during the test. If this processor is not the first one generated, the accesses for this processor are merged with those from previous processors into a single access map. The map is written at the end of the test output file. This implementation requires the generator to read the test output file for the most recently generated CPU before beginning generation of the current CPU. Communication of accessed addresses forward to later-generated processors could also be accomplished by having the generator store this information in some internal data structure. Although this method might be marginally faster, RAVEN did not choose this implementation for two reasons. Firstly, this would restrict the use of previously generated CPU tests as a starting point. Secondly, the data present in the memory map for each processor is useful for the verification engineer during debugging.

5. Test Generation

5.1 Generation of Non-sharing Tests

For regional memory division generators, non-sharing multi-processor tests, while useful during verification, do not present unique problems for the generator. The single constraint imposed on the generation of processors in these tests is that they do not use any memory region that is marked as shared.

For on-the-fly memory division generators, each memory access would need to check memory at adjacent addresses, depending on the cache line size, to determine if another processor has already made an access to this cache line. If no such access has been make, the generator can use the address. If another processor has accessed the same cache line, the address would be rejected.

5.2 Generation of False Sharing Tests

The goal of false sharing multi-processor tests is to cause multiple processors to access memory addresses near addresses accessed by another processor. This should result in the loading and flushing of cache lines and, depending on processor architecture, communication between the various processor caches.

For on-the-fly memory division, false sharing is much like non-sharing. The same type of memory check is required but only the exact addresses accessed by the current processor would be prohibited from having been accessed by previously generated processors.

In regional memory division generators, processors generated for false-sharing tests have relatively few restrictions. In RAVEN, a user configurable parameter is provided that determines the percentage of memory accesses that will hit in a shared region. When an access is selected as “shared”, the generator always attempts to hit near another processor’s access. Because of constraints in the register and memory contents and the nature of the instruction being generated, this may not always be possible.

Because RAVEN uses sequential processor generation, each processor can only attempt to hit addresses near a lower-numbered CPU. This is not a restriction however since processors generated later will attempt to hit near all CPUs with equal probability. For example, in a two CPU test, during CPU1 generation, the generator will not be able to find any shared locations that hit within the same cache line as another processor. However, CPU2 will attempt to cause cache line hits near CPU1 accesses. Since CPU2 has knowledge of all of CPU1’s accesses, some CPU2 accesses will occur both before and after accesses made by CPU1.

5.3 Generation of Deterministic True Sharing Tests

In deterministic true sharing, code execution is divided into sections called zones. Each zone begins at a semaphore, or at the beginning of the test, and ends at a semaphore, or at the end of the test. If there are n semaphores there are therefore n+1 zones. Each processor executes the instruction stream within a zone and then enters the semaphore code sequence. In the semaphore code, the processor decrements the semaphore count using an atomic instruction, checks the value of the semaphore count, and loops until the count reaches zero. Therefore all memory accesses made in a zone by all processors are guaranteed to have taken place before the first instruction in the next zone is executed.

Deterministic true sharing can be implemented using either the sequential generation or round robin generation methods. Certain rules need to be applied to the memory accesses to prevent non-deterministic behavior. These rules differ slightly between the two methods of generation.

5.3.1 Sequential generation

The memory access rules that apply to overlapping shared memory accesses for sequential generation are as follows:

  1. 1.  A processor may read an address if:
    • No previously generated processor has written to this address OR
    • In the last zone where this address was written, only a single processor performed a write OR
    • The current processor has written to this address in this zone and no other processor has performed a write to this address in this zone
  2. A processor may write an address if:
    • No previously generated processor performs a read to this address in this zone AND
    • No previously generated processor performs a read to this address in a following zone unless there is an intervening write from a previously generated processorThese rules ensure that non-deterministic data is not read by any processor. Because the processors are generated sequentially, the data read by early processors must not be modified by later-generated processors.

      These early processors were generated without knowledge of potential writes by later processors. It is possible to modify these rules to gain some greater flexibility. For example, a new rule could be inserted before the previously stated rule 2 as follows.

  3. A processor may write an address if:
    • The data to be written is identical to the current data in memoryWhile this might appear to provide added randomness to the test, it does not enhance the verification processes. If the only data that can be written is the data that is already there, it is unlikely that a behavioral error can be detected.

5.3.2 Round-robin generation

The rules for round-robin generation are similar to those for sequential generation, however some restrictions are relaxed. The rules for round-robin generation are shown below.

  1. A processor may read an address if:
    • No other processor has written to this address in this zone AND
    • In the last zone where this address was written, only a single processor performed a write OR
    • The current processor has written to this address in this zone and no other processor has performed a write to this address in this zone
  2. A processor may write an address if:
    • No other processor performs a read to this address in this zoneThese rules demonstrate the strength of round-robin generation over sequential generation. The relaxed rules allow a round-robin generated test to contain more shared accesses than a similar sequentially generated test.

5.4 Generation of Non-deterministic True Sharing Tests

In non-deterministic true sharing, there is no knowledge about the order of execution of instructions in different processors. Deterministic true sharing provides this information through the use of semaphores. After each semaphore, the generator is guaranteed that all instructions in previous zones for all processors have been executed; this includes all memory reads and writes. Removing the semaphores removes this information. Any processor that reads a data address written by another processor cannot determine if the read is done before or after the write.

When non-deterministic data is read into a register, the register must be marked as non-deterministic. This means that future instructions that use this register as a source will have unpredictable behavior. Non-deterministic behavior may also be propagated into the processor’s status or flags registers, depending on the nature of the instruction that uses the non-deterministic data. The generator may mark the entire flags register as nondeterministic or may mark individual bits as nondeterministic. Marking the entire register requires that the generator have knowledge of which instructions modify any flag bits. Marking the individual bits requires the generator to have knowledge of all flag bits that are potentially modified by every instruction. Since most instructions that operate off flags do so using a limited set of flag bits, tracking the individual bits results in greater flexibility in instruction choice.

5.4.1 Determining non-determinism

For regional memory division, all reads from shared memory regions are assumed to be non-deterministic. Whether using sequential or round-robin generation, it is not possible to tell at the time an instruction is generated whether another processor will write to a particular memory address. In the round-robin case, even though the processor tests are generated using a stepping process, it cannot be assumed that execution of the processors in a cycle-accurate simulator or in hardware will exhibit this same lock-step behavior.

On-the-fly memory division generation always assumes that all non-reserved memory accesses are shared. In nondeterministic true sharing, all of these accesses must be assumed to be non-deterministic.

In sequential generation, only the last processor to generate could know whether there are writes by other processors to a given address because only this processor has full knowledge of the memory usage of all of the other processors. This information could be used to limit the marking of memory or registers as non-deterministic. There are three reasons why this is probably not worthwhile. Firstly, any generation modification would apply to only one processor. Secondly, this would prevent the expansion of the MP test to a larger number of processors while using the previously generated processor tests. Thirdly, the generator is attempting to share as much as possible resulting in few non-shared accesses.

5.4.2 Restrictions on non-deterministic registers

Registers marked as containing non-deterministic data can propagate non-determinism through the processor. For example, an ADD instruction with one deterministic register operand and one non-deterministic register operand can result in the propagation of non-deterministic source data to the destination register or memory location. This same instruction can also cause certain processor flags, such as those indicating equality, negative values or zeros, to become non-deterministic. In order to allow reasonable processor execution and random behavior, such propagation must be allowed.

Restrictions apply to the use of registers containing nondeterministic data under two circumstances: 1) where the non-deterministic nature of the data may affect the instruction stream and 2) where the non-deterministic data would be used in the generation of a data address.

Case (1) can result from exceptions, which may be allowed to a limited extent. It may be that not all exception handlers return to the excepting instruction. Where the exception handler does return to the excepting instruction, the potential for an exception may be allowed, providing that execution of the exception handler does not result in unacceptable non-deterministic behavior. In the case where the exception does not return to the excepting instruction there are two cases. If the processor has a fixed instruction length, such exceptions may be allowed. If the processor has a variable instruction length it is frequently impossible, or requires an unacceptably large handler, to determine the length of the excepting instruction. In this case, the exception handler normally jumps to a known safe location – a location known to be past the end of the longest instruction that could have caused the exception. This potentially results in a gap in the addresses of instructions in the main code stream. However, if the exception is not taken, the next instruction will fall immediately after the current instruction with no gap. This means that the two potential code execution paths can result in the processor executing at an address that is not aligned with the initial generated code.

Case (1) may also result from conditional jumps based either on non-deterministic flag values or on the values of non-deterministic registers. These instructions can be restricted to execute only when the flags or data are deterministic. Alternatively, a non-conditional jump to the same address as the conditional jump could be inserted directly after a non-deterministic conditional jump. This would result in the possible execution of one extra instruction before both potential streams converged. However allowing an exception on the conditional branch or the non-conditional branch instruction would result in a much more complex problem space and should be avoided.

Case (2) allows for the use of non-deterministic values for the generation of data addresses. This means that the address to read or write will not be known. This may be acceptable if all potential addresses are acceptable. This would mean that the address is not being used for a system function – for example a system table fetch or write – and that no potential address falls within a reserved, non-shared region.

Although all of this support is theoretically possible, RAVEN does not support any of the exotic cases presented as case (1) and case (2) above. Supporting these conditions would result in a greatly complicated generator and much slower generation. Instead, RAVEN inhibits the use of non-deterministic registers in address generation, prohibits the use of non-deterministic flags for conditional jumps and disallows instructions that may potentially cause exceptions based on non-deterministic data. Although this limits some of the random behavior of the test, the faster generation time and ease of code maintenance are seen as offsetting benefits.

5.4.3 Restoring deterministic values

Using some algorithm, the generator must restore registers to a deterministic state. This can be done periodically – every n instructions – or based on the current processor condition – how many registers contain non-deterministic data or which registers contain nondeterministic data. In RAVEN restoration of deterministic values is based on user configurable settings that govern the percentage of registers of different classes that are allowed to contain non-deterministic data. As each class of registers passes the allowed ratio of nondeterministic to deterministic registers, a macro is executed that loads the registers using immediate values, if possible, or reserved non-shared memory locations.

Since all memory within shared regions is assumed to be non-deterministic at all times, memory is not restored to deterministic values.

6. Conclusions

Although this paper has covered many issues concerning random test generation of multi-processor systems, there are many areas that remain uncovered. The approach taken by RAVEN is a single example of a successful system. There are many other possible approaches, each with it’s own strengths and weaknesses. Some of these may be more appropriate to one processor or system than the RAVEN approach. However RAVEN strikes a good compromise between performance, random behavior and flexibility.