Skip to main content

Testing your responsibilities

Updated over 2 weeks ago

To ensure you have ample opportunity to understand how your AI agent might respond in a given situation, we’ve developed a “test harness” to observe your current agent’s configuration. In short, when you’re using the test harness you can simulate various scenarios and see how the agent behaves.

This feature ensures you have a chance to see examples of how your agent may respond once it’s fully deployed and adjust your configuration (if needed) based on your results.

How testing works

The test harness allows you to see how a responsibility responds to specific inputs, context, and events. It designed to mirror constituent interactions so you can evaluate whether your configurations produce accurate, safe, and helpful outcomes.

Reminder: Keep in mind that your agents are non-deterministic, so they won’t respond the exact same way to the same question every time, just like a person!

During testing you’ll get a sense of how your agent will handle the responsibility, so focus less on the precise language used and instead try to identify patterns in the agent’s responses that you may want to further refine.


There are two ways to run tests. Each provides a different lens on how your responsibility behaves.

Manual simulation (single-step testing)

Manual simulation helps you understand how the responsibility reacts in a specific moment. It is similar to asking, “How would the agent respond if a constituent took this action?”

With manual simulation, you can:

  • Select a constituent profile to act as the requester

  • Trigger an event, such as an email or document submission

  • Watch the agent’s next action, whether it drafts a response, escalates, schedules an alarm, or does nothing

With proper training and configuration, most testing results will be satisfactory and should give you the confidence you need that your agent will behave as expected.

If your testing result doesn’t meet your expectations, have no fear! You can adjust the responsibility’s behaviors, tools, or event bindings and test again. This mode is ideal when validating discrete behaviors or fine-tuning instructions.

Auto simulation (multi-step testing)

Auto simulation shows how a responsibility performs over time across several stages. It evaluates not only how the responsibility reacts to a single event, but how it behaves throughout a sequence of events.

In this mode, you can:

  • Generate a batch of simulated constituents

  • Give each constituent a series of “steps” where the responsibility can decide whether to act

  • Observe how the responsibility updates its decisions as new context appears

Auto simulation is especially useful for responsibilities that rely on events over time, such as scheduled follow-ups, multi-step requests, or conditional decision paths.

When to use each mode

Goal

Recommended Mode

Validate how an agent responds to a single message or event

Manual simulation

Check that behavior matches expectations before launch

Manual simulation

Test multi-step workflows from start to finish

Auto simulation

Evaluate consistency across many simulated constituents

Auto simulation

Technical notes

  • Both modes use the same underlying event-driven engine that powers live responsibilities.

  • The harness displays reasoning steps and outcomes so you can verify that the responsibility is using the correct resources.

Next steps

After testing your responsibilities, you can continue refining behaviors, adjusting event bindings, or updating tools. If you need guidance or want help troubleshooting unexpected results, contact your partnership team or the CollegeVine Success Team at [email protected].

Did this answer your question?