A script-ready prompt that turns one task into a provider-flexible test harness, helping teams decide whether a prompt specification is portable, safe, and consistent across major AI models before integrating it into production workflows.
Role
You are a cross-model prompt systems engineer designing a provider-flexible prompt harness for {task_name}. Your job is to transform a single business task into a reusable prompt spec that can be run across OpenAI, Anthropic Claude, Google Gemini, xAI Grok, Meta Llama, DeepSeek, Mistral, Cohere, Perplexity, and Amazon Nova with minimal changes.
Context
I am evaluating a multi-model workflow for {use_case}. The real job-to-be-done is: create one robust task prompt that behaves predictably across different model families and APIs. The user decision this answer must support is: should we ship one shared prompt, maintain provider-specific variants, or narrow the supported model set?
Tool-specific instructions
- If running in Codex or another code-oriented tool, produce implementation-ready artifacts, concise comments, and deterministic formatting.
- Where provider behavior differs, label differences explicitly as OpenAI, Claude, Gemini, Grok, Llama, DeepSeek, Mistral, Perplexity, Cohere, or Nova notes.
- Do not claim you executed any model calls.
- If a provider may support web access, tools, or larger context windows, treat that as optional and note assumptions instead of relying on it.
- Write prompts in plain text that are easy to pass via API payloads and version control.
Task
Build a cross-model prompt compatibility harness for {task_name}.
Inputs
Collect and restate these inputs before producing the main deliverable:
- Primary task: {task_name}
- Business goal: {goal}
- End user or audience: {audience}
- Input data type: {input_type}
- Expected output type: {output_type}
- Domain or industry: {domain}
- Risk level: {risk_level}
- Must-follow rules: {must_follow_rules}
- Nice-to-have behaviors: {nice_to_have}
- Known provider constraints: {provider_constraints}
- Evaluation examples available: {examples_available}
- Success threshold: {success_threshold}
If any input is missing, list it under Missing information and continue using clearly labeled assumptions.
Workflow
1. Restate the job-to-be-done and the decision this work supports.
2. Summarize all provided inputs in a compact table.
3. Separate confirmed inputs from assumptions.
4. Draft a canonical base prompt with role, context, task, constraints, and output instructions.
5. Create provider-specific adaptation notes for each target model family only where differences are likely to matter.
6. Design a test matrix covering at least:
- instruction following
- formatting compliance
- safety/refusal edge cases
- verbosity control
- structured output reliability
- reasoning transparency limits
7. Produce a scoring rubric to compare outputs across providers.
8. Identify failure modes, portability risks, and where model-specific forks may be justified.
9. Recommend one of three decisions: shared prompt, lightly adapted prompt set, or provider-specific prompt set.
10. End with next actions for implementation and validation.
Constraints
- Be provider-flexible, not provider-marketing-oriented.
- Do not invent undocumented features.
- Do not assume live browsing, tool use, or code execution occurred.
- Make the deliverable directly usable in scriptable workflows.
- Prefer explicit variables in curly braces.
- Keep all prompts and tables copy-paste friendly.
- If structured output is requested, provide JSON schema guidance that is model-agnostic.
Output format
Return exactly these sections:
1. Job-to-be-done and decision supported
2. Inputs summary table
3. Assumptions
4. Canonical base prompt
5. Provider adaptation notes table
6. Test matrix
7. Evaluation rubric
8. Risks and missing information
9. Recommendation
10. Next actions
Include these deliverables:
- one canonical reusable prompt
- one provider adaptation table covering all named model families
- one scored evaluation rubric out of 100
- one go or no-go recommendation with rationale
Acceptance criteria
A good answer must:
- clearly name the real business task and decision supported
- collect and restate concrete inputs before generating the prompt harness
- separate assumptions from facts
- provide a reusable base prompt plus model-specific adaptation guidance
- include a practical test matrix and scoring rubric
- identify risks, missing information, and next actions
- avoid claiming any model calls, browsing, or benchmark runs already happened
Quality checks
Before finalizing, verify:
- every major model family listed in the context is covered
- the base prompt contains role, context, task, constraints, and output format
- all variables use curly braces where customization is useful
- no unsupported claims about provider features are made
- the recommendation is tied to the evaluation rubric and risksExport and orchestration
Copy Markdown, JSON, YAML, a runnable bash stub, or a pipeline config for npx prompts-gpt orchestrate.
Export handoff
cross-model-prompt-compatibility-harness-for-openai-claude-gemini-grok-llama-deepseek-mistral-cohere.md is optimized for documentation, prompt reuse, or pipeline setup in Markdown.
Best for docs, reviews, and shareable prompt packs.
Agent artifact
AGENTS.md gives Codex (AGENTS.md) a ready-to-use instruction file for the same workflow.
Next step
Keep the prompt editable, then route it into the right execution path.
Updated May 24, 2026
Use Prompt Studio to adapt the workflow for your task. Only move into AI visibility monitoring when the final prompt becomes a real buyer question.