Analysis

ArchitectureURL copied

The workbench is a five-layer pipeline. Each layer has a single responsibility and hands off to the next via a structured intermediate representation — a migration object that carries the source artifact, parsed policies, candidate mappings, generated code, and confidence scores through the pipeline.

The ingestion and parsing layer accepts a source artifact on disk — an Apigee XML bundle or a Tibco BW project folder — and decomposes it into a structured policy list. Each policy becomes a discrete unit of work: type, configuration, position in the chain, any custom scripting. This layer is rule-based, not AI-assisted; the formats are documented and parseable without an LLM.

The policy RAG layer takes each parsed policy and generates an embedding via OpenAI, then queries the pgvector index in Postgres for the most semantically relevant Mulesoft equivalents. The index contains Mulesoft connector docs, Apigee and Tibco migration guides, and reviewed migration examples accumulated over time. The layer returns a ranked list of candidate Mulesoft equivalents with retrieval scores.

The policy mapping layer takes the RAG candidates and uses an LLM to select the best match, generate the mapping rationale, and compute a confidence score. This is where the core AI reasoning happens. The model routing layer sits here: simple policy types (HTTP proxy, basic auth, logging) route to a lightweight model via OpenRouter; complex types (custom scripts, quota chains, conditional routing) route to a more capable model. The routing boundary is configured, not hardcoded — it is expected to evolve as the corpus grows and the team learns which policy types are reliably handled by lighter models.

The migration execution layer assembles the per-policy mappings into a complete Mulesoft project: XML configuration, DataWeave transformations, and connector wiring. It aggregates confidence scores across the full policy set to produce a migration-level score and flags any policy whose score falls below the configured review threshold.

The human review gate surfaces flagged policies to the developer with full context — source policy, generated output, score, and reason. Approved items are finalised; corrections are written back to the policy corpus and the pgvector index in Postgres.

flowchart TD
    A["Ingestion & Parsing<br/>XML / BW project → policy list"]
    B["Policy RAG Layer<br/>pgvector semantic search<br/>→ candidate mappings"]
    C["Policy Mapping Layer<br/>LLM: select match,<br/>generate code, score"]
    D["Model Routing<br/>lightweight ↔ heavyweight<br/>via OpenRouter"]
    E["Migration Execution<br/>Assemble Mulesoft project<br/>aggregate confidence"]
    F{"Review Gate<br/>score threshold"}
    G["Completed Queue<br/>migration done"]
    H["Needs-Review Queue<br/>flagged policies +<br/>reason annotations"]
    I["Human Reviewer<br/>approve / correct"]
    J["Corpus Feedback<br/>corrections → pgvector index"]

    A --> B
    B --> C
    C --> D
    D --> C
    C --> E
    E --> F
    F -->|high confidence| G
    F -->|low confidence| H
    H --> I
    I -->|approved| G
    I -->|corrected| J
    J --> B

    style A fill:#E3F2FD,color:#0D47A1
    style B fill:#E3F2FD,color:#0D47A1
    style C fill:#E3F2FD,color:#0D47A1
    style E fill:#E3F2FD,color:#0D47A1
    style D fill:#FFF9C4,color:#F57F17
    style G fill:#E8F5E9,color:#1B5E20
    style H fill:#FBE9E7,color:#BF360C
    style J fill:#F3E5F5,color:#4A148C

2	✦
3	✦ > Technology Stack
4	✦ > What the System Must Do
5	✦ > Where AI Adds Value
6	✦ > Architecture
7	✦ > What Is Hard > Behavioral Equivalence Validation
8	✦ > What Is Hard > Policy Complexity Gradient and Model Routing Boundary
9	✦ > What Is Hard > RAG Coverage Gaps
10	✦ > What Is Hard > Corpus Cold Start
11	✦ > What Is Hard > Human Review UX
12	✦ > Feasibility Verdict
13	✦ > Why These Outputs — Nothing Missing?
14	✦ > Build Order
15	✦ > Open Questions