SWF
Visual analysis tool for the interpretation of geometric symbologies and wave frequencies.
As a software architect, I faced the challenge of building a system capable of interpreting complex visual information—specifically, geometric symbologies and wave frequencies in the context of Dakila technologies—and crossing it with a highly specialized knowledge base. The fundamental problem lay in the difficulty of processing unstructured visual artifacts and accurately correlating them with extensive documentary records, without incurring hallucinations on the part of the language models.
To solve this, I designed a progressive web application (PWA) that acts as a bridge between computer vision and Retrieval-Augmented Generation (RAG). The system is not a simple chatbot; it is an inference engine that orchestrates multimodal models to extract characteristics from images (geometric patterns, lines, curves, colors) and uses those attributes as search vectors against a specialized vector database.
Main Operating Flow
From an operational perspective, the system executes a deterministic sequence in three main phases:
- Analytical Ingestion: The user provides an image and, optionally, context textual. The system processes and temporarily stores the visual file using Vercel Blob, preparing it for multimodal analysis.
- Extraction and Vector Search (RAG): The AI agent (powered by Mastra and foundation models) executes an initial visual analysis to extract metadata from the image. Immediately after, the system vectorizes these findings and queries the internal knowledge base. This step ensures that any subsequent statements are strictly anchored in official literature.
- Synthesis and State Return: The engine consolidates the visual findings with the retrieved records, generating a structured response. The conversation maintains state throughout the session using a dedicated memory system, allowing follow-up iterations on the same visual artifact.
Architectural Dissection
To build this system, I opted for a decoupled modular architecture, prioritizing performance and a clear separation of responsibilities between the presentation layer and cognitive orchestration.
General Architecture
The project follows a Decoupled Service-Oriented Architecture pattern within an Astro-based ecosystem. The underlying technical reason for this decision is to isolate the computational complexity of the AI agent from the user interface rendering, guaranteeing scalability and maintainability.
- Presentation Layer (Frontend): Built with Astro and React. I opted for Astro because of its “islands” philosophy, which allows hydrating complex interactions (such as 3D visualizers or the chat interface) only when necessary. The interface is orchestrated by high-level components that delegate rendering to specialized subcomponents.
- Orchestration and API Layer: Exposed through secure API routes acting as middleware. This layer handles file uploads, real-time distributed configuration verification, and streaming request delegation to the underlying cognitive engine.
- Cognitive Engine (Agentic Layer): Represents the core of the domain. I configured an autonomous Agent equipped with vector query tools and multimodal capabilities, completely encapsating the inference logic and RAG interaction.
graph TD
%% Architecture Diagram
User([User]) --> |Uploads Image / Message| UI[UI Presentation Layer\nReact / Astro]
UI --> |FormData| API[API Middleware\n/api/analyze.ts]
subgraph Edge Infrastructure
API --> |Upload| BlobStore[(Vercel Blob)]
API --> |State Verification| EdgeConfig[(Edge Config)]
end
API --> |Stream Request\nContext + Image URL| Agent[Cognitive Engine\nMastra Agent]
subgraph Cognitive Layer
Agent --> |Entity Extraction| LLM[Google Gemini Multimodal]
Agent --> |Vector Query| VectorTool[RAG Search Tool]
VectorTool --> |Embeddings Text-004| VectorDB[(LibSQL Vector Store)]
Agent --> |Context Management| Memory[(LibSQL Store\nMastra Memory)]
end
LLM --> |Synthesized Response| Agent
Agent --> |Server-Sent Events| API
API --> |Stream| UI
Data Modeling and State Management
The application demands strict control over the conversational state and the processed artifacts. To achieve this, I implemented a persistence model based on execution threads.
- Frontend State Management: Handled centrally through React hooks and context containers, maintaining a strict unidirectional flow of data to the rendering components (chat, interactive visualizer, visual feedback).
- Backend State Management: The agentic framework uses transactional memory modules to autonomously persist the message history in an embedded database, linking them through unique identifiers.
erDiagram
%% Data Model
THREAD {
string threadId PK
datetime createdAt
}
MESSAGE {
string messageId PK
string role "user | assistant"
text content
string threadId FK
}
RESOURCE {
string resourceId PK
string publicUrl "Blob Storage Access URL"
}
KNOWLEDGE_CHUNK {
string chunkId PK
vector embedding "Dimension: 768"
text content
string source
}
THREAD ||--o{ MESSAGE : contains
THREAD ||--o| RESOURCE : contextualizes
Technology Stack
The selection of tools in this project represents a meticulous balance between theoretical innovation, development speed, and execution efficiency.
| Layer / Domain | Technology | Technical Justification and Role |
|---|---|---|
| Core Framework | Astro + React 19 | Astro provides efficient routing and an optimized rendering model. React manages the reactive state in interactive islands. |
| Styles and Interface | Tailwind CSS v4, shadcn/ui, Framer Motion | Utilitarian design system, accessible components without coupling, and high-performance declarative animations. |
| 3D Rendering | react-three-fiber, @react-three/postprocessing | Declarative abstraction of WebGL to render complex scenes and visual effects linked to the analyzed features. |
| Cognitive Engine | Mastra Framework (@mastra/core) | Orchestrator of AI agents and workflows. Defines the tools schema, memory management, and base analysis instructions. |
| AI Models | Google Gemini (2.0 Flash / 3.0 Pro) | Foundation engines responsible for visual analysis, high-dimensionality embeddings generation, and natural language synthesis. |
| Inference and RAG | Mistral AI (OCR) | Documentary ingestion processing using advanced optical recognition for structuring PDFs prior to vectorization. |
| Data Persistence | LibSQL / Vercel Edge Config / Vercel Blob | LibSQL operates dually as a transactional and vector engine. Edge Config and Blob manage distributed configurations and static binaries. |
Architectural Impact
The implementation of this design has consolidated a cohesive ecosystem where multimodal analysis operates in an integrated manner, guaranteeing high fidelity in responses thanks to a deep Retrieval-Augmented Generation architecture. By moving the inference logic to an autonomous agents environment and abstracting storage through native Edge embedded databases, I have isolated the presentation layer from the computational bottlenecks typical in generative AI systems. The result is a system that scales cleanly and deterministically while maintaining a rigorous architectural consistency in every data flow.