1 min read

Autonomous AI Agent for End-to-End Component Data Extraction

Autonomous AI Agent for End-to-End Component Data Extraction
Autonomous AI Agent for End-to-End Component Data Extraction
1:14

1. Objective

  • Streamline: complex, error-prone manual data entry
  • Reallocate: engineering talent to high-value innovation
  • Automation: Achieve end-to-end automation for component administration

2025 GTC_poster_Objective

2. Challenges

  • Interpreting ambiguous engineering terminology
  • Requiring specialized engineering knowledge
  • Handling inconsistent specification formats
  • Parsing complex engineering graphics

3. Our Solution

3.1 Solution Overview
  • Developed multi-phase data processing to enhance engineering graphic recognition
  • Implemented AI-driven engineering data collection and analysis
  • Integrated on-premise and public cloud resources
3.2 Intelligent Processing Pipeline

Transforms unstructured engineering content into structured, multimodal data

2025 GTC_poster_Intelligent Processing Pipeline

3.2.1 Data Preprocessing
  • Unifies text, tables, and images into a single, context-rich JSON for LLM
  • Applies OCR to extract and map image text to its corresponding parent page
  • Loads customizable, component-specific attribute lists for direct extraction
3.2.2 Retrieval Augmentation
  • Implements Hybrid Search (keyword + vector) for high-precision retrieval
  • Utilizes Parent Page Retrieval to preserve critical context
  • Employs a cross-encoder reranker to optimize semantic similarity and result relevance
3.2.3 Context Engineering
  • Utilizes one universal prompt to handle all component types
  • Embeds engineering terminology intent directly into the model
  • Pinpoints correct attributes by component model name (especially dimensions)
3.3 Data-Oriented Hybrid Cloud
  • On-premise AI platform powered by OCP Grand Teton servers (8x NVIDIA H100 GPUs)
  • Software stack built on NVIDIA AI Enterprise, running NVIDIA Dynamo
  • Intelligently escalates high-demand workload to public LLM models (GPT, Gemini, Claude)
  • Architecture balances data security, cost, and peak performance via intelligent distribution

2025 GTC_poster_ Data-Oriented Hybrid Cloud

4. Key Achievements

  • Efficiency: 83% reduction in component processing time (2 hours to 20 minutes)
  • Autonomy: Fully autonomous, end-to-end AI agent for component database automation
  • Scalability: Scalable engineering documents pipeline for knowledge distillation
  • Reliability: Hybrid infrastructure for non-stop service

 

2025 GTC_poster_Key Achievements

References

Autonomous AI Agent for End-to-End Component Data Extraction

2 min read

Autonomous AI Agent for End-to-End Component Data Extraction

1. Objective Streamline: complex, error-prone manual data entry Reallocate: engineering talent to high-value innovation Automation: Achieve...

Read More
White Paper: From Design to Live Operation: Wiwynn’s L12 AI Cluster Deployment with MLPerf Validation

1 min read

White Paper: From Design to Live Operation: Wiwynn’s L12 AI Cluster Deployment with MLPerf Validation

Deploying large-scale AI clusters introduces engineering challenges that extend well beyond the individual server rack. From liquid cooling...

Read More
White Paper: AI Rack Management with Wiwynn UMS

1 min read

White Paper: AI Rack Management with Wiwynn UMS

This paper discusses the rapid expansion of AI workloads and the resulting transformation in data center infrastructure requirements. Traditional...

Read More