#ML, #internship

Electrochemical Multimodal LLM

Published Sep 1, 2025 by Noah Cauchi

Graph Peak Detection Comparison between gpt4.1 mini and my model

Graph Peak Detection Comparison between gpt4.1 mini and my model


Democratizing Electrochemical Analysis with Vision-Languageg AI - From 5% to 95% Accuracy

Timeline: 10 weeks Role: Applied ML Intern Team: Fischell Institute of Biomedical Devices

🎯 Project Overview

What: Fine-tuned Qwen2.5-VL to analyze electrochemical graphs (CVs/DPVs), extracting quantitative features and providing scientific interpretations for integration into autonomous laboratory systems.
Why: Electrochemistry offers unparalleled speed and accuracy for chemical measurements, making it highly desirable across research and industry. However, the specialized and nuanced analysis required to interpret the data creates a major bottleneck. This model solves that barrier, enabling both human researchers and agentic systems to leverage electrochemistry’s full potential.
Impact: Achieved 95% accuracy on peak detection and electrochemical question-answering, creating a robust tool for autonomous experimental workflows and data interpretation pipelines.

Quantified Performance


🔧 Technical Implementation

Key Technologies & Tools

  • Model & Framework: Fine-tuned multimodal Qwen2.5-VL using ms-swift framework
  • Training Techniques: Supervised Fine-Tuning (SFT), LoRA for compute efficiency, Full ViT tuning for visual understanding
  • Infrastructure: Remote hardware with >600 TFLOPS compute capacity
  • Data Pipeline: Python scripts and web tools using templating, augmentation, and dataset mixing strategies
Data Pipeline

Technical Approach

Developed a vision-language model that bridges electrochemical graph analysis with scientific reasoning. The model analyzes graphs for peak detection, curve comparison, anomaly detection, and concentration trends - processing visual features through fine-tuned encoders while maintaining chemical knowledge through language components.

Built comprehensive infrastructure including:

Evaluation Tool


🚧 Challenges & Problem-Solving

Challenge: Limited raw electrochemical data

Solution: Built sophisticated data pipeline combining vision-language templating, automated augmentation, and synthetic generation. Developed web applications for efficient expert annotation and quality control. Scaled from limited raw data to 25,000 high-quality training samples through strategic augmentation while maintaining scientific validity.
Skills Demonstrated: Data engineering, web development, quality assurance systems

Challenge: Ensuring reliable integration with autonomous systems

Solution: Created structured output formats with confidence scores and error bounds. Implemented response parsing that extracts numerical values, peak coordinates, and interpretations in machine-readable formats for downstream agents.
Skills Demonstrated: API design, system integration, reliability engineering

Challenge: Understanding and improving model failures

Solution: Built comprehensive evaluation suite that visualizes model responses, tracks failure modes across different electrochemical scenarios, and identifies systematic errors. Enables rapid iteration by pinpointing exactly where and why the model struggles.
Skills Demonstrated: Error analysis, visualization tools, debugging complex systems

Some Analysis Metrics

📊 Results & Impact

What’s Next

Expanding integration with autonomous lab systems for closed-loop experimentation. The evaluation suite continues to guide improvements, particularly for edge cases in complex multi-electron processes.


💡 Key Takeaways

Technical Skills Developed:

Data Pipeline Innovations:

Evaluation Framework:




*****

© 2025, Noah Cauchi