Matthew Fitzgerald

ML Engineer | LLM Fine-Tuning & Real-Time Inference | MLOps

End-to-end ML engineering: model development, real-time inference, and the platform layer that makes it reliable at scale.

Education

Florida Tech - B.S. in Computer Science, 2021-2024

Work Experience

ML Ops Engineer at Cognitive Network Solutions - February 2025 - November 2025

  • Designed and deployed multi-cloud infrastructure (GCP + Azure) with Terraform, enabling scalable GPU node pools, secure storage, service accounts, cloud authentication and VPC networking
  • Built and maintained Kubernetes platforms with Helm deployments, Ingress controllers, and namespace isolation for reproducible service delivery
  • Engineered CI/CD pipelines with GitLab runners, automating builds, scans, and deployments while embedding secrets management and least-privilege IAM practices
  • Implemented GPU-accelerated ML workflows (TensorFlow, PyTorch, MLflow) for inference and reinforcement learning, with auto-scaling to optimize costs
  • Developed and secured databases with role-based access controls and integrated them into microservices securely
  • Established monitoring/observability stacks (Prometheus, logging, health probes) to ensure proactive debugging, system reliability, and performance tuning

Junior Fullstack Engineer at EarthCam - February 2026 - May 2026

  • Optimized a core customer-facing component from the ground up, achieving a 3–5x improvement in load time
  • Integrated data querying APIs across the full application, improving data resolution and reducing fetch overhead
  • Delivered new pages and feature work across the stack in TypeScript and Node.js

Software Engineer, Intern at Dfinitiv.io - Summer 2023, 2024

  • Engineered secure, cloud-native pipelines on AWS and GCP to automate ingestion and curation of digital media assets, reducing processing time by over 60%
  • Built and maintained asset metadata databases in PostgreSQL and MongoDB, enabling fast, reliable querying across thousands of records
  • Deployed applications and microservices using boto3, google-cloud-storage, psycopg2, and pymongo, ensuring scalability and portability

About Me

I'm an ML Engineer with a strong foundation in the infrastructure and platform layer that makes machine learning viable at scale.

At Cognitive Network Solutions, I designed and deployed multi-cloud infrastructure across GCP and Azure, built Kubernetes platforms for reproducible ML service delivery, and engineered CI/CD pipelines with embedded security and least-privilege IAM practices from the ground up. I worked hands-on with GPU-accelerated ML workflows using TensorFlow, PyTorch, and MLflow, building the deployment and observability infrastructure that keeps inference systems reliable in production.

I'm currently expanding deeper into the modeling side: fine-tuning LLMs for domain-specific tasks, building real-time inference pipelines with drift detection, and developing end-to-end ML systems from training through production serving. My infrastructure background means I can own the full lifecycle, not just the model.

Previously at Dfinitiv, I built cloud-native data pipelines on AWS and GCP to automate media asset workflows, cutting processing time by over 60%.

I'm drawn to ML engineering because the best systems require both: models that work and infrastructure that keeps them working.

Skills

Python Go TypeScript Bash / Shell MLflow PyTorch TensorFlow Hugging Face scikit-learn MLX LLM Fine-Tuning Pandas NumPy Kubernetes Docker Helm Terraform GCP (GKE / Cloud Run) Azure (AKS / ACR) AWS (EC2 / Lambda / S3) GitLab CI/CD GitHub Actions Prometheus / Grafana IAM & Secrets Management Kafka Event Streaming Distributed Systems Container Orchestration PostgreSQL MongoDB Redis BigQuery ETL Pipelines FastAPI Flask gRPC RESTful APIs Infrastructure as Code CI/CD Automation System Observability Linux Systems Git

Projects

LLM Fine-Tuning Pipeline

Fine-tuned Mistral-7B on financial news with LoRA adapters, served as a streaming inference API with SSE token delivery and an automated ROUGE-L evaluation gate.

Libraries Used: MLX, mlx-lm, FastAPI, LoRA, Hugging Face

View Repository

ML Platform

Feature store and model registry backed by PostgreSQL and Redis, with scheduled feature pipelines, data quality validation, and MLflow experiment tracking.

Libraries Used: FastAPI, PostgreSQL, Redis, MLflow, APScheduler, SQLAlchemy

View Repository

ML Drift Monitor

Real-time drift detection service using ADWIN on live inference streams, with scheduled retraining, webhook alerting, and automatic model promotion via the registry.

Libraries Used: FastAPI, River, MLflow, APScheduler, PostgreSQL

View Repository

RAG Pipeline

Hybrid retrieval pipeline combining BM25 keyword search with dense vector retrieval, cross-encoder reranking, SSE token streaming, and per-sentence citation tracking.

Libraries Used: FastAPI, ChromaDB, BM25, sentence-transformers, mlx-lm

View Repository

LLM Agent

ReAct-style agent with tools spanning a RAG pipeline, feature store, and drift monitor, with multi-turn conversation memory and live SSE streaming of each reasoning step.

Libraries Used: FastAPI, mlx-lm, SQLAlchemy, PostgreSQL, httpx

View Repository

LLM Guardrails

Security proxy layer in front of the LLM agent enforcing semantic injection detection, PII scrubbing, per-client rate limit tiers, replay protection, and full audit logging.

Libraries Used: FastAPI, Redis, PostgreSQL, sentence-transformers, SQLAlchemy

View Repository