Matthew Fitzgerald

ML Infrastructure Engineer | MLOps & Platform Engineering | Cloud-Native Distributed Systems

Building the infrastructure that makes ML systems reliable, scalable, and production-ready.

Education

Florida Tech - B.S. in Computer Science, 2021-2024

Work Experience

ML Ops Engineer at Cognitive Network Solutions - February 2025 - November 2025

  • Designed and deployed multi-cloud infrastructure (GCP + Azure) with Terraform, enabling scalable GPU node pools, secure storage, service accounts, cloud authentication and VPC networking
  • Built and maintained Kubernetes platforms with Helm deployments, Ingress controllers, and namespace isolation for reproducible service delivery
  • Engineered CI/CD pipelines with GitLab runners, automating builds, scans, and deployments while embedding secrets management and least-privilege IAM practices
  • Implemented GPU-accelerated ML workflows (TensorFlow, PyTorch, MLflow) for inference and reinforcement learning, with auto-scaling to optimize costs
  • Developed and secured databases with role-based access controls and integrated them into microservices securely
  • Established monitoring/observability stacks (Prometheus, logging, health probes) to ensure proactive debugging, system reliability, and performance tuning

Junior Fullstack Engineer at EarthCam - February 2026 - May 2026

  • Optimized a core customer-facing component from the ground up, achieving a 3–5x improvement in load time
  • Integrated data querying APIs across the full application, improving data resolution and reducing fetch overhead
  • Delivered new pages and feature work across the stack in TypeScript and Node.js

Software Engineer, Intern at Dfinitiv.io - Summer 2023, 2024

  • Engineered secure, cloud-native pipelines on AWS and GCP to automate ingestion and curation of digital media assets, reducing processing time by over 60%
  • Built and maintained asset metadata databases in PostgreSQL and MongoDB, enabling fast, reliable querying across thousands of records
  • Deployed applications and microservices using boto3, google-cloud-storage, psycopg2, and pymongo, ensuring scalability and portability

About Me

I'm an ML Infrastructure Engineer specializing in the platform and DevOps layer that makes machine learning viable at scale. At Cognitive Network Solutions, I designed and deployed multi-cloud infrastructure across GCP and Azure, built Kubernetes platforms for reproducible ML service delivery, and engineered CI/CD pipelines that embedded security and least-privilege IAM practices from the ground up.

I've worked hands-on with GPU-accelerated ML workflows using TensorFlow, PyTorch, and MLflow, building the deployment and observability infrastructure that keeps inference systems reliable in production. My focus has consistently been on the operational layer: provisioning cloud resources with Terraform, enforcing access controls, and establishing monitoring stacks that surface issues before they become incidents.

Previously at Dfinitiv, I built cloud-native data pipelines on AWS and GCP to automate media asset workflows, cutting processing time by over 60%.

I'm drawn to the infrastructure side of ML because it's where reliability is actually built. Models are only as good as the systems that serve them.

Skills

Python Go TypeScript Bash / Shell MLflow PyTorch TensorFlow Hugging Face Pandas NumPy Kubernetes Docker Helm Terraform GCP (GKE / Cloud Run) Azure (AKS / ACR) AWS (EC2 / Lambda / S3) GitLab CI/CD GitHub Actions Prometheus / Grafana IAM & Secrets Management Kafka Event Streaming Distributed Systems Container Orchestration PostgreSQL MongoDB Redis BigQuery ETL Pipelines FastAPI Flask gRPC RESTful APIs Infrastructure as Code CI/CD Automation System Observability Linux Systems Git

Projects

Playing Card Recognition

Image recognition system that identifies playing cards in real time and exposes predictions through a simple inference interface.

View Repository

Wildlife Recognition Model

Deployed image classification model for identifying wildlife images with a clean demo interface.

View Repository

System Metrics Pipeline

Backend service that collects and exposes live CPU and memory metrics through an API and monitoring dashboard.

View Repository

Travel Offers Scraper

Data ingestion tool that continuously collects airline promotions and exposes structured results for downstream systems.

View Repository

Stock Price Tracker

Full-stack application that fetches real-time stock data and displays price changes through a simple web interface.

View Repository

Recipe Finder Web App

Full-stack web app that allows users to submit ingredients and receive dynamically ranked recipes through a clean UI.

View Repository