Amir's Face

Amir Karimi

Infrastructure & DevOps Consultant for cloud-native teams
20 years of engineering experience with a deep focus on cloud infrastructure, DevOps, and Site Reliability Engineering, building scalable and highly available systems for clients ranging from early-stage startups to and .

Services


Cloud Infrastructure & Infrastructure as Code

Design and implement scalable, secure, and reproducible cloud infrastructure using Infrastructure as Code tools such as Terraform and AWS CDK. Establish well-structured AWS accounts, networking, and environments that teams can evolve safely.

Site Reliability & Observability

Establish SRE practices, define SLOs and error budgets, and build observability into systems through metrics, logging, alerting, and dashboards. Improve incident response and reduce mean time to recovery.

CI/CD & Release Automation

Build and optimize continuous integration and delivery pipelines that make releases fast, repeatable, and low-risk. Automate testing, builds, and deployments so teams can ship confidently and frequently.

Cloud Migration & Cost Optimization

Plan and execute migrations to the cloud with minimal disruption, including multi-region disaster recovery. Review existing infrastructure to improve scalability, reliability, and cost efficiency.

Platform Engineering & Developer Enablement

Build internal platforms, shared infrastructure, and tooling that let product teams self-serve and move quickly. Provide technical guidance on platform and cloud infrastructure across multiple teams.

Architecture Review & Scalability

Analyze system architectures and prepare them for high-throughput launches through capacity planning, performance testing, and chaos engineering. Identify and remove reliability and scalability bottlenecks.

Skills


DevOps
SRE
System Design
Distributed Systems
Serverless
AWS
CDK
Terraform
Docker
Kubernetes
GitHub Actions
Linux
ETL
Python
Go
Scala
TypeScript

Positions


Fractional Head of Engineering | Self Employed
Jun 2023
Vancouver, Canada

I help startups build and operate reliable cloud infrastructure, often as a fractional infrastructure or DevOps lead:

  • Designing cloud architectures and Infrastructure as Code foundations that small teams can own and evolve.
  • Setting up CI/CD pipelines, deployment automation, and observability so teams can ship quickly and safely.
  • Bridging the gap between product teams and infrastructure work, and mentoring in-house engineers on DevOps and SRE practices.

Refer to the Projects section for detailed information on successfully executed projects.

Software Engineer | Amazon
Nov 2022 - Jun 2023
Vancouver, Canada
  • One of the main contributors to re-architecting and improving large-scale data processing pipelines processing petabytes of data daily, reducing the number of jobs from around 15 to 1.
  • Became a project lead within two months of joining, driving the infrastructure and platform decisions for the team.
  • Worked within strict privacy and compliance constraints while keeping data infrastructure scalable and maintainable.
Tech Lead / Principal Software Architecture | Acceptto
May 2019 - Nov 2022
Vancouver, Canada
  • Increased the availability and scalability of the company's core services by leading the migration to AWS and adding multi-region disaster recovery support, implemented with Terraform.
  • Led the redesign of the next generation of Acceptto's SSO microservice to make it horizontally scalable and highly available in the cloud.
  • Built a horizontally scalable, full-duplex communication service for the customer directory agent, reducing customer onboarding time from days to hours.
  • Promoted to lead the core engineering team of 5 engineers, owning architecture and infrastructure decisions.
  • Acceptto was acquired by SecureAuth in Nov 2021.
Site Reliability Engineer | Disney Streaming Services
Mar 2018 - Mar 2019
Manchester, UK
  • One of the main contributors to establishing SRE practices within Disney Streaming Services as the 4th member of the newly formed SRE team.
  • Increased service availability by implementing transparent cross-region replication for AWS Kinesis resources (Kinesis did not offer such a capability at the time).
  • Helped three different teams within a year prepare for large launches handling thousands of requests per second through architecture reviews, high-throughput performance testing, chaos engineering, and reliability tooling.
Software Engineer | Disney Streaming Services
Apr 2017 - Mar 2018
Manchester, UK
  • Developed scalable microservices capable of handling tens of thousands of requests per second on AWS, using Scala, Play, DynamoDB, AWS Lambda, Kinesis, SQS, and S3.
  • Contributed to the design and implementation, including the AWS infrastructure, of a new subscription system for BAMTech Media, later used in Disney+.
  • The original company name was Cake Solutions. It was acquired by BAMTech and then Disney in 2017.
Independent Software Engineer / Consultant | Self Employed
Jan 2006 - Mar 2017
Tehran, Iran
  • Designed, built, and operated more than 40 custom software projects, including highly scalable web applications and the infrastructure they ran on.
  • Built and deployed systems serving tens of thousands of users per day on modest hardware by leveraging non-blocking IO and async programming.
  • Worked directly with clients from corporations to individual entrepreneurs as a contractor and partner.
Software Developer | Behsad
Feb 2003 - Dec 2005
Arak, Iran
  • Software developer in a small team building desktop and database applications using C++, C#, .NET, and MSSQL.
  • Mentored and led a team of junior developers.

Featured Projects


SiriusXM commerce nextgen platform infrastructure
Aug 2023
  • Designed and implemented platform infrastructure supporting 13 microservices in the commerce domain.
  • Provided technical guidance on platform and cloud infrastructure to 5 teams comprising over 20 members, enabling seamless adoption and scalability.
  • Led and mentored a team of two DevOps engineers, fostering skill development and efficient delivery.

CDK, TypeScript, Python, Scala, GitHub Actions, AWS

Re-architecting the SSO and migrate to cloud
Nov 2020 - Jun 2021

Led the redesign and implementation of the next generation of Acceptto's SSO microservice to make it horizontally scalable and highly available. The old SSO module was a web server that had to be run and maintained on the customer site. The new version was designed to run in the cloud and connect to the customer's user directory using an agent. This project reduced customer onboarding time from days to hours.

AWS, Terraform, Ruby, VueJS, JavaScript

AWS migration and multi-region disaster recovery
Jan 2020 - Nov 2020

Increased the availability and scalability of Acceptto's core services by leading the migration to AWS and adding multi-region disaster recovery support. The entire infrastructure was defined and managed as code using Terraform, making environments reproducible and the platform resilient to regional outages.

AWS, Terraform, Docker, Ruby

LDAP Agent and Switchboard
Sep 2020 - Feb 2021

Created a solution for Acceptto that allowed their cloud backend to communicate securely with customers' Active Directory. I managed the project (team of two) and contributed to the development directly. It was delivered in two phases and the final architecture removed the need for firewall configurations and load balancing on the customer side.

Go, Ruby, VueJS, TypeScript, JavaScript

Re-architecture of the ML big-data data pipelines
Dec 2022 - Jun 2023

One of the two main contributors to re-designing and implementing a new version of a data pipeline that simplified the architecture and enabled faster evolution. The pipeline handles petabytes of data daily for a strategic Amazon Advertising product. The team leveraged Iceberg and other technologies and refactored existing Spark jobs into a cleaner design, reducing the number of jobs from 15 to 1 for a module.

Python, Scala, AWS, Spark/PySpark, ETL, PyTorch

Launch SRE
Apr 2018 - Apr 2019

As the 4th member of the newly founded SRE team at Disney Streaming Services, helped three different development teams within a year prepare for large launches that required handling thousands of requests per second. This involved reviewing the architecture, building high-throughput performance tests, chaos engineering, and building the tools and processes needed to improve reliability.

Python, Go, Scala, Kafka, Ruby

Subscription system
Apr 2017 - Apr 2018

As part of a team of four, contributed directly to the design and implementation of a new subscription system for BAMTech Media, as well as its AWS infrastructure. The system was later used in Disney+ and handled tens of thousands of requests per second.

Scala, AWS, Docker

Pluggable Authentication Module (PAM) to add MFA support to Linux services
Jan 2014 - Mar 2014

Implemented a Pluggable Authentication Module (PAM) that integrates with a multi-factor authentication service. It allows adding MFA to all Linux services supporting PAM, such as SSH or local user logins.

C/C++, Linux

Education


BSc in Computer Science | Azad
2005 - 2007
Arak, Iran
  • Created a full-featured messenger system from scratch, including a custom binary protocol written in C++.
  • Gave a few talks about Computer Networks & Socket Programming.
AEng in Computer Software | Elmi-Karbordi
2003 - 2005
Arak, Iran
  • Developed a boot loader in an effort to develop a basic operating system. Learned working with IO, direct screen memory access handling, memory management (used linked-list data structure), re-implementing some functions of the C standard library from scratch.
  • Was selected as the top 3 students to participate in ACM Asian regional contest.
  • The youngest student attending this university at age 17.

Certifications


AWS Certified Developer | AWS
Jun 2017

License: F4YBGRB2KNV11PWB

Stripe Certified Professional Developer | Stripe
Jul 2023

Credential ID: 76849733