Mihai-Valentin Curelea — DevOps Engineer
Most organisations have a system that works until it doesn't. The people who understood why it was built that way have left, the architecture accumulated decisions that made sense at the time, and nobody has looked at the whole thing end to end in years. That's usually when I get called.I come in, map what actually exists - not what the diagrams say - find where the real breaking points are, and give leadership a clear picture of what needs to change and why. Sometimes I make those changes myself. Sometimes with the team. Depends on what the situation needs. Most recently I cut cloud costs by 25% for a Fortune 500 company. At Meta, I scaled the self-healing infrastructure platform for the whole Facebook fleet as it grew from 4 to 18 data-centers globally. I've done this type of work at Meta, Datadog, and other Fortune 500 companies. I co-authored a research paper on AI-based root cause analysis presented at ACM Sigmetrics. I've built open-source infrastructure used in 100K+ projects. I can walk into a system nobody fully understands and tell you, with precision, what's holding it together and what's about to break. If your delivery is slower than it should be, your costs are skyrocketing while growth has stagnated, your AI investments aren't paying off, or you're about to make a significant architecture decision and want someone who has seen how these go wrong - that's the conversation I'm useful for. Remote only. If that's your situation, send me a message.
Stackforce AI infers this person is a SaaS and B2B Infrastructure Specialist with extensive experience in cloud observability and system reliability.
Experience: 15 yrs 6 mos
Skills
- Cloud Infrastructure
- Solution Architecture
- Site Reliability Engineering
- Cloud Observability
- Service Level Objectives Management
- Infrastructure Scalability
- Infrastructure Provisioning
- Root Cause Analysis
- Web Development
- Fullstack Development
- Javascript Development
- Open Source Development
Career Highlights
- Reduced cloud costs by 25% for a Fortune 500 company.
- Scaled Facebook's self-healing infrastructure from 4 to 18 data centers.
- Co-authored a research paper on AI-based root cause analysis.
Work Experience
Remote Work
Principal Software Engineer & AWS Solutions Architect (3 yrs 9 mos)
Datadog
Senior Site Reliability Engineer (1 yr 3 mos)
Senior Site Reliability Engineer (1 yr 3 mos)
Tech Lead, Production Engineer (Site Reliability Engineer / Cloud Engineer) (8 mos)
Tech Lead, Production Engineer (Site Reliability Engineer / Cloud Engineer) (1 yr)
Tech Lead, Production Engineer (Site Reliability Engineer / Cloud Engineer) (3 yrs 4 mos)
1&1 Internet, Inc.
Fullstack Software Architect (NodeJS) (2 yrs 4 mos)
Fullstack Software Architect (NodeJS) (2 yrs 4 mos)
Senior PHP/JavaScript developer (3 yrs 9 mos)
Adobe
Senior JavaScript & NodeJS Software Engineer (5 mos)
Open Source
Author & Lead Developer (1 mo)
Hippotomate - Supercharge your automated testing development & debugging
Author of Open Source App (1 mo)
Image Pro WordPress Plugin
Author of Open Source WordPress Plugin (3 yrs)
RCS & RDS
Web developer (1 yr 4 mos)
MultiACT Media
Web developer (2 yrs 1 mo)
vWorker
Freelancer on vWorker (ex RentAcoder) (1 yr)
Education
Machine Learning at Stanford University
Bachelor of Engineering at University POLITEHNICA of Bucharest