David Ponessa

SRE (Site Reliability Engineer)

Amsterdam, North Holland, Netherlands19 yrs 5 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in Site Reliability Engineering and DevOps practices.
Proficient in large scale data processing and cloud infrastructure.
Strong background in Linux systems administration and automation.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with a strong focus on Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability EngineeringApache Flink

Other Skills

AIXAWSAmazon Web Services (AWS)AnsibleApache KafkaBackup ManagementBash scriptingBig Data ProcessingCI/CDCloud ComputingDNSDockerDomain Name System (DNS)Elastic Stack (ELK)ElasticSearch

About

I have been working as a Linux systems administrator for a long time, and I quickly moved to become a DevOps Engineer/SRE, striving to design systems that implement high levels of observability, scalability and high availability. I have become comfortable coding in Java, and also use Python sometimes as a swiss army knife. I prefer all things open-source, always. Best solutions should be secure, elegant, and simple. Horizontal scaling should always be preferred, development and testing environments must be dynamic in nature. Every IT solution needs to be prepared to be reinvented in the not-so-long term. I've been a physics/astronomy nerd, always, and I also meet the "geek" classification quite easily. I've discovered throughout my professional career that I have abilities managing small groups of people, and in my experience, no team that qualifies as "great" grows much bigger than a dozen or so people.

Experience

Booking.com

3 roles

Senior Site Reliability Engineer

Promoted

Aug 2022 – Present · 3 yrs 7 mos

Responsible for architecture, design, development and operations of a large scale data processing platform built to deliver security as a service. This involves dealing with large volumes at data in flight/at rest (Gigabytes per second, Petabytes at rest), and intertwined systems to perform data filtering, transformations, async and sync data enrichment, and aggregations, at processing time, using Apache Flink, Apache Kafka, Elasticsearch-Logstash-Kibana, hdfs/gcs/s3. In doing so I work extensively with Java in Flink and predecessor custom code for consumer/producer applications, and development of a framework to abstract the need of code knowledge to implement transformations and aggregations from end users (data analysts, data scientists, etc).
I am also responsible for ensuring observability and reliability of the entire service and components, using a variety of automation tools such as Puppet, Ansible, Terraform, GitLab, and custom scripting in shell or python, as well as design of CI/CD pipelines to improve velocity, and overall design for fast idempotent reliable delivery.
The platform underneath consists of a mix of Public Cloud, SaaS, and most of the actual processing layer sitting on kubernetes, where orchestration is a mix of kubernetes (deployments, services, CRDs), helm and in-house orchestration solutions.

Apache FlinkSite Reliability EngineeringJavaElasticsearchAnsibleTerraform+2

Site Reliability Engineer

Feb 2021 – Aug 2022 · 1 yr 6 mos

Responsible for architecture, design, development and operations of a large scale data processing platform built to deliver security as a service. This involves dealing with large volumes at data in flight/at rest (Gigabytes per second, Petabytes at rest), and intertwined systems to perform data filtering, transformations, async and sync data enrichment, and aggregations, at processing time, using Apache Flink, Apache Kafka, Elasticsearch-Logstash-Kibana, hdfs/gcs/s3. In doing so I work extensively with Java in Flink and predecessor custom code for consumer/producer applications, and development of a framework to abstract the need of code knowledge to implement transformations and aggregations from end users (data analysts, data scientists, etc).
I am also responsible for ensuring observability and reliability of the entire service and components, using a variety of automation tools such as Puppet, Ansible, Terraform, GitLab, and custom scripting in shell or python, as well as design of CI/CD pipelines to improve velocity, and overall design for fast idempotent reliable delivery.
The platform underneath consists of a mix of Public Cloud (Dataflow, BigQuery), BareMetal (Petabyte scale data in Elasticsearch plus indexers), and most of the actual processing layer sitting on kubernetes, where orchestration is a mix of kubernetes (deployments, services, CRDs), helm and in-house orchestration solutions.

Apache FlinkSite Reliability EngineeringJavaElasticsearchAnsibleTerraform+2

Linux Systems Engineer

May 2018 – Feb 2021 · 2 yrs 9 mos

Responsible for architecture, design, development and operations of a large scale data processing platform built to deliver security as a service. This involves dealing with large volumes at data in flight/at rest (Gigabytes per second, Petabytes at rest), and intertwined systems to perform data filtering, transformations, async and sync data enrichment, and aggregations, at processing time, using Apache Flink, Apache Kafka, Elasticsearch-Logstash-Kibana, hdfs/gcs/s3. In doing so I work extensively with Java in Flink and predecessor custom code for consumer/producer applications, and development of a framework to abstract the need of code knowledge to implement transformations and aggregations from end users (data analysts, data scientists, etc).
I am also responsible for ensuring observability and reliability of the entire service and components, using a variety of automation tools such as Puppet, Ansible, Terraform, GitLab, and custom scripting in shell or python, as well as design of CI/CD pipelines to improve velocity, and overall design for fast idempotent reliable delivery.
The platform underneath consists of a mix of Public Cloud (Dataflow, BigQuery), BareMetal (Petabyte scale data in Elasticsearch plus indexers), and most of the actual processing layer sitting on kubernetes, where orchestration is a mix of kubernetes (deployments, services, CRDs), helm and in-house orchestration solutions.

Apache FlinkSite Reliability EngineeringJavaElasticsearchAnsibleTerraform+2

Doctor.com

DevOps

Mar 2017 – Apr 2018 · 1 yr 1 mo · Argentina

Manage the entire IT infrastructure supporting Doctor.com services in a full DevOps position.
Full lifecycle of EC2, RDS, Elasticsearch, ECS instances to support all applications, mostly LAMP stack, in separate environments.
Architecture, design and implementation of self-managed Kubernetes to start migrating to microservices.
Design and implementation of CI/CD pipelines for the complete business logic, using jenkins.

Site Reliability EngineeringKubernetesCI/CDJenkinsAWSElasticsearch

Atos

Technical Supervisor

Jul 2015 – Apr 2018 · 2 yrs 9 mos

Linux and Solaris Systems Administration Senior with focus on: Storage Management, Security compliance, deployment of new systems through kickstart/jumpstart technoogy (cobbler/foreman), Backup management and configuration, Networking (TCP/IP, NFS, clustering), Incident management - Change Management - RCA investigation (ITIL process), Archtecture testing and planning for *nix general platforms, Procedure design, testing and implementation with focus on developing proven, supported work instructions for continuous improvement, Disaster Recovery support planning and execution, private cloud experience and management, virtualization (vmware/kvm/xen).
Developed knowledge in AWS, Puppet, Docker and up and coming DevOps technologies. Built infrastructure solutions based on RedHat Enterprise Virtualization, Ansible, Satellite 6/Foreman.
Writing of Ansible roles for config automation and deployment.
Managed and monitored all installed systems and infrastructure to ensure the highest level of availability.
Installed, configured, tested and maintained operating systems, application software and system management tools.
Defined enterprise processes and best practices and tailored enterprise processes for applications.
Monitored and tested application performance to identify potential bottlenecks, develop solutions, and collaborate with developers on solution implementation.
Wrote and maintained custom scripts to increase system efficiency and performance time.
Designed and implemented system security and data assurance.
Provided 2nd and 3rd level technical support and troubleshooting to internal and external clients.
Created ample procedure documentation for newly adopted technologies.

Site Reliability EngineeringLinuxSolarisAnsiblePuppetDocker

Acs, a xerox company

Infrastructure Analyst Senior

Nov 2011 – Jun 2015 · 3 yrs 7 mos · Argentina

Linux and Solaris Systems Administration Senior
Storage Management
Security Management
Server building
Print queue administration
Monitoring/Auditing/Performance Tools installation and configuration
Backup management and configuration
Networking management
Incident management
Change Management
RCA investigation
Archtecture testing and planning for *nix general platforms
Procedure design, testing and implementation with focus on developing proven, supported work instructions for Bussiness As Usual pocess
Disaster Recovery support, planning and execution
Simple cloud management
VMWare operation

LinuxSolarisNetworkingBackup ManagementSecurity ManagementSite Reliability Engineering

Ibm global business services

Unix Systems Administrator

Jun 2010 – Nov 2011 · 1 yr 5 mos · Argentina

System Administration of HP-UX, AIX, Solaris and Linux OSes, with focus on monitoring and administration software tools (TIVOLI, IBM Director, HP-SIM, Sun MC and others).
Tasks performed:
Linux (RHEL, SLES, Debian), Solaris 8/9/10, AIX, HP-UX administration.
Tivoli Endpoint and Tivoli Management Region administration.
Tivoli Usage and Accounting Manager (TUAM) administration.
Tivoli Security Compliance Manager (TSCM) administration.
Tivoli Application Dependency Discovery Manager (TADDM) administration.
Server Resource Monitoring (SRM) administration.
IBM Systems Director 6.2 administration.
HP Systems Insight Manager administration.
Sun Management Center administration.
Project deployment and management.

LinuxSolarisNetworkingBackup ManagementSecurity ManagementSite Reliability Engineering

Hp enterprise services

MidRange Coverage Systems Administrator

Sep 2009 – Jun 2010 · 9 mos · Argentina

First level support of midrange server spectrum and application facilities. Management and support of 9000+ *nix and wintel servers.
Daily work included:
Unicenter Operation.
Linux (RHEL) and Solaris administration.
Network troubleshooting.
Incident Management.
Change Installation and Management.
Technical Team Management.

LinuxSolarisHP-UXAIXMonitoring Tools

Universidad nacional de cordoba

Teacher Assistant - 2nd category

Mar 2007 – Feb 2009 · 1 yr 11 mos

Assistant teacher in Mathematical analysis and General Physics.
Academic consultant and lab assistant.
Performed as teacher in:
Calculus and Algebra
Classical Mechanics
Thermodynamics
Electrodynamics

LinuxNetworkingWeb Server Administration

Grupo de teoría de la materia condensada - famaf - unc

Systems and Network Administrator

Sep 2006 – Sep 2009 · 3 yrs

Mainteinance of systems and network administration for the computing systems and workstations of the group.
These included:
Architecture design.
GNU/Linux under Red Hat and Debian administration, mainteinance and installation.
Storage administration and planning.
Backup administration.
Network security and administration.
Web Server deployment and administration.
Mail Server deployment and administration.
Performance and capacity planning, as well as enhancement.