This position has been
FilledView Open positions
Senior Performance Engineer - Search Platform Team
- Belarus, Minsk
- Poland, Warsaw
The Platform team at Lucidworks builds the foundation of our cloud-native microservices architecture orchestrated by Kubernetes. The Platform team owns the design and implementation of our API gateway, security, cloud ops, workflows and job scheduling, Apache Spark integration, messaging framework (Apache Pulsar), and ML model ops / serving infrastructure (Seldon Core / Argo). Underlying all of these capabilities, we support a service mesh built on Spring Cloud that provides security, observability, resiliency, and reliability under massive load.
To be successful in this role, you need a DevOps mindset coupled with a solid foundation in designing and building microservices. We also want you to bring strong opinions, but weakly held about architecture and design decisions. You'll be expected to write code, lots of it. We support first-class product features written in Java, Python, Scala, and Go; you don't need to know all of these but must be proficient in at least two. You should be passionate about Kubernetes and cloud native technologies in general.
Every millisecond counts should be your daily mantra!
Flexibility is a must! In any given sprint, you may work on improving our service discovery mechanism, help design a performance / load test, solve a customer scalability problem, or solve a performance issue with ML model serving. Our team's motto is fast-paced without cutting corners so you should be comfortable with the team moving fast around and with you. This role reports directly to the Chief Architect.
- Implement robust network resiliency strategies between microservices
- Implement highly-scalable load balancing strategies in Kubernetes capable of handling 10's of thousands of requests per second
- Implement distributed tracing and metrics to understand the behavior of our service mesh under high load in cloud environments
- Use Java profilers to find inefficiencies in Java code; implement improvements to the code when you find issues
- Tune JVM performance, including GC tuning
- Automate performance and load test frameworks using Gatling and Spark
- Works closely with our Site Reliability Engineering (SRE) team to drive engineering improvements to better support operations, performance, and scalability
Required Skills & Qualifications:
- BS in computer science or similar field; Masters degree or higher preferred
- Mastery of Spring Boot, Git, Gradle, Jenkins, BASH, Python, SQL, Gatling, and Java
- A minimum 5 years experience with large-scale distributed systems
- Expert level understanding of Java concurrency, data structures, and distributed computing
- Experience with Spring Cloud, Ribbon, Open Feign, Hystrix or similar load-balancing / network resiliency libraries
- Minimum of 3 years experience using Java profilers to find and fix code inefficiencies (YourKit preferred)
- Solid understanding of Kubernetes, Helm, and Docker; interest and experience with Istio highly valued
- Experience with Prometheus and Grafana highly desired
- Resourcefulness – willing to jump in, work with both opportunity and constraint, and leverage existing resources to accomplish goals
- Team player - confident collaborating with a diverse community of people and personalities across geographies, backgrounds, and professional abilities
- Strong interpersonal, written, and communication skills
- Empathy and care for all stakeholders of Lucidworks, including employees, executives, partners, and guests