Who are We?
Trumid is a financial technology company building tomorrow’s credit trading network. We combine agile technology, intuitive design, and deep market expertise to deliver differentiated trading solutions. With an expanding network of more than 600 leading buy side and sell side institutions, Trumid is optimizing the fixed income trading experience.
Our business model thrives on participation and connection, and so does our company culture. We believe in collaborative innovation and solving for fun. Working together to achieve common goals and finding joy in pushing into unexplored areas and new ways of thinking. Learn more at www.trumid.com.
What will you do in this role?
The Senior Site Reliability Engineer (SRE) will bring deep expertise designing and supporting highly-scalable, highly-available infrastructure and applications in Kubernetes, as well as promoting microservice design patterns in complex working environments within the cloud. This role will serve as a subject matter expert on all aspects of our containerized deployments, including deployment, configuration, scaling, and upgrades. The ideal candidate will be passionate about mentoring other team members and departments on the adoption of new technologies and design principles, as well as promoting DevOps culture and collaboration. This role will also work closely to ensure deployments are successful in both production and non-production environments.
- Troubleshoot complicated, cross platform issues handling OS, AWS, networking and databases
- Work closely with Development, QA and Production Support teams to make sure releases are on time and successful
- Ensure the reliability and security of the infrastructure while building proactive dynamic monitoring, alerting and metrics solutions to make sure each environment is meeting the SLA requirements
- Build infrastructure in both AWS and GCP using Terraform
- Seek to minimize or eliminate manual hand-offs and to also link all automated workflows
- Support the Kubernetes application/infrastructure in both production and non-production environments
- Establish and test disaster recovery policies and procedures
- Responsible for resiliency and scalability of the infrastructure
- Track and apply all required patches
- Demonstrate experience in the creation and management of technical documentation
Skills & Qualifications
- BA in Computer Science or Information Systems or combination of education and related work experience
- 2+ years of Site Reliability experience (SRE)
- 2+ years of DevOps experience
- 2+ years with Kubernetes experience
- 2+ years with Cloud Platform experience, AWS and GCP
- 2+ years with Production infrastructure experience
- Proven experience to automate routine repeatable tasks
- Strong sense of ownership, ability to work independently and proven track record of driving products and changes
- Strong experience in production support and operations
- Strong experience in monitoring application / infrastructure performance and availability while creating metrics for management use
- Strong experience in Terraform, Jenkins, Bazel, Linux, Docker, Helm, Elasticsearch, Prometheus
- Strong automation, problem-solving skills, and ability to follow through to completion
- Ability to wear multiple hats and multitask effectively in a fast paced environment
- Capable of working independently as well as part of a group
Just a few perks our employees enjoy!
- Highly competitive compensation
- Fully paid medical, dental and vision coverage
- Team-oriented and collaborative company culture
- Flexibility for in-office and work from home arrangements
Trumid is an equal opportunity employer.
In compliance with New York City Pay Transparency Law, the base salary range for this role in New York City is between $175,000 - $250,000. This range does not include discretionary bonus or other forms of compensation or benefits offered in connection with this job. Several factors are considered when determining a candidate’s compensation.