Remote site reliability engineer jobs

We, at Turing, are looking for site reliability engineers who will be responsible for automating solutions including capacity and performance planning, managing risks, disaster response, and on-call monitoring. Here’s your chance to work with elite U.S. companies and collaborate with top professionals across the globe.

Find remote software jobs with hundreds of Turing clients

Job description

Job responsibilities

  • Build software applications to help operations and support teams
  • Gather and analyze metrics to help in performance tuning and troubleshooting errors
  • Contribute to system design consulting, platform management, and capacity planning
  • Develop sustainable systems and services with automation and uplifts
  • Improve feature development speed and system reliability through optimization of on-call processes
  • Prepare documentation of historical knowledge concerning software development, support, IT operations, and on-call duties
  • Monitor application performance and keep the sites up and running

Minimum requirements

  • Bachelor’s/Master’s degree in Engineering, Computer Science, or IT (or equivalent experience)
  • At least 3+ years of experience as a site reliability engineer (rare exceptions for highly skilled engineers)
  • Proficient understanding of operating systems (Linux/Windows)
  • Expert knowledge of DevOps concepts and best practices
  • Expertise in CI/CD implementation
  • Hands-on experience in troubleshooting issues
  • Knowledge of one or more high-level programming languages like Python, Java, JavaScript, C/C++, Ruby, etc.
  • Experience with distributed storage technologies and dynamic resource management frameworks
  • Fluent in English to communicate effectively
  • Ability to work full-time (40 hours/week) with a 4 hour overlap with US time zones

Preferred skills

  • Working knowledge of code versioning tools such as Git
  • Proactivity in finding issues, performance bottlenecks, and areas for improvement
  • Passion for automation, coding skills, and software-centric mindset
  • Understanding of distributed computing, cloud-native applications, application monitoring, and database management
  • Excellent organizational and interpersonal skills

Interested in this job?

Apply to Turing today.

Apply now

Why join Turing?

Elite US Jobs

1Elite US Jobs

Turing’s developers earn better than market pay in most countries, working with top US companies.
Career Growth

2Career Growth

Grow rapidly by working on challenging technical and business problems on the latest technologies.
Developer success support

3Developer success support

While matched, enjoy 24/7 developer success support.

Developers Turing

Read Turing.com reviews from developers across the world and learn what it’s like working with top U.S. companies.
4.65OUT OF 5
based on developer reviews as of June 2024
View all reviews

How to become a Turing developer?

Work with the best software companies in just 4 easy steps
  1. Create your profile

    Fill in your basic details - Name, location, skills, salary, & experience.

  2. Take our tests and interviews

    Solve questions and appear for technical interview.

  3. Receive job offers

    Get matched with the best US and Silicon Valley companies.

  4. Start working on your dream job

    Once you join Turing, you’ll never have to apply for another job.

cover

How to become a Site Reliability engineer ?

As software development became faster and more complex, traditional software teams had trouble keeping up. To help with the transition of workflows from development to production applications, they introduced DevOps.

However, it became increasingly apparent that this system needed greater reliability and performance in order to stay competitive. This is where the field of site reliability engineering comes into play.

Site reliability engineering blends software engineering practices with information technology (IT) engineering practices to create highly reliable systems. Site reliability engineers are responsible for ensuring the reliability of all aspects of the full stack, from the front-end, customer-facing applications all the way through to the database and hardware infrastructure.

What is the scope in Site Reliability engineering?

The role of SRE (Systems and Release Engineer) is ideal for assessing the newest development in the DevOps world, expanding your knowledge and skills in high-demand areas such as infrastructure automation, release engineering, and continuous delivery. As an SRE, you’ll be highly creative, stimulated, and technically challenged every day.

Site reliability engineers are crucial to most organizations. These professionals are in high demand at successful tech companies that have large data centers and complex technical challenges. They can also be inspirational from both a financial and workplace culture perspective. Google considers them scarce resources.

What are the roles and responsibilities of a Site Reliability engineer?

Site reliability engineering (SRE) refers to software engineering approaches used by organizations to manage their IT operations. SRE teams use software tools as a way to automate operations and solve problems in a timely manner.
Software reliability engineers (SREs) are software engineers who have Unix systems administration, networking, and software engineering experience. SREs also have polished programming skills because they regularly use automation to reduce human labor and increase reliability.
Software Release Engineering (SRE) transfers the tedious work traditionally done by DevOps and operations teams to software engineers who can use automation and software to optimize processes.
Site reliability engineers spend half their time doing development work, and the other half doing operations duties, such as responding to outages and incidents and being on call.

The roles and responsibilities of a site reliability engineer include

  • Building software to help Operations and Support Teams
  • Conducting Post-Incident Reviews
  • Documenting the knowledge to ensure a seamless flow of information between teams
  • Implementing strategies to increase system reliability and performance through on-call rotation
  • Fix cases related to support escalation
  • Incorporate various software engineering aspects to develop and implement services that improve IT and support teams
  • Optimize the Software Development Life Cycle (SDLC) to boost service reliability

How to become a Site Reliability engineer?

You can become a site reliability engineer in the following ways:

  1. Bachelor's degree: It is mandatory for the developer to have a Bachelor’s degree or Master’s degree. This helps with growth in the software field and also aids in easy understanding of technical aspects of the job.
  2. 2+ years experience in operations or software engineering role: It helps if you have some previous experience working as a software engineer. This will give you an advantage over other candidates while trying for SRE positions.
  3. Required skills: You must have the following technical skills.
  • Experience with cloud-continuous deployment based software development lifecycles
  • Expertise in infrastructure automation technologies

Along with technical skills you must have a strong foundation of non-technical skills as well. What you need:

  • Excellent verbal and written communication skills
  • Strong problem-solving skills
  • Passion and curiosity for technology
  • Keenness to provide support for teams or customers.

Now let us discuss the skills and methods you will need to learn to become a successful site reliability engineer:

Interested in remote Site Reliability jobs?

Become a Turing developer!

Apply now

Skills required to become a Site Reliability engineer

Fundamental skills are important in helping you land high-paying site reliability engineer jobs. Here is what you need to know!

1. DevOps

DevOps refers to a set of practices that promote better collaboration and widespread automation of the processes happening between operational and development teams. It can be extended to other business units as well.

DevOps is a new cultural movement combining software development, operations, and engineering. It stimulates the adoption of agile practices that are continuous in nature and enable continuous delivery of small batches to customers.

2. Python

Python is easy to learn. It is a high-level, dynamic language with an interpreted structure to make debugging errors relatively painless. Which helps programmers rapidly develop working application prototypes. This feature has earned Python a reputation as a language well-suited for coding. Because Python supports cross-platform operating systems, it is a good choice for programmers. Especially those who do not want to spend time writing separate programs for different operating systems.

3. Go

Go was created for applications relating to network infrastructure and was intended to replace Java and C++. It is used in cloud-based or server-side (web) applications. With DevOps, site reliability automation, micro-controller programming, robotics, and games also common users of Go. Go is also used in the world of artificial intelligence and data science.

4. CI/CD

Continuous integration/continuous delivery (CI/CD) is a software development process in which code is automatically built and tested as new code is added. CI/CD can improve the effectiveness of a software team by reducing the risk of errors or defects and enabling automated deployments, freeing up time spent manually building, testing, or releasing software.

CI/CD introduces automated processes to integrate code and test in a continuous manner with delivery and deployment, replacing error-prone manual processes. CI/CD is supported by teams working together in an agile way, either with DevOps or SRE practices.

5. Version control

Version control or revision control systems help software developers keep track of changes to application code and manage the development of a single program by more than one person. Version control systems such as Git have the ability to create branches, where a developer can make a copy of an existing project and modify one or more files.

6. NoSQL databases

NoSQL databases are a class of database management systems (DBMSs) that do not rely on the traditional relational database management system (RDBMS) structure. NoSQL databases are purpose-built for specific data models, have flexible schemas for building modern applications, and are widely recognized for their ease of development and performance at scale. These databases use various data models for accessing and managing data, which makes them optimized specifically for applications that require large data volume, low latency, and flexible data models.

Interested in remote Site Reliability jobs?

Become a Turing developer!

Apply now

How to get remote Site Reliability engineer jobs?

Developers are a lot like athletes. In order to excel at their craft, they have to practice effectively and consistently. They also need to work hard enough so that their skills grow gradually over time. In that regard, there are two major factors that developers must focus on in order for that progress to happen: the support of someone who is more experienced and effective in practice techniques while you're practicing. As a developer, it's vital for you to know how much to practice - so make sure there is someone on hand who will help you out and keep an eye out for any signs of burnout!

Turing offers the best remote site reliability engineer jobs that suit your career trajectories as a site reliability engineer. Grow rapidly by working on challenging technical and business problems on the latest technologies. Join a network of the world's best developers & get full-time, long-term remote site reliability engineer jobs with better compensation and career growth.

Why become a Site Reliability engineer at Turing?

Elite US jobs
Career growth
Exclusive developer community
Once you join Turing, you’ll never have to apply for another job.
Work from the comfort of your home
Great compensation

How much does Turing pay their Site Reliability engineers?

At Turing, every site reliability engineer is allowed to set their rate. However, Turing will recommend a salary at which we know we can find a fruitful and long-term opportunity for you. Our recommendations are based on our assessment of market conditions and the demand that we see from our customers.

Frequently Asked Questions

Turing is an AGI infrastructure company specializing in post-training large language models (LLMs) to enhance advanced reasoning, problem-solving, and cognitive tasks. Founded in 2018, Turing leverages the expertise of its globally distributed technical, business, and research experts to help Fortune 500 companies deploy customized AI solutions that transform operations and accelerate growth. As a leader in the AGI ecosystem, Turing partners with top AI labs and enterprises to deliver cutting-edge innovations in generative AI, making it a critical player in shaping the future of artificial intelligence.

After uploading your resume, you will have to go through the three tests -- seniority assessment, tech stack test, and live coding challenge. Once you clear these tests, you are eligible to apply to a wide range of jobs available based on your skills.

No, you don't need to pay any taxes in the U.S. However, you might need to pay taxes according to your country’s tax laws. Also, your bank might charge you a small amount as a transaction fee.

We, at Turing, hire remote developers for over 100 skills like React/Node, Python, Angular, Swift, React Native, Android, Java, Rails, Golang, PHP, Vue, among several others. We also hire engineers based on tech roles and seniority.

Communication is crucial for success while working with American clients. We prefer candidates with a B1 level of English i.e. those who have the necessary fluency to communicate without effort with our clients and native speakers.

Currently, we have openings only for the developers because of the volume of job demands from our clients. But in the future, we might expand to other roles too. Do check out our careers page periodically to see if we could offer a position that suits your skills and experience.

Our unique differentiation lies in the combination of our core business model and values. To advance AGI, Turing offers temporary contract opportunities. Most AI Consultant contracts last up to 3 months, with the possibility of monthly extensions—subject to your interest, availability, and client demand—up to a maximum of 10 continuous months. For our Turing Intelligence business, we provide full-time, long-term project engagements.

No, the service is absolutely free for software developers who sign up.

Ideally, a remote developer needs to have at least 3 years of relevant experience to get hired by Turing, but at the same time, we don't say no to exceptional developers. Take our test to find out if we could offer something exciting for you.

View more FAQs

Latest posts from Turing

Software-developer-jobs-in-Silicon-Valley-tech-companies

Looking for Software Developer Jobs? Learn How to Write a Clean Code First

Are you a software developer looking for remote jobs in Silicon Valley tech companies? If yes, these clean code t...

Read more

Turing.com Review by Nigeria’s Joy: Flexibility in Work Allows Me to Enjoy Life More

Flexibility in work at Turing allows me to enjoy life more, says Nigeria’s Joy in her Turing.com review...

Read more
Software-Development-Life-Cycle-scaled

The Nine Steps of Software Product Development Life Cycle

A product development process depends on the nature of the business. But these steps can turn your ordinary softw...

Read more

Ten Tips to Crack a Software Developer Job Interview

Cracking a software developer job interview is no cakewalk. Here are a few tips to help level up your...

Read more

Turing Blog: Articles, Insights, Company News and Updates

Explore insights on AI and AGI at Turing's blog. Get expert insights on leveraging AI-powered solutions to drive ...

Read more

Leadership

In a nutshell, Turing aims to make the world flat for opportunity. Turing is the brainchild of serial A.I. entrepreneurs Jonathan and Vijay, whose previous successfully-acquired AI firm was powered by exceptional remote talent. Also part of Turing’s band of innovators are high-profile investors, such as Facebook's first CTO (Adam D'Angelo), executives from Google, Amazon, Twitter, and Foundation Capital.

Equal Opportunity Policy

Turing is an equal opportunity employer. Turing prohibits discrimination and harassment of any type and affords equal employment opportunities to employees and applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, age, disability status, protected veteran status, or any other characteristic protected by law.

Explore remote developer jobs

briefcase
AI Quality Analyst - Portuguese (Portugal)

About Turing:
Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.

Role Overview:

As an AI Quality Analyst, you will evaluate a new personalization feature for Gemini. You will assess how well the model uses information from your past Gemini conversations, Gmail, Google Search, and YouTube activity to make responses more relevant and helpful. This role requires a unique blend of creativity and analytical rigor. You will actively design prompts from the perspective of your own personal experiences. You will then use your analytical skills to assess the quality of the model's personalized responses, evaluating dimensions like Grounding, Integration, and Helpfulness.


Key Qualifications

  • Portugueese Proficiency: Ability to read and write in Portuguese with a high degree of comp, as Portuguese is the focus language for this project.
  • Personal Account Usage: Willingness to use your primary personal Google account (not a testing account) and enable personal data sources for a genuine assessment.
  • Schedule Flexibility: Full-time availability in your local time zone is required.  We are staffing a global, 24-hour operations team.
  • Exceptional Analytical Thinking: Demonstrate ability to evaluate nuanced and ambiguous AI responses, specifically assessing personalization quality.
  • Creative Prompt Engineering: Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test the model's capabilities.
  • Strong Evaluation Acumen: Understanding of personalization concepts, including the ability to identify incorrect personalization, poor inferences, and forced connections.
  • Meticulous Attention to Detail: The ability to review Side-by-Side (SxS) model responses and spot subtle differences in naturalness and overnarrating.
  • Excellent Written Communication: Superior ability to write clear, concise, and structured rationales for model rankings, explicitly referencing specific turn numbers.
  • Feedback: Ability to provide constructive feedback and detailed annotations.
  • Communication: Excellent communication and collaboration skills.
  • Independence: Self-motivated and able to work independently in a remote setting.
  • Technical Setup: Desktop/Laptop set up with a good internet connection.


Description:

  • In this role, you will be part of a dynamic team focused on evaluating the quality of personalized AI interactions. Your day-to-day work will involve:
  • Designing and executing multi-turn conversational prompts (typically 1-5 turns) that require the AI to utilize your personal information and experiences.
  • Evaluating model responses based on your intent from the starting prompt, checking if the personalization was appropriately applied.
  • Analyzing responses for Grounding issues, ensuring claims about you are supported by evidence and not flawed inferences or hallucinations.
  • Assessing Integration quality to ensure personal data is woven naturally into the response without robotic "overnarrating".
  • Rigorously evaluating and stack-ranking two model responses side-by-side (SxS) to determine which is overall more helpful, easy to use, and enjoyable.
  • Writing clear, defensible rationales for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
  • Extracting and verifying "Debug Info" from the model to confirm that chat summaries and data sources were properly utilized.
  • Maintaining strict data hygiene by deleting evaluation conversations to prevent them from polluting your future chat history.


Education & Experience

  • BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
  • Experience in data annotation, AI quality evaluation, content moderation, or a related role is strongly preferred.

Offer Details:

  • Commitments Required: at least 4 hours per day and upto 40 hours per week with 4 hours of overlap with PST.
  • Engagement type: Contractor
  • Engagement Length: 3 months
  • Our offered rate for this project is $15 per hour.

Evaluation Process -

  • Shortlisted candidates will be sent a Job Interest Form.
  • After the profile review, an assessment will be shared, which must be completed within 24 hours.
  • Based on the assessment outcomes, shortlisted candidates will be contacted to discuss the pre‑onboarding requirements.
Software
10K+ employees
Domain-Specific Languages
briefcase
AI Engineer

About Turing


Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L


Role Overview


We are looking for an AI/ML Engineer specializing in LLM post-training and reinforcement learning workflows. The role focuses on fine-tuning open-weight models, building reward systems, and improving model performance through scalable training, evaluation, and data curation


What does day-to-day life look like?

  • Design and execute fine-tuning pipelines for open-weight models (Qwen, Llama, Mistral families) using SFT → DPO → GRPO progressions on tool-use and agentic data.
  • Implement and tune LoRA / QLoRA adapters for parameter-efficient fine-tuning; understand when full fine-tuning vs PEFT is the right call.
  • Build reward functions and verifiers for RL training  including programmatic verifiers, LLM-as-judge rubrics, and state-transition checks against gym environments.
  • Generate, curate, and filter RL tool-use training data: golden trajectories, preference pairs, on-policy rollouts, and rejection-sampled completions.
  • Run distributed training on multi-GPU setups; manage inference at scale with vLLM (including extended-context configurations via YaRN / RoPE scaling).
  • Diagnose failure modes: reward hacking, distribution collapse, KL blow-up, tool-selection errors vs state-transition errors, format drift.
  • Define and track evaluation metrics  pass@k, pass^k, trajectory-level scoring, rubric-based vs binary scoring  and own model-quality reporting against benchmarks.
  • Partner with annotation, eval, and client teams to translate data-quality signals into training improvements.

Requirements

  • 3+ years of hands-on ML engineering experience, with at least 1+ year specifically on LLM post-training.
  • Demonstrated production or research experience with at least three of: SFT, LoRA/QLoRA, DPO, PPO, GRPO, RLHF.
  • Strong PyTorch fundamentals; working familiarity with Hugging Face TRL, Accelerate, DeepSpeed or FSDP, and vLLM.
  • Experience designing reward signals or verifiers for RL training  not just running training scripts.
  • Solid grasp of tokenization, attention, chat templates, tool-calling formats (OpenAI/Anthropic-style), and common failure modes in agent training.
  • Comfort with Python, distributed training, GPU profiling, and reading research papers and turning them into working code.

Strongly Preferred:


  • Experience training tool-use or agentic models (function calling, multi-step tool selection, planner-executor patterns).
  • Experience with synthetic data generation pipelines and rejection sampling.
  • Familiarity with MCP, LangChain/LangGraph, or similar agent frameworks.
  • Exposure to evals at scale: building harnesses, designing rubrics, dealing with judge variance and reward hacking.
  • Cloud/infra: RunPod, AWS, GCP; container workflows; long-context inference tuning.


Perks of Freelancing With Turing

  • Work in a fully remote environment.
  • Opportunity to work on cutting-edge AI projects with leading LLM companies.

Offer Details

  • Commitments Required: 40 hours per week with overlap of 4 hours with PST. 
  • Engagement Type: Contractor assignment (no medical/paid leave)
  • Duration of contract : 2 months; [expected start date is next week]
  • Location: India, Pakistan, Bangladesh, Brazil

Evaluation Process

  • 2 rounds of Technical Interview (90 mins)
-
1-10 employees
PythonMachine Learning
sample card

Apply for the best jobs

View more openings
Turing books $87M at a $1.1B valuation to help source, hire and manage engineers remotely
Turing named one of America's Best Startup Employers for 2022 by Forbes
Ranked no. 1 in The Information’s "50 Most Promising Startups of 2021" in the B2B category
Turing named to Fast Company's World's Most Innovative Companies 2021 for placing remote devs at top firms via AI-powered vetting
Turing helps entrepreneurs tap into the global talent pool to hire elite, pre-vetted remote engineers at the push of a button

Work with the world's top companies

Create your profile, pass Turing Tests and get job offers as early as 2 weeks.