Recur company logo

Recur is hiring a

Senior Scientific Data Engineer, Data Platform

Back to Jobs
London, England; Milton Park, England
Posted 2 months ago
86 views

Job Description

Your work will change lives. Including your own. 

Recursion is decoding biology to industrialize drug discovery. We are looking for a Senior Scientific Data Engineer. As part of a team, you will own a suite of business-critical data products, including our Structure-Activity Relationship data mart.

This is a high-impact role requiring a strong synthesis of robust software engineering capabilities and deep drug discovery domain expertise. You will take ownership of the data architecture responsible for ingesting, standardizing, and serving both public and proprietary datasets. These systems directly power our competitor intelligence, chemical tractability assessments, and compound design models.

Please note: This is a specialized Data Engineering position focused strictly on data infrastructure and product ownership. While your work will directly enable our machine learning and predictive modeling efforts, the responsibilities do not encompass building or training models. This opportunity is ideally suited for engineers dedicated to architecting complex scientific data systems, rather than data scientists seeking modeling-focused roles.

The Systems You Will Own

You will join the Data Platform team and maintain an ecosystem of ~100 ingested datasets, while taking specific ownership of high-value products including:

  • Flagship SAR Data Mart: A unified bioactivity warehouse merging commercial and public (e.g., ChEMBL) databases with internal assay data.
  • Commercial Vendor Data Mart: A massive catalog of purchasable compounds used to guide our internal compound design tools and tractability assessments.
  • Biomedical Knowledge Graph: The critical data feeds and infrastructure that power our semantic graph and associated AI agents, linking targets, diseases, and compounds.
  • Chemical Synthesis Data: The foundational dataset of chemical reactions used for training retrosynthesis models and tractability prediction.
  • Patent Intelligence System: A pipeline transforming patent feeds and competitor data into actionable intelligence.
  • Compound Standardization Registry: A large-scale chemical structure warehouse ensuring consistency across billions of compounds (similar to UniChem).

What You’ll Do

  • Pipeline Ownership at Scale: Act as a key owner for our core bioactivity pipeline, processing 75M+ records and managing ~100 distinct data feeds. You will navigate complex logic and orchestration, including managing 4000+ lines of complex SQL with 20+ transformation steps.
  • Scientific Data Standardization: Resolve ambiguity by reconciling heterogeneous data formats from diverse commercial and public sources. You will design and implement logic to standardize chemical structures (SMILES, InChI, tautomers), biological targets (UniProt mapping, gene families, species homology), and assay data (IC50/Ki normalization, unit conversion).
  • Engineer for Distributed Compute: Optimize tasks using Python and Snowpark for heavy-lifting operations, such as large-scale text mining (extracting dose/concentration from unstructured text) and molecular property calculation.
  • Drive Data Quality: Implement rigorous data quality frameworks (DQF) to handle the nuance of biological data, ensuring our downstream models are trained on clean, semantic-aware data.
  • Cross-Functional Consulting: Interface directly with discovery scientists to understand their diverse data needs and translate complex scientific requirements into robust engineering solutions.

The Experience You’ll Need

  1. Core Engineering:
  • Advanced SQL & Warehousing: Deep expertise in modern cloud data warehousing (e.g. Snowflake, BigQuery). You should be comfortable with complex window functions, CTEs, and schema design for multi-layer environments.
  • Python & Distributed Compute: Strong proficiency in Python for data processing. Experience with Data warehouses is a huge plus, but general distributed processing experience is also valuable.
  • Orchestration: Experience managing complex DAGs and asynchronous task coordination (e.g. Prefect, Argo Workflows).
  1. Domain Expertise:
  • Medicinal Chemistry Context: You understand how chemistry is represented in data (SMILES, scaffolds) and the nuance of bioactivity measurements (potency vs. efficacy, IC50 vs. pXC50).
  • Biological Context: Familiarity with gene/protein families, species homology, and target nomenclature (e.g., how similar genes appear in different species).
  • Assay Knowledge: Ability to distinguish between assay types (e.g., binding, functional), formats, and the units/measurements associated with them. Ideally familiar with ontologies (e.g., BioAssay Ontology, cell line taxonomies).
  • Data Landscape: Knowledge about public drug discovery datasets and how they can be used to support the drug discovery pipeline.
  1. Nice-to-Haves:
  • Experience with chemical toolkits (e.g. OpenEye or RDKit).
  • Experience using text mining or LLMs for structured data extraction from scientific text.

Working Location & Compensation:

This position can be based at either our London or Milton Park office. Please note that we are a hybrid environment and ask that employees spend 50% of their time in the office.

At Recursion, we believe that every employee should be compensated fairly. Based on the skill and level of experience required for this role, the estimated current annual base range for this role is £75,900 - £101,900. You will also be eligible for an annual bonus and equity compensation, as well as a comprehensive benefits package.

#LI-EP1

The Values We Hope You Share:

  • We act boldly with integrity. We are unconstrained in our thinking, take calculated risks, and push boundaries, but never at the expense of ethics, science, or trust. 
  • We care deeply and engage directly. Caring means holding a deep sense of responsibility and respect - showing up, speaking honestly, and taking action.
  • We learn actively and adapt rapidly. Progress comes from doing. We experiment, test, and refine, embracing iteration over perfection.
  • We move with urgency because patients are waiting. Speed isn’t about rushing but about moving the needle every day.
  • We take ownership and accountability. Through ownership and accountability, we enable trust and autonomy—leaders take accountability for decisive action, and teams own outcomes together. 
  • We are One Recursion. True cross-functional collaboration is about trust, clarity, humility, and impact. Through sharing, we can be greater than the sum of our individual capabilities.

Our values underpin the employee experience at Recursion. They are the character and personality of the company demonstrated through how we communicate, support one another, spend our time, make decisions, and celebrate collectively.

More About Recursion

Recursion (NASDAQ: RXRX) is a clinical stage TechBio company leading the space by decoding biology to radically improve lives. Enabling its mission is the Recursion OS, a platform built across diverse technologies that continuously generate one of the world’s largest proprietary biological and chemical datasets. Recursion leverages sophisticated machine-learning algorithms to distill from its dataset a collection of trillions of searchable relationships across biology and chemistry unconstrained by human bias. By commanding massive experimental scale — up to millions of wet lab experiments weekly — and massive computational scale — owning and operating one of the most powerful supercomputers in the world, Recursion is uniting technology, biology and chemistry to advance the future of medicine.

Recursion is headquartered in Salt Lake City, where it is a founding member of BioHive, the Utah life sciences industry collective. Recursion also has offices in Toronto, Montréal, New York, London, Oxford area, and the San Francisco Bay area. Learn more at www.Recursion.com, or connect on X (formerly Twitter) and LinkedIn.

Recursion is an Equal Opportunity Employer.  All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other characteristic protected under applicable federal, state, local, or provincial human rights legislation. 

Accommodations are available on request for candidates taking part in all aspects of the selection process.


Recruitment & Staffing Agencies: Recursion Pharmaceuticals and its affiliate companies do not accept resumes from any source other than candidates. The submission of resumes by recruitment or staffing agencies to Recursion or its employees is strictly prohibited unless contacted directly by Recursion’s internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Recursion, and Recursion will not owe any referral or other fees. Our team will communicate directly with candidates who are not represented by an agent or intermediary unless otherwise agreed to prior to interviewing for the job.
Sponsored
⭐ Featured Partner

Sportstechjobs

Discover exciting opportunities in sports tech. Join innovative companies that are advancing sports through cutting-edge technology.

Remote FriendlyCompetitive SalarySportstech

Create a Job Alert

Interested in building your career at Recur? Get future opportunities sent straight to your email.

Create Alert

Related Opportunities

Discover similar positions that might interest you