Contact Us

Datasets

Open Source Data for Drug Discovery

Download, analyze, and build upon our growing library of free biological datasets to advance your AI models and traditional drug discovery efforts.

Explore Datasets

Datasets

We release curated datasets from our automated high-throughput platforms to accelerate drug discovery across multiple therapeutic modalities.

Functional Genomics

  • Chemical and genetic perturbation responses with multi-omic readouts

  • High-throughput transcriptomics using DRUG-seq (~10,000+ genes per sample)

  • High-content imaging with cellular morphology measurements

  • Data from primary cells, iPSC-derived cells, and standard cell lines

Antibody Developability

  • Comprehensive biophysical characterization across multiple assays

  • Hydrophobicity, aggregation, stability, and self-association measurements

  • Benchmark antibodies with diverse properties

Datasets

Functional Genomics

DetailsGDPx3GDPx2GDPx1

DOWNLOAD NOW

Download GDPx3

Download GDPx2

Download GDPx1

RELEASE DATE

April 2025

December 2024

September 2024

CELL TYPES

A549 cells (human non-small cell carcinoma epithelial cell line), aortic smooth muscle cells, dermal fibroblast, aortic endothelial cells

Human melanocytes cells, aortic smooth muscle cells, dermal fibroblast, and skeletal muscle myoblasts

A549 cells (human non-small cell carcinoma epithelial cell line)

PERTURBATIONS

48 compounds, 4 concentrations, 2 time points, 4 replicates

85 compounds, 6 concentrations, 4 replicates

1,264 compounds, 2 concentrations, 4 replicates

READOUT

High Content Imaging (Cell Painting) with 2200 x 2200 image dimensions

Transcriptomic (DRUG-seq) with 2M reads sequencing depth

Transcriptomic (DRUG-seq) with 2M reads sequencing depth

PLATE DENSITY

384 well plate

384 well plate

384 well plate

AVAILABLE DATASET SIZE

220 GB

~200 GB / cell type 

60 GB

Datasets

Antibody Developability

DETAILSGDPa1

DOWNLOAD NOW

Download GDPa1

PREPRINT

Read the preprint with method details, results summary, and discussion

DATA PACKAGE

  • 246 IgGs were characterized by the biophysical assays listed below.

  • Data readouts are reported for each antibody that was tested for each assay and passed quality control criteria for that assay.

  • Data is available in a tidy data format

ANTIBODY PRODUCTION

Antibodies were expressed in HEK293F and purified using Protein A chromatography prior to developability assessment for all assays. Antibodies tested on DLS-kD went through an additional polishing SEC step. A smaller subset of antibodies (20 IgGs) was produced in ExpiCHO and purified using Protein A chromatography. 

DEVELOPABILITY ASSAYS

  1. Titer by Valita

  2. Purity by rCE-SDS

  3. Aggregation by SEC

  4. Thermostability by nanoDSF and DSF

  5. Colloidal stability by SMAC

  6. Hydrophobicity by HIC

  7. Heparin binding by HAC 

  8. Self association by AC-SINS

  9. Self association by DLS-kD

  10. Polyreactivity by bead-based method against CHO SMP and ovalbumin

DATASHEETS IN DATA FILE

  • Definitions of column headers in other datasheets

  • Antibody sequences

  • Assay data in “tidy data” format with one row per replicate

  • Assay data summary statistics with average, standard deviation, and replicates for each assay

  • Data for nanodsf vs dsf with the same ramp rate in “tidy data” format

  • Prior literature data summarizing prior published results compared with GDPa1 data in the associated preprint

Tell us what data you need. We’ll generate a large antibody developability dataset to power your AI model.

Need data? Let's talk.

Schedule a call