Fragalysis User Guide
Fragalysis is a web-based platform for the visualisation, comparison, and analysis of fragment-bound protein crystal structures, assay measurements, and follow-up virtual ligand screens. It can effectively be divided into:
Experimental fragment screening data processed via XChemAlign and uploaded to Fragalysis, and affinity data collected using the Creoptix WAVEsystem and fit using either the vendor’s software or SensoFit. These data can be curated and downloaded via the “left-hand side” (LHS) of Fragalysis.
Computed follow-up designs from virtual compound sets uploaded to Fragalysis, curated and downloaded via the “right-hand side” (RHS) of Fragalysis.
Getting started
Fragalysis can be used to explore data in a number of ways:
Experimental Structures (LHS)
Computed Structures (RHS)
Jupyter Notebooks
Programming Interface (API)
The Fragalysis “viewer” interface
The Fragalysis viewer has been customised for fragment screening workflows, is fully interactive and runs directly in your browser. When opening a target, you will be presented with an interface that allows you to interact with and curate data:
Share/snapshot this allows you to create and share a permanent link to your exact Fragalysis state
Tags This is how you can control which hits are visible by sites and other categories
LHS / Hits Here you can navigate all the hits and add visualisations to them (The Tags panel also belongs to the LHS)
The visualisation buttons are shared also with virtual hits (RHS) and work as follows:
All : show ligand in (CPK), protein side chains (lines), and interactions.
Ligand: Ligand (CPK)
Protein: Protein side chains (lines)
Interactions: Interactions
Surface: Electrostatic surface of the protein
Electron Density: Experimental electron density
Vectors: Possible vectors for elaboration
Controlling the 3D viewer
Fragalysis uses NGL viewer under the hood to visualise 3D models, inspect binding sites, and compare multiple structures at once. It can be easily controlled with mouse and keyboard inputs:
Key / Mouse action |
Effect |
|---|---|
scroll |
Zoom scene |
scroll + Ctrl |
Move near clipping plane |
scroll + Shift |
Move near clipping plane and far fog |
scroll + Alt |
Change isolevel of isosurfaces |
drag right |
Pan / translate scene |
drag middle |
Zoom scene |
drag left |
Rotate scene |
drag + Shift + right |
Zoom scene |
drag left + right |
Zoom scene |
drag + Ctrl + right |
Pan / translate hovered component |
drag + Ctrl + left |
Rotate hovered component |
click pick (middle) |
Auto view picked component element |
hover pick |
Show tooltip for hovered component element |
i |
Toggle stage spinning |
k |
Toggle stage rocking |
p |
Pause all stage animations |
r |
Reset stage auto view |
Geometric filtering
Geometric filtering allows you to limit hits based on their position in 3D space. When you click any structure in the NGL Viewer, a green semi-transparent sphere will appear. After clicking Apply in the Radius selection dialog, only hits that intersect with the sphere will be shown in the hit navigator.
This feature is ON by default and can be toggled in the Advanced Search dialog. If you turn it off, any existing geometric filtering is cleared and no spatial filtering will be applied.
Browsing experimental data (LHS)
The left-hand-side (LHS) user interface of Fragalysis allows you to select experimental data for display, download, and computation.
There are three panels Tag Details, Hit Navigator, and Snapshot:
Tag Details
Tags are user-defined labels used to organise, filter, and annotate structures and fragments within Fragalysis. They provide a flexible way to group related data and share interpretation without modifying the underlying experimental data. This section details how tags can be used to filter experimental data. To add and edit tags see the curating experimental (LHS) data page.
All tags assigned to left-hand side data can be managed in this panel:
Select tags to show datasets assigned to that tag in the Hit navigator. The union / intersection toggle at the top of the page can be used to determine the behaviour when multiple tags are selected:
Union: Display datasets that are tagged with at least one of the selected tags
Intersection: Display datasets that are tagged with all of the selected tags
Control |
Description |
|---|---|
SHOW UNTAGGED HITS |
Displays only datasets that do not have any tags assigned. Useful for identifying new or unreviewed fragment hits. |
SHOW ALL HITS |
Displays all datasets, ignoring the current tag selection. Overrides any active tag filters. |
SELECT ALL TAGS |
Selects all available tags in the tag list. Does not automatically select hits in the hit navigator. |
SELECT HITS (per tag) |
Activates selection checkboxes for all datasets associated with the chosen tag. Useful for bulk operations on tagged datasets. |
Snapshots
Snapshots are saved views of the current analysis state, allowing you to quickly return to a specific set of selected hits and visualisation settings, and making it easy to share or revisit a particular analysis.
Creating direct URLs to specific views
To link to specific datasets within a target, the following syntax is supported:
Specifying the target and proposal
The following URL takes you to the target with:
name:
A71EV2Atarget access string (tas):
lb32627-66:
https://fragalysis.diamond.ac.uk/viewer/react/preview/target/A71EV2A/tas/lb32627-66
Using the direct URL syntax
You can also create URLs that display specific datasets. To use this functionality you have to use this base URL including the direct command:
https://fragalysis.diamond.ac.uk/viewer/react/preview/direct/
Examples
e.g. showing observations with ligands where compound alias contains substring ASAP:
target/A71EV2A: specifies the target nametas/lb32627-66: specifies the target access stringcompound/ASAP/L: shows the ligand (L) representation for allcompoundaliases containingASAP
e.g. showing observations with ligands where compound alias is exactly ASAP-0016733-001:
target/A71EV2A: specifies the target nametas/lb32627-66: specifies the target access stringcompound/ASAP-0016733-001/L/exact: shows the ligand (L) representation for exactcompoundaliases matchASAP-0016733-001
Downloading experimental data (LHS)
You can download experimental structures and affinity data (if present) directly from the Fraglaysis UI. At the top of the Fragalysis viewer interface, you will see a download button:
This will open the download interface. By default, the download will select “All structures”, “Incremental”, “single SDF of all ligands” and “Computed copound sets” but there are various selections that allow you to customise your download:
Option |
What it does |
|---|---|
Subset selection |
|
All structures |
Downloads every aligned structure available for the target ( |
Structures displayed in the 3D display |
Downloads only aligned structures currently visible in the viewer ( |
Structures selected in the Hit Navigator |
Downloads only aligned structures you’ve explicitly selected in the hit navigator ( |
Structures associated with the active tags |
Filters aligned structures based on active annotation tags and downloads those ( |
Map files (re-aligned to reference) |
|
PanDDA Event maps |
PanDDA output highlighting ligand-binding events; best for detecting signal over noise ( |
Conventional inspection maps |
2Fo-Fc electron density maps used for model building and validation ( |
Conventional residual maps |
Fo-Fc difference maps showing unmodelled or incorrectly modelled density ( |
Transformations applied for alignments |
Alignment matrices used to superpose structures/maps onto a reference frame |
Crystallographic files |
|
Coordinate files (not re-aligned) |
Atomic coordinates in their original reference frame (not aligned) ( |
Reflections and map coefficients |
Structure factor data and map coefficients used for map calculation and refinement ( |
Ligand definitions and geometry restraints |
Restraints and chemical definitions needed for ligand refinement ( |
Real-space map files |
Maps in real-space format (large files; often unnecessary unless specifically needed) ( |
Version of data stored in permalink |
|
Incremental (always up-to-date) |
Link always reflects the latest dataset as new structures are added |
Preserved (snapshot) |
Fixed dataset frozen at the current state; reproducible and unchanging |
Other |
|
Single SDF of all ligands |
One file containing all ligand structures |
Computed compound sets |
Includes computed ligand sets |
SoakDB CSV and SQLite files |
Metadata database containing experiment details, useful for large-scale analysis |
After selecting what files you want, select “Prepare download” to zip your files. Once this is complete (be patient, this can take a few minutes) the “Download” and “Copy permalink” buttons will no longer be greyed out, and a green “Download is ready!” indicator will appear, allowing you to commence the download. These download options, as well as the others available are explained here:
Option |
What it does |
When to use it |
|---|---|---|
Prepare download |
Packages your selected files into a |
Always use this option if downloading a |
Copy permalink (prepare download first) |
Copies a persistent URL that encodes all your current selections |
Sharing datasets or saving your exact selection for later |
Download (prepare download first) |
Once prepared, this immediately downloads the dataset with your current selections |
Use when you’re ready to download the data locally |
(For coders) Copy JSON for API call |
Copies a structured JSON representation of your selection for programmatic access |
Scripting workflows, automation, or pipeline integration |
Show Examples |
Opens example usage GitHub page |
If you need useful example / template code. |
Affinity Data
If a data release contains accompanying affinity data, as part of the download affinity data will be contained in an affinity_files subdirectory located inside the extra_files directory of the download and typically arranged as follows:
all_affinity_data.csv
all_affinity_data.sdf
creoptix_raw_data.zip
README.md
README.pdf
sensofit_package_data.zip
sensofit_walkthough.ipynb
All affinity data
The all_affinity_data.csv file gathers all the values from manual kinetics evaluations using Creoptix software. Below is a decription
of all the fields in the CSV:
“Run date”: date when the experiment was performed
“Cycle number”: ID of the cycle (defined by Creoptix during the experiment)
“Protein concentration (μg/mL)”: concentration in μg/mL the protein was captured at
“Channel”: which channel the signal comes from (formatted as Ch Y-X, where X is the ID of reference channel and Y is the ID of the active channel)
“Sample type”: whether the analyte was a control or a sample
“ASAP IDs”: ASAP ID of the analyte
“OpenBind IDs”: OpenBind ID of the analyte
“SMILES”: CxSMILES of the analyte with enhanced stereochemistry
“Sample concentration (M)”: concentration in M of the analyte used for the experiment
“ka (M-1s-1)”: association, or on-rate, constant in M-1.s-1 estimated by Creoptix
“ka error (%)”: the 95% confidence interval error of ka expressed as a percentage of the estimated ka value
“kd (s-1)”: dissociation, or off-rate, constant in s-1 estimated by Creoptix
“kd error (%)”: the 95% confidence interval error of kd expressed as a percentage of the estimated kd value
“KD (M)”: binding/affinity constant in M ($KD = kd/ka$)
“Rmax (pg/mm2)”: maximum signal response of the sensorgram in pg/mm2 for the analyte (estimated by Creoptix)
“Sqrt(Chi2)”: squared-root of Chi2 (goodness-of-fit metrics, lower = better/closer to 1 = better?)
“Comments”: comments of trained experimentalist on the sensorgram/fit for the analyte
“Used in analysis”: boolean flag indicating whether the data passed all curation criteria (see below table), and therefore was use in the ML analysis (True), or failed (False):
Criteria |
Values |
|---|---|
Removing boundary fits (i.e. where Creoptix couldn’t fit) |
|
Removing large CI errors |
|
Removing “bad” goodness-of-fit metrics |
|
Removing low maximum signal response estimation |
|
Compounds that passed all criteria but had no associated structures were not used in analysis and were flagged “False”.
The all_affinity_data.sdf file is an SDF version of the CSV file generated using the RDKit.Chem.SDWriter function. It contains the exact
same information as the CSV file.
Creoptix raw data
The creoptix_raw_data.zip archive contains all 4 Creoptix experiments used to generate the data. They can be read using
Creoptix software or SensoFit (more details about SensoFit below).
SensoFit package data
sensofit_package_data.zip is the compressed “package” of the 4 Creoptix experiments exported into a more accessible format using
SensoFit export function (please refer to the GitHub repo). For the first OpenBind data release, this is the input used in an additional file sensofit_walkthough.ipynb. This is a Jupyter Notebook that walks you through the first OpenBind data release affinity data using our open-source Python tool SensoFit.
To use this notebook, please follow the steps below: Clone the GitHub repo, and cd to the root:
git clone https://github.com/xchem/sensofit
cd sensofit
Create a new conda environment, activate it, then install the package:
conda create -n sensofit python=3.11
conda activate sensofit
pip install -e .
cd to the root of the affinity directory of the download and run the Jupyter Notebook: (note: change the path below to the path of the actual affinity_files directory)
cd /path/to/data-release/download/extra_files/affinity_files/
jupyter lab
An internet page should open with the default JupyterLab home page. sensofit_walkthough.ipynb should be available in
file browser on the left. Double click on the file and the notebook should open. You can run all the cell, or follow the instructions in the notebook cell by cell.
Interpreting the download
A Fragalysis download will contain a minimum of 2 directories, aligned_files and crytallographic_files. The download will typically include the additional directories extra_files, scripts and yaml_files, as well as some additional files at the top level directory.
Two important top level files are metadata.csv and smiles.smi. These are both plain-text files. metadata.csv will contain information about the context of each ligand and may provide a convenient way to browse through smiles, site labels and PDB codes for each ligand. smiles.smi contains a list of all smiles strings that you have downloaded separated by commans. [target-name]_combined.sdf may also be present which will contain all the ligand sdf files in a single sdf file.
Aligned directory
The aligned directory contains a subdirectory for each dataset that was selected for downloading, aligned to a common reference through XChemAlign processing as they appear in the viewer interface. Depending on your selection of options when downloading the data, the follow file suffixes may be present:
⚠️ IMPORTANT
.ccp4maps are optimised to work with NGL viewer.
If viewing in PyMOL or COOT, files that align with the XCA aligned model have the suffix_crystallographic.ccp4.
File pattern |
Description |
|---|---|
|
Full atomic model. Protein, ligand, and water/ion/buffer molecules |
|
Protein model only. Ligand and water/ion/buffer molecules removed |
|
Water/ion/buffer molecules molecules only. Protein and ligand molecules removed |
|
Protein and solvent/ion/buffer molecules. Ligand molecules removed |
|
PanDDA event electron density map cut to around 12 Å around the ligand |
|
2mFo-DFc σA-weighted map cut to around 12 Å around the ligand |
|
mFo-DFc σA-weighted difference map cut to around 12 Å around the ligand |
|
PanDDA event electron density map cut to around 12 Å around the ligand |
|
2mFo-DFc σA-weighted map cut to around 12 Å around the ligand |
|
mFo-DFc σA-weighted difference map cut to around 12 Å around the ligand |
|
Ligand structure in PDB format |
|
Ligand structure in SDF format |
|
Ligand structure in SMILES format |
Crystallographic directory
The crystallographic_files directory contains versions of data found in the aligned folder prior to XChemAlign processing. Depending on your selection of options when downloading the data the follow file suffixes may be present:
File pattern |
Description |
|---|---|
|
Full atomic model. Protein, ligand(s), and solvent/ion/buffer molecules |
|
Reflection data corresponding to the PDB file |
|
Ligand structure in CIF format |
Extra files
If affinity data is available for a target, an affinity_files subdirectory containing all the affinity data will be available. A README inside this subdirectory will explain every file in detail.
If the SoakDB CSV and/or SQLite option(s) have been selected, their corresponding files can be found in this directory.
Beyond this, if the extra_files directory is present the files will have been added by the uploader of the data, and therefore has no defined structure. As a result we cannot guess what the contents of the file may be, but we hope that the uploader of the extra files will have provided a README to describe any additional files.
Some examples of extra files:
File pattern |
Description |
|---|---|
|
Subdirectory containing affinity data for specific datasets |
|
SoakDB file in SQLite format |
|
SoakDB file in CSV format |
|
Target sequence in FASTA format |
Curating experimental data (LHS)
XChemAlign transforms crystallographic data into a biological reference frame. This involves matching ligand neighbourhoods across crystalforms, assemblies, and chains onto a appropriate reference structures. This process generates various sites which make their way into Fragalysis via tags. All tag information is also included in the metadata.csv in any Fragalysis download.
Indicating merging hypotheses
For Fast Forward Fragments it is required to create one Curator Tag for each group of fragments that you wish to explore merging. These can often just be all hits in the pockets of interest.
Indicating experimental / model quality
The experimental / model quality can be indicated using the traffic light system:
Each observation will have a Main Status, it should be decided in your project who has the final say on this, typically there is one main data owner / structural biologist. All other members are recommended to only add Peer Reviews. These not only have a status, but also allow for a comment.
Uploading assay measurements or computed scores (LHS)
Fragalysis supports annotation of experimental data with text or numeric scores that are linked either to compound codes or observation short codes.
Warning
Do not upload any assay data to a public target that is confidential! Measurements against compounds that do not (yet) have structures will still be accessible to authorised API users.
Creating the assay data CSV
Create a CSV with:
one identifier column, containing either compound codes or observation short codes
as many text/numeric columns as you want
The data type of columns can optionally be specified by an additional row containing
text,int, orfloat
Please note that CDD data can be exported as a CSV and often uploading with minimal manual modification.
Uploading
Log in and open your target of interest
Select
Assay data uploadfrom the menuComplete the form:
Modifying data type of existing data column
Use the /api/activity_data_curation/ endpoint to change data types of previously uploaded scores
Browsing virtual compound sets (RHS)
Overview of the RHS interface
The right hand side (RHS) is where follow-up designs and their virtual hits are navigated. Follow-up designs are grouped into compound sets, corresponding to each SDF that was uploaded (See Uploading compound sets to the RHS).
Inspirations
The F button on each compound can be used to bring up a modal with the experimental hits used as inspirations / references for the compound design. The same LHS visualisation buttons are available to superimpose the inspiration hit with the follow-up design. When an experimental dataset is displayed, all virtual designs referencing that ligand will have their F icon active.
Sorting and filtering
Clicking on the filter
icon allows you to sort and filter the compounds by properties present in the uploaded set. Typically you will find scores such as energy_score representing computed binding energy, distance_score representing RMS distance to the fragment inspirations, and score_inspiration which may indicate how well the fragments references have been recapitulated:
Curating virtual compound sets (RHS)
Colours / painting
You can paint compounds with colours that can be renamed, i.e. “Yes”, “No”, “Maybe”:
These labels will be assigned to compounds in your session and can be downloaded as a CSV in the “selected compounds” tab
Warning
The state of the Fragalyis RHS does not persist when you refresh or otherwise leave the page. To export a copy of your curations remember to download a CSV:
Arrows
Use these arrows to quickly apply the current visualisations to adjacent compounds
This works best when inspirations modal is open, and the inspiration hits and current compound are shown as ligands
Exporting curations
Once you have painted compounds you can export a CSV which can be used to share your curations/review with others:
Compounds from different sets they can be viewed together in the “selected compounds” tab.
Uploading virtual compound sets (RHS)
In order to disseminate non-experimental structures/ligands with Fragalysis, they can be uploaded using the “RHS upload” option in the “Hamburger menu”, which takes you to the viewer/upload_cset endpoint:
Supported data format
To upload a compound set to the RHS of Fragalysis an SD file (SDF) must be prepared.
Header molecule
Fragalysis requires a header molecule that defines properties for the whole compound set. The molecule and coordinates of the header molecule are completely ignored, however there are required properties:
Property |
Value |
|---|---|
|
|
|
Reference URL for the algorithm / dataset |
|
Compound set submitter’s name |
|
Compound set submitter’s email |
|
Compound set submitter’s institution |
|
Date associated with the data (ISO 8601) |
|
Algorithm / method name for this compound set |
Additionally, if you want to include extra text or numerical properties for ligands in this set you will have to include that property in the header as well with a description value. For example if you want to include a energy_score property with each ligand you will need to include this as a property on the header as well, with a text description:
Property |
Value |
|---|---|
|
|
An example header molecule is provided below.
Ligands
The required properties for each non-header molecule are different:
Property |
Value |
|---|---|
|
compound name |
|
Reference protein (Fragalysis observation short-code, e.g. A0310a) |
|
Reference datasets that inspired this molecule/pose (Fragalysis observation short-code, e.g. A0310a) |
Ligands and proteins (SDF + ZIP of PDBs)
If you have computed custom protein conformations associated with these ligands they can be provided in the upload form as a separate ZIP archive. In this case, your ref_pdb values for each ligand should be the name of the relevant PDB file.
Example Header
ver_1.2
RDKit 3D
14 15 0 0 0 0 0 0 0 0999 V2000
-3.4503 1.0190 -1.1743 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.2533 1.0671 -0.5344 N 0 0 0 0 0 0 0 0 0 0 0 0
-2.1679 -0.0620 0.1865 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3036 -0.8455 1.1366 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.4390 -1.7388 0.2452 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3763 -0.9521 -0.6603 N 0 0 0 0 0 0 0 0 0 0 0 0
1.4334 -0.1564 -0.1409 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0843 -0.7615 0.8099 N 0 0 0 0 0 0 0 0 0 0 0 0
3.2028 -0.1250 1.4766 C 0 0 0 0 0 0 0 0 0 0 0 0
4.1795 0.3255 0.4069 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7544 1.5811 -0.2821 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9712 1.4810 -0.5890 S 0 0 0 0 0 0 0 0 0 0 0 0
-3.3092 -0.7524 -0.0399 N 0 0 0 0 0 0 0 0 0 0 0 0
-4.0785 -0.0801 -0.8706 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 2 0
8 9 1 0
9 10 1 0
10 11 1 0
11 12 1 0
3 13 2 0
13 14 1 0
14 1 1 0
12 7 1 0
M END
> <ref_url> (1)
https://github.com/mwinokan/BulkDock
> <submitter_name> (1)
Max Winokan
> <submitter_email> (1)
max.winokan@diamond.ac.uk
> <submitter_institution> (1)
DLS
> <generation_date> (1)
2024-12-02
> <method> (1)
Knitwork_CavB_impure
> <SLURM_JOB_ID> (1)
SLURM_JOB_ID
> <SLURM_JOB_NAME> (1)
SLURM_JOB_NAME
> <csv_name> (1)
csv_name
> <scratch_subdir> (1)
scratch_subdir
> <fragmenstein_runtime> (1)
fragmenstein_runtime
> <fragmenstein_outcome> (1)
fragmenstein_outcome
> <fragmenstein_mode> (1)
fragmenstein_mode
> <fragmenstein_error> (1)
fragmenstein_error
> <exports> (1)
exports
> <HIPPO Pose ID> (1)
HIPPO Pose ID
> <HIPPO Compound ID> (1)
HIPPO Compound ID
> <smiles> (1)
smiles
> <ref_pdb> (1)
protein reference
> <ref_mols> (1)
fragment inspirations
> <original ID> (1)
original ID
> <compound inchikey> (1)
compound inchikey
