Vogue is Clinical Genomics solution for capturing data from various places in the data flow and to trend the data over a longer period of time.
Installation¶
git clone https://github.com/Clinical-Genomics/vogue.git
cd vogue
pip install -e .
Release model¶
Vogue development is organised on a flexible Git “Release Flow” branching system. This more or less means that we make releases in release branches which corresponds to stable versions of Vogue.
Steps to make a new release:¶
- Create a release branch from master named
version_X.X.X
- Update change log with the new version.
- Make a PR to master,
- Merge release PR into master
- Use
bumpversion
to change version accordingly:bumpversion major
orbumpversion minor
orbumpversion patch
- Do
git push
andgit push --tags
- Name PR `release version X.X.X`
- Justify if its a patch/minor/major version bump
- Paste the latest changelog to the text body
- get it approved and merge to master. **Dont delete the release branch!**
- Make a new release.
- Name tag version as `vX.X.X`
- Set target to the release branch
- Make descriptive title
- Paste latest changelog to the text body
- Release!
Deploying to production¶
Use update-vogue-prod.sh
script to update production both on Hasta and Clinical-db. Please follow the development guide and ``servers`` repo when doing so. It is also important to keep those involved informed.
Front End¶
All views in vogue should be self-explanatory. There should be no further documentation needed to be able to interpret the content of the web page.
Back End¶
The trending database is a Mongo database consisting of following collections:
- sample - holds LIMS specific data on sample level. Anchoring identifier are LIMS sample ids.
- sample_analysis - holds data from diferent pipeliens on sample level. Anchoring identifier are lims sample ids.
- flowcell - holds LIMS specific data on run level. Anchoring identifier are flowcell ids.
- application_tag - holds application tag specific data. Anchoring identifier are application tags.
The load command of each collection is described below.
Data Flow¶
CLI¶
The CLI has two base commands - load and run. The load command is for loading various data into the trending database, and the run command is for running the web application.
Load sample¶
Usage: vogue load sample [OPTIONS]
Read and load lims data for one or all samples. When loading many
samples, the different options -f, -n, -d are used to delimit the subset
of samples to load.
Options:
-s, --sample-lims-id TEXT Input sample lims id
-m, --many Load all LIMS samples if no other options are
selected
--dry-run Load from sample or not. (dry-run)
-f, --load-from TEXT load from this sample LIMS id. Use if load all
broke. Start where it ended
-n, --new Use this flag if you only want to load samples
that do not exist in the database
-d, --date TEXT Update only samples delivered after date
--help Show this message and exit.
Load analysis¶
Usage: vogue load analysis [OPTIONS]
Read and load analysis results. These are either QC or analysis output
files.
The inputs are unique ID with an analysis config file (JSON/YAML) which
includes analysis results matching the analysis model. Analysis types
recognize the following keys in the input file: QC:multiqc_picard_dups,
multiqc_picard_HsMetrics, multiqc_picard_AlignmentSummaryMetrics,
multiqc_picard_insertSize microsalt:blast_pubmlst, quast_assembly,
blast_resfinder_resistence, picard_markduplicate, microsalt_samtools_stats
Options:
-s, --sample-id TEXT Input sample id. [required]
-a, --analysis-config PATH Input config file. Accepted format: JSON,
YAML [required]
-t, --analysis-type [QC|microsalt|all]
Type of analysis results to load.
-c, --analysis-case TEXT The case that this sample belongs.
It can be
specified multiple times. [required]
-w, --analysis-workflow TEXT Analysis workflow used. [required]
--workflow-version TEXT Analysis workflow used. [required]
--is-case Specify this flag if input json is case
level.
--case-analysis-type [multiqc] Specify the type for the case analysis. i.e.
if it is multiqc output, then choose multiqc
--dry Load from sample or not. (dry-run)
--help Show this message and exit. Show this message and exit.
Load flowcell¶
Usage: vogue load flowcell [OPTIONS]
Read and load LIMS data for one or all runs
Options:
-r, --run-id TEXT Run id for the run. Eg: 190510_A00689_0032_BHJLW2DSXX
-a, --all-runs Loads all flowcells found in LIMS.
--dry Load from flowcell or not. (dry-run)
--help Show this message and exit.
Load apptag¶
Usage: vogue load apptag [OPTIONS] APPLICATION_TAGS
Reads json string with application tags. Eg:'[{"tag":"MELPCFR030",
"category":"wgs",...},...]'
Options:
--help Show this message and exit.
Run¶
Usage: vogue run [OPTIONS]
Run a local development server.
This server is for development purposes only. It does not provide the
stability, security, or performance of production WSGI servers.
The reloader and debugger are enabled by default if FLASK_ENV=development
or FLASK_DEBUG=1.
Options:
-h, --host TEXT The interface to bind to.
-p, --port INTEGER The port to bind to.
--cert PATH Specify a certificate file to use HTTPS.
--key FILE The key file to use when specifying a
certificate.
--reload / --no-reload Enable or disable the reloader. By default
the reloader is active if debug is enabled.
--debugger / --no-debugger Enable or disable the debugger. By default
the debugger is active if debug is enabled.
--eager-loading / --lazy-loader
Enable or disable eager loading. By default
eager loading is enabled if the reloader is
disabled.
--with-threads / --without-threads
Enable or disable multithreading.
--help Show this message and exit.
vogue¶
vogue package¶
Subpackages¶
vogue.adapter package¶
-
class
vogue.adapter.plugin.
VogueAdapter
(client=None, db_name=None)[source]¶ Bases:
mongo_adapter.adapter.MongoAdapter
-
add_or_update_bioinfo_processed
(analysis_result: dict)[source]¶ Functionality to add or update analysis for processed bioinfo stat
-
add_or_update_bioinfo_raw
(analysis_result: dict)[source]¶ Functionality to add or update analysis for unprocessed aka raw bioinfo stat
-
add_or_update_bioinfo_samples
(analysis_result: dict)[source]¶ Functionality to add or update bioinfo analysis for sample level results
-
add_or_update_document
(document_news: dict, collection)[source]¶ Adds/updates a document in the database
-
bioinfo_samples_aggregate
(pipe: list)[source]¶ Function to make a aggregation on the sample analysis colleciton
-
find_samples
(query: dict) → list[source]¶ Function to find samples in samples collection based on query
-
genotype_analysis_aggregate
(pipe: list)[source]¶ Function to make a aggregation on the genotype analysis colleciton
-
get_all_reagent_label_names_grouped_by_category
()[source]¶ Function get all reagent label names grouped by category from the reagent_label_category colleciton
-
get_category
(app_tag)[source]¶ Function get category based on application tag from the application tag collection
-
get_reagent_label_categories
()[source]¶ Function to get all categories from label_category_collection
-
get_reagent_label_category
(reagent_label)[source]¶ Function get category based on application tag from the application tag collection
-
vogue.build package¶
-
vogue.build.bioinfo_analysis.
build_analysis
(analysis_dict: dict, analysis_type: str, valid_analysis: list, current_analysis: dict, process_case=False, cleanup=False)[source]¶ Builds analysis dictionary based on input analysis_dict and prepares a mongo_doc.
If not process_case, then do not validate any keys in the analysis_dict. This will only load into bioinfo_raw.
If process_case, then extract valid keys from analysis_dict.
-
vogue.build.bioinfo_analysis.
build_bioinfo_sample
(analysis_dict: dict, sample_id: str, process_case=False)[source]¶ Builds sample analysis from analysis_dict
analysis_dict is a processed dictionary, i.e. from bioinfo_processed
-
vogue.build.bioinfo_analysis.
build_mongo_case
(analysis_dict: dict, case_analysis: dict, processed=False)[source]¶ Build a mongo case document dictionary
-
vogue.build.bioinfo_analysis.
build_processed_case
(analysis_dict: dict, analysis_type: str, valid_analysis: list, cleanup=False)[source]¶ Builds an analysis dict from input information provided by user.
- Input:
- analysis_dict: A dictionary of bioinfo stats to be prepared for bioinfo_processed collection analysis_type: A string for analysis_type to be extracted from from analysis_dict valid_analysis: A list of valid analysis to found within analysis_dict cleanup: Flag to cleanup unwanted keys from analysis_dict using info from valid_analysis and analysis_type
- Output:
- case_analysis: A dictionary with information about workflow and case_analysis_type(e.g. multiqc),
- workflow version, and date added.
-
vogue.build.bioinfo_analysis.
build_unprocessed_case
(analysis_dict: dict)[source]¶ Prepare a case analysis dictionary
-
vogue.build.bioinfo_analysis.
extract_valid_analysis
(analysis_dict: dict, analysis_type: str, valid_analysis: list)[source]¶ Extracts analysis dictionary based on input analysis_dict. This function will remove analysis json that are not part of the matching model. analysis_type is a single key matching ANALYSIS_SETS’s first level keys.
- Input:
- analysis_dict: A dictionary of bioinfo analysis stats. analysis_type: A string of analysis type. This is provided by user. valid_analysis: A list of analysis to be extracted from analysis dict.
- Output:
- analysis: A dictionary of valid_analysis as keys extracted from analysis_dict
-
vogue.build.bioinfo_analysis.
get_common_keys
(valid_analysis: list, analysis_type: str)[source]¶ Match a list of values with keys from a MODEL dictionary
input: valid_analysis as list output: analysis_common_keys as list
-
vogue.build.bioinfo_analysis.
update_mongo_doc_case
(mongo_doc: dict, analysis_dict: dict, new_analysis: dict)[source]¶ Parameters: - mongo_doc – an existing analysis retrieved from MongoDB
- analysis_dict – a dictionary parsed from CLI
- new_analysis – new analysis dictionary to be loaded to MongoDB
Returns: an updated mongo_doc from Args
Return type: mongo_doc
Add or update mongo document for case data Adds or updates within processed or raw bioinfo collection
vogue.commands package¶
cli for handling bioinfo collections. Addition and update!
Functionality to add or update to processed bioinfo collection
Add or update bioinfo results to bioinfo raw collection
Add or update analysis results for samples from bioinfo_processed into bioinf_sample collection
Module with CLI commands for vogue The CLI is intended for development/testing purpose only. To run in a production setting please refer to documentation for suggestions how.
vogue.constants package¶
vogue.load package¶
Will go through all application tags in json_list and add/update them to trending-db.
Parameters: - adapter (adapter.VogueAdapter) –
- json_list (list(dict)) – [{‘tag’:’MELPCFR030’, ‘category’:’wgs’,…},…]
-
vogue.load.flowcell.
load_all
(adapter, lims)[source]¶ Function to load all lims flowcell into the database
-
vogue.load.reagent_label.
load_all
(adapter, lims)[source]¶ Function to load reagent_labels from all lims flowcells into the database
-
vogue.load.sample.
load_all
(adapter, lims, start_sample=None)[source]¶ Function to load all lims samples into the database
vogue.parse package¶
-
vogue.parse.build.flowcell.
filter_none
(mongo_dict)[source]¶ Function to filter out Nones and NaN from a dict.
-
vogue.parse.build.flowcell.
run_data
(run)[source]¶ Function to get run info from lanes in a lims sequecing process. Reformates the data to be part of a document in the flowcell database.
Parameters: run (Process) – lims Process instance of sequencing type Returns: - run info per lane.
- eq: {‘Lane 1’: {‘% Aligned R2’: 0.94, ‘% Bases >=Q30 R1’: 90.67, ‘% Bases >=Q30 R2’: 88.84,…},
- ’Lane 2’: {‘% Aligned R2’: 0.92, ‘% Bases >=Q30 R1’: 91.67, ‘% Bases >=Q30 R2’: 83.84,…}}
- avg_data (dict): average run info over all lanes.
- eg: {‘% Phasing R2’: 0.09, ‘% Bases >=Q30’: 89.755, …}
Return type: lane_data (dict)
-
vogue.parse.build.reagent_label.
filter_none
(mongo_dict)[source]¶ Function to filter out Nones and NaN from a dict.
-
vogue.parse.build.reagent_label.
get_define_step_data
(pool)[source]¶ Search the artifact history for the define steps. Input:
pool: lims artifact - Input to bcl stepReturns: - dict
- keys: sample ids value: target reads (udf ‘Reads to sequence (M)’)
- define_step: lims process
- the define step
- flowcell_target_reads: int
- the summe of the udf ‘Reads to sequence (M)’ from all outarts in the step
Return type: define_step_outputs
-
vogue.parse.build.reagent_label.
reagent_label_data
(bcl_step)[source]¶ This function takes as input a bcl conversion and demultiplexing step. From that step it goes back in artifact history to the prevoius Define step. Both step types exist in the Nova Seq workflow. From the output artifacts of the bcl step, index_total_reads is calculated:
index_total_reads: the sum of ‘# Reads’ from all artifact with a specific index flowcell_total_reads: the sum of ‘# Reads’ from all output artifacts- From the output artifacts of the define step, index_target_reads and flowcell_target_reads are fetched:
- index_target_reads: fetched from the ‘Reads to sequence (M)’ udf of the output artifact with a specific index flowcell_target_reads: the sum of the ‘Reads to sequence (M)’ udf of all the output artifacts
-
vogue.parse.build.sample.
datetime2date
(date: datetime.datetime) → None.datetime.date[source]¶ Convert datetime.datetime to datetime.date
-
vogue.parse.build.sample.
get_concentration_and_nr_defrosts
(application_tag: str, lims_id: str, lims: genologics.lims.Lims) → dict[source]¶ Get concentration and nr of defrosts for wgs illumina PCR-free samples.
Find the latest artifact that passed through a concentration_step and get its concentration_udf. –> concentration Go back in history to the latest lot_nr_step and get the lot_nr_udf from that step. –> lotnr Find all steps where the lot_nr was used. –> all_defrosts Pick out those steps that were performed before our lot_nr_step –> defrosts_before_this_process Count defrosts_before_this_process. –> nr_defrosts
-
vogue.parse.build.sample.
get_final_conc_and_amount_dna
(application_tag: str, lims_id: str, lims: genologics.lims.Lims) → dict[source]¶ Find the latest artifact that passed through a concentration_step and get its concentration. Then go back in history to the latest amount_step and get the amount.
-
vogue.parse.build.sample.
get_latest_input_artifact
(process_type: str, lims_id: str, lims: genologics.lims.Lims) → genologics.entities.Artifact[source]¶ Returns the input artifact related to lims_id and the step that was latest run.
-
vogue.parse.build.sample.
get_library_size
(sample_id: str, lims: genologics.lims.Lims, size_steps: List[str], workflow: str) → int[source]¶ Getting the udf Size (bp) that in fact is set on the aggregate qc librar validation step.
-
vogue.parse.build.sample.
get_microbial_library_concentration
(application_tag: str, lims_id: str, lims: genologics.lims.Lims) → float[source]¶ Check only samples with mictobial application tag. Get concentration_udf from concentration_step.
-
vogue.parse.build.sample.
get_number_of_days
(first_date: datetime.datetime, second_date: datetime.datetime) → int[source]¶ Get number of days between different time stamps.
-
class
vogue.parse.build.sample_analysis.
Mip_dna
(case)[source]¶ Bases:
object
Class to prepare mip case_analysis results for mip_dna results in the sample_analysis collection
-
vogue.parse.build.sample_analysis.
get_latest_analysis
(case, analysis_type)[source]¶ Get the latest analysis of anaöysis_type from one case
-
vogue.parse.load.bioinfo_analysis.
inspect_analysis_result
(analysis_dict: dict)[source]¶ Takes input analysis_dict dictionary and validates entries.
Checks for there are at least two keys in analysis_dict dictionary. If there is less than two, or the key doesn’t exist, disqualifies the file and returns False
vogue.server package¶
vogue.tools package¶
-
vogue.tools.cli_utils.
add_doc
(docstring)[source]¶ A decorator for adding docstring. Taken shamelessly from stackexchange.
-
vogue.tools.cli_utils.
concat_dict_keys
(my_dict: dict, key_name='', out_key_list=['multiqc:multiqc_picard_dups, multiqc_picard_HsMetrics, multiqc_picard_AlignmentSummaryMetrics, multiqc_picard_insertSize', 'microsalt:blast_pubmlst, quast_assembly, blast_resfinder_resistence, picard_markduplicate, microsalt_samtools_stats', 'multiqc:multiqc_picard_dups, multiqc_picard_HsMetrics, multiqc_picard_AlignmentSummaryMetrics, multiqc_picard_insertSize', 'microsalt:blast_pubmlst, quast_assembly, blast_resfinder_resistence, picard_markduplicate, microsalt_samtools_stats'])[source]¶ Recursively create a list of key:key1,key2 from a nested dictionary
-
vogue.tools.cli_utils.
convert_defaultdict_to_regular_dict
(inputdict: dict)[source]¶ Recursively convert defaultdict to dict.
-
vogue.tools.cli_utils.
dict_replace_dot
(obj)[source]¶ recursively replace all dots in json.load keys.
Submodules¶
vogue.exceptions module¶
-
exception
vogue.exceptions.
VogueRestError
(message: str, code: Optional[int] = None)[source]¶ Bases:
vogue.exceptions.VogueError
Module contents¶
Build Doc¶
If you’d like to create Sphinx documentation locally, follow the steps explained below locally. Tested on Conda 4.6.X
- Create a conda environment:
conda create -n vogue_doc -c bioconda -c conda-forge python=3.6 pip
conda activate vogue_doc
- Install Sphinx and extensions:
cd docs
pip install -r requirements.txt -r ../requirements-dev.txt -r ../requirements.txt
- Build docs:
sphinx-apidoc -o source/ ../vogue
sphinx-build -T -E -b html -d _build/doctrees-readthedocs -D language=en . _build/html
- View docs (
open
or similar command from your OS):
open _build/html/index.html