NAME
pheno-ranker: A script that performs semantic similarity in PXF/BFF data structures and beyond (JSON|YAML)
SYNOPSIS
pheno-ranker -r <individuals.json> -t <patient.json> [-options]
Arguments:
* Cohort mode:
-r, --reference <file> JSON/YAML BFF/PXF file(s) (array/object), supports .gz
* Patient mode:
-t, --target <file> JSON/YAML BFF/PXF file (object or single-object array), supports .gz
Options:
-age Include age-related variables; excludes agent-like terms (BFF/PXF-only) [>no-age|age]
-a, --align [path/basename] Write alignment file(s). If not specified, default filenames are used [default: alignment.*]
-append-prefixes <prefixes> Prefixes for primary_key when #cohorts >= 2 [default: C]
-config <file> YAML config file to modify default parameters [default: share/conf/config.yaml]
-cytoscape-json [file] Writes an undirected graph in Cytoscape-compatible JSON [default: graph.json]
-e, --export [path/basename] Export miscellaneous JSON files. If not specified, default filenames are used [default: export.*]
-exclude-terms <terms> Exclude BFF/PXF terms (e.g., --exclude-terms sex id) or column names in JSON-derived from CSV
-graph-stats [file] Generates a text file with key graph metrics, for use with <-cytoscape-json> [default: graph_stats.txt]
-graph-min-weight <number> Keep graph edges with weight greater than or equal to this value
-graph-max-weight <number> Keep graph edges with weight less than or equal to this value
-include-hpo-ascendants Include ascendant terms from the Human Phenotype Ontology (HPO)
-include-terms <terms> Include BFF/PXF terms (e.g., --include-terms diseases) or column names in JSON-derived from CSV
-max-matrix-records-in-ram <number> In cohort mode, set max records before switching to RAM-efficient mode (default: 5000)
-matrix-format <format> Matrix output format in cohort mode [>dense|mtx]
-max-number-vars <number> Maximum number of variables for binary string [default: 10000]
-max-out <number> Print only N comparisons [default: 50]
-o, --out-file <file> Output file path [default: -r matrix.txt | -t rank.txt]
-poi, --patients-of-interest <id_list> Export JSON files for the selected individual IDs during a dry-run
-poi-out-dir <directory> Directory for JSON files (used with --poi)
-prp, --precomputed-ref-prefix [path/basename] Use precomputed data for the reference cohort(s). No need to use --r
-retain-excluded-phenotypicFeatures Retains features set to "excluded": true by appending '_excluded' to their IDs
-similarity-metric-cohort <metric> Similarity metric for cohort mode [>hamming|jaccard]
-sort-by <metric> Sort by Hamming distance or Jaccard index [>hamming|jaccard]
-w, --weights <file> YAML file with weights
Generic Options:
-debug <level> Print debugging (from 1 to 5, being 5 max)
-h, --help Brief help message
-log Save log file [default: pheno-ranker-log.json]
-man Full documentation
-no-color Toggle color output [>color|no-color]
-v, --verbose Verbosity on
-V, --version Print version
SUMMARY
Pheno-Ranker is a lightweight, easy-to-install tool for performing semantic similarity analysis on phenotypic data in JSON/YAML formats, including Beacon v2 Models and Phenopackets v2. It also supports pre-processed CSV files prepared using the included csv2pheno-ranker utility.
INSTALLATION
If you plan to only use pheno-ranker CLI, we recommend installing it via CPAN. See details below.
Non containerized
The Perl command-line interface is tested on Linux, macOS, and Windows via GitHub Actions. The commands below focus on Debian-based Linux systems, where Perl 5 is typically available by default and extra CPAN modules can be installed with cpanminus. On Windows, use Docker, WSL, or a Perl environment such as Strawberry Perl.
Method 1: From CPAN
First install system level dependencies:
sudo apt-get install cpanminus libperl-dev gcc make
We will install Pheno-Ranker and the dependencies at ~/perl5
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
cpanm --notest Pheno::Ranker
pheno-ranker --help
To ensure Perl recognizes your local modules every time you start a new terminal, you should type:
echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc
To update to the newest version:
cpanm Pheno::Ranker
Method 2: From CPAN in a CONDA environment
Please follow these instructions.
Method 3: From GitHub
To clone the repository for the first time:
git clone https://github.com/cnag-biomedical-informatics/pheno-ranker.git
cd pheno-ranker
To update an existing clone, navigate to the repository folder and run:
git pull
Install system level dependencies:
sudo apt-get install cpanminus libperl-dev
Now you have to choose between one of the 2 options below:
Option 1: Install dependencies (they're harmless to your system) as sudo:
cpanm --notest --sudo --installdeps .
bin/pheno-ranker --help
Option 2: Install the dependencies at ~/perl5:
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
cpanm --notest --installdeps .
bin/pheno-ranker --help
To ensure Perl recognizes your local modules every time you start a new terminal, you should type:
echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc
Optional: If you want to use utils/barcode or utils/bff_pxf_plot:
sudo apt-get install python3-pip libzbar0
pip3 install -r requirements.txt
Containerized
Method 4: From Docker Hub
(Estimated Time: Approximately 10 seconds)
Download the latest version of the Docker image (supports both amd64 and arm64 architectures) from Docker Hub by executing:
docker pull manuelrueda/pheno-ranker:latest
docker image tag manuelrueda/pheno-ranker:latest cnag/pheno-ranker:latest
See additional instructions below.
Method 5: With Dockerfile
(Estimated Time: Approximately 1 minute)
Please download the Dockerfile from the repo:
wget https://raw.githubusercontent.com/cnag-biomedical-informatics/pheno-ranker/main/Dockerfile
And then run:
# Docker Version 19.03 and Above (Supports buildx)
docker buildx build -t cnag/pheno-ranker:latest .
# Docker Version Older than 19.03 (Does Not Support buildx)
docker build -t cnag/pheno-ranker:latest .
Additional instructions for Methods 4 and 5
To run the container (detached) execute:
docker run -tid -e USERNAME=root --name pheno-ranker cnag/pheno-ranker:latest
To enter:
docker exec -ti pheno-ranker bash
The command-line executable can be found at:
/usr/share/pheno-ranker/bin/pheno-ranker
The default container user is root but you can also run the container as $UID=1000 (dockeruser).
docker run --user 1000 -tid --name pheno-ranker cnag/pheno-ranker:latest
Mounting volumes
Docker containers are fully isolated. If you need the mount a volume to the container please use the following syntax (-v host:container). Find an example below (note that you need to change the paths to match yours):
docker run -tid --volume /media/mrueda/4TBT/data:/data --name pheno-ranker-mount cnag/pheno-ranker:latest
Then I will do something like this:
# First I create an alias to simplify invocation (from the host)
alias pheno-ranker='docker exec -ti pheno-ranker-mount /usr/share/pheno-ranker/bin/pheno-ranker'
# Now I use the alias to run the command (note that I use the flag --o to specify the filepath)
pheno-ranker -r /data/individuals.json -o /data/matrix.txt
System requirements
- OS/ARCH supported: B<linux/amd64> and B<linux/arm64>.
- Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
The Perl CLI is also tested on macOS and Windows; container images are Linux-based.
* Perl 5 (>= 5.26 core; installed by default in most Linux distributions). Check the version with "perl -v".
* >= 4GB of RAM
* 1 core
* At least 16GB HDD
HOW TO RUN PHENO-RANKER
For executing pheno-ranker you will need a PXF/BFF file(s) in JSON|YAML format. The reference cohort must be a JSON array, where each individual data are consolidated in one object.
You can download examples from this location.
There are two modes of operation:
- Cohort mode:
-
Intra-cohort: With
--rargument and 1 cohort.Inter-cohort: With
--rand multiple cohort files. It can be used in combination with--append-prefixesto add prefixes to each individual id. - Patient Mode:
-
With
-rreference cohort(s) and--tpatient data.
Examples:
$ bin/pheno-ranker -r phenopackets.json # intra-cohort
$ bin/pheno-ranker -r phenopackets.yaml -o my_matrix.txt # intra-cohort
$ bin/pheno-ranker -r phenopackets.json -w weights.yaml --exclude-terms sex ethnicity exposures # intra-cohort with weights
$ $path/pheno-ranker -r individuals.json others.yaml --append-prefixes CANCER CONTROL # inter-cohort
$ $path/pheno-ranker -r individuals.json -t patient.yaml -max-out 100 # mode patient
COMMON ERRORS AND SOLUTIONS
* Error message: R plotting
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have X elements
Calls: as.matrix -> read.table -> scan
Execution halted
Solution: Make sure that the values of your primary key (e.g., "id") do not contain spaces (e.g., "my fav id" must be "my_fav_id")
* Error message: Foo
Solution: Bar
CITATION
The author requests that any published work that utilizes Pheno-Ranker includes a cite to the following reference:
Leist, I.C. et al., (2024). Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. BMC Bioinformatics. DOI: 10.1186/s12859-024-05993-2
AUTHOR
Written by Manuel Rueda, PhD. Info about CNAG can be found at https://www.cnag.eu.
COPYRIGHT AND LICENSE
This PERL file is copyrighted. See the LICENSE file included in this distribution.