The pyranges1 module
The pyranges1 module exposes the class PyRanges (omitted in this page)
as well as a number of functions for reading data from commonly used file formats.
You also have an pyranges1.options interface to
configure how PyRanges objects are represented, and an pyranges1.example_data object used in tests and documentation.
- pyranges1.concat(grs: Iterable, *args, **kwargs) T
Concatenate PyRanges.
- Parameters:
grs (iterable of PyRanges) – PyRanges to concatenate.
args – Arguments passed to pandas.concat.
kwargs – Keyword arguments passed to pandas.concat.
- Return type:
pyranges.PyRanges
Examples
>>> import pyranges1 as pr >>> gr1 = pr.example_data.f2 >>> gr2 = pr.example_data.f1 >>> pr.concat([gr1, gr2]) index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- --------- ------- ---------- 0 | chr1 1 2 a 0 + 1 | chr1 6 7 b 0 - 0 | chr1 3 6 interval1 0 + 1 | chr1 5 7 interval2 0 - 2 | chr1 8 9 interval3 0 + PyRanges with 5 rows, 6 columns, and 1 index columns (with 2 index duplicates). Contains 1 chromosomes and 2 strands.
>>> pr.concat([gr1, gr2.remove_strand()]) index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- --------- ------- ---------- 0 | chr1 1 2 a 0 + 1 | chr1 6 7 b 0 - 0 | chr1 3 6 interval1 0 nan 1 | chr1 5 7 interval2 0 nan 2 | chr1 8 9 interval3 0 nan PyRanges with 5 rows, 6 columns, and 1 index columns (with 2 index duplicates). Contains 1 chromosomes and 2 strands (including non-genomic strands: nan).
>>> r = pr.RangeFrame(gr1) >>> pr.concat([r, gr1]) Traceback (most recent call last): ... ValueError: Can only concatenate RangeFrames of the same type. Got: PyRanges, RangeFrame
>>> pd.testing.assert_frame_equal(pr.concat([r]), r) # would throw if they were not equal
- pyranges1.count_overlaps(grs: dict[str, PyRanges], features: PyRanges | None = None, strand_behavior: Literal['auto', 'same', 'opposite', 'ignore'] = 'auto', by: list[str] | None = None) PyRanges
Count overlaps in multiple pyranges.
- Parameters:
grs (dict of PyRanges) – The PyRanges to use as queries.
features (PyRanges, default None) – The PyRanges to use as subject in the query. If None, the PyRanges themselves are used as a query.
strand_behavior ({None, "same", "opposite", False}, default None, i.e. auto) –
- Whether to compare PyRanges on the same strand, the opposite or ignore strand
information. The default, None, means use “same” if both PyRanges are stranded, otherwise ignore the strand information.
- how{None, “all”, “containment”, “first”}, default None, i.e. all
What intervals to report. By default reports all overlapping intervals. “containment” reports intervals where the overlapping is contained within it.
by (list of str, default None) – Columns to group by.
Examples
>>> import pyranges1 as pr >>> a = '''Chromosome Start End ... chr1 6 12 ... chr1 10 20 ... chr1 22 27 ... chr1 24 30'''
>>> b = '''Chromosome Start End ... chr1 12 32 ... chr1 14 30'''
>>> c = '''Chromosome Start End ... chr1 8 15 ... chr1 10 14 ... chr1 32 34'''
>>> grs = {n: pr.from_string(s) for n, s in zip(["a", "b", "c"], [a, b, c])} >>> for k, v in grs.items(): ... print("Name: " + k) ... print(v) Name: a index | Chromosome Start End int64 | str int64 int64 ------- --- ------------ ------- ------- 0 | chr1 6 12 1 | chr1 10 20 2 | chr1 22 27 3 | chr1 24 30 PyRanges with 4 rows, 3 columns, and 1 index columns. Contains 1 chromosomes. Name: b index | Chromosome Start End int64 | str int64 int64 ------- --- ------------ ------- ------- 0 | chr1 12 32 1 | chr1 14 30 PyRanges with 2 rows, 3 columns, and 1 index columns. Contains 1 chromosomes. Name: c index | Chromosome Start End int64 | str int64 int64 ------- --- ------------ ------- ------- 0 | chr1 8 15 1 | chr1 10 14 2 | chr1 32 34 PyRanges with 3 rows, 3 columns, and 1 index columns. Contains 1 chromosomes.
>>> pr.count_overlaps(grs) index | Chromosome Start End a b c int64 | str int64 int64 int64 int64 int64 ------- --- ------------ ------- ------- ------- ------- ------- 0 | chr1 6 8 1 0 0 1 | chr1 8 10 1 0 1 2 | chr1 10 12 2 0 2 3 | chr1 12 14 1 1 2 ... | ... ... ... ... ... ... 8 | chr1 24 27 2 2 0 9 | chr1 27 30 1 2 0 10 | chr1 30 32 0 1 0 11 | chr1 32 34 0 0 1 PyRanges with 12 rows, 6 columns, and 1 index columns. Contains 1 chromosomes.
>>> gr = pr.PyRanges({"Chromosome": ["chr1"] * 2, "Start": [0, 25], "End": [40, 35]}).tile_ranges(10) >>> gr index | Chromosome Start End int64 | str int64 int64 ------- --- ------------ ------- ------- 0 | chr1 0 10 0 | chr1 10 20 0 | chr1 20 30 0 | chr1 30 40 1 | chr1 20 30 1 | chr1 30 40 PyRanges with 6 rows, 3 columns, and 1 index columns (with 4 index duplicates). Contains 1 chromosomes.
>>> pr.count_overlaps(grs, gr) index | Chromosome Start End a b c int64 | str int64 int64 int64 int64 int64 ------- --- ------------ ------- ------- ------- ------- ------- 0 | chr1 0 10 1 0 1 0 | chr1 10 20 2 2 2 0 | chr1 20 30 2 2 0 0 | chr1 30 40 0 1 1 1 | chr1 20 30 2 2 0 1 | chr1 30 40 0 1 1 PyRanges with 6 rows, 6 columns, and 1 index columns (with 4 index duplicates). Contains 1 chromosomes.
- pyranges1.from_string(s: str) PyRanges
Create a PyRanges from multiline string.
- Parameters:
s (str) – String with data.
Examples
>>> import pyranges1 as pr >>> s = '''Chromosome Start End Strand ... chr1 246719402 246719502 + ... chr5 15400908 15401008 + ... chr9 68366534 68366634 + ... chr14 79220091 79220191 + ... chr14 103456471 103456571 -'''
>>> pr.from_string(s) index | Chromosome Start End Strand int64 | str int64 int64 str ------- --- ------------ --------- --------- -------- 0 | chr1 246719402 246719502 + 1 | chr5 15400908 15401008 + 2 | chr9 68366534 68366634 + 3 | chr14 79220091 79220191 + 4 | chr14 103456471 103456571 - PyRanges with 5 rows, 4 columns, and 1 index columns. Contains 4 chromosomes and 2 strands.
- pyranges1.random(n: int = 1000, length: int = 100, chromsizes: dict[str, int] | DataFrame | None = None, seed: int | None = None, *, strand: bool = True) pr.PyRanges
Return PyRanges with random intervals.
- Parameters:
n (int, default 1000) – Number of intervals.
length (int, default 100) – Length of intervals.
chromsizes (dict or DataFrame, default None, i.e. use "hg19") – Draw intervals from within these bounds.
strand (bool, default True) – Data should have strand.
seed (int, default None) – Seed for random number generator.
Examples
>>> import pyranges1 as pr >>> pr.random(seed=12345) index | Chromosome Start End Strand int64 | str int64 int64 str ------- --- ------------ --------- --------- -------- 0 | chr4 36129012 36129112 + 1 | chr5 177668498 177668598 - 2 | chr15 1902279 1902379 + 3 | chr11 23816613 23816713 + ... | ... ... ... ... 996 | chr2 155410960 155411060 - 997 | chr6 80054552 80054652 + 998 | chrX 66474125 66474225 + 999 | chr7 69941721 69941821 - PyRanges with 1000 rows, 4 columns, and 1 index columns. Contains 24 chromosomes and 2 strands.
- pyranges1.read_bam(f: str | Path, /, mapq: int = 0, required_flag: int = 0, filter_flag: int = 1540, *, sparse: bool = True) PyRanges
Return bam file as PyRanges.
- Parameters:
f (str) – Path to bam file
sparse (bool, default True) – Whether to return only the columns Chromosome, Start, End, Strand, Flag. Set to False to return also columns QueryStart, QueryEnd, QuerySequence, Name, Cigar, Quality (more time consuming).
mapq (int, default 0) – Minimum mapping quality score.
required_flag (int, default 0) – Flags which must be present for the interval to be read.
filter_flag (int, default 1540) – Ignore reads with these flags. Default 1540, which means that either the read is unmapped, the read failed vendor or platfrom quality checks, or the read is a PCR or optical duplicate.
- Return type:
Notes
This functionality requires the library bamread. It can be installed with pip install bamread or conda install -c bioconda bamread.
Examples
>>> import pyranges1 as pr >>> path = pr.example_data.files["smaller.bam"] >>> pr.read_bam(path) index | Chromosome Start End Strand Flag int64 | category int64 int64 category uint16 ------- --- ------------ -------- -------- ---------- -------- 0 | chr1 887771 887796 - 16 1 | chr1 994660 994685 - 16 2 | chr1 1041102 1041127 + 0 3 | chr1 1770383 1770408 - 16 ... | ... ... ... ... ... 96 | chr1 18800901 18800926 + 0 97 | chr1 18800901 18800926 + 0 98 | chr1 18855123 18855148 - 16 99 | chr1 19373470 19373495 + 0 PyRanges with 100 rows, 5 columns, and 1 index columns. Contains 1 chromosomes and 2 strands.
- pyranges1.read_bed(f: Path, /, nrows: int | None = None) PyRanges
Return bed file as PyRanges.
This is a reader for files that follow the bed format. They can have from 3-12 columns which will be named like so:
Chromosome Start End Name Score Strand ThickStart ThickEnd ItemRGB BlockCount BlockSizes BlockStarts
- Parameters:
f (str) – Path to bed file
nrows (Optional int, default None) – Number of rows to return.
Notes
If you just want to create a PyRanges from a tab-delimited bed-like file, use pr.PyRanges(pandas.read_table(f)) instead.
- Return type:
Examples
>>> import pyranges1 as pr >>> path = pr.example_data.files["aorta.bed"] >>> pr.read_bed(path, nrows=5) index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- -------- ------- ---------- 0 | chr1 9916 10115 H3K27me3 5 - 1 | chr1 9939 10138 H3K27me3 7 + 2 | chr1 9951 10150 H3K27me3 8 - 3 | chr1 9953 10152 H3K27me3 5 + 4 | chr1 9978 10177 H3K27me3 7 - PyRanges with 5 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 2 strands.
- pyranges1.read_bigwig(f: str | Path) PyRanges
Read bigwig files into a PyRanges.
- Parameters:
f (str) – Path to bw file.
- Return type:
Note
This function requires the library pyBigWig, it can be installed with pip install pyBigWig
Examples
>>> import pyranges1 as pr >>> path = pr.example_data.files["bigwig.bw"] >>> pr.read_bigwig(path) index | Chromosome Start End Value int64 | str int64 int64 float64 ------- --- ------------ ------- ------- --------- 0 | 1 0 1 0.1 1 | 1 1 2 0.2 2 | 1 2 3 0.3 3 | 1 100 150 1.4 4 | 1 150 151 1.5 5 | 10 200 300 2 PyRanges with 6 rows, 4 columns, and 1 index columns. Contains 2 chromosomes.
- pyranges1.read_gff(f: str | Path, /, *, nrows: int | None = None, duplicate_attr: bool = False, ignore_bad: bool = False) PyRanges
Read files in the Gene Transfer Format.
- Parameters:
f (str) – Path to GTF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.
duplicate_attr (bool, default False) – Whether to handle (potential) duplicate attributes or just keep last one.
ignore_bad (bool, default False) – Whether to ignore bad lines or raise an error.
- Return type:
Note
The GTF format encodes both Start and End as 1-based included. PyRanges encodes intervals as 0-based, Start included and End excluded.
See also
pyranges1.read_gff3read files in the General Feature Format
Examples
>>> import pyranges1 as pr >>> from tempfile import NamedTemporaryFile >>> contents = ['#!genome-build GRCh38.p10'] >>> contents.append('1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";') >>> contents.append('1\thavana\ttranscript\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; group_by "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";') >>> f = NamedTemporaryFile("w") >>> _bytes_written = f.write("\n".join(contents)) >>> f.flush() >>> pr.read_gtf(f.name) index | Chromosome Source Feature Start End Score Strand Frame gene_id ... int64 | category category category int64 int64 str category category str ... ------- --- ------------ ---------- ---------- ------- ------- ------- ---------- ---------- --------------- ----- 0 | 1 havana gene 11868 14409 . + . ENSG00000223972 ... 1 | 1 havana transcript 11868 14409 . + . ENSG00000223972 ... PyRanges with 2 rows, 20 columns, and 1 index columns. (11 columns not shown: "gene_version", "gene_name", "gene_source", ...). Contains 1 chromosomes and 1 strands.
- pyranges1.read_gff3(f: str | Path, nrows: int | None = None) PyRanges
Read files in the General Feature Format into a PyRanges.
- Parameters:
f (str) – Path to GFF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.
- Return type:
Notes
The gff3 format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded.
See also
pyranges1.read_gtfread files in the Gene Transfer Format
- pyranges1.read_gtf(f: str | Path, /, *, nrows: int | None = None, duplicate_attr: bool = False, ignore_bad: bool = False) PyRanges
Read files in the Gene Transfer Format.
- Parameters:
f (str) – Path to GTF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.
duplicate_attr (bool, default False) – Whether to handle (potential) duplicate attributes or just keep last one.
ignore_bad (bool, default False) – Whether to ignore bad lines or raise an error.
- Return type:
Note
The GTF format encodes both Start and End as 1-based included. PyRanges encodes intervals as 0-based, Start included and End excluded.
See also
pyranges1.read_gff3read files in the General Feature Format
Examples
>>> import pyranges1 as pr >>> from tempfile import NamedTemporaryFile >>> contents = ['#!genome-build GRCh38.p10'] >>> contents.append('1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";') >>> contents.append('1\thavana\ttranscript\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; group_by "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";') >>> f = NamedTemporaryFile("w") >>> _bytes_written = f.write("\n".join(contents)) >>> f.flush() >>> pr.read_gtf(f.name) index | Chromosome Source Feature Start End Score Strand Frame gene_id ... int64 | category category category int64 int64 str category category str ... ------- --- ------------ ---------- ---------- ------- ------- ------- ---------- ---------- --------------- ----- 0 | 1 havana gene 11868 14409 . + . ENSG00000223972 ... 1 | 1 havana transcript 11868 14409 . + . ENSG00000223972 ... PyRanges with 2 rows, 20 columns, and 1 index columns. (11 columns not shown: "gene_version", "gene_name", "gene_source", ...). Contains 1 chromosomes and 1 strands.
- pyranges1.tile_genome(chromsizes: PyRanges | pd.DataFrame | dict[str | int, int], tile_size: int, *, full_last_tile: bool = False) PyRanges
Create a tiled genome.
Split the genome into adjacent non-overlapping tiles of a given size.
- Parameters:
chromsizes (dict or PyRanges) – Dict or PyRanges describing the lengths of the chromosomes.
tile_size (int) – Length of the tiles.
full_last_tile (bool, default False) – Do not truncate the last tile to the end of the chromosome. Use to ensure size is consistent for all tiles.
See also
pyranges.PyRanges.tile_rangessplit intervals into adjacent non-overlapping tiles.
Examples
>>> import pyranges1 as pr >>> chromsizes = pr.example_data.chromsizes >>> chromsizes index | Chromosome Start End int64 | category int64 int64 ------- --- ------------ ------- --------- 0 | chr1 0 249250621 1 | chr2 0 243199373 2 | chr3 0 198022430 3 | chr4 0 191154276 ... | ... ... ... 21 | chr19 0 59128983 22 | chr22 0 51304566 23 | chr21 0 48129895 24 | chrM 0 16571 PyRanges with 25 rows, 3 columns, and 1 index columns. Contains 25 chromosomes.
>>> pr.tile_genome(chromsizes,int(1e6)) index | Chromosome Start End int64 | category int64 int64 ------- --- ------------ -------- -------- 0 | chr1 0 1000000 1 | chr1 1000000 2000000 2 | chr1 2000000 3000000 3 | chr1 3000000 4000000 ... | ... ... ... 3110 | chrY 56000000 57000000 3111 | chrY 57000000 58000000 3112 | chrY 58000000 59000000 3113 | chrY 59000000 59373566 PyRanges with 3114 rows, 3 columns, and 1 index columns. Contains 25 chromosomes.
>>> pr.tile_genome(chromsizes,int(1e6), full_last_tile=True) index | Chromosome Start End int64 | category int64 int64 ------- --- ------------ -------- -------- 0 | chr1 0 1000000 1 | chr1 1000000 2000000 2 | chr1 2000000 3000000 3 | chr1 3000000 4000000 ... | ... ... ... 3110 | chrY 56000000 57000000 3111 | chrY 57000000 58000000 3112 | chrY 58000000 59000000 3113 | chrY 59000000 60000000 PyRanges with 3114 rows, 3 columns, and 1 index columns. Contains 25 chromosomes.
pyranges1.options
The pyranges1.options object is used to configure aspects of how PyRanges is represented.
Below are the methods available on this object.
- class pyranges1.core.options.PyRangesOptions
Bases:
object- display_options() str
Return a representation of the current options and their values.
Examples
# In the below, the console width has been set to 120 so that the doctests will return the same result no matter # the console width. >>> import pyranges1 as pr >>> print(pr.options.display_options()) max_rows_to_show : 8 (the max number of rows to show in PyRanges repr) max_column_names_to_show : 3 (how many columns listed in PyRanges repr when not all fit the screen width) console_width : 120 (console width, affecting PyRanges representation (None for auto)) html_max_cols : 20 (max number of columns to show as HTML (e.g. Jupyter), others are hidden) html_max_rows : None (max n. of rows shown as HTML (e.g. Jupyter). If undefined, max_rows_to_show is used)
- get_option(name: str) int
Get the value of an option.
- Parameters:
name (str) – The name of the option to get.
- Returns:
The value of the option.
- Return type:
int
Examples
>>> import pyranges1 as pr >>> pr.options.get_option("max_rows_to_show") 8
- reset_options() None
Reset all options to their default values.
Examples
>>> import pyranges1 as pr >>> pr.options.get_option('max_rows_to_show') 8
>>> pr.options.set_option('max_rows_to_show', 10) >>> pr.options.get_option('max_rows_to_show') 10
>>> pr.options.set_option('console_width', 120) >>> pr.options.get_option('console_width') 120
>>> pr.options.reset_options() >>> pr.options.get_option('max_rows_to_show') 8
- set_option(name: str, value: int) None
Set an option to a new value.
Set one or more options.
Run pyranges.options.display_options() to see available options and their current values.
- Parameters:
name (str) – The name of the option to set.
value (int) – The value to set the option to.
Examples
>>> import pyranges1 as pr >>> pr.options.set_option('max_rows_to_show', 8)
pyranges1.example_data
The pyranges1.example_data object contains example data used in tests and documentation.
Printing it shows an overview of available data:
>>> import pyranges1 as pr
>>> pr.example_data
Available example data:
-----------------------
example_data.chipseq : Example ChIP-seq data.
example_data.chipseq_background : Example ChIP-seq data.
example_data.chromsizes : Example chromsizes data (hg19).
example_data.ensembl_gtf : Example gtf file from Ensembl.
example_data.f1 : Example bed file.
example_data.f2 : Example bed file.
example_data.aorta : Example ChIP-seq data.
example_data.aorta2 : Example ChIP-seq data.
example_data.ncbi_gff : Example NCBI GFF data.
example_data.ncbi_fasta : Example NCBI fasta.
example_data.files : A dict of basenames to file paths of available data.
Most of the data is in the form of PyRanges objects:
>>> pr.example_data.chipseq
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ --------- --------- -------- ------- ----------
0 | chr8 28510032 28510057 U0 0 -
1 | chr7 107153363 107153388 U0 0 -
2 | chr5 135821802 135821827 U0 0 -
3 | chr14 19418999 19419024 U0 0 -
... | ... ... ... ... ... ...
16 | chr9 120803448 120803473 U0 0 +
17 | chr6 89296757 89296782 U0 0 -
18 | chr1 194245558 194245583 U0 0 +
19 | chr8 57916061 57916086 U0 0 +
PyRanges with 20 rows, 6 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
pyranges1.assistant
The pyranges1.assistant object is a helper to use AI-based assistant to code with pyranges1:
>>> pr.assistant Utilities to instruct a AI coding assistant for pyranges1 prompts.
- Get a prompt to copy-paste into an AI assistant to prime it for pyranges1 coding tasks:
>>> import pyranges1 as pr >>> pr.assistant.prompt()- Make a file with pyranges1 documentation to upload to the AI assistant:
>>> pr.assistant.export_docs("pr_docs.txt")