The pyranges1 module

The pyranges1 module exposes the class PyRanges (omitted in this page) as well as a number of functions for reading data from commonly used file formats. You also have an pyranges1.options interface to configure how PyRanges objects are represented, and an pyranges1.example_data object used in tests and documentation.

pyranges1.concat(grs: Iterable, *args, **kwargs) → T

Concatenate PyRanges.

Parameters:

grs (iterable of PyRanges) – PyRanges to concatenate.
args – Arguments passed to pandas.concat.
kwargs – Keyword arguments passed to pandas.concat.

Return type:

pyranges.PyRanges

Examples

>>> import pyranges1 as pr
>>> gr1 = pr.example_data.f2
>>> gr2 = pr.example_data.f1
>>> pr.concat([gr1, gr2])
  index  |    Chromosome      Start      End  Name         Score  Strand
  int64  |    category        int64    int64  str          int64  category
-------  ---  ------------  -------  -------  ---------  -------  ----------
      0  |    chr1                1        2  a                0  +
      1  |    chr1                6        7  b                0  -
      0  |    chr1                3        6  interval1        0  +
      1  |    chr1                5        7  interval2        0  -
      2  |    chr1                8        9  interval3        0  +
PyRanges with 5 rows, 6 columns, and 1 index columns (with 2 index duplicates).
Contains 1 chromosomes and 2 strands.

>>> pr.concat([gr1, gr2.remove_strand()])
  index  |    Chromosome      Start      End  Name         Score  Strand
  int64  |    category        int64    int64  str          int64  category
-------  ---  ------------  -------  -------  ---------  -------  ----------
      0  |    chr1                1        2  a                0  +
      1  |    chr1                6        7  b                0  -
      0  |    chr1                3        6  interval1        0  nan
      1  |    chr1                5        7  interval2        0  nan
      2  |    chr1                8        9  interval3        0  nan
PyRanges with 5 rows, 6 columns, and 1 index columns (with 2 index duplicates).
Contains 1 chromosomes and 2 strands (including non-genomic strands: nan).

>>> r = pr.RangeFrame(gr1)
>>> pr.concat([r, gr1])
Traceback (most recent call last):
...
ValueError: Can only concatenate RangeFrames of the same type. Got: PyRanges, RangeFrame

>>> pd.testing.assert_frame_equal(pr.concat([r]), r)  # would throw if they were not equal

pyranges1.count_overlaps(grs: dict[str, PyRanges], features: PyRanges | None = None, strand_behavior: Literal['auto', 'same', 'opposite', 'ignore'] = 'auto', by: list[str] | None = None) → PyRanges

Count overlaps in multiple pyranges.

Parameters:

grs (dict of PyRanges) – The PyRanges to use as queries.
features (PyRanges, default None) – The PyRanges to use as subject in the query. If None, the PyRanges themselves are used as a query.
strand_behavior ({None, "same", "opposite", False}, default None, i.e. auto) –

Whether to compare PyRanges on the same strand, the opposite or ignore strand
information. The default, None, means use “same” if both PyRanges are stranded, otherwise ignore the strand information.

how{None, “all”, “containment”, “first”}, default None, i.e. all
What intervals to report. By default reports all overlapping intervals. “containment” reports intervals where the overlapping is contained within it.
by (list of str, default None) – Columns to group by.

Examples

>>> import pyranges1 as pr
>>> a = '''Chromosome Start End
... chr1    6    12
... chr1    10    20
... chr1    22    27
... chr1    24    30'''

>>> b = '''Chromosome Start End
... chr1    12    32
... chr1    14    30'''

>>> c = '''Chromosome Start End
... chr1    8    15
... chr1    10    14
... chr1    32    34'''

>>> grs = {n: pr.from_string(s) for n, s in zip(["a", "b", "c"], [a, b, c])}
>>> for k, v in grs.items():
...     print("Name: " + k)
...     print(v)
Name: a
  index  |    Chromosome      Start      End
  int64  |    str             int64    int64
-------  ---  ------------  -------  -------
      0  |    chr1                6       12
      1  |    chr1               10       20
      2  |    chr1               22       27
      3  |    chr1               24       30
PyRanges with 4 rows, 3 columns, and 1 index columns.
Contains 1 chromosomes.
Name: b
  index  |    Chromosome      Start      End
  int64  |    str             int64    int64
-------  ---  ------------  -------  -------
      0  |    chr1               12       32
      1  |    chr1               14       30
PyRanges with 2 rows, 3 columns, and 1 index columns.
Contains 1 chromosomes.
Name: c
  index  |    Chromosome      Start      End
  int64  |    str             int64    int64
-------  ---  ------------  -------  -------
      0  |    chr1                8       15
      1  |    chr1               10       14
      2  |    chr1               32       34
PyRanges with 3 rows, 3 columns, and 1 index columns.
Contains 1 chromosomes.

>>> pr.count_overlaps(grs)
index    |    Chromosome    Start    End      a        b        c
int64    |    str           int64    int64    int64    int64    int64
-------  ---  ------------  -------  -------  -------  -------  -------
0        |    chr1          6        8        1        0        0
1        |    chr1          8        10       1        0        1
2        |    chr1          10       12       2        0        2
3        |    chr1          12       14       1        1        2
...      |    ...           ...      ...      ...      ...      ...
8        |    chr1          24       27       2        2        0
9        |    chr1          27       30       1        2        0
10       |    chr1          30       32       0        1        0
11       |    chr1          32       34       0        0        1
PyRanges with 12 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes.

>>> gr = pr.PyRanges({"Chromosome": ["chr1"] * 2, "Start": [0, 25], "End": [40, 35]}).tile_ranges(10)
>>> gr
  index  |    Chromosome      Start      End
  int64  |    str             int64    int64
-------  ---  ------------  -------  -------
      0  |    chr1                0       10
      0  |    chr1               10       20
      0  |    chr1               20       30
      0  |    chr1               30       40
      1  |    chr1               20       30
      1  |    chr1               30       40
PyRanges with 6 rows, 3 columns, and 1 index columns (with 4 index duplicates).
Contains 1 chromosomes.

>>> pr.count_overlaps(grs, gr)
  index  |    Chromosome      Start      End        a        b        c
  int64  |    str             int64    int64    int64    int64    int64
-------  ---  ------------  -------  -------  -------  -------  -------
      0  |    chr1                0       10        1        0        1
      0  |    chr1               10       20        2        2        2
      0  |    chr1               20       30        2        2        0
      0  |    chr1               30       40        0        1        1
      1  |    chr1               20       30        2        2        0
      1  |    chr1               30       40        0        1        1
PyRanges with 6 rows, 6 columns, and 1 index columns (with 4 index duplicates).
Contains 1 chromosomes.

pyranges1.from_string(s: str) → PyRanges

Create a PyRanges from multiline string.

Parameters:: s (str) – String with data.

Examples

>>> import pyranges1 as pr
>>> s = '''Chromosome      Start        End Strand
... chr1  246719402  246719502      +
... chr5   15400908   15401008      +
... chr9   68366534   68366634      +
... chr14   79220091   79220191      +
... chr14  103456471  103456571      -'''

>>> pr.from_string(s)
  index  |    Chromosome        Start        End  Strand
  int64  |    str               int64      int64  str
-------  ---  ------------  ---------  ---------  --------
      0  |    chr1          246719402  246719502  +
      1  |    chr5           15400908   15401008  +
      2  |    chr9           68366534   68366634  +
      3  |    chr14          79220091   79220191  +
      4  |    chr14         103456471  103456571  -
PyRanges with 5 rows, 4 columns, and 1 index columns.
Contains 4 chromosomes and 2 strands.

pyranges1.random(n: int = 1000, length: int = 100, chromsizes: dict[str, int] | DataFrame | None = None, seed: int | None = None, *, strand: bool = True) → pr.PyRanges

Return PyRanges with random intervals.

Parameters:

n (int, default 1000) – Number of intervals.
length (int, default 100) – Length of intervals.
chromsizes (dict or DataFrame, default None, i.e. use "hg19") – Draw intervals from within these bounds.
strand (bool, default True) – Data should have strand.
seed (int, default None) – Seed for random number generator.

Examples

>>> import pyranges1 as pr
>>> pr.random(seed=12345)
index    |    Chromosome    Start      End        Strand
int64    |    str           int64      int64      str
-------  ---  ------------  ---------  ---------  --------
0        |    chr4          36129012   36129112   +
1        |    chr5          177668498  177668598  -
2        |    chr15         1902279    1902379    +
3        |    chr11         23816613   23816713   +
...      |    ...           ...        ...        ...
996      |    chr2          155410960  155411060  -
997      |    chr6          80054552   80054652   +
998      |    chrX          66474125   66474225   +
999      |    chr7          69941721   69941821   -
PyRanges with 1000 rows, 4 columns, and 1 index columns.
Contains 24 chromosomes and 2 strands.

pyranges1.read_bam(f: str | Path, /, mapq: int = 0, required_flag: int = 0, filter_flag: int = 1540, *, sparse: bool = True) → PyRanges

Return bam file as PyRanges.

Parameters:

f (str) – Path to bam file
sparse (bool, default True) – Whether to return only the columns Chromosome, Start, End, Strand, Flag. Set to False to return also columns QueryStart, QueryEnd, QuerySequence, Name, Cigar, Quality (more time consuming).
mapq (int, default 0) – Minimum mapping quality score.
required_flag (int, default 0) – Flags which must be present for the interval to be read.
filter_flag (int, default 1540) – Ignore reads with these flags. Default 1540, which means that either the read is unmapped, the read failed vendor or platfrom quality checks, or the read is a PCR or optical duplicate.

Return type:

PyRanges

Notes

This functionality requires the library bamread. It can be installed with pip install bamread or conda install -c bioconda bamread.

Examples

>>> import pyranges1 as pr
>>> path = pr.example_data.files["smaller.bam"]
>>> pr.read_bam(path)
index    |    Chromosome    Start     End       Strand      Flag
int64    |    category      int64     int64     category    uint16
-------  ---  ------------  --------  --------  ----------  --------
0        |    chr1          887771    887796    -           16
1        |    chr1          994660    994685    -           16
2        |    chr1          1041102   1041127   +           0
3        |    chr1          1770383   1770408   -           16
...      |    ...           ...       ...       ...         ...
96       |    chr1          18800901  18800926  +           0
97       |    chr1          18800901  18800926  +           0
98       |    chr1          18855123  18855148  -           16
99       |    chr1          19373470  19373495  +           0
PyRanges with 100 rows, 5 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.

pyranges1.read_bed(f: Path, /, nrows: int | None = None) → PyRanges

Return bed file as PyRanges.

This is a reader for files that follow the bed format. They can have from 3-12 columns which will be named like so:

Chromosome Start End Name Score Strand ThickStart ThickEnd ItemRGB BlockCount BlockSizes BlockStarts

Parameters:

f (str) – Path to bed file
nrows (Optional int, default None) – Number of rows to return.

Notes

If you just want to create a PyRanges from a tab-delimited bed-like file, use pr.PyRanges(pandas.read_table(f)) instead.

Return type:: PyRanges

Examples

>>> import pyranges1 as pr
>>> path = pr.example_data.files["aorta.bed"]
>>> pr.read_bed(path, nrows=5)
  index  |    Chromosome      Start      End  Name        Score  Strand
  int64  |    category        int64    int64  str         int64  category
-------  ---  ------------  -------  -------  --------  -------  ----------
      0  |    chr1             9916    10115  H3K27me3        5  -
      1  |    chr1             9939    10138  H3K27me3        7  +
      2  |    chr1             9951    10150  H3K27me3        8  -
      3  |    chr1             9953    10152  H3K27me3        5  +
      4  |    chr1             9978    10177  H3K27me3        7  -
PyRanges with 5 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.

pyranges1.read_bigwig(f: str | Path) → PyRanges

Read bigwig files into a PyRanges.

Parameters:: f (str) – Path to bw file.
Return type:: PyRanges

Note

This function requires the library pyBigWig, it can be installed with pip install pyBigWig

Examples

>>> import pyranges1 as pr
>>> path = pr.example_data.files["bigwig.bw"]
>>> pr.read_bigwig(path)
  index  |      Chromosome    Start      End      Value
  int64  |             str    int64    int64    float64
-------  ---  ------------  -------  -------  ---------
      0  |               1        0        1        0.1
      1  |               1        1        2        0.2
      2  |               1        2        3        0.3
      3  |               1      100      150        1.4
      4  |               1      150      151        1.5
      5  |              10      200      300        2
PyRanges with 6 rows, 4 columns, and 1 index columns.
Contains 2 chromosomes.

pyranges1.read_gff(f: str | Path, /, *, nrows: int | None = None, duplicate_attr: bool = False, ignore_bad: bool = False) → PyRanges

Read files in the Gene Transfer Format.

Parameters:

f (str) – Path to GTF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.
duplicate_attr (bool, default False) – Whether to handle (potential) duplicate attributes or just keep last one.
ignore_bad (bool, default False) – Whether to ignore bad lines or raise an error.

Return type:

PyRanges

Note

The GTF format encodes both Start and End as 1-based included. PyRanges encodes intervals as 0-based, Start included and End excluded.

See also

pyranges1.read_gff3: read files in the General Feature Format

Examples

>>> import pyranges1 as pr
>>> from tempfile import NamedTemporaryFile
>>> contents = ['#!genome-build GRCh38.p10']
>>> contents.append('1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";')
>>> contents.append('1\thavana\ttranscript\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; group_by "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";')
>>> f = NamedTemporaryFile("w")
>>> _bytes_written = f.write("\n".join(contents))
>>> f.flush()
>>> pr.read_gtf(f.name)  
      index  |      Chromosome  Source      Feature       Start      End  Score    Strand      Frame       gene_id          ...
  int64  |        category  category    category      int64    int64  str      category    category    str              ...
-------  ---  ------------  ----------  ----------  -------  -------  -------  ----------  ----------  ---------------  -----
      0  |               1  havana      gene          11868    14409  .        +           .           ENSG00000223972  ...
      1  |               1  havana      transcript    11868    14409  .        +           .           ENSG00000223972  ...
PyRanges with 2 rows, 20 columns, and 1 index columns. (11 columns not shown: "gene_version", "gene_name", "gene_source", ...).
Contains 1 chromosomes and 1 strands.

pyranges1.read_gff3(f: str | Path, nrows: int | None = None) → PyRanges

Read files in the General Feature Format into a PyRanges.

Parameters:

f (str) – Path to GFF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.

Return type:

PyRanges

Notes

The gff3 format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded.

See also

pyranges1.read_gtf: read files in the Gene Transfer Format

pyranges1.read_gtf(f: str | Path, /, *, nrows: int | None = None, duplicate_attr: bool = False, ignore_bad: bool = False) → PyRanges

Read files in the Gene Transfer Format.

Parameters:

f (str) – Path to GTF file.
nrows (int, default None) – Number of rows to read. Default None, i.e. all.
duplicate_attr (bool, default False) – Whether to handle (potential) duplicate attributes or just keep last one.
ignore_bad (bool, default False) – Whether to ignore bad lines or raise an error.

Return type:

PyRanges

Note

The GTF format encodes both Start and End as 1-based included. PyRanges encodes intervals as 0-based, Start included and End excluded.

See also

pyranges1.read_gff3: read files in the General Feature Format

Examples

>>> import pyranges1 as pr
>>> from tempfile import NamedTemporaryFile
>>> contents = ['#!genome-build GRCh38.p10']
>>> contents.append('1\thavana\tgene\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";')
>>> contents.append('1\thavana\ttranscript\t11869\t14409\t.\t+\t.\tgene_id "ENSG00000223972"; gene_version "5"; group_by "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";')
>>> f = NamedTemporaryFile("w")
>>> _bytes_written = f.write("\n".join(contents))
>>> f.flush()
>>> pr.read_gtf(f.name)  
      index  |      Chromosome  Source      Feature       Start      End  Score    Strand      Frame       gene_id          ...
  int64  |        category  category    category      int64    int64  str      category    category    str              ...
-------  ---  ------------  ----------  ----------  -------  -------  -------  ----------  ----------  ---------------  -----
      0  |               1  havana      gene          11868    14409  .        +           .           ENSG00000223972  ...
      1  |               1  havana      transcript    11868    14409  .        +           .           ENSG00000223972  ...
PyRanges with 2 rows, 20 columns, and 1 index columns. (11 columns not shown: "gene_version", "gene_name", "gene_source", ...).
Contains 1 chromosomes and 1 strands.

pyranges1.tile_genome(chromsizes: PyRanges | pd.DataFrame | dict[str | int, int], tile_size: int, *, full_last_tile: bool = False) → PyRanges

Create a tiled genome.

Split the genome into adjacent non-overlapping tiles of a given size.

Parameters:

chromsizes (dict or PyRanges) – Dict or PyRanges describing the lengths of the chromosomes.
tile_size (int) – Length of the tiles.
full_last_tile (bool, default False) – Do not truncate the last tile to the end of the chromosome. Use to ensure size is consistent for all tiles.

See also

pyranges.PyRanges.tile_ranges: split intervals into adjacent non-overlapping tiles.

Examples

>>> import pyranges1 as pr
>>> chromsizes = pr.example_data.chromsizes
>>> chromsizes
index    |    Chromosome    Start    End
int64    |    category      int64    int64
-------  ---  ------------  -------  ---------
0        |    chr1          0        249250621
1        |    chr2          0        243199373
2        |    chr3          0        198022430
3        |    chr4          0        191154276
...      |    ...           ...      ...
21       |    chr19         0        59128983
22       |    chr22         0        51304566
23       |    chr21         0        48129895
24       |    chrM          0        16571
PyRanges with 25 rows, 3 columns, and 1 index columns.
Contains 25 chromosomes.

>>> pr.tile_genome(chromsizes,int(1e6))
index    |    Chromosome    Start     End
int64    |    category      int64     int64
-------  ---  ------------  --------  --------
0        |    chr1          0         1000000
1        |    chr1          1000000   2000000
2        |    chr1          2000000   3000000
3        |    chr1          3000000   4000000
...      |    ...           ...       ...
3110     |    chrY          56000000  57000000
3111     |    chrY          57000000  58000000
3112     |    chrY          58000000  59000000
3113     |    chrY          59000000  59373566
PyRanges with 3114 rows, 3 columns, and 1 index columns.
Contains 25 chromosomes.

>>> pr.tile_genome(chromsizes,int(1e6), full_last_tile=True)
index    |    Chromosome    Start     End
int64    |    category      int64     int64
-------  ---  ------------  --------  --------
0        |    chr1          0         1000000
1        |    chr1          1000000   2000000
2        |    chr1          2000000   3000000
3        |    chr1          3000000   4000000
...      |    ...           ...       ...
3110     |    chrY          56000000  57000000
3111     |    chrY          57000000  58000000
3112     |    chrY          58000000  59000000
3113     |    chrY          59000000  60000000
PyRanges with 3114 rows, 3 columns, and 1 index columns.
Contains 25 chromosomes.

pyranges1.options

The pyranges1.options object is used to configure aspects of how PyRanges is represented. Below are the methods available on this object.

class pyranges1.core.options.PyRangesOptions

Bases: object

display_options() → str

Return a representation of the current options and their values.

Examples

# In the below, the console width has been set to 120 so that the doctests will return the same result no matter # the console width. >>> import pyranges1 as pr >>> print(pr.options.display_options()) max_rows_to_show : 8 (the max number of rows to show in PyRanges repr) max_column_names_to_show : 3 (how many columns listed in PyRanges repr when not all fit the screen width) console_width : 120 (console width, affecting PyRanges representation (None for auto)) html_max_cols : 20 (max number of columns to show as HTML (e.g. Jupyter), others are hidden) html_max_rows : None (max n. of rows shown as HTML (e.g. Jupyter). If undefined, max_rows_to_show is used)

get_option(name: str) → int

Get the value of an option.

Parameters:: name (str) – The name of the option to get.
Returns:: The value of the option.
Return type:: int

Examples

>>> import pyranges1 as pr
>>> pr.options.get_option("max_rows_to_show")
8

reset_options() → None

Reset all options to their default values.

Examples

>>> import pyranges1 as pr
>>> pr.options.get_option('max_rows_to_show')
8

>>> pr.options.set_option('max_rows_to_show', 10)
>>> pr.options.get_option('max_rows_to_show')
10

>>> pr.options.set_option('console_width', 120)
>>> pr.options.get_option('console_width')
120

>>> pr.options.reset_options()
>>> pr.options.get_option('max_rows_to_show')
8

set_option(name: str, value: int) → None

Set an option to a new value.

Set one or more options.

Run pyranges.options.display_options() to see available options and their current values.

Parameters:

name (str) – The name of the option to set.
value (int) – The value to set the option to.

Examples

>>> import pyranges1 as pr
>>> pr.options.set_option('max_rows_to_show', 8)

pyranges1.example_data

The pyranges1.example_data object contains example data used in tests and documentation. Printing it shows an overview of available data:

>>> import pyranges1 as pr
>>> pr.example_data
Available example data:
-----------------------
example_data.chipseq            : Example ChIP-seq data.
example_data.chipseq_background : Example ChIP-seq data.
example_data.chromsizes         : Example chromsizes data (hg19).
example_data.ensembl_gtf        : Example gtf file from Ensembl.
example_data.f1                 : Example bed file.
example_data.f2                 : Example bed file.
example_data.aorta              : Example ChIP-seq data.
example_data.aorta2             : Example ChIP-seq data.
example_data.ncbi_gff           : Example NCBI GFF data.
example_data.ncbi_fasta         : Example NCBI fasta.
example_data.files              : A dict of basenames to file paths of available data.

Most of the data is in the form of PyRanges objects:

>>> pr.example_data.chipseq
index    |    Chromosome    Start      End        Name      Score    Strand
int64    |    category      int64      int64      str    int64    category
-------  ---  ------------  ---------  ---------  --------  -------  ----------
0        |    chr8          28510032   28510057   U0        0        -
1        |    chr7          107153363  107153388  U0        0        -
2        |    chr5          135821802  135821827  U0        0        -
3        |    chr14         19418999   19419024   U0        0        -
...      |    ...           ...        ...        ...       ...      ...
16       |    chr9          120803448  120803473  U0        0        +
17       |    chr6          89296757   89296782   U0        0        -
18       |    chr1          194245558  194245583  U0        0        +
19       |    chr8          57916061   57916086   U0        0        +
PyRanges with 20 rows, 6 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.

pyranges1.assistant

The pyranges1.assistant object is a helper to use AI-based assistant to code with pyranges1:

>>> pr.assistant
Utilities to instruct a AI coding assistant for pyranges1 prompts.
Get a prompt to copy-paste into an AI assistant to prime it for pyranges1 coding tasks:
>>> import pyranges1 as pr
>>> pr.assistant.prompt()
Make a file with pyranges1 documentation to upload to the AI assistant:
>>> pr.assistant.export_docs("pr_docs.txt")