Swarup snATAC-seq: QC to Co-accessible analysis

Example pipeline to perform snATAC-seq data analysis using the Swarup data

Pre-processing workflow

Download the Swarup snATAC-seq data to a directory, for example “/snRNAseq_Swarup/Data/snATAC-seq/”. The data using here is down loaded from https://www.synapse.org/#!Synapse:syn22130833, including a count matrix file from https://www.synapse.org/#!Synapse:syn22130800, a .cvs metadata file from https://www.synapse.org/#!Synapse:syn22130802, a .cvs peak file from https://www.synapse.org/#!Synapse:syn24978763, a .cvs barcode file from https://www.synapse.org/#!Synapse:syn24978672 and a big fragment file from https://www.synapse.org/#!Synapse:syn25745094. This dataset was published in the article: Morabito S, Miyoshi E, Michael N, Shahin S, Head E, and Swarup V, Single-nuclei epigenomic and transcriptomic landscape in Alzheimer’s disease , Nature Genetics, 2021 https://doi.org/10.1038/s41588-021-00894-z.

Load necessary libraries and set up input and output locations.

suppressPackageStartupMessages({
library(Signac)
library(Seurat)
library(Rsamtools)
library(GenomeInfoDb)
library(EnsDb.Hsapiens.v86)
library(ggplot2)
library(cowplot)
library(Matrix)
library(SeuratWrappers)
library(patchwork)
library(monocle3)
library(cicero)
})

## Warning: package 'matrixStats' was built under R version 4.1.2

set.seed(124)
indir <- 'Data/snATAC-seq/'
outdir <- "Output/snATAC-seq/"
if(!file.exists(outdir)) dir.create(outdir, recursive = TRUE)

## Warning in dir.create(outdir, recursive = TRUE): 'Output\snATAC-seq' already exists

We use Signac pre-processing chromatin data. Information from two related input files, both of which are created by CellRanger: Peak/Cell matrix. This is analogous to the gene expression count matrix used to analyze single-cell RNA-seq. However, instead of genes, each row of the matrix represents a region of the genome (a ‘peak’), that is predicted to represent a region of open chromatin. Each value in the matrix represents the number of Tn5 cut sites for each single barcode (i.e. cell) that map within each peak. You can find more detail on the 10X Website. Fragment file. This represents a full list of all unique fragments across all single cells. It is a substantially larger file, is slower to work with, and is stored on-disk (instead of in memory). However, the advantage of retaining this file is that it contains all fragments associated with each single cell, as opposed to only reads that map to peaks.

Let us read in count matrix and sample meta data.

snATAC.Counts <- readMM("Data/snATAC-seq/snATAC_counts.mtx")
peaks <- read.csv("Data/snATAC-seq/peaks.csv", header=F,stringsAsFactors=F)
barcodes <- read.csv("Data/snATAC-seq/barcodes_atac.csv", header=F,stringsAsFactors=F)
colnames(snATAC.Counts) <- barcodes[,1]
rownames(snATAC.Counts) <- peaks[,1]
snATAC.Counts*1 -> snATAC.Counts
metadata <- read.csv("Data/snATAC-seq/snATAC_metadta.csv", header=T,row.names=1)
head(metadata, 4)

##                      Sample.ID Batch Sex Age Diagnosis    UMAP_1    UMAP_2   cluster celltype
## AAACGAAAGAAACGCC-13  Sample-96     1   M  79   Control  4.653650 -2.973710     ODC.f      ODC
## AAACGAAAGAAATGGG-11 Sample-101     2   F  74   Control  6.781233 -2.420259     ODC.k      ODC
## AAACGAAAGAAATGGG-5   Sample-37     3   F  87        AD -2.631755 -6.038205 PER.END.a  PER.END
## AAACGAAAGAAATTCG-13  Sample-96     1   M  79   Control -7.145253  2.478923      EX.b       EX

To speed up the processing, we use only 1 AD Sample-47 and 1 Control Sample-101 in the analysis.

AD.metadata <- metadata[metadata[,1] == "Sample-47", ]
CTRL.metadata <- metadata[metadata[,1] == "Sample-101", ]
CTRL.Counts <- snATAC.Counts[, rownames(CTRL.metadata)]
AD.Counts <- snATAC.Counts[, rownames(AD.metadata)]

Next, extract fragments for Sample-47 and Sample-101 and write into separate fragment files “Sample47_fragments.tsv” and “Sample101_fragments.tsv”.

fragpath <- "Data/snATAC-seq/fragments.tsv.gz"
CTRL_fragment_file <- paste(outdir, "Sample47_fragments.tsv", sep="/")
if (!file.exists(CTRL_fragment_file))
FilterCells(fragpath, cells = rownames(CTRL.metadata), CTRL_fragment_file)
AD_fragment_file <- paste(outdir, "Sample101_fragments.tsv", sep="/")
if (!file.exists(AD_fragment_file))
FilterCells(fragpath, cells = rownames(AD.metadata), AD_fragment_file)

Obtain annotation

suppressWarnings(suppressMessages(annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)))

Change chromosome style

genome(annotation) <- "hg38"
seqlevelsStyle(annotation) <- "UCSC"
# Rename chromosomes to change them to UCSC style
ChrList <- paste0("chr", seqlevels(annotation))
ChrList[25] <- "chrM"
annotation <- renameSeqlevels(annotation, ChrList)

Creat scATAC-seq object for control and AD

CTRL <- CreateChromatinAssay(counts = CTRL.Counts, sep = c(":", "-"), min.cells=10, 
            fragments = CTRL_fragment_file, annotation = annotation)

## Computing hash

GetTSSPositions(Annotation(CTRL))

## GRanges object with 19932 ranges and 3 metadata columns:
##           seqnames    ranges strand |         gene_id   gene_biotype   gene_name
##              <Rle> <IRanges>  <Rle> |     <character>    <character> <character>
##       [1]     chrX    276322      + | ENSG00000182378 protein_coding      PLCXD1
##       [2]     chrX    318819      - | ENSG00000178605 protein_coding      GTPBP6
##       [3]     chrX    386955      - | ENSG00000167393 protein_coding     PPP2R3B
##       [4]     chrX    624344      + | ENSG00000185960 protein_coding        SHOX
##       [5]     chrX   1212750      - | ENSG00000205755 protein_coding       CRLF2
##       ...      ...       ...    ... .             ...            ...         ...
##   [19928]     chrM     10470      + | ENSG00000212907 protein_coding     MT-ND4L
##   [19929]     chrM     10760      + | ENSG00000198886 protein_coding      MT-ND4
##   [19930]     chrM     12337      + | ENSG00000198786 protein_coding      MT-ND5
##   [19931]     chrM     14673      - | ENSG00000198695 protein_coding      MT-ND6
##   [19932]     chrM     14747      + | ENSG00000198727 protein_coding      MT-CYB
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

CTRL.atac <- CreateSeuratObject(counts = CTRL, assay = "peaks", meta.data = CTRL.metadata, annotation = annotation)
AD <- CreateChromatinAssay(counts = AD.Counts, sep = c(":", "-"), min.cells=10, 
            fragments = AD_fragment_file, annotation = annotation)

## Computing hash

GetTSSPositions(Annotation(AD))

## GRanges object with 19932 ranges and 3 metadata columns:
##           seqnames    ranges strand |         gene_id   gene_biotype   gene_name
##              <Rle> <IRanges>  <Rle> |     <character>    <character> <character>
##       [1]     chrX    276322      + | ENSG00000182378 protein_coding      PLCXD1
##       [2]     chrX    318819      - | ENSG00000178605 protein_coding      GTPBP6
##       [3]     chrX    386955      - | ENSG00000167393 protein_coding     PPP2R3B
##       [4]     chrX    624344      + | ENSG00000185960 protein_coding        SHOX
##       [5]     chrX   1212750      - | ENSG00000205755 protein_coding       CRLF2
##       ...      ...       ...    ... .             ...            ...         ...
##   [19928]     chrM     10470      + | ENSG00000212907 protein_coding     MT-ND4L
##   [19929]     chrM     10760      + | ENSG00000198886 protein_coding      MT-ND4
##   [19930]     chrM     12337      + | ENSG00000198786 protein_coding      MT-ND5
##   [19931]     chrM     14673      - | ENSG00000198695 protein_coding      MT-ND6
##   [19932]     chrM     14747      + | ENSG00000198727 protein_coding      MT-CYB
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

AD.atac <- CreateSeuratObject(counts = AD, assay = "peaks", meta.data = AD.metadata, annotation = annotation)

Computing QC metrics for each object

CTRL.atac <- TSSEnrichment(CTRL.atac)

## Extracting TSS positions

## Extracting fragments at TSSs

## 
## Computing TSS enrichment score

CTRL.atac <- NucleosomeSignal(object = CTRL.atac)
AD.atac <- TSSEnrichment(AD.atac)

## Extracting TSS positions

## Extracting fragments at TSSs

## 
## Computing TSS enrichment score

AD.atac <- NucleosomeSignal(object = AD.atac)

Visualization of QC metrics

p1 <- VlnPlot(
  object = CTRL.atac,
  features = c('nCount_peaks','nucleosome_signal', 'TSS.enrichment'),
  pt.size = 0.2,
  ncol = 3) + NoLegend()

p2 <- VlnPlot(
  object = AD.atac,
  features = c('nCount_peaks','nucleosome_signal', 'TSS.enrichment'),
  pt.size = 0.2,
  ncol = 3) + NoLegend()
print(p1 / p2)

The enrichment of Tn5 integration events at transcriptional start sites (TSSs) can also be an important quality control metric to assess the targeting of Tn5 in ATAC-seq experiments. The ENCODE consortium defined a TSS enrichment score as the number of Tn5 integration site around the TSS normalized to the number of Tn5 integration sites in flanking regions. See the ENCODE documentation for more information about the TSS enrichment score (https://www.encodeproject.org/data-standards/terms/). We can calculate the TSS enrichment score for each cell using the TSSEnrichment function in Signac.

Create granges object with TSS positions

gene.ranges <- genes(EnsDb.Hsapiens.v86)
gene.ranges <- gene.ranges[gene.ranges$gene_biotype == 'protein_coding', ]

tss.ranges <- GRanges(
  seqnames = seqnames(gene.ranges),
  ranges = IRanges(start = start(gene.ranges), width = 2),
  strand = strand(gene.ranges))

seqlevelsStyle(tss.ranges) <- 'UCSC'
tss.ranges <- keepStandardChromosomes(tss.ranges, pruning.mode = 'coarse')

CTRL.atac$high.tss <- ifelse(CTRL.atac$TSS.enrichment > 1, 'High', 'Low')
AD.atac$high.tss <- ifelse(AD.atac$TSS.enrichment > 1, 'High', 'Low')

Finally we remove cells that are outliers for these QC metrics.

CTRL.atac <- subset(x = CTRL.atac, 
  subset = nCount_peaks < 30000 & nCount_peaks > 1000 &
    nucleosome_signal < 5 & TSS.enrichment > 1)

AD.atac <- subset(x = AD.atac, 
  subset = nCount_peaks < 30000 & nCount_peaks > 1000 &
    nucleosome_signal < 5 & TSS.enrichment > 1)

Normalization, linear dimensional reduction and integration

Normalization: Signac performs term frequency-inverse document frequency (TF-IDF) normalization. This is a two-step normalization procedure, that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks.

Feature selection: The largely binary nature of scATAC-seq data makes it challenging to perform ‘variable’ feature selection, as we do for scRNA-seq. Instead, we can choose to use only the top n% of features (peaks) for dimensional reduction, or remove features present in less that n cells with the FindTopFeatures function. Here, we will use top 90% features, though we note that we see very similar results when using only a small subset of features (try setting min.cutoff to ‘q75’ to use the top 25% all peaks), with faster runtimes. Features used for dimensional reduction are automatically set as VariableFeatures for the Seurat object by this function.

Dimensional reduction: We next run a singular value decomposition (SVD) on the TD-IDF normalized matrix, using the features (peaks) selected above. This returns a low-dimensional representation of the object (for users who are more familiar with scRNA-seq, you can think of this as analogous to the output of PCA).

CTRL.atac <- RunTFIDF(CTRL.atac)

## Performing TF-IDF normalization

CTRL.atac <- FindTopFeatures(CTRL.atac, min.cutoff = 'q10')
CTRL.atac <- RunSVD(object = CTRL.atac, assay = 'peaks', reduction.key = 'LSI_', reduction.name = 'lsi')

## Running SVD

## Scaling cell embeddings

CTRL.atac <- RunUMAP(object = CTRL.atac, reduction = 'lsi', dims = 2:30)

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

## 08:14:18 UMAP embedding parameters a = 0.9922 b = 1.112

## 08:14:18 Read 6377 rows and found 29 numeric columns

## 08:14:18 Using Annoy for neighbor search, n_neighbors = 30

## 08:14:18 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 08:14:19 Writing NN index file to temp file C:\Users\zhoux05\AppData\Local\Temp\Rtmp2nEliP\filea99c426e6682
## 08:14:19 Searching Annoy index using 1 thread, search_k = 3000
## 08:14:20 Annoy recall = 100%
## 08:14:21 Commencing smooth kNN distance calibration using 1 thread
## 08:14:24 Initializing from normalized Laplacian + noise
## 08:14:24 Commencing optimization for 500 epochs, with 289210 positive edges
## 08:14:42 Optimization finished

AD.atac <- RunTFIDF(AD.atac)

## Performing TF-IDF normalization

AD.atac <- FindTopFeatures(AD.atac, min.cutoff = 'q10')
AD.atac <- RunSVD(object = AD.atac, assay = 'peaks', reduction.key = 'LSI_', reduction.name = 'lsi')

## Running SVD
## Scaling cell embeddings

AD.atac <- RunUMAP(object = AD.atac, reduction = 'lsi', dims = 2:30)

## 08:15:16 UMAP embedding parameters a = 0.9922 b = 1.112
## 08:15:16 Read 5953 rows and found 29 numeric columns
## 08:15:16 Using Annoy for neighbor search, n_neighbors = 30
## 08:15:16 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 08:15:17 Writing NN index file to temp file C:\Users\zhoux05\AppData\Local\Temp\Rtmp2nEliP\filea99c46044fe3
## 08:15:17 Searching Annoy index using 1 thread, search_k = 3000
## 08:15:18 Annoy recall = 100%
## 08:15:19 Commencing smooth kNN distance calibration using 1 thread
## 08:15:21 Initializing from normalized Laplacian + noise
## 08:15:21 Commencing optimization for 500 epochs, with 260834 positive edges
## 08:15:38 Optimization finished

# Combine AD and control object for integration
AD_CTRL.combined <- merge(CTRL.atac, AD.atac)

# re-run normalization and linear dimensional reduction for combined data, and then clustering cells. 
AD_CTRL.combined <- RunTFIDF(AD_CTRL.combined)

## Performing TF-IDF normalization

AD_CTRL.combined <- FindTopFeatures(AD_CTRL.combined, min.cutoff = 'q10')
AD_CTRL.combined <- RunSVD(object = AD_CTRL.combined, assay = 'peaks', reduction.key = 'LSI_', reduction.name = 'lsi')

## Running SVD
## Scaling cell embeddings

AD_CTRL.combined <- RunUMAP(object = AD_CTRL.combined, reduction = 'lsi', dims = 2:30)

## 08:16:45 UMAP embedding parameters a = 0.9922 b = 1.112
## 08:16:45 Read 12330 rows and found 29 numeric columns
## 08:16:45 Using Annoy for neighbor search, n_neighbors = 30
## 08:16:45 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 08:16:47 Writing NN index file to temp file C:\Users\zhoux05\AppData\Local\Temp\Rtmp2nEliP\filea99c48c14193
## 08:16:47 Searching Annoy index using 1 thread, search_k = 3000
## 08:16:50 Annoy recall = 100%
## 08:16:51 Commencing smooth kNN distance calibration using 1 thread
## 08:16:54 Initializing from normalized Laplacian + noise
## 08:16:54 Commencing optimization for 200 epochs, with 535006 positive edges
## 08:17:07 Optimization finished

AD_CTRL.combined <- FindNeighbors(AD_CTRL.combined, reduction = "lsi", dims = 2:50)

## Computing nearest neighbor graph
## Computing SNN

AD_CTRL.combined <- FindClusters(AD_CTRL.combined, resolution = 0.5, algorithm = 3)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 12330
## Number of edges: 635980
## 
## Running smart local moving algorithm...
## Maximum modularity in 10 random starts: 0.9086
## Number of communities: 15
## Elapsed time: 9 seconds

## 4 singletons identified. 11 final clusters.

Visualization of cell clusters

p1 <- DimPlot(AD_CTRL.combined, reduction = "umap", group.by = "Diagnosis", pt.size=0.05, cols = c("deepskyblue","red"))
p2 <- DimPlot(AD_CTRL.combined, reduction = "umap", label = TRUE)
print(plot_grid(p1, p2))

To visualize the two conditions side-by-side, we can use the split.by argument to show each condition colored by cluster.

p1 <- DimPlot(AD_CTRL.combined, reduction = "umap", split.by = "Diagnosis", label = TRUE)
print(p1)

Integration of AD and Control

To find integration anchors between the two datasets, we need to project them into a shared low-dimensional space. To do this, we’ll use reciprocal LSI projection (projecting each dataset into the others LSI space) by setting reduction=“rlsi”. For more information about the data integration methods in Seurat, see our recent paper and the Seurat website. Rather than integrating the normalized data matrix, as is typically done for scRNA-seq data, we’ll integrate the low-dimensional cell embeddings (the LSI coordinates) across the datasets using the IntegrateEmbeddings() function. This is much better suited to scATAC-seq data, as we typically have a very sparse matrix with a large number of features. Note that this requires that we first compute an uncorrected LSI embedding using the merged dataset (as we did above).

# find integration anchors
integration.anchors <- FindIntegrationAnchors(object.list = list(CTRL.atac, AD.atac),
  anchor.features = rownames(CTRL.atac), reduction = "rlsi", dims = 2:30)

## Computing within dataset neighborhoods

## Finding all pairwise anchors

## Warning: No filtering performed if passing to data rather than counts

## Projecting new data onto SVD

## Projecting new data onto SVD

## Finding neighborhoods

## Finding anchors

##  Found 1224 anchors

# integrate LSI embeddings
AD_CTRL.integrated <- IntegrateEmbeddings(anchorset = integration.anchors,reductions = AD_CTRL.combined[["lsi"]],
        new.reduction.name = "integrated_lsi",dims.to.integrate = 1:30)

## Merging dataset 2 into 1

## Extracting anchors for merged samples

## Finding integration vectors

## Finding integration vector weights

## Integrating data

# create a new UMAP using the integrated embeddings, re-cluster cells
AD_CTRL.integrated <- RunUMAP(AD_CTRL.integrated, reduction = "integrated_lsi", dims = 2:30)

## 08:18:24 UMAP embedding parameters a = 0.9922 b = 1.112

## 08:18:24 Read 12330 rows and found 29 numeric columns

## 08:18:24 Using Annoy for neighbor search, n_neighbors = 30

## 08:18:24 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 08:18:26 Writing NN index file to temp file C:\Users\zhoux05\AppData\Local\Temp\Rtmp2nEliP\filea99c2d024d38
## 08:18:26 Searching Annoy index using 1 thread, search_k = 3000
## 08:18:29 Annoy recall = 100%
## 08:18:31 Commencing smooth kNN distance calibration using 1 thread
## 08:18:33 Initializing from normalized Laplacian + noise
## 08:18:33 Commencing optimization for 200 epochs, with 539742 positive edges
## 08:18:46 Optimization finished

AD_CTRL.integrated <- FindNeighbors(AD_CTRL.integrated, reduction = "integrated_lsi", dims = 2:50)

## Computing nearest neighbor graph
## Computing SNN

AD_CTRL.integrated <- FindClusters(AD_CTRL.integrated, resolution = 0.5, algorithm = 3)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 12330
## Number of edges: 641729
## 
## Running smart local moving algorithm...
## Maximum modularity in 10 random starts: 0.9045
## Number of communities: 17
## Elapsed time: 10 seconds

## 6 singletons identified. 11 final clusters.

Visualization of integrated cell clusters

p1 <- DimPlot(AD_CTRL.integrated, reduction = "umap", group.by = "Diagnosis", pt.size=0.05, cols = c("deepskyblue","red"))
p2 <- DimPlot(AD_CTRL.integrated, reduction = "umap", label = TRUE)
print(plot_grid(p1, p2))

Visualize of integrated cell clusters side-by-side

p1 <- DimPlot(AD_CTRL.integrated, reduction = "umap", split.by = "Diagnosis", label = TRUE)
print(p1)

Add the gene activity matrix to the Seurat object as a new assay, and normalize it

gene.activities <- GeneActivity(AD_CTRL.integrated)

## Extracting gene coordinates

## Extracting reads overlapping genomic regions
## Extracting reads overlapping genomic regions

# add to the Seurat object as a new assay
AD_CTRL.integrated[['gene.activities']] <- CreateAssayObject(counts = gene.activities)

## Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from gene.activities_ to geneactivities_

AD_CTRL.integrated <- NormalizeData(object = AD_CTRL.integrated, assay = 'gene.activities',
  normalization.method = 'LogNormalize', scale.factor = median(AD_CTRL.integrated$nCount_gene.activities))

Plot gene activities for the cell marker genes

DefaultAssay(AD_CTRL.integrated) <- 'gene.activities'
p1 <- FeaturePlot(object = AD_CTRL.integrated,
  features = c('GFAP', 'AQP4', 'SLC1A2', 'SNAP25','SATB2','GAD1','MOBP','MBP','MOG'),
  pt.size = 0.1,max.cutoff = 'q95',ncol = 3) # 
print(p1)

Identify markers for each cluster Identify gene activity markers for each cluster to obtain conserved markers. Cell types can be identified using the conserved markers.

DefaultAssay(AD_CTRL.integrated) <- 'gene.activities'
uniq.clusters <- unique(Idents(AD_CTRL.integrated))
for (i in uniq.clusters) {
    CLS_peaks <- FindConservedMarkers(AD_CTRL.integrated, ident.1 = i, min.cells.group=1, 
        grouping.var = "Diagnosis", assay="gene.activities", verbose = FALSE)
    write.table(data.frame(Gene=rownames(CLS_peaks),CLS_peaks),file=paste("Cluster",i,"AD_CTRL.Conserved.Gene_Activity.txt",sep="."),
                sep="\t",row.names=F)
}

Rename Cell Types

AD_CTRL.integrated <- RenameIdents(AD_CTRL.integrated, `0` = "Oligodendrocyte 1", `1` = "Excitatory neuron 1", 
    `2` = "Astrocyte 1",`3` = "Microglia", `4` = "Excitatory neuron 2", `5` = "Inhibitory neuron 1", 
    `6` = "Astrocyte 2", `7` = "Inhibitory neuron 2", `8` = "Pericyte", `9` = "Oligodendrocyte 2", 
    `10` = "Excitatory neuron 3")
# assign cell type for each cell
AD_CTRL.integrated$CellType <- Idents(AD_CTRL.integrated)
AD_CTRL.integrated$celltype.Diag <- paste(Idents(AD_CTRL.integrated), AD_CTRL.integrated$Diagnosis, sep = "_")

Visualization with Cell type annotation

p1 <- DimPlot(AD_CTRL.integrated, label = FALSE)
print(p1)

Create the Cicero object to construct co-accessible network We can find cis-co-accessible networks (CCANs) using Cicero. The Cicero developers have developed a separate branch of the package that works with a Monocle 3 CellDataSet object. We will first make sure this branch is installed, then convert our Seurat object to CellDataSet format.

DefaultAssay(AD_CTRL.integrated) <- 'peaks'
AD_CTRL.cds <- as.cell_data_set(x = AD_CTRL.integrated)

## Warning: Monocle 3 trajectories require cluster partitions, which Seurat does not calculate. Please run 'cluster_cells' on your
## cell_data_set object

# AD_CTRL.cds <- cluster_cells(AD_CTRL.cds, resolution=1e-5)
# remove objects to release memory
rm(AD_CTRL.combined,snATAC.Counts,AD.Counts,AD.atac,AD,CTRL,integration.anchors,gene.activities,CTRL.atac,CTRL.Counts,peaks)
AD_CTRL.cicero <- make_cicero_cds(AD_CTRL.cds, reduced_coordinates = reducedDims(AD_CTRL.cds)$UMAP)

## Overlap QC metrics:
## Cells per bin: 50
## Maximum shared cells bin-bin: 44
## Mean shared cells bin-bin: 0.203288856584117
## Median shared cells bin-bin: 0

We’ll demonstrate running Cicero here using just one chromosome to save some time, but the same workflow can be applied to find CCANs for the whole genome. Here we demonstrate the most basic workflow for running Cicero. This workflow can be broken down into several steps, each with parameters that can be changed from their defaults to fine-tune the Cicero algorithm depending on your data.

# get the chromosome sizes from the Seurat object
genome <- seqlengths(annotation)
# convert chromosome sizes to a dataframe
genome <- genome[2]
genome.df <- data.frame(names(genome), "length" = genome)

# run cicero and find cis-co-accessible networks (CCANs)
conns <- run_cicero(AD_CTRL.cicero, genomic_coords = genome.df, sample_num = 3)

## [1] "Starting Cicero"
## [1] "Calculating distance_parameter value"
## [1] "Running models"
## [1] "Assembling connections"
## [1] "Successful cicero models:  250"
## [1] "Other models: "
## 
## Zero or one element in range 
##                            8 
## [1] "Models with errors:  0"
## [1] "Done"

ccans <- generate_ccans(conns)

## [1] "Coaccessibility cutoff used: 0.1"

Add links to the integrated Seurat object We can add the co-accessible links found by Cicero to the ChromatinAssay object in Seurat. Using the ConnectionsToLinks() function in Signac we can convert the outputs of Cicero to the format needed to store in the links slot in the ChromatinAssay, and add this to the object using the Links<- assignment function.

links <- ConnectionsToLinks(conns = conns, ccans = ccans)
Links(AD_CTRL.integrated) <- links

Visualization gene SNAP25

p1 <- CoveragePlot(AD_CTRL.integrated, region = "chr20-10198829-10327420")
print(p1)

Session info

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cicero_1.3.4.11             Gviz_1.36.2                 monocle3_1.0.0              SingleCellExperiment_1.14.1
##  [5] SummarizedExperiment_1.22.0 MatrixGenerics_1.4.3        matrixStats_0.61.0          patchwork_1.1.1            
##  [9] SeuratWrappers_0.3.0        Matrix_1.3-4                cowplot_1.1.1               ggplot2_3.3.5              
## [13] EnsDb.Hsapiens.v86_2.99.0   ensembldb_2.16.4            AnnotationFilter_1.16.0     GenomicFeatures_1.44.0     
## [17] AnnotationDbi_1.54.1        Biobase_2.52.0              Rsamtools_2.8.0             Biostrings_2.60.2          
## [21] XVector_0.32.0              GenomicRanges_1.44.0        GenomeInfoDb_1.28.1         IRanges_2.26.0             
## [25] S4Vectors_0.30.0            BiocGenerics_0.38.0         SeuratObject_4.0.2          Seurat_4.0.3               
## [29] Signac_1.3.0                rmarkdown_2.9              
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3           SnowballC_0.7.0          rtracklayer_1.52.0       scattermore_0.7          R.methodsS3_1.8.1       
##   [6] tidyr_1.1.3              bit64_4.0.5              knitr_1.33               multcomp_1.4-17          irlba_2.3.3             
##  [11] DelayedArray_0.18.0      R.utils_2.11.0           data.table_1.14.0        rpart_4.1-15             KEGGREST_1.32.0         
##  [16] RCurl_1.98-1.3           generics_0.1.0           metap_1.4                TH.data_1.1-0            RSQLite_2.2.7           
##  [21] RANN_2.6.1               VGAM_1.1-5               future_1.21.0            bit_4.0.4                mutoss_0.1-12           
##  [26] spatstat.data_2.1-0      xml2_1.3.2               httpuv_1.6.1             assertthat_0.2.1         viridis_0.6.1           
##  [31] xfun_0.25                hms_1.1.0                jquerylib_0.1.4          evaluate_0.14            promises_1.2.0.1        
##  [36] fansi_0.5.0              restfulr_0.0.13          progress_1.2.2           dbplyr_2.1.1             tmvnsim_1.0-2           
##  [41] igraph_1.2.6             DBI_1.1.1                htmlwidgets_1.5.3        sparsesvd_0.2            spatstat.geom_2.2-2     
##  [46] purrr_0.3.4              ellipsis_0.3.2           RSpectra_0.16-0          dplyr_1.0.7              backports_1.4.0         
##  [51] biomaRt_2.48.2           deldir_0.2-10            vctrs_0.3.8              remotes_2.3.0            ROCR_1.0-11             
##  [56] abind_1.4-5              cachem_1.0.5             withr_2.4.2              ggforce_0.3.3            BSgenome_1.60.0         
##  [61] checkmate_2.0.0          sctransform_0.3.2        GenomicAlignments_1.28.0 prettyunits_1.1.1        mnormt_2.0.2            
##  [66] goftest_1.2-2            cluster_2.1.2            lazyeval_0.2.2           crayon_1.4.1             pkgconfig_2.0.3         
##  [71] slam_0.1-48              labeling_0.4.2           tweenr_1.0.2             nlme_3.1-152             ProtGenerics_1.24.0     
##  [76] nnet_7.3-16              rlang_0.4.11             globals_0.14.0           lifecycle_1.0.0          miniUI_0.1.1.1          
##  [81] sandwich_3.0-1           filelock_1.0.2           BiocFileCache_2.0.0      mathjaxr_1.4-0           rsvd_1.0.5              
##  [86] dichromat_2.0-0          polyclip_1.10-0          lmtest_0.9-38            ggseqlogo_0.1            zoo_1.8-9               
##  [91] base64enc_0.1-3          ggridges_0.5.3           png_0.1-7                viridisLite_0.4.0        rjson_0.2.20            
##  [96] bitops_1.0-7             R.oo_1.24.0              KernSmooth_2.23-20       blob_1.2.2               stringr_1.4.0           
## [101] parallelly_1.27.0        jpeg_0.1-9               scales_1.1.1             memoise_2.0.0            magrittr_2.0.1          
## [106] plyr_1.8.6               ica_1.0-2                zlibbioc_1.38.0          compiler_4.1.1           BiocIO_1.2.0            
## [111] RColorBrewer_1.1-2       plotrix_3.8-2            fitdistrplus_1.1-5       listenv_0.8.0            pbapply_1.4-3           
## [116] htmlTable_2.3.0          Formula_1.2-4            MASS_7.3-54              mgcv_1.8-36              tidyselect_1.1.1        
## [121] stringi_1.7.3            highr_0.9                yaml_2.2.1               latticeExtra_0.6-29      ggrepel_0.9.1           
## [126] sass_0.4.0               VariantAnnotation_1.38.0 fastmatch_1.1-3          tools_4.1.1              future.apply_1.8.1      
## [131] rstudioapi_0.13          foreign_0.8-81           lsa_0.73.2               gridExtra_2.3            farver_2.1.0            
## [136] Rtsne_0.15               digest_0.6.27            BiocManager_1.30.16      FNN_1.1.3                shiny_1.6.0             
## [141] qlcMatrix_0.9.7          Rcpp_1.0.7               later_1.2.0              RcppAnnoy_0.0.19         httr_1.4.2              
## [146] biovizBase_1.40.0        Rdpack_2.1.2             colorspace_2.0-2         XML_3.99-0.6             tensor_1.5              
## [151] reticulate_1.20          splines_4.1.1            uwot_0.1.10              RcppRoll_0.3.0           sn_2.0.1                
## [156] spatstat.utils_2.2-0     multtest_2.48.0          plotly_4.9.4.1           xtable_1.8-4             jsonlite_1.7.2          
## [161] glasso_1.11              R6_2.5.0                 TFisher_0.2.0            Hmisc_4.6-0              pillar_1.6.2            
## [166] htmltools_0.5.1.1        mime_0.11                glue_1.4.2               fastmap_1.1.0            BiocParallel_1.26.1     
## [171] codetools_0.2-18         mvtnorm_1.1-3            utf8_1.2.2               lattice_0.20-44          bslib_0.2.5.1           
## [176] spatstat.sparse_2.0-0    tibble_3.1.3             numDeriv_2016.8-1.1      curl_4.3.2               leiden_0.3.9            
## [181] limma_3.48.3             survival_3.2-12          docopt_0.7.1             munsell_0.5.0            GenomeInfoDbData_1.2.6  
## [186] reshape2_1.4.4           gtable_0.3.0             rbibutils_2.2.4          spatstat.core_2.3-0