scmli

Single cell mutant library inspection

Background

To construct a whole-genome random mutation library in an efficient, reliable and economical manner, we developed a specialized experimental design and corresponding analysis workflow.
Specifically, we utilized 24 plates, each containing 24 wells, as one batch, with each well representing an individual mutation experiment. For same plate and same well, we added unique barcode to the gRNA PCR products and then proceeded with next-generation sequencing. In parallel, we performed whole-genome sequencing for samples from each plate.
By identifying gRNAs present in both the plate and the well, we were able to accurately determine the specific gRNA for specific well and assess mutations at the targeted locations, thereby validating our results.

Install

git clone https://github.com/gongyh/scmli.git

Software Requirements

Python3 (3.9)
Biopython (1.79) (python package)
pandas (1.4.2) (python package)
lxml (4.9.1) (python package)
argparse (python package, only needed if python<=3.6)
fastqc (0.11.9)
trim-galore>=0.6.0 (0.6.7)
r-base (3.6.1)
r-ggplot2 (3.3.5)
bcftools (1.15)
snippy (4.6.0 modified)

The tested versions are given in parentheses.

You can install these dependencies using Conda (Miniconda3):

conda install -c bioconda fastqc trim-galore pandas biopython lxml r-base r-ggplot2 bcftools snippy

Usage

gRNA model

Sclmi searches reads which have target gRNAs sequence. It uses fixed sequence (all sequencing bases before gRNAs in forward reads without adapter) for filtering valid reads, then searches gene-special gRNAs sequence with gRNAs library file. The gRNAs library sequence contains universal sequence and gene-special sequence, number(a b) is used to locate gene-special sequences in gRNAs.
gRNAs_library.csv:
NO01G00240,ccgggtccgattcccggtgcctgcaGAGTGTGGTGGAATTTGCCGgttttagagctagaaatagcaagttaaaataag
NO01G00250,ccgggtccgattcccggtgcctgcaACACGATAGTCAAGACGCTGgttttagagctagaaatagcaagttaaaataag
...... , ......

required: reads(fastq file), fixed sequence(str), gRNAs library(.csv)

variant model

Call variants in the data, filtering and statistical variation information in the target region
We changed the parameters in snippy, copy lib/snippy to path/bin/snippy

required: reads(fastq file), target(bed file), reference(.gbk)

Arguments

gRNA model

required arguments:
  -l LIB                            gRNAs library file
  -s SEQ                            All sequencing bases before gRNAs in forward reads without adapter
  -r1 READ1                         Read1 fastq file
  -r2 READ2                         Read2 fastq file

optional arguments:
  -h, --help                        Show help message and exit
  -t NUMBER                         Number of threads, default = 8
  --number NUMBER NUMBER            Start and end of the gene-special position in gRNAs,
                                    default='25 45', from the 26-th to the 45-th bases
  -n OUTPUT_NAME                    Prefix of output files, default = "my_project"
  -o OUTPUT_DIR                     Directory of output files, default = "output"
  --FASTQC_PATH                     PATH to fastqc
  --TRIM_GALORE_PATH                PATH to trim-galore

variant model

required arguments:
  -r1 READ1                         Read1 fastq file
  -r2 READ2                         Read2 fastq file
  --ref REF                         reference
  --target TARGET                   target region of variant
  --dtarget TARGET2                 selected smaller target region

optional arguments:
  -h, --help                        show this help message and exit
  -t THREADS                        Number of threads
  -n OUTNAME                        Prefix of output files, default='my_project'
  -o OUTDIR                         Directory of output files, default='output'

locate_gRNA.py

Script for identifying gRNA results in experimental protocols.

Test

cd scmli
python3 scmli.py gRNA \
  -l test/NoIMET1_gRNAs.csv \
  -s GGTAGAATTGGTCGTTGCCATCGACCAGGC \
  -r1 test/test_R1.fq.gz \
  -r2 test/test_R2.fq.gz

python3 scmli.py variant \
  -r1 test/test_R1.fq.gz \
  -r2 test/test_R2.fq.gz \
  --ref test/genes.gbk \
  --target test/targets.bed \
  --dtarget test/filter.bed

Results

gRNA model

file_fastqc.html/zip: Quality control results(raw data)
file_val_1/2_fastqc.html/zip: Quality control results(clean data)
file_trimming_report.txt: Trim results
my_project.counts: Raw count result
my_project.percentage: Detailed count result

gene_id	sequence	counts	percentage	percentage_gRNAs	accumulative_unknow_percentage
NO12G02480	TCTATCTCAACAGCCACCCG	17	0.037707	0.040775	0.0
NO03G04750	ACTTCCTGGTCCTCCCACGA	17	0.037707	0.040775	0.0
NO08G01490	TGCCTCAGGAGGGATGATCG	16	0.035489	0.040775	0.0
NO02G03790	GAGAACTTTTCATCCTCGCG	16	0.035489	0.040775	0.0
.......	.......	.......	......	......	......

my_project.stats: Statistical result

Key	Value
raw_reads(paired)	50000
all_reads(clean reads,paired)	49947
valid_reads	45085
unknow_reads	3393
gRNAs_reads	41692
all_kinds	12649
lib_kinds	9709
unknow_kinds	2940
gRNAs_kinds	9368
all/raw_reads_percent	0.99894
valid/all_reads_percentage	0.902657
gRNAs/valid_reads_percentage	0.924742
gRNAs_coverage	0.964878
gRNAs_average_all	4.29416
gRNAs_average_detected	4.45047

unknow.seq: List of unknow sequences
my_project.log: Process log
reads.plot: Count of different kinds of reads
frequency.plot: Frequency of all gRNAs
frequency_detected.plot: Frequency of detected gRNAs
frequency_distribution.plot: Count of different frequency of all gRNAs
frequency_distribution_detected.plot: Count of different frequency of detected gRNAs
accumulative_unknow_percentage.plot: Percentage of accumulative unknow sequences

variant model

my_project_snippy_hq.vcf: Result of variant
my_project_snippy_hq.gids: Gene id of variant
target2_variant.txt: Variation information in target region

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github/workflows		.github/workflows
conda-env		conda-env
doc		doc
libs		libs
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scmli.py		scmli.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scmli

Table of Contents

Background

Install

Software Requirements

Usage

gRNA model

variant model

Arguments

gRNA model

variant model

locate_gRNA.py

Test

Results

gRNA model

variant model

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scmli

Table of Contents

Background

Install

Software Requirements

Usage

gRNA model

variant model

Arguments

gRNA model

variant model

locate_gRNA.py

Test

Results

gRNA model

variant model

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages