Assembly P2


layout: post title: Assembly P2 date: ‘2021-10-11’ categories: Analysis P2 tags: bioinformatics —

  1. Assembly

Assembly each set of reads with spades

#!/bin/bash 
#SBATCH --mail-user=esogin@ucmerced.edu  
#SBATCH --mail-type=ALL  
#SBATCH --nodes=1  
#SBATCH --ntasks=24
#SBATCH --partition std.q 
#SBATCH --mem=250G 
#SBATCH --time=24:00:00 #  
#SBATCH --job-name=error-correction
#SBATCH --export=ALL

	
cd /home/esogin/scratch/prj001-comp-gene-chr/02_Analysis/
conda activate megahit

##----------01 Assemble Data --------------##
echo "--------Assemble Data--------"

#mkdir /home/esogin/scratch/prj001-comp-gene-chr/02_Analysis/02_assembly -p;
cd /home/esogin/scratch/prj001-comp-gene-chr/02_Analysis/01_subsampled/;

#error correct in spades
for i in *fq.gz;do
	 spades.py -o ../02_assembly/spades."${i%.fq.gz}" --12 $i -t 12 -m 250G --phred-offset 33 --only-error-correction; 

# assemble in megahit
megahit -1 ../02_assembly/spades."${i%.fq.gz}"corrected/highfreq_kmers_1.00.0_0.cor.fastq.gz \
		-2 ../02_assembly/spades."${i%.fq.gz}"corrected/highfreq_kmers_2.00.0_0.cor.fastq.gz \
		-r ../02_assembly/spades."${i%.fq.gz}"corrected/highfreq_kmers__unpaired.00.0_0.cor.fastq.gz \
		-t 48 -o ../02_assembly/megahit --out-pre "${i%.fq.gz}" --k-min 21 --k-max 151 --k-step 10

done

##----------02 Copy results back to /home/data --------------##
rsync -r /home/esogin/scratch/prj001-comp-gene-chr/ /home/esogin/data/prj001-comp-gene-chr/

echo "clean up scratch"

uptime
hostname
date

Some update on results: Spades keeps on failing to error correct, likely due to memory limitations

Written on October 11, 2021