Rebinning Sediment Metagenomes - Profiling the database
##Objective Today’s goal is set up a script to profile the contigs database using the BAM files. This is an important step in the binning process as it provides coverage statistics needed to accurately place contigs into bins.
##1. Set up and check workspace
Log into cologe:
cologn
ssh 8100:localhost:8100 gc-node-2
conda activate anvio-7
cd /opt/extern/bremen/symbiosis/sogin/prj001-comp-gene-chr/
Re-display contigs database stats:
anvi-display-contigs-stats contigs.db
We have something like 12 M contigs!
##2. Set up profiling qsub, bash script
if the bam files are not indexed, you need to add this part to the script:
anvi-init-bam SAMPLE-01-RAW.bam -o SAMPLE-01.bam
This will sort and index your BAM file for sample 1, you can write a loop to repeat this for each BAM file.
otherwise, move on to anvi-profile comand:
anvi-profile -i SAMPLE-01.bam -c contigs.db --output-dir profiles --sample-name NEW-NAME
Important: New Name can not have - in name file, only _, letters etc.
Names | New Name |
---|---|
3847_A_sorted.bam.bai | Out_A |
3847_B_sorted.bam.bai | Out_B |
3847_C_sorted.bam.bai | Out_C |
3847_D_sorted.bam.bai | Edge_D |
3847_E_sorted.bam.bai | Edge_E |
3847_F_sorted.bam.bai | Edge_F |
3847_G_sorted.bam.bai | In_G |
3847_H_sorted.bam.bai | In_H |
3847_I_sorted.bam.bai | In_I |
Trouble shooting tips:
- can’t allocate memory with running on 48 cores (T set to 24), set T to way less (try, T=4) but ask for 48 cores.
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe smp 48
#$ -V
#$ -q main.q@@himem
#make scratch directory; change to directory
mkdir /scratch/sogin/tmp.$JOB_ID/ -p;
cd /scratch/sogin/tmp.$JOB_ID/
#copy over data
rsync -r /opt/extern/bremen/symbiosis/sogin/prj001-comp-gene-chr/ /scratch/sogin/tmp.$JOB_ID/
conda activate anvio-7
cd /scratch/sogin/tmp.$JOB_ID/02_Analysis
anvi-profile -i ../01_Data/bam/3847_A_sorted.bam -c contigs.db --sample-name Out_A -T 4 -W
rsync -r /scratch/sogin/tmp.$JOB_ID/02_Analysis/ /opt/extern/bremen/symbiosis/sogin/prj001-comp-gene-chr/02_Analysis/
## CLEAN UP ##
rm /scratch/sogin/tmp.$JOB_ID/ -R;
echo "job finished: "
date