「基因组」着丝粒和端粒鉴定软件-365bet娱乐场注册-365bet体育在线赌博-英国365-365bet娱乐场注册

最近状态不佳，连续两件事情做的时候只想到了一半，做了又等于没做，都成了自己预想的最差的结果，要想做到最佳，只有重做，现在浪费时间结果不合适等于白做。一心多用，今天白搞一下午，越是忙碌的时候，越是出错，生活太难了。基因组分析是很个性的东西，不是流水生产线，做的越是快，问题越多，返工越多，一个项目也就是做了两次基因预测和两次HiC，要坚强！

（1）端粒（telomere）检测软件（更新中）

0.端粒数据库

https://telomerase.asu.edu/sequences_telomere.html（现在打不开了，墙外面可以打开，一般植物是7碱基重复，如AAACCCT，也存在不同染色体端粒不一样的情况，如人参T2T：https://doi.org/10.1093/hr/uhae107

）

software

Download

Time

Need

Note

FindTelomeres

https://github.com/JanaSperschneider/FindTelomeres

可以只用genome，需要genome和gff3（可以改加上这个gff输入）

可以修改脚本替换重复单元

tidk（telomeric-identifier）

https://github.com/tolkit/telomeric-identifier

需要genome

quarTeT

https://github.com/aaranyue/quarTeT http://www.atcgn.com:8080/quarTeT/home.html

2023

需要genome, 调用tidk（telomeric-identifier），可以补gap，鉴定端粒，着丝粒

https://doi.org/10.1093/hr/uhad127

拿端粒序列

例如CCCATTT at the 5′ end and TTTAGGG at the 3′ end查找，seqtk

VGP

https://github.com/VGP/vgp-assembly

蓝莓T2T，HR：https://doi.org/10.1093/hr/uhad209

综上

1.染色体一端和另外一端的端粒序列应该是反向互补的，调整HiC的时候应该注意末端的这种情况，实践经验发现存在末尾很短会挂反的情况，可以提取首位一段50kb区域进行查看。

2.有的物种端粒区域很长，有的很短，跟组装好坏水平也有一点关系，一般认为可能都是kb级别以上比较好（目前也没看到标准）。

3.不同染色体存在端粒不一样的情况。

4.有的物种端粒特殊。

端粒延伸

1.Telomere extensions were conducted using minimap2 (v2.24)21, medaka consensus (v1.7.2; https://github.com/nanoporetech/medaka) and blastn (v2.11.0+; ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/). 来源：https://www.nature.com/articles/s41597-025-04793-4

2.teloclip：https://github.com/Adamtaranto/teloclip

着丝粒（centromere）检测软件（待更新）

1.Centromics software：

(https://github.com/ShuaiNIEgithub/Centromics)，需要基因组，不需要注释文件，不需要IGV。

2.Telomeres_and_Centromeres

https://github.com/Immortal2333/Telomeres_and_Centromeres 这个是需要IGV结果注释文件看。

3.HiCAT：

https://github.com/865699871/HiCAT #需要参考的着丝粒区域，genome。

4.quarTeT

（调用tidk（telomeric-identifier），可以补gap，鉴定端粒，着丝粒）https://github.com/aaranyue/quarTeT

或者http://www.atcgn.com:8080/quarTeT/home.html

5.Tandem Repeat Finder (TRF)

https://link.zhihu.com/?target=http%3A//tandem.bu.edu/trf/trf.html 一般文章好像没写具体的做法。

6.TRASH

TRASH https://github.com/vlothec/TRASH

7.CentIER 2024年8月8日，中国农业科学院农业基因组研究所潘玮华团队在Plant Communications发表

各项准确性预测指标高于同类型软件20%以上。源程序及测试文件可由github（https://github.com/simon19891216/CentIER/releases/tag/CentIERv2.0）下载。对于登录github有困难的用户可以选择到https://gitee.com/SimonX19891216/CentIER

3.软件用法

1.Centromics 安装及使用

git clone --recurse-submodules https://github.com/zhangrengang/Centromics

cd Centromics

# install

conda env create -f Centromics.yaml

conda activate RepCent

./install.sh

# start

cd example_data

# long reads

centromics -l hifi.fq.gz -g ref.fa

# long reads + HiC data + ChIP data

centromics -l hifi.fq.gz -g ref.fa -pre hifi -chip chip.bam -hic merged_nodups.hic

centromics -l ont*.fq.gz -g ref.fa -pre ont -chip chip.bam -hic merged_nodups.hic

centromics -l ccs.fq.gz/ont.fq.gz/ccs.fq -g genb=ome.fa -pre out -outdir ./ -tmpdir {}.tmp -ncpu 10 -min_ratio 0.03

/share/nas1/yangp/01.software/anaconda3/envs/RepCent/bin/centromics -h

usage: centromics [-h] [-g FILE] -l FILE [FILE ...] [-hic FILE] [-chip FILE] [-pre STR] [-o DIR]

[-tmpdir DIR] [-subsample_x INT] [-subsample_n INT] [-trf_opts STR]

[-min_cov FLOAT] [-min_len INT] [-min_monomer_len INT] [-clust_opts STR]

[-min_ratio FLOAT] [-window_size INT] [-chr_prefix STR] [-p INT] [-cleanup]

[-overwrite] [-v]

Cluster Repeat Sequences.

optional arguments:

-h, --help show this help message and exit

Input:

-g FILE, -genome FILE

Genome FASTA file

-l FILE [FILE ...], -long FILE [FILE ...]

Long whole-genome-shotgun reads such as PacBio CCS/CLR or ONT reads in fastq

or fasta format [required]

-hic FILE Hi-C data alignments by juicer

-chip FILE ChIP data alignments in bam format (sorted)

Output:

-pre STR, -prefix STR

Prefix for output [default=centomics]

-o DIR, -outdir DIR Output directory [default=cent-output]

-tmpdir DIR Temporary directory [default=tmp]

Kmer matrix:

-subsample_x INT Subsample long reads up to X depth (prior to `-subsample_n`) [default=5]

-subsample_n INT Subsample long reads up to N reads [default=100000]

-trf_opts STR TRF options to identify tandem repeats on a read [default='1 1 2 80 5 200

2000 -d -h']

-min_cov FLOAT Minimum coverage of tandem repeats for a read [default=0.9]

-min_len INT Minimum length of tandem repeats for a read [default=100]

-min_monomer_len INT Minimum monomer length of a tandem repeat [default=1]

-clust_opts STR REPclust options to cluster tandem repeat units [default='-m jaccard -k 15

-c 0.2 -x 2 -I 2']

-min_ratio FLOAT Minimum relative mass ratio to filter tandem repeats [default=0.1]

Circos:

Options for circos plot

-window_size INT Window size (bp) for circos plot [default=50000]

-chr_prefix STR match chromosome to only plot chromosomes [default="chr[\dXYZW]+"]

Other options:

-p INT, -ncpu INT Maximum number of processors to use [default=160]

-cleanup Remove the temporary directory [default=False]

-overwrite Overwrite even if check point files existed [default=False]

-v, -version show program's version number and exit

结果文件：有ont数据优先使用，没有则用ccs数据

*.candidate_peaks.bed，候选的centomics区域。

out.circos_legend.pdf #out.circos.png中不同颜色代表的不同类型TRF

out.circos_legend.txt #out.circos.png 两圈的含义

out.circos.pdf #不同类型的TRF的密度图

out.circos.png

Centromics.txt #不同类型的TRF的数目

out.trf.count #按bin统计的不同类型的TRF的数目

data/genome_karyotype.txt #核型文件

data/tr_density.txt #圈图画图文件，不同类型的TRF数目

https://github.com/zhangrengang/Centromics/issues/6

文献来源：

着丝粒：

说明：各种方法的原理基本都差不多，着丝粒基本都是基于trf结果，端粒都是找重复单元，大同小异。

实测结果：目前对于端粒和着丝粒的完整性并没有直接的定义，端粒的组装长度实测跟数据量有关系，深度越深越好，组装长度可能越长，首尾反向互补；着丝粒可能更需要实验和文献结果，多个软件的结果和结合hic热图进行验证。

「基因组」着丝粒和端粒鉴定软件

📚 相关推荐

燕云十六声蛇王骨碎片在哪获取蛇王密藏任务流程攻略

cad中怎么随意移动图形_CAD中需要挪动图形到准确位置？试试这几种方法

蒸地瓜多长时间最好

属羊的名人和明星

促销组合中的促销活动有哪些常见形式？

企业中常见的工作模式有哪些？

人死亡后大脑意识存活多长时间

惜福是什么意思

石字旁加一个立念什么字?砬怎么读?

🔗 友情链接