Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synteny Pattern #707

Open
tamim07 opened this issue Sep 17, 2024 · 15 comments
Open

Synteny Pattern #707

tamim07 opened this issue Sep 17, 2024 · 15 comments

Comments

@tamim07
Copy link

tamim07 commented Sep 17, 2024

Hi, I am trying to identify the syntenic pattern. I have three genomes to compare one is diploid A (2x), B- hexaploid(6x) and C- allododecaploid (12x). The results I am getting A-B (1:4) instead of 1:3, A-C(1:6). and B-C (2:2) instead of 2:3 while giving the parameter --no_strip_names --cscore=.99. I have attached the the figures and here is my example commands:

python -m jcvi.compara.catalog ortholog A B --no_strip_names --cscore=.99

python -m jcvi.compara.synteny depth --histogram A.B.anchors

A-B.pdf
A-C.depth.pdf
A-C.pdf
B_C.depth.pdf
B_C.pdf
A-B.depth.pdf

Could you please help to get the right pattern? Thanks

@tanghaibao
Copy link
Owner

@tamim07

Seems complicated. Please also also send results with --cscore=0.7 and I'll take a look.

@tamim07
Copy link
Author

tamim07 commented Sep 17, 2024

Thanks for your quick reply. This become more compliocated while runing in default programme A-B = 1:4 (instead of 1:3), A-C = 1:7 (instead of 1:6) and B-C = 4:6 (instead of 2:3). However, I have managed to generate microsynteny with i=3 and i=6 for B and C blocks reference with A. Thanks

B-C_default.pdf
Microsynteny_B_i3_C_i6_blocks.pdf
A-B_default.depth.pdf
A-B_default.pdf
A-C_default.depth.pdf
A-C_default.pdf
B-C_default.depth.pdf

@tanghaibao
Copy link
Owner

@tamim07

Thanks.
Somehow the A-B_default.pdf is the same as A-B.pdf (look at the number of gene pairs at the top). Same with A-C_default.pdf vs A-C.pdf. B-C is good.

@tamim07
Copy link
Author

tamim07 commented Sep 18, 2024

Yes. it is. Any suggestion from your end? Any parameter to play with? Thanks

@tanghaibao
Copy link
Owner

tanghaibao commented Sep 18, 2024

Can you double check A-B_default and A-C_default? I find it unusual to have exactly same gene pairs when filtering with 0.99 vs 0.7 (default).

Please note since jcvi caches the result, you may need to delete the .last.filtered and then apply the cscore cutoff again to generate new results. Otherwise, it just uses the cached results from the old cscore.

@tanghaibao
Copy link
Owner

tanghaibao commented Sep 18, 2024

From the results you send me:

  1. Roughly A:B:C = 1:4:6, which means that that A is 2x, B is 8x, C is 12x. (For this pattern, you use the default cscore setting).
  2. The 0.99 cscore instead shows only reciprocal best match. This is useful to check subgenomes that are closer. The reason B-C is so different from B-C_default is that they share some subgenomes. Just as an example, 8x is AAAAAABB, 12x is AAAAAACCCCCC (here the A/B/C are subgenomes), then the pattern will look very different in 0.99, but still with expected pattern running with default.

My suggestions:

  1. Make sure you run cscore default and 0.99 separately. See my comments above, better use separate folders to avoid getting confused with cached results. It's unlikely that these results are the same, as currently seen.
  2. Once we have the full results (3 from default and 3 from 0.99, with different results), we'll then try to interpret.

@tamim07
Copy link
Author

tamim07 commented Sep 19, 2024

Thanks for your suggestion. I have re run the programme. The default results shows A:B = 1:4 (39,769 pair), A:C=1:7 (47,263 pair) and B:C = 4:6 (121,709 pairs), on the otherhand, the cs0.99 results showed A:B = 1:4 (20,213 pair), A:C=1:6 (19,697 pair) and B:C = 2:2 (32,289 pairs)
The results varies with previous analysis probably due to the different version of A genomes.

A_B_default.pdf
A-C_cs0.99.depth.pdf
A-C_cs0.99.pdf
A-C_default.depth.pdf
A-C_default.pdf
B-C_cs0.99.depth.pdf
B-C_cs0.99.pdf
B-C_default.depth.pdf
B-C_default.pdf
A_B_cs0.99.depth.pdf
A_B_cs0.99.pdf
A_B_default.depth.pdf

I do appreciate your help. Thanks

@tanghaibao
Copy link
Owner

@tamim07

The 1:4:6 pattern is very clear - which means A:B:C=2x:8x:12x. Also the 4x and 6x regions are equally distant to the diploid. This is because the pattern of 1:4 and 1:6 remains even when filtering down to reciprocal best (cscore=0.99).

At least this part is clear.

What's interesting is the pattern between B:C. You see B:C=4:6 in the default setting, which agrees with the paragraph above, that is fine. However, when filtering down to reciprocal best, some of the region pairs are closer among the 4x6 grid. This suggests that there are some shared sub-genomes in B and C.

This becomes a more complicated research problem and goes beyond what I can help here. What I can suggest as next step is that you should calculate Ks for all gene pairs, plot the distributions of the Ks values. More importantly, with the Ks values you can repeat the dot plot again, with different colors of Ks, using the following command.

python -m jcvi.graphics.dotplot B-C_default.ks

This visualize the anchorfile in a dotplot. B-C_default.ks contains two columns
indicating gene pairs, followed by an optional column (e.g. Ks value).

@tamim07
Copy link
Author

tamim07 commented Sep 24, 2024

Hi there,
Thanks for your suggestion. Here is the kaks_plots between B_C.

B_C_default_kaks.pdf

This puts me in a challenging situation. The literature indicates that B is hexaploid (6x), and C is an allododecaploid (12x). C is the result of genome duplication of D (6x), which is a hybrid of B (6x) and E (6x). I know it sounds complicated. I’ve included a reference figure where C represents S. anglica (12x), B is S. alterniflora (6x), and A is the distantly related species O. thomaeum (2x). I'm struggling to piece together how to solve this.

1-s2 0-S1055790317301628-gr2
https://www.sciencedirect.com/science/article/pii/S1055790317301628#b0355

I am wondering, if can I make a plot using quota commands.
python -m jcvi.compara.synteny depth --histogram --xmax 3 --quota 1:3 A.B.anchors

Would you please share your thoughts?

Finally, I want to show a figure like this.

blocks.pdf

I do appreciate your help. Thanks

@tanghaibao
Copy link
Owner

I do not see the color differences in the ks plot sent. It looks like we need to add the option to show the colormap.

python -m jcvi.graphics.dotplot B-C_default.ks --cmaptext "Ks"

I understand your situation. You can always screen with quota in all the comparisons:

python -m jcvi.compara.catalog ortholog A B --quota 1:3

However, this will leave some chromosomes in B unaccounted for. Try this command, and see for yourself. We can review the situation of A:B later when you have the Ks colors overlaid on the dotplot (above), but at this point it does look like 1:4, which brings B at 8x.

You have the freedom to do what you want. My past experience is to do the right thing. Check genome sizes, or chromosome karyotype, flow cytometry. Sometimes past knowledge may not be accurate.

@tamim07
Copy link
Author

tamim07 commented Sep 24, 2024

Hi,
I got this error:
python -m jcvi.graphics.dotplot --qbed Salterniflora.bed --sbed Sanglica.bed Salterniflora_Sanglica_kaks --cmaptext "ks"

[09/24/24 19:12:02] DEBUG Load file Salterniflora.bed base.py:36
[09/24/24 19:12:03] DEBUG Load file Sanglica.bed base.py:36
[09/24/24 19:12:04] DEBUG Capping values within [0.0, 2.0] dotplot.py:286
Traceback (most recent call last):
File "/gpfs/home/evz22uzu/.conda/envs/jcvi_new/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/gpfs/home/evz22uzu/.conda/envs/jcvi_new/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/gpfs/home/evz22uzu/.conda/envs/jcvi_new/lib/python3.8/site-packages/jcvi/graphics/dotplot.py", line 549, in
dotplot_main(sys.argv[1:])
File "/gpfs/home/evz22uzu/.conda/envs/jcvi_new/lib/python3.8/site-packages/jcvi/graphics/dotplot.py", line 518, in dotplot_main
dotplot(
File "/gpfs/home/evz22uzu/.conda/envs/jcvi_new/lib/python3.8/site-packages/jcvi/graphics/dotplot.py", line 332, in dotplot
x, y, c = zip(*data)
ValueError: not enough values to unpack (expected 3, got 0)


Quota on alignment stage not having any change. However quota on histogram stage showed the 1:3 pattern. Here is the images.

A_B_quota1_3.pdf
A_B_quota1_3_in_depth_stage.pdf
A-B_quota1_3.depth.pdf

@tanghaibao
Copy link
Owner

tanghaibao commented Sep 24, 2024

@tamim07

You are getting very deep use of the JCVI package ;-) so it can get frustrating for you to know how to use these functions.

  1. For ks calculation, it's possible to do it with JCVI, however unfortunately I have not added a tutorial yet. You need to compute these somehow and add a 3rd column with the Ks values in the .anchors file. Then with --cmaptext option you'll be able to see different colors from recent or ancient gene pairs based on Ks.

  2. After you run with --quota 1:4, look for a file that looks like A.B.lifted.1x4.anchors and redo the dotplot:

python -m jcvi.graphics.dotplot A.B.lifted.1x4.anchors

@tamim07
Copy link
Author

tamim07 commented Sep 30, 2024

Apology for delayed response. Here is the two figures:

A-B.lifted.1x4.pdf
A-B_kaks.pdf

I am wondering, some of the papers used only blocks having more than 5 genes. Do pattern can be modified by reducing the cds numbers and/or reducing the block numbers?

Thanks

@tanghaibao
Copy link
Owner

@tamim07

Thank you.

Would you please post a histogram of Ks? and set the appropriate --vmax (for example --vmax=0.5) in the dotplot command so we can see the color difference better? The default plot, as you sent, was ranged between 0 to 2, which can't distinguish old and new events very well.

@tamim07
Copy link
Author

tamim07 commented Oct 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants