National Cancer Institute

Overview

This activity will focus on the location and amount of genetic

The Data

library( tidyverse )
library( gstudio )
data( arapat )

The data for this activity is included in the gstudio library and represents a set of nuclear co-dominant loci (named LTRS, WNT, EN, EF, ZMP, AML, ATPS, MP20) assayed for 363 individuals and partitioned into 3 partitions.

summary( arapat )
      Species      Cluster      Population        ID         Latitude       Longitude          LTRS          WNT     
 Cape     : 75   CBP-C :150   32     : 19   101_10A:  1   Min.   :23.08   Min.   :-114.3   01:01 :147   03:03  :108  
 Mainland : 36   NBP-C : 84   75     : 11   101_1A :  1   1st Qu.:24.59   1st Qu.:-113.0   01:02 : 86   01:01  : 82  
 Peninsula:252   SBP-C : 18   Const  : 11   101_2A :  1   Median :26.25   Median :-111.5   02:02 :130   01:03  : 77  
                 SCBP-A: 75   12     : 10   101_3A :  1   Mean   :26.25   Mean   :-111.7                02:02  : 62  
                 SON-B : 36   153    : 10   101_4A :  1   3rd Qu.:27.53   3rd Qu.:-110.5                03:04  :  8  
                              157    : 10   101_5A :  1   Max.   :29.33   Max.   :-109.1                (Other): 15  
                              (Other):292   (Other):357                                                 NA's   : 11  
       EN           EF          ZMP           AML           ATPS          MP20    
 01:01  :225   01:01 :219   01:01 : 46   08:08  : 51   05:05  :155   05:07  : 64  
 01:02  : 52   01:02 : 52   01:02 : 51   07:07  : 42   03:03  : 69   07:07  : 53  
 02:02  : 38   02:02 : 90   02:02 :233   07:08  : 42   09:09  : 66   18:18  : 52  
 03:03  : 22   NA's  :  2   NA's  : 33   04:04  : 41   02:02  : 30   05:05  : 48  
 01:03  :  7                             07:09  : 22   07:09  : 14   05:06  : 22  
 (Other): 16                             (Other):142   08:08  :  9   (Other):119  
 NA's   :  3                             NA's   : 23   (Other): 20   NA's   :  5  
  1. Create all potential genotypes de novo for a locus with 3 alleles.

  2. At the ATPS locus, there are several genotypes that are observed only once in the entire data set. What are these genotypes and which populations are they found in?

  3. Look at the composition of Populations in the arapat data set with particular attention to Species. This species is a parasite on a limited habit resource. Ecologically, what is happening here?

t
           
            101 102 12 153 156 157 159 160 161 162 163 164 165 166 168 169 171 173 175 177 32 48 51 58 64 73 75 77 84 88 89  9
  Cape        0   0  0   0   6   8   0   0   0   0   3   2   0   2   0   0   0   0   0   0  0 10  0  0  0  8 10  1  0  0  0  0
  Mainland    9   8  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 19  0  0  0  0  0  0  0  0  0  0  0
  Peninsula   0   0 10  10   0   2   9  10  10  10   7   8  10   8  10  10  10  10   7  10  0  0  7  9  5  2  1  9  9 10 10  9
           
            93 98 Aqu Const ESan Mat SFr
  Cape       0  3   4     8    6   4   0
  Mainland   0  0   0     0    0   0   0
  Peninsula 10  1   4     3    2   1   9

There is only 2 populations for the Cape species where it occurs in isolation, all the rest have both individuals who have been designated as Cape and Peninsula Species co-occurring. This parasite lives on a restricted habitat, and this may be a situation where they in competitive exclusion on the limited habitat.

  1. One way to see if ther is any indication that Peninsula and Cape species are interbreeding would be to look to see if there are alleles that are unique to one of the groups and not in the other. A stronger inference would be gained if those private alleles were within individuals who are in the same physical location. For the sites where both Peninsula and Cape taxonomic groups co-exist is there any evidence of alleles in one species but not in the other?

First I would just grab the populations where they co-occur. I can do this either programatically or just look at the table from the previous question. Then filter the data to just have those populations.

summary( sympatry )
      Species     Population     LTRS         WNT          EN          EF         ZMP          AML         ATPS         MP20   
 Cape     :59   75     :11   01:01 :12   02:02  :47   01:01 :54   01:01 :82   01:01 : 3   04:04  :28   02:02 : 2   18:18  :42  
 Peninsula:48   Const  :11   01:02 :21   03:03  :30   01:02 :27   01:02 :10   01:02 : 7   05:05  :18   03:03 :55   05:05  :11  
                157    :10   02:02 :74   01:01  :11   02:02 :24   02:02 :15   02:02 :90   06:06  :15   03:06 : 2   11:11  :11  
                163    :10               01:03  : 6   NA's  : 2               NA's  : 7   07:08  : 9   05:05 :37   05:06  : 6  
                164    :10               01:02  : 4                                       08:08  : 9   05:07 : 1   17:17  : 6  
                166    :10               (Other): 6                                       (Other):22   08:08 : 9   (Other):29  
                (Other):45               NA's   : 3                                       NA's   : 6   09:09 : 1   NA's   : 2  

Then I’d split them by species and look at the genetic data

sympatry %>%
  filter( Species == "Cape") -> cape
summary( cape )
      Species     Population     LTRS        WNT          EN          EF         ZMP         AML         ATPS        MP20   
 Cape     :59   75     :10   01:01 : 2   01:01 : 7   01:01 : 8   01:01 :55   02:02 :54   03:03 : 3   02:02 : 1   17:17 : 6  
 Peninsula: 0   157    : 8   01:02 : 8   01:02 : 4   01:02 :27   01:02 : 3   NA's  : 5   03:04 : 4   03:03 :55   17:18 : 6  
                73     : 8   02:02 :49   02:02 :47   02:02 :24   02:02 : 1               03:05 : 3   03:06 : 2   18:18 :42  
                Const  : 8               NA's  : 1                                       04:04 :28   09:09 : 1   18:19 : 4  
                ESan   : 6                                                               05:05 :17               19:19 : 1  
                Aqu    : 4                                                               NA's  : 4                          
                (Other):15                                                                                                  
sympatry %>%
  filter( Species == "Peninsula" ) -> peninsula
summary( peninsula )
      Species     Population     LTRS        WNT          EN          EF         ZMP          AML         ATPS         MP20   
 Cape     : 0   77     :9    01:01 :10   01:01 : 4   01:01 :46   01:01 :27   01:01 : 3   06:06  :15   02:02 : 1   05:05  :11  
 Peninsula:48   164    :8    01:02 :13   01:03 : 6   NA's  : 2   01:02 : 7   01:02 : 7   07:08  : 9   05:05 :37   11:11  :11  
                166    :8    02:02 :25   01:04 : 3               02:02 :14   02:02 :36   08:08  : 9   05:07 : 1   05:06  : 6  
                163    :7                03:03 :30                           NA's  : 2   07:07  : 6   08:08 : 9   06:06  : 5  
                Aqu    :4                03:04 : 1                                       06:07  : 2               10:11  : 5  
                Const  :3                04:04 : 2                                       (Other): 5               (Other): 8  
                (Other):9                NA's  : 2                                       NA's   : 2               NA's   : 2  

Now, look at the output. If you look at WNT you can see that the cape group only has alleles 1&2 whereas the peninsula group also has alleles 3 & 4. In fact, there is most of them are 03:03 homozygotes, which do not occur in sympatric cape samples. Here are the populations in which these homozygotes exist.

Repeat with the other loci and you’ll see a lot of evidence that there are private alleles in one species that do not occur in the other.

LS0tCnRpdGxlOiAiUG9wdWFsdGlvbiBHZW5ldGljcyBBY3Rpdml0eSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKPGNlbnRlcj4KWyFbTmF0aW9uYWwgQ2FuY2VyIEluc3RpdHV0ZV0oaHR0cHM6Ly91bnNwbGFzaC5jb20vcGhvdG9zL3RWLVJYMGJlRHA4L2Rvd25sb2FkP2ZvcmNlPXRydWUmdz02NDApXShodHRwczovL3Vuc3BsYXNoLmNvbS9waG90b3MvdFYtUlgwYmVEcDg/dXRtX3NvdXJjZT11bnNwbGFzaCZ1dG1fbWVkaXVtPXJlZmVycmFsJnV0bV9jb250ZW50PWNyZWRpdFNoYXJlTGluaykKPC9jZW50ZXI+CgojIyBPdmVydmlldwoKVGhpcyBhY3Rpdml0eSB3aWxsIGZvY3VzIG9uIHRoZSBsb2NhdGlvbiBhbmQgYW1vdW50IG9mIGdlbmV0aWMgCgojIyAgVGhlIERhdGEKCmBgYHtyfQpsaWJyYXJ5KCB0aWR5dmVyc2UgKQpsaWJyYXJ5KCBnc3R1ZGlvICkKZGF0YSggYXJhcGF0ICkKYGBgCgoKVGhlIGRhdGEgZm9yIHRoaXMgYWN0aXZpdHkgaXMgaW5jbHVkZWQgaW4gdGhlIGBnc3R1ZGlvYCBsaWJyYXJ5IGFuZCByZXByZXNlbnRzIGEgc2V0IG9mIG51Y2xlYXIgY28tZG9taW5hbnQgbG9jaSAobmFtZWQgKmByIGNvbHVtbl9jbGFzcyhhcmFwYXQsImxvY3VzIikgYCopIGFzc2F5ZWQgZm9yIGByIG5yb3coYXJhcGF0KWAgaW5kaXZpZHVhbHMgYW5kIHBhcnRpdGlvbmVkIGludG8gYHIgbGVuZ3RoKCBjb2x1bW5fY2xhc3MoYXJhcGF0LCJmYWN0b3IiKSktMWAgcGFydGl0aW9ucy4KCmBgYHtyfQpzdW1tYXJ5KCBhcmFwYXQgKQpgYGAKCjEuIENyZWF0ZSBhbGwgcG90ZW50aWFsIGdlbm90eXBlcyAqZGUgbm92byogZm9yIGEgbG9jdXMgd2l0aCAzIGFsbGVsZXMuCgoyLiBBdCB0aGUgQVRQUyBsb2N1cywgdGhlcmUgYXJlIHNldmVyYWwgZ2Vub3R5cGVzIHRoYXQgYXJlIG9ic2VydmVkIG9ubHkgb25jZSBpbiB0aGUgZW50aXJlIGRhdGEgc2V0LiAgV2hhdCBhcmUgdGhlc2UgZ2Vub3R5cGVzIGFuZCB3aGljaCBwb3B1bGF0aW9ucyBhcmUgdGhleSBmb3VuZCBpbj8gIAoKCjMuIExvb2sgYXQgdGhlIGNvbXBvc2l0aW9uIG9mICpQb3B1bGF0aW9ucyogaW4gdGhlICphcmFwYXQqIGRhdGEgc2V0IHdpdGggcGFydGljdWxhciBhdHRlbnRpb24gdG8gKlNwZWNpZXMqLiAgVGhpcyBzcGVjaWVzIGlzIGEgcGFyYXNpdGUgb24gYSBsaW1pdGVkIGhhYml0IHJlc291cmNlLiAgRWNvbG9naWNhbGx5LCB3aGF0IGlzIGhhcHBlbmluZyBoZXJlPwoKYGBge3J9CnRhYmxlKCBhcmFwYXQkU3BlY2llcywgYXJhcGF0JFBvcHVsYXRpb24pIC0+IHQKdApgYGAKCipUaGVyZSBpcyBvbmx5IDIgcG9wdWxhdGlvbnMgZm9yIHRoZSBgQ2FwZWAgc3BlY2llcyB3aGVyZSBpdCBvY2N1cnMgaW4gaXNvbGF0aW9uLCBhbGwgdGhlIHJlc3QgaGF2ZSBib3RoIGluZGl2aWR1YWxzIHdobyBoYXZlIGJlZW4gZGVzaWduYXRlZCBhcyBgQ2FwZWAgYW5kIGBQZW5pbnN1bGFgIFNwZWNpZXMgY28tb2NjdXJyaW5nLiAgVGhpcyBwYXJhc2l0ZSBsaXZlcyBvbiBhIHJlc3RyaWN0ZWQgaGFiaXRhdCwgYW5kIHRoaXMgbWF5IGJlIGEgc2l0dWF0aW9uIHdoZXJlIHRoZXkgaW4gY29tcGV0aXRpdmUgZXhjbHVzaW9uIG9uIHRoZSBsaW1pdGVkIGhhYml0YXQuKgoKCjQuIE9uZSB3YXkgdG8gc2VlIGlmIHRoZXIgaXMgYW55IGluZGljYXRpb24gdGhhdCAqUGVuaW5zdWxhKiBhbmQgKkNhcGUqIHNwZWNpZXMgYXJlIGludGVyYnJlZWRpbmcgd291bGQgYmUgdG8gbG9vayB0byBzZWUgaWYgdGhlcmUgYXJlIGFsbGVsZXMgdGhhdCBhcmUgdW5pcXVlIHRvIG9uZSBvZiB0aGUgZ3JvdXBzIGFuZCBub3QgaW4gdGhlIG90aGVyLiAgQSBzdHJvbmdlciBpbmZlcmVuY2Ugd291bGQgYmUgZ2FpbmVkIGlmIHRob3NlICpwcml2YXRlKiBhbGxlbGVzIHdlcmUgd2l0aGluIGluZGl2aWR1YWxzIHdobyBhcmUgaW4gdGhlIHNhbWUgcGh5c2ljYWwgbG9jYXRpb24uICBGb3IgdGhlIHNpdGVzIHdoZXJlIGJvdGggKlBlbmluc3VsYSogYW5kICpDYXBlKiB0YXhvbm9taWMgZ3JvdXBzIGNvLWV4aXN0IGlzIHRoZXJlIGFueSBldmlkZW5jZSBvZiBhbGxlbGVzIGluIG9uZSBzcGVjaWVzIGJ1dCBub3QgaW4gdGhlIG90aGVyPwoKKkZpcnN0IEkgd291bGQganVzdCBncmFiIHRoZSBwb3B1bGF0aW9ucyB3aGVyZSB0aGV5IGNvLW9jY3VyLiAgSSBjYW4gZG8gdGhpcyBlaXRoZXIgcHJvZ3JhbWF0aWNhbGx5IG9yIGp1c3QgbG9vayBhdCB0aGUgdGFibGUgZnJvbSB0aGUgcHJldmlvdXMgcXVlc3Rpb24uIFRoZW4gZmlsdGVyIHRoZSBkYXRhIHRvIGp1c3QgaGF2ZSB0aG9zZSBwb3B1bGF0aW9ucy4qCgpgYGB7cn0KYXJhcGF0ICU+JQogIGZpbHRlciggUG9wdWxhdGlvbiAlaW4lIGMoIk1hdCIsIkVTYW4iLCJDb25zdCIsIkFxdSIsIjk4IiwiNzciLCI3NSIsIjczIiwiMTY2IiwiMTY0IiwiMTYzIiwiMTU3IikgKSAlPiUKICBkcm9wbGV2ZWxzKCkgJT4lICAgIyBJIHVzZSB0aGlzIHRvIHJlY29uZmlndXJlIHRoZSBsZXZlbHMgb24gZmFjdG9ycyB0byBvbmx5IHRob3NlIHByZXNlbnQKICBzZWxlY3QoLUNsdXN0ZXIsIC1JRCwgLUxhdGl0dWRlLCAtTG9uZ2l0dWRlICkgLT4gc3ltcGF0cnkgCnN1bW1hcnkoIHN5bXBhdHJ5ICkKYGBgCgoqVGhlbiBJJ2Qgc3BsaXQgdGhlbSBieSBzcGVjaWVzIGFuZCBsb29rIGF0IHRoZSBnZW5ldGljIGRhdGEqCgpgYGB7cn0Kc3ltcGF0cnkgJT4lCiAgZmlsdGVyKCBTcGVjaWVzID09ICJDYXBlIikgLT4gY2FwZQpzdW1tYXJ5KCBjYXBlICkKCnN5bXBhdHJ5ICU+JQogIGZpbHRlciggU3BlY2llcyA9PSAiUGVuaW5zdWxhIiApIC0+IHBlbmluc3VsYQpzdW1tYXJ5KCBwZW5pbnN1bGEgKQpgYGAKCipOb3csIGxvb2sgYXQgdGhlIG91dHB1dC4gIElmIHlvdSBsb29rIGF0IGBXTlRgIHlvdSBjYW4gc2VlIHRoYXQgdGhlIGNhcGUgZ3JvdXAgb25seSBoYXMgYWxsZWxlcyAxJjIgd2hlcmVhcyB0aGUgcGVuaW5zdWxhIGdyb3VwIGFsc28gaGFzIGFsbGVsZXMgMyAmIDQuICBJbiBmYWN0LCB0aGVyZSBpcyBtb3N0IG9mIHRoZW0gYXJlIDAzOjAzIGhvbW96eWdvdGVzLCB3aGljaCBkbyBub3Qgb2NjdXIgaW4gc3ltcGF0cmljIGNhcGUgc2FtcGxlcy4gIEhlcmUgYXJlIHRoZSBwb3B1bGF0aW9ucyBpbiB3aGljaCB0aGVzZSBob21venlnb3RlcyBleGlzdC4qCgpgYGB7cn0KcGVuaW5zdWxhICU+JQogIGZpbHRlciggV05UID09IGxvY3VzKCIwMzowMyIpICkgJT4lCiAgc2VsZWN0KCBQb3B1bGF0aW9uLCBXTlQgKSAKCmBgYAoqUmVwZWF0IHdpdGggdGhlIG90aGVyIGxvY2kgYW5kIHlvdSdsbCBzZWUgYSBsb3Qgb2YgZXZpZGVuY2UgdGhhdCB0aGVyZSBhcmUgcHJpdmF0ZSBhbGxlbGVzIGluIG9uZSBzcGVjaWVzIHRoYXQgZG8gbm90IG9jY3VyIGluIHRoZSBvdGhlci4qCgoKCgoKCgo=