Lettuce SFP Chip Project



HomeLaboratories and PeopleData and ToolsPublicationsPartnersCollaborationsUC Discovery GrantSiteMap


The lettuce Affymetrix GeneChip© contains 6,553,600 cells or features, each with a size of 5 µm × 5 µm, organized in 2,560 rows (Y-coordinate) and 2,560 columns (X-coordinate). Table 1 shows how many of these features actually contain probes (Original Probeset Assembly).

The probes on the lettuce chip can be grouped into six different classes (Table 1). The number of TI probes is different for the original probeset assembly and the native-alien probeset assembly, while the number of the other five classes of probes is the same. The native-alien probeset assembly takes into account the native-alien situation of the probes; thus, unigenes are represented by both the native and alien probes matching that unigene. When we received the ~ 11M potential probes from Affymetrix we had to select ~ 6.4 M of these; i.e. we had to discard ~ 4.6 M probes. We saw that part of the 25-mer probes were 100% identical in different unigenes. Thus, we decided to remove multiple copies of identical probes. However, that means that you have one probeset where these probes are included, and other probesets where these probes are not included. When we analyze for SPPs we want to have a complete tiling path to work with, so we reconstructed the original situation by using these probes for multiple probesets, resulting in the Native-Alien Probeset Assembly.

The links in Table 1 lead to more information on this page about each of the probe classes. Detailed characteristics for individual, or subsets of, probes from the lettuce chip can be retrieved on the Probe Data page.


Table 1. Different classes of probes, with their numbers, on the Lettuce Chip.

Original Probeset Assembly
Native-Alien Probeset Assembly
TI: Tiling Probes
6,410,923
6,901,326
EX: Expression Probes
4,719
TR: Technical Replicate Probes
16,900
AF: Affymetrix Control Probes
2,484
AG: Affymetrix AntiGenomic Probes 33,886
QC: Affymetrix B2 Oligo Grid Probes 13,567
Total
6,482,479
6,972,882




Tiling (TI) probes:

The majority of the probes on the Lettuce Chip belongs to the class of TI probes. These are the probes that are used in the SPP detection analysis. We have the original probeset assembly of the lettuce chip, but also the native-alien probeset assembly, which is currently used for SPP detection. The TI probe class is the only one (of the 6 classes) that is affected by the choice of probeset assembly. The following table shows the differences between the two probeset assemblies, as well as the distribution of probes according to GC content. For the SPP analysis we currently only use the probes with a GC content ranging from 5 to 19.


The CLS_S3 EST assembly was used to design the probes for the lettuce GeneChip. The fasta sequences for all the 36,962 sequences that were used for chip design (TI probes) are available in two files. The larger file contains 35,778 CLS_S3 EST assembly sequences, GenBank sequences and genomic sequences from our laboratory (see footnote 2 of Table 2). The smaller file contains the 1,184 duplicated sequences (see footnote 2 of Table 2) on the GeneChip. The fasta headers for the sequences in this smaller file contain the original CLS_S3 EST assembly identifier as well.

AffyLettChip_TI_ProbeSet_NativeAlien_35778.fasta.zip (8.4 MB)
AffyLettChip_TI_ProbeSet_Duplicates_1184.fasta.zip (0.4 MB)


Table 2. Statistics TI probes on the Lettuce Chip.

Original Probeset Assembly
Native-Alien Probeset Assembly 1
Tiling (TI) probes 2
6,410,923
6,901,326
Probesets with TI probes 2
36,918 (list)
36,962 (list)
TI probes/probeset (avg)
173.7
186.7
TI probes - GC content 1/25
0
0
TI probes - GC content 2/25 0
0
TI probes - GC content 3/25 0
0
TI probes - GC content 4/25 4
4
TI probes - GC content 5/25 554
601
TI probes - GC content 6/25 15,504
16,481
TI probes - GC content 7/25 149,202
159,280
TI probes - GC content 8/25 567,250
607,829
TI probes - GC content 9/25 1,068,802
1,148,539
TI probes - GC content 10/25 1,336,920
1,439,557
TI probes - GC content 11/25 1,255,426
1,353,697
TI probes - GC content 12/25 934,553
1,008,554
TI probes - GC content 13/25 574,696
619,720
TI probes - GC content 14/25 303,144
326,606
TI probes - GC content 15/25 136,960
147,416
TI probes - GC content 16/25 51,633
55,487
TI probes - GC content 17/25 14,342
15,417
TI probes - GC content 18/25 1,820
1,991
TI probes - GC content 19/25 95
123
TI probes - GC content 20/25 14
20
TI probes - GC content 21/25 4
4
TI probes - GC content 22/25 0
0
TI probes - GC content 23/25 0
0
TI probes - GC content 24/25 0
0
TI probes - GC content 25/25 0
0

1 The Native-Alien Probeset Assembly contains 44 more probesets than the Original Probeset Assembly. This means that all the probes in these probesets have a perfect match with another probeset. During the filtering of duplicate probes in the chip design process these were discarded for synthesis on the chip, and thus, the corresponding contigs were not represented on the chip. However, using the Native-Alien Probeset Assembly allows us to bring back these contigs. These 44 contigs may represent the same biological gene, but were not assembled well during EST clustering. Alternatively, these 44 contigs may be paralogs.
The Native-Alien Probeset Assembly contains 1,281 probesets with a total of 12,337 twenty-five bp sequences represented by more than one probe; i.e. multiple probes with the exact same nucleotide sequence were synthesized on the chip.

2 During chip design 46,270 TI probes, in 1,184 probesets, in regions with putative SNPs were duplicated. The duplicates have identical probeset IDs except for the first two letters: LS instead of whatever the original first two letters were. The numbers in Table 2 include the duplicated 'LS' probes.
Besides the unigenes from the EST assembly, we added 93 GenBank sequences (L. sativa) and 57 genomic sequences from our laboratory (L. sativa and L. serriola) to the chip.

Figure 1 shows the distribution of the number of probes per unigene for the original probeset assembly and the native-alien probeset assembly. For visualization purposes the X-axis was cut-off at 750 probes per unigene. The original probeset assembly and the native-alien probeset assembly contained 14 and 20 unigenes, respectively, with a greater number of probes. The maximum number of probes per unigene for the original probeset assembly and the native-alien probeset assembly is 2,337 and 2,443, respectively. For the average number of probes per unigene see Table 2.


Distribution of the number of probes per unigene for the original probeset assembly (red) and the native-alien probeset assembly (blue)
Figure 1. Distribution of the number of probes per unigene for the original probeset assembly (red) and the native-alien probeset assembly (blue).



Figure 2 shows the percentage of probes per GC content for the original probeset assembly and the native-alien probeset assembly. The actual numbers can be found in Table 2. The GC content of the probe is incorporated in the SPP detection algorithm, as the probe's hybridization signal is evaluated based on the AntiGenomic probes with the same GC content.


Percentage of probes per GC content for the original probeset assembly (red) and the native-alien probeset assembly (blue)
Figure 2. Percentage of probes per GC content for the original probeset assembly (red) and the native-alien probeset assembly (blue).



Expression (EX) probes:

Besides the TI probes, we designed some probes using the expression analysis probe design criteria. There are 4,719 of these EX probes, in 1,734 of the TI probesets, and two additional probesets. We currently do not use the data from the EX probes.



Technical Replicate (TR) probes:

We chose 10 lettuce genes, with low, moderate and high expression in most tissues, to design control blocks across the lettuce chip. The chip is arranged in a 13 by 13 grid, so we replicated the control block 169 times. Each block contains 10 probes for each of the 10 genes. The following tables (Tables 3, 4, 5) show the characteristics of these control blocks and probes.


Table 3. Control block layout.
Probe\Gene 1 2 3 4 5 6 7 8 9 10
1 GI-1 UBC9-1 SAG101-1 PDS-1 EDS1-1 UBC10-1 UBQ4-1 GAPC-1 Adh1-1 CHS-1
2 GI-2 UBC9-2 SAG101-2 PDS-2 EDS1-2 UBC10-2 UBQ4-2 GAPC-2 Adh1-2 CHS-2
3 GI-3 UBC9-3 SAG101-3 PDS-3 EDS1-3 UBC10-3 UBQ4-3 GAPC-3 Adh1-3 CHS-3
4 GI-4 UBC9-4 SAG101-4 PDS-4 EDS1-4 UBC10-4 UBQ4-4 GAPC-4 Adh1-4 CHS-4
5 GI-5 UBC9-5 SAG101-5 PDS-5 EDS1-5 UBC10-5 UBQ4-5 GAPC-5 Adh1-5 CHS-5
6 GI-6 UBC9-6 SAG101-6 PDS-6 EDS1-6 UBC10-6 UBQ4-6 GAPC-6 Adh1-6 CHS-6
7 GI-7 UBC9-7 SAG101-7 PDS-7 EDS1-7 UBC10-7 UBQ4-7 GAPC-7 Adh1-7 CHS-7
8 GI-8 UBC9-8 SAG101-8 PDS-8 EDS1-8 UBC10-8 UBQ4-8 GAPC-8 Adh1-8 CHS-8
9 GI-9 UBC9-9 SAG101-9 PDS-9 EDS1-9 UBC10-9 UBQ4-9 GAPC-9 Adh1-9 CHS-9
10 GI-10 UBC9-10 SAG101-10 PDS-10 EDS1-10 UBC10-10 UBQ4-10 GAPC-10 Adh1-10 CHS-10


Table 4. Control block gene ID, gene code, contig corresponding to gene, and Affymetrix quality score for each probe.
Gene GeneCode Contig Affy quality score 0.25 Affy quality score 0.3 Affy quality score 0.325 Affy quality score 0.35 Affy quality score 0.375 Affy quality score 0.4 Affy quality score 0.425 Affy quality score 0.45 Affy quality score 0.5 Affy quality score 0.6
Probe-01 Probe-02 Probe-03 Probe-04 Probe-05 Probe-06 Probe-07 Probe-08 Probe-09 Probe-10
1 GI CLS_S3_Contig10058 CLS_S3_Contig10058-1203 CLS_S3_Contig10058-535 CLS_S3_Contig10058-1369 CLS_S3_Contig10058-607 CLS_S3_Contig10058-357 CLS_S3_Contig10058-775 CLS_S3_Contig10058-627 CLS_S3_Contig10058-91 CLS_S3_Contig10058-331 CLS_S3_Contig10058-793
2 UBC9 CLS_S3_Contig10134 CLS_S3_Contig10134-345 CLS_S3_Contig10134-87 CLS_S3_Contig10134-349 CLS_S3_Contig10134-511 CLS_S3_Contig10134-677 CLS_S3_Contig10134-339 CLS_S3_Contig10134-289 CLS_S3_Contig10134-635 CLS_S3_Contig10134-553 CLS_S3_Contig10134-473
3 SAG101 CLS_S3_Contig11326 CLS_S3_Contig11326-217 CLS_S3_Contig11326-645 CLS_S3_Contig11326-193 CLS_S3_Contig11326-381 CLS_S3_Contig11326-603 CLS_S3_Contig11326-197 CLS_S3_Contig11326-569 CLS_S3_Contig11326-201 CLS_S3_Contig11326-165 CLS_S3_Contig11326-321
4 PDS CLS_S3_Contig1921 CLS_S3_Contig1921-545 CLS_S3_Contig1921-175 CLS_S3_Contig1921-543 CLS_S3_Contig1921-585 CLS_S3_Contig1921-539 CLS_S3_Contig1921-89 CLS_S3_Contig1921-541 CLS_S3_Contig1921-817 CLS_S3_Contig1921-101 CLS_S3_Contig1921-109
5 EDS1 CLS_S3_Contig2153 CLS_S3_Contig2153-431 CLS_S3_Contig2153-669 CLS_S3_Contig2153-305 CLS_S3_Contig2153-393 CLS_S3_Contig2153-821 CLS_S3_Contig2153-749 CLS_S3_Contig2153-89 CLS_S3_Contig2153-753 CLS_S3_Contig2153-595 CLS_S3_Contig2153-225
6 UBC10 CLS_S3_Contig3655 CLS_S3_Contig3655-607 CLS_S3_Contig3655-441 CLS_S3_Contig3655-561 CLS_S3_Contig3655-609 CLS_S3_Contig3655-417 CLS_S3_Contig3655-281 CLS_S3_Contig3655-583 CLS_S3_Contig3655-289 CLS_S3_Contig3655-499 CLS_S3_Contig3655-491
7 UBQ4 CLS_S3_Contig3847 CLS_S3_Contig3847-137 CLS_S3_Contig3847-685 CLS_S3_Contig3847-133 CLS_S3_Contig3847-731 CLS_S3_Contig3847-689 CLS_S3_Contig3847-683 CLS_S3_Contig3847-129 CLS_S3_Contig3847-659 CLS_S3_Contig3847-721 CLS_S3_Contig3847-651
8 GAPC CLS_S3_Contig4694 CLS_S3_Contig4694-365 CLS_S3_Contig4694-371 CLS_S3_Contig4694-1069 CLS_S3_Contig4694-491 CLS_S3_Contig4694-135 CLS_S3_Contig4694-873 CLS_S3_Contig4694-91 CLS_S3_Contig4694-279 CLS_S3_Contig4694-965 CLS_S3_Contig4694-987
9 Adh1 CLS_S3_Contig8861 CLS_S3_Contig8861-983 CLS_S3_Contig8861-459 CLS_S3_Contig8861-847 CLS_S3_Contig8861-817 CLS_S3_Contig8861-505 CLS_S3_Contig8861-755 CLS_S3_Contig8861-535 CLS_S3_Contig8861-365 CLS_S3_Contig8861-217 CLS_S3_Contig8861-233
10 CHS CLSM3570.b1_D05.ab1 CLSM3570.b1_D05.ab1-359 CLSM3570.b1_D05.ab1-411 CLSM3570.b1_D05.ab1-273 CLSM3570.b1_D05.ab1-289 CLSM3570.b1_D05.ab1-311 CLSM3570.b1_D05.ab1-291 CLSM3570.b1_D05.ab1-99 CLSM3570.b1_D05.ab1-547 CLSM3570.b1_D05.ab1-569 CLSM3570.b1_D05.ab1-677


Table 5. Control block GC content of all probes.
Gene\Probe 1 2 3 4 5 6 7 8 9 10
1 11 8 13 8 9 11 8 12 14 12
2 8 10 9 9 11 10 10 9 11 13
3 9 8 8 10 10 10 10 10 12 12
4 7 9 8 11 9 10 9 13 13 14
5 18 9 8 9 10 9 9 10 12 15
6 11 15 9 11 8 10 11 11 10 12
7 12 11 12 11 12 10 12 15 15 12
8 16 15 8 10 12 10 10 11 13 14
9 7 7 11 9 8 9 11 13 12 11
10 16 13 8 8 9 7 8 12 11 15



Affymetrix control (AF) probes:

The lettuce chip contains the standard Affymetrix bacterical control probesets including bioB, bioC, bioD from E. coli and cre from P1 Bacteriophage and Dap, lys, phe, thr, trp from B. subtilis. In total there are 81 probesets with a total of 2,484 probes (1,251 Perfect Match (PM) probes and 1,233 Mismatch (MM) probes). These probesets are the only probesets on the lettuce chip that use the PM-MM system (see also the AntiGenomic (AG) probes).



AntiGenomic (AG) probes:

Instead of the PM-MM system for background hybridization control, the lettuce chip uses a set of AntiGenomic probes (designed by Affymetrix). These probes are used in the SPP detection algorithm to determine if a TI probe hybridizes at a high enough level, compared to the AG probes with the same GC content. The total of 33,886 AG probes have 16,926 socalled RandomGC probes and 16,960 AdditionalGC probes. Table 6 shows the distribution of the AG probes according to GC content.


Table 6. Distribution of the AG probes according to GC content.
GC content (out of 25 nucleotides)
# probes
1
0
2 0
3 50
4 644
5 1,406
6 1,746
7 1,828
8 1,880
9 1,918
10 1,904
11 1,920
12 1,946
13 1,936
14 1,920
15 1,898
16 1,926
17 1,884
18 1,824
19 1,698
20 1,626
21 1,394
22 1,170
23 814
24 536
25 18
Total
33,886



Affymetrix B2 Oligo Grid (QC) probes:

The lettuce chip contains 13,567 B2 oligo probes which are used at the edge of the cells on the chip, internally and in the corners. These probes are used by the GCOS software to align the scanner.





Contact information
Send Email to Hamid Ashrafi
Authors: Hans van Leeuwen, Hamid Ashrafi, Sebastian Reyes Chin-Wo


Last modified: February 06, 2016. 14:45:13 pm PST