The lettuce Affymetrix GeneChip© contains 6,553,600 cells or features, each with a size of 5 µm × 5 µm, organized in 2,560 rows (Y-coordinate) and 2,560 columns (X-coordinate). Table 1 shows how many of these features actually contain probes (Original Probeset Assembly).
The probes on the lettuce chip can be grouped into six different classes (Table 1). The number of TI probes is different for the original probeset assembly and the native-alien probeset assembly, while the number of the other five classes of probes is the same. The native-alien probeset assembly takes into account the native-alien situation of the probes; thus, unigenes are represented by both the native and alien probes matching that unigene. When we received the ~ 11M potential probes from Affymetrix we had to select ~ 6.4 M of these; i.e. we had to discard ~ 4.6 M probes. We saw that part of the 25-mer probes were 100% identical in different unigenes. Thus, we decided to remove multiple copies of identical probes. However, that means that you have one probeset where these probes are included, and other probesets where these probes are not included. When we analyze for SPPs we want to have a complete tiling path to work with, so we reconstructed the original situation by using these probes for multiple probesets, resulting in the Native-Alien Probeset Assembly.
The links in Table 1 lead to more information on this page about each of the probe classes. Detailed characteristics for individual, or subsets of, probes from the lettuce chip can be retrieved on the Probe Data page.
Table 1. Different classes of probes, with their numbers, on the Lettuce Chip.
Tiling (TI) probes:
The majority of the probes on the Lettuce Chip belongs to the class of TI probes. These are the probes that are used in the SPP detection analysis. We have the original probeset assembly of the lettuce chip, but also the native-alien probeset assembly, which is currently used for SPP detection. The TI probe class is the only one (of the 6 classes) that is affected by the choice of probeset assembly. The following table shows the differences between the two probeset assemblies, as well as the distribution of probes according to GC content. For the SPP analysis we currently only use the probes with a GC content ranging from 5 to 19.
The CLS_S3 EST assembly was used to design the probes for the lettuce GeneChip. The fasta sequences for all the 36,962 sequences that were used for chip design (TI probes) are available in two files. The larger file contains 35,778 CLS_S3 EST assembly sequences, GenBank sequences and genomic sequences from our laboratory (see footnote 2 of Table 2). The smaller file contains the 1,184 duplicated sequences (see footnote 2 of Table 2) on the GeneChip. The fasta headers for the sequences in this smaller file contain the original CLS_S3 EST assembly identifier as well.
AffyLettChip_TI_ProbeSet_NativeAlien_35778.fasta.zip (8.4 MB)
AffyLettChip_TI_ProbeSet_Duplicates_1184.fasta.zip (0.4 MB)
Table 2. Statistics TI probes on the Lettuce Chip.
1 The Native-Alien Probeset Assembly contains 44 more probesets than the Original Probeset Assembly. This means that all the probes in these probesets have a perfect match with another probeset. During the filtering of duplicate probes in the chip design process these were discarded for synthesis on the chip, and thus, the corresponding contigs were not represented on the chip. However, using the Native-Alien Probeset Assembly allows us to bring back these contigs. These 44 contigs may represent the same biological gene, but were not assembled well during EST clustering. Alternatively, these 44 contigs may be paralogs.
The Native-Alien Probeset Assembly contains 1,281 probesets with a total of 12,337 twenty-five bp sequences represented by more than one probe; i.e. multiple probes with the exact same nucleotide sequence were synthesized on the chip.
2 During chip design 46,270 TI probes, in 1,184 probesets, in regions with putative SNPs were duplicated. The duplicates have identical probeset IDs except for the first two letters: LS instead of whatever the original first two letters were. The numbers in Table 2 include the duplicated 'LS' probes.
Besides the unigenes from the EST assembly, we added 93 GenBank sequences (L. sativa) and 57 genomic sequences from our laboratory (L. sativa and L. serriola) to the chip.
Figure 1 shows the distribution of the number of probes per unigene for the original probeset assembly and the native-alien probeset assembly. For visualization purposes the X-axis was cut-off at 750 probes per unigene. The original probeset assembly and the native-alien probeset assembly contained 14 and 20 unigenes, respectively, with a greater number of probes. The maximum number of probes per unigene for the original probeset assembly and the native-alien probeset assembly is 2,337 and 2,443, respectively. For the average number of probes per unigene see Table 2.
Figure 1. Distribution of the number of probes per unigene for the original probeset assembly (red) and the native-alien probeset assembly (blue).
Figure 2 shows the percentage of probes per GC content for the original probeset assembly and the native-alien probeset assembly. The actual numbers can be found in Table 2. The GC content of the probe is incorporated in the SPP detection algorithm, as the probe's hybridization signal is evaluated based on the AntiGenomic probes with the same GC content.
Figure 2. Percentage of probes per GC content for the original probeset assembly (red) and the native-alien probeset assembly (blue).
Expression (EX) probes:
Besides the TI probes, we designed some probes using the expression analysis probe design criteria. There are 4,719 of these EX probes, in 1,734 of the TI probesets, and two additional probesets. We currently do not use the data from the EX probes.
Technical Replicate (TR) probes:
We chose 10 lettuce genes, with low, moderate and high expression in most tissues, to design control blocks across the lettuce chip. The chip is arranged in a 13 by 13 grid, so we replicated the control block 169 times. Each block contains 10 probes for each of the 10 genes. The following tables (Tables 3, 4, 5) show the characteristics of these control blocks and probes.
Table 3. Control block layout.
Table 4. Control block gene ID, gene code, contig corresponding to gene, and Affymetrix quality score for each probe.
Table 5. Control block GC content of all probes.
Affymetrix control (AF) probes:
The lettuce chip contains the standard Affymetrix bacterical control probesets including bioB, bioC, bioD from E. coli and cre from P1 Bacteriophage and Dap, lys, phe, thr, trp from B. subtilis. In total there are 81 probesets with a total of 2,484 probes (1,251 Perfect Match (PM) probes and 1,233 Mismatch (MM) probes). These probesets are the only probesets on the lettuce chip that use the PM-MM system (see also the AntiGenomic (AG) probes).
AntiGenomic (AG) probes:
Instead of the PM-MM system for background hybridization control, the lettuce chip uses a set of AntiGenomic probes (designed by Affymetrix). These probes are used in the SPP detection algorithm to determine if a TI probe hybridizes at a high enough level, compared to the AG probes with the same GC content. The total of 33,886 AG probes have 16,926 socalled RandomGC probes and 16,960 AdditionalGC probes. Table 6 shows the distribution of the AG probes according to GC content.
Table 6. Distribution of the AG probes according to GC content.
Affymetrix B2 Oligo Grid (QC) probes:
The lettuce chip contains 13,567 B2 oligo probes which are used at the edge of the cells on the chip, internally and in the corners. These probes are used by the GCOS software to align the scanner.