asein 1-23 peptide ([M + H]+ = 2764.55) and bovine insulin ([M + H]+ = 5730.61), achieving an accuracy in the measurement of peptide mass better than 75 ppm. For the analysis of peptides, the mass spectra were acquired in the positive, reflector ion mode using Delay Extraction (DE) technology. In the reflector ion mode, resolution at the full width/half maximum of the peak was normally ≥5000. Raw data were analyzed using the software program furnished by the manufacturer.
2.10. Peptide Recognition
Signals in the mass spectra were associated with the corresponding tryptic peptides based on the expected molecular mass from the ewe a-La and β-Lg sequences, taking into account enzyme specificity using several bioinformatics tools, such as the GPmaw 5.0 software (Lighthouse data, Odense, Denmark) or other online resources. In addition, the peptide assignment was confirmed by MS/MS fragmentation data.
2.11. Data Analysis
The genepop program was used to estimate gene and haplotype frequencies  both in each flock and in the whole sample.
The COGNOSAG nomenclature guidelines have been followed throughout this paper for naming loci . Thus, the locus name was written in a combination of Latin letters and Arabic numerals indicating the gene position in the cluster, particularly numerals are placed immediately after the gene stem symbol, without any space between the letters and numbers used.
3. RESULTS AND DICUSSION
According to the scientific method the results of the investigation can be described following four steps. First, the observation and description of a phenomenon, namely the presence of four quantitative patterns of β-LGA and β-LGB morphs, observed after phenotyping all samples by PAGIF and RP-HPLC. Second, once the analytical identification of the two molecular species was obtained, the theoretical tools for understanding the phenomenon had to be decided on to formulate a hypothesis for the presence of CNVs at the BLG locus. Third, on the basis of this hypothesis the existence of gene and genotype arrangements had to be inferred and the expected results had to be compared to the observations. Then, based on population data, an analysis was carried out testing Hardy Weinberg and other population parameters to evaluate whether the numbers of the suggested genotype arrangements are consistent with the expected probability. In the following subsections the relevant points in each step are described.
3.1. β-LG Polymorphism
In Figure 1, the common BLGAB phenotype characterized by a 50/50 ratio is compared by RP-HPLC analysis to the three phenotypes exhibiting aberrant expression ratios. As previously described, the six samples showing an aberrant phenotype were analyzed together with two normal one, in order to check whether the molecular species in the aberrant HPLC fractions were to ascribe to those already known or to new β-LG variants.
The protein components were assigned according to the chromatographic order of elution and the identification was confirmed by ESI-MS (Table 1). Based on the relative hydrophobicity of the whey proteins, the pattern of elution showed the a-LA peak, preceded by minor amount of glycosylated a-LA and followed first by β-LGB and second by β-LGA.
The average values of the peak abundances for the major whey proteins, measured by integration of the l = 220 nm UV-chromatograms are reported in Table 2. The percentage values do not include the contribution of the glycosylated a-LA. The relative amount of the two β-LG variants, expressed as the β-LGB/β-LGA ratio, obtained averaging 96 normally polymorphic samples, ranged around 50/50. Unexpectedly, extremely disproportioned β-LGB/β-LGA ratio was observed for the samples II and III (Figure 1), reaching the outermost values of 86.6/ 13.4 and 93.6/6.43 (Table 2), respectively.
Based on these observations attention was focused also to the four samples characterized by a 1.3 ratios ranging around 56/44, the chromatogram of one of which (sample IV) being shown in Figure 1.
The HPLC-ESI MS analysis of the samples II and III, confirmed that the very low abundance peaks occurring at the retention times of the β-LGA contained a protein with the expected MW for the β-LGA.
Table 1. Molecular weights (MW) of the a-LA, β-LGA and bLGB obtained averaging the experimental measures (m) of the eight analyzed ovine milk samples compared to the respective theoretical ones (Th); particularly the signals in the mass spectra were associated with the corresponding tryptic peptides based on the expected molecular mass from the ewe a-La and β-Lg sequences.
Table 2. Section a: b Lactalbumin output of BLG genes (β-LGA = A and β-LGB = B) based on the area percentage of the corresponding peak in the HPLC chromatogram; in the third column the β-LG A/β-LGB rate values evaluated in ovine β-LG polymorphic samples. Section b: Total Percentage values (%) of β-LG and a-LA as well as the related ratio are shown.
In order to confirm the assignment by MALDI-TOF MS peptide mass mapping, the proteins were isolated through a separate RP-HPLC analysis, reduced, alkylated and digested by trypsin. As an example, Figure 2 shows the MALDI spectra of the tryptic digest of the two β-LG variants isolated from the sample II. The assignment of the peptide ion signals, reported in the Figure 2, confirmed the β-LGA identity, distinct from the B variant exclusively for the single amino acid substitution His20®Tyr20.
The amino acid substitution was evident from the MW shift of the signal m/z 2682.4 in the spectrum of variant B replaced by m/z 2707.2 in the spectrum of variant A (Figure 2). Section (a) in Table 2 then provides the output of BLG genes inferred from the RP-HPLC chromatographic peaks while section (b) shows the β-LG percentage values with respect a-LA evaluated in ovine β-LG polymorphic milk samples.
3.2. Differential Expression of Ovine β-LG Variants in Heterozygous Individuals
The results recorded in the β-LG polymorphic individuals might be justified by a non-allelic polymorphism at the BLG locus. Particularly they suggest one triplicated and two duplicated haplotypes. The pattern of quantitative variations indicate a normal condition of a duplicated gene arrangement composed by two copies of the same gene BLGA or BLGB and a rare triplicate one in which a third gene BLGA follows a couple of BLGB genes.
Table 3 shows the assignment of the individual gene relative position in ovine BLG haplotypes based on the model of expression gradient recorded in the a-globin gene cluster [31,32]. The regulation of expression in multigene families  appears to be consistent with the hypotheses of the b-transcriptional regulation of the BLG gene and the presence of the activator protein-2 transcription factor as a modulator of gene expression of β-LG [33,34]. Then the proposed assessment of BLG haplotypes is shown in
Figure 2. The MALDI spectra of the tryptic digest of the two β-LG variants isolated from the sample II and compared one another. The assignment of the peptide ion signals confirmed the β-LGA identity, distinct from the β-LGB variant exclusively for the single amino acid substitution His20®Tyr20. The amino acid substitution was evident from the MW shift of the base peak m/z 2682.4 in the spectrum of variant B replaced by m/z 2707.2 in the spectrum of the A variant.
Table 3. BLG haplotypes and diplotypes based on their expression determined by the relative protein proportions.
Figure 3. Moreover, by examination of the data in Section (b) of Table 2, it emerged a seemingly surplus of protein recorded in the triple BLG haplotypes that may suggest an over-expression related to the increased gene dosage. In fact, there are evidence that increased copy numbers of a particular gene enable the synthesis of an additional amount of protein [32,35,36]. Then the association between the number of BLG genes and the β-Lg/a-La ratio values was checked by the regression analysis. The two variables seem to be positively and linearly correlated, though, owing to the small sample size of the BLG extra numeral phenotypes, the regression coefficient appears not significant.
Generally speaking, nomenclature is a necessity in organizing information about biodiversity; in the case of CNVs related to functional genes, the problem concerns not so much the single gene arrangements (haplotypes) but rather the diplotypes which risk to be very confusing. In our previous experience with a-globin gene arrangements, for sake of simplicity, the haplotypes have been named according to the single-letter amino acid (a. a.) code referred to the point mutation characterizing the genes in the haplotypes . Similarly, the duplicate BLG arrangements, might been tentatively named Y and H; in fact the former arrangement is composed by two copies of the BLGA gene encoding for the β-LGA which exhibits tyrosine (Y) at position 20 while the latter arrangement is composed by two BLGB genes with histidine (H) at codon 20. Accordingly, the triplicated haplotype being composed by
Figure 3. BLG haplotypes as proposed according to the quantitative polymorphism recorded and to the model of expression gradient assessed in the alpha globin gene cluster. The gene efficiency (%) is indicated in parenthesis. b-lactoglobulin genes are indicated by the usual name of the gene products. b-lactoglobulin haplotypes are indicated by the capital letter corresponding to the relevant point mutation of the genes in the cluster (H = His20; Y = Tyr20).
2BLGB + 1BLGA was named HY. Figure 3 represent the suggested three haplotypes while the diplotypes found are listed in Tables 3 and 4.
3.4. BLG Haplotype Frequency
Genotype frequencies at the BLG system in the different flocks and in the whole sample are shown in Table 4. All the flocks were polymorphic as to the BLG system though genotype and haplotype frequency differences between the flocks were observed. Predominance of BLGY haplotype was generally found, the overall frequency being around 0.59%. Remarkable differences in the distribution of the three haplotypes were also evident. As an example, HY frequency ranged from 0.0 to 0.09 in different flocks. As to the overall samples, 37% homozygote for the haplotype H and 15% homozygote individuals for the haplotype Y were found; HY were 47% out of the remaining 51% individuals, exhibiting β-LGA and β-LGB in a 1/1 ratio, while 3% showed different quantitative polymerphism (Table 4). The homozygote samples H/H and Y/Y expressed 100% of β-LG B and 100% of β-LGA respecttively, while the heterozygotes Y/H expressed 50% of β-LGA and 50% of β-LGB. In the triplicated haplotypes, based on the expression gradient [27,32] the ILGB, IILGB and IIILGB genes are expected to exhibit a 30%, 12% and 6% of total β-LG expression respectively (Figure 3). Triplication implies that the HY haplotype β-LGB is encoded by the upstream genes (IBLG and IIBLG) and β-LGA by the downstream gene IIILGB. Then the diplotype arrangements give account for the observed quantitative polymorphism (Table 3).
As to the qualitative polymorphism—that is the occurrence of the β-LG A and B variants in the sheep in this study—the results obtained based on a convenience sampling method are not necessarily representative of the examined breeds. Anyway they are similar to that reported for the Mediterranean dairy sheep populations . Particularly, Valle del Belice frequency values are in agreement with Giaccone et al.  and Comisana exhibit similar values to the data reported by Chiofalo et al. .
By comparing the data shown in Tables 4 and 5 some discrepancies may be noticed between the observed and expected diplotype frequencies. Particularly, in the Comisana sample as well as in the Valle del Belice only the Y/HY diplotype was found, deviation from Hardy-Weinberg equilibrium being supposedly caused by non-random mating practiced in the field conditions; moreover according to the HY haplotype frequency no homozygotes should have been found; conversely, the HY/HY homozygote present in the Laticauda sample was found together with a H/HY heterozygote in the same small flock suggesting that a possible allele concentration might be the result of the overlapping effects of non-random mating practice and inbreeding.
Table 4. Diplotype frequency values recorded in southern Italian sheep. The last row shows the overall weighted frequency (All W) of each diplotype.
Table 5. Haplotype frequency values recorded in southern Italian sheep and diplotype expected frequencies calculated on the basis of the observed haplotype frequencies. The last row shows overall weighted frequency (All W) of each haplotype or diplotype.
In the introductory section, based on experimental and theoretical considerations, we have formulated the hypothesis that the BLG locus is most likely duplicated. Taken together the results of the investigation performed on the ovine β-LG qualitative and quantitative polymorphism seems to confirm our assumption.
However, in our previous investigations on a-globin gene family in buffalo, cattle, and horse [41-43] the proteomic approach allowed us to detect a range of quantitative phenotypes associated to differently duplicated gene arrangements. Conversely, the phenotypic data in this investigation support the BLG duplication only in the case of strong disproportion between the A and B morphs. The duplicated BLG in sheep might then be tandemly duplicated pair of the same BLG either A or B, resulting in a 1:1 ratio—the same as for alleles—which may be the reason why the BLG has always thought as singleton. Then, the findings related to the quantitative polymorphism almost overlapping the pattern of expression of the a-globin triplicated haplotypes open new vistas on the BLG genetic system. Particularly, we infer that the HY triplicate haplotype do exist, but it is spread in different unrelated populations too.
The results presented in this work and related inferences surf on the wave of recent genomic sequence data that provide a substantial evidence for the abundance of duplicated genes in all organisms examined. Duplication events produce additional copies of genomic information, perhaps including one or more genes. Gene duplication generates functional redundancy. The presence of duplicate genes may also confer a selective advantage simply because extra amounts of protein or RNA products are provided . This applies primarily to strongly expressed genes, which encode high demand products , so that BLG may be considered one of them. An analysis of mammalian gene sequences suggested that gene duplication is as important as redundant metabolic networks . Hypothesis explaining how two paralogous genes maintain the same function after duplication generally reflect a history of concerted evolution, mediated by gene conversion and/or unequal crossing over [46-48]. However, the population genetic analysis by Hurst and Smith  also suggested that the conditions for gene conversion to be favored selectively are relatively restrictive. In line with these conclusions, Nei and co-workers [50,51], reexamining several large gene families that were previously thought to be under concerted evolution, suggested that purifying selection is much more important than is gene conversion in maintaining common functions of these duplicated genes. To date most mammals have been shown to possess a tandemly duplicated pair of adult aglobin genes that have identical coding sequences and therefore encode identical polypeptides [52,53].
In conclusion, while not underestimating the key importance of molecular genetic approach in detecting and confirming the CNV—whose methods have been recently reviewed by —the phenotypic characterization by advanced proteomic techniques confirm itself a valuable complementary or alternative approach to infer information about the copy number polymorphisms.