US20080233565A1

US20080233565A1 - PKHDL1, a homolog of the autosomal recessive kidney disease gene

Info

Publication number: US20080233565A1
Application number: US11/202,548
Authority: US
Inventors: Peter C. Harris; Marie C. Hogan; Christopher J. Ward; Matthew D. Griffin
Original assignee: Mayo Foundation for Medical Education and Research
Current assignee: Mayo Foundation for Medical Education and Research
Priority date: 2003-02-12
Filing date: 2005-08-12
Publication date: 2008-09-25
Also published as: WO2004072268A3; WO2004072268A2; US20110256558A1; US20100273190A1; US8323916B2

Abstract

Nucleic acids encoding fibrocystin-L polypeptides and fibrocystin-L polypeptides are provided. Antibodies against the polypeptides, vectors and host cells containing the nucleic acids, methods for using the nucleic acids and polypeptides, and compositions and articles of manufacture also are provided.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part and claims benefit under 35 U.S.C. § 119(a) of International Application No. PCT/US2004/004300 having an International Filing Date of Feb. 12, 2004, which published in English as International Publication Number WO 2004/072268, and which claims priority to U.S. Provisional Application Ser. No. 60/446,860, filed Feb. 12, 2003.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

Funding for the work described herein was provided in part by the federal government: National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) grants DK58816, DK59597, and DK59505. The federal government may have certain rights in the invention.

TECHNICAL FIELD

This invention relates to PKHDL1, a homolog of the autosomal recessive kidney disease gene, and more particularly, to PKHDL1 nucleic acids and polypeptides, and variants thereof.

BACKGROUND

Autosomal recessive polycystic kidney disease (ARPKD) is an important renal cause of death in the perinatal period and of childhood renal failure. Neonatal disease presentation is typical, and characterized by greatly enlarged kidneys due to fusiform dilation of collecting ducts; congenital hepatic fibrosis is often a major complication in older patients. Progress toward understanding this complex disorder has recently been made by the identification of the disease-causing gene, PKHD1, in chromosome region 6p12. PKHD1 is a very large gene (˜470 kb) containing 67 exons and an open reading frame (ORF) of 12,222 bp. PKHD1 has a tissue-specific expression pattern with the highest levels in fetal and adult kidney and lower levels in liver, pancreas and lung. The murine ortholog, Pkhd1, has recently been described.
A notable feature of both the human and murine genes is that multiple different splice forms may be generated. Visualization of PKHD1 transcripts by northern analysis has proved difficult with a smear of products often detected. These may represent multiple splice forms, unusual sensitivity of this transcript to degradation, or a combination of these factors. In situ hybridization of the murine transcript showed expression in the developing kidney and mature collecting ducts, plus ductal plate and bile ducts in the liver. Other sites of expression during development detected by in situ analysis were: large vessels, testis, sympathetic ganglia, pancreas and trachea with evidence that some sites of expression may be of specific splice forms.
The PKHD1 encoded protein, fibrocystin, is large (4074 aa) and predicted to be an integral membrane protein with a large extracellular region and a short cytoplasmic tail. Fibrocystin is not closely related to any other characterized protein, although it contains multiple copies of a defined domain and has regions of homology to other proteins; it seems to represent the founder member of a new protein family. The only well characterized domain in fibrocystin is the TIG/IPT (immunoglobulin-like fold shared by plexins and transcription factors) that is also found in the hepatocyte growth factor receptor (HGFR), plexins and other receptor molecules. Although fibrocystin has many more copies of this domain than these other proteins, the presence of the TIG domain, along with the structure of the protein, suggested that it may also act as a receptor.

SUMMARY

The invention is based on the identification, cloning, and sequence analysis of PKHDL1 and Pkhdl1, human and murine homologs, respectively, of the ARPKD gene PKHD1. The PKHD1 homologs encode fibrocystin-L, a large receptor protein (approximately 466 kDa) that contains a signal peptide, a single transmembrane domain, and a short cytoplasmic tail. Fibrocystin-L has low, but highly significant, homology to fibrocystin over the entire length of the protein, except the extreme C-terminal region containing the predicted transmembrane domain and cytoplasmic tail. This level of homology is greater than that seen between polycystin homologs, establishing the fibrocystins as a new protein family. PKHDL1 expression is up-regulated specifically in T lymphocytes and may have a role in cellular immunity. PKHDL1 expression also is up-regulated in endometrial cancer and other cancers, including breast, ovarian, and colon cancers.
In one aspect, the invention features an isolated nucleic acid that includes a sequence encoding a fibrocystin-L polypeptide. The fibrocystin-L polypeptide can be encoded by SEQ ID NO:1 or SEQ ID NO:2, and can include the amino acid sequence of SEQ ID NO:3 or SEQ ID NO:4. The fibrocystin-L polypeptide can include an amino acid sequence variant at a position selected from the group consisting of: position 702, position 1192, position 1199, position 1223, position 1514, position 1607, position 1638, position 3050, position 3607, and position 4220 of SEQ ID NO:3. For example, the amino acid sequence variant can be selected from the group consisting of: Pro at position 702, Ala at position 1192, Ser at position 1199, Val at position 1223, Ser at position 1514, Ile at position 1607, Cys at position 1638, Gln at position 3050, Glu at position 3607, and Ile at position 4220. The isolated nucleic acid can include a sequence variant with respect to SEQ ID NO:1, e.g., a sequence variant at a position selected from the group consisting of: position 1227, position 1404, position 1920, position 1965, position 2105, position 3574, position 3599, position 3668, position 4540, position 4819, position 4913, position 6621, position 9084, position 9150, position 10821, and position 12658 of SEQ ID NO:1. The sequence variant can be selected from the group consisting of: A at position 1227, T at position 1404, C at position 1920, G at position 1965, C at position 2105, G at position 3574, C at position 3599, T at position 3668, A at position 4540, A at position 4819, G at position 4913, G at position 6621, T at position 9084, G at position 9150, A at position 10821, and A at position 12658.
In another aspect, the invention features an isolated nucleic acid encoding a fibrocystin polypeptide, wherein the nucleic acid includes at least 300 contiguous nucleotides of SEQ ID NO:1 or a sequence variant thereof. The invention also features a vector that includes such isolated nucleic acids and host cells including the vector.
The invention also features an isolated nucleic acid 10 to 1700 nucleotides in length, the nucleic acid including a sequence, the sequence including one or more sequence variants relative to the sequence of SEQ ID NO:1, wherein the sequence is at least 80% identical over its length to the corresponding sequence in SEQ ID NO:1. The sequence variant can be at a position selected from the group consisting of: position 1227, position 1404, position 1920, position 1965, position 2105, position 3574, position 3599, position 3668, position 4540, position 4819, position 4913, position 6621, position 9084, position 9150, position 10821, and position 12658 of SEQ ID NO:1.
In yet another aspect, the invention features a plurality of oligonucleotide primer pairs (e.g., at least three, 13, 16, or 23 primer pairs), wherein each primer is 10 to 50 nucleotides in length, and wherein each primer pair, in the presence of mammalian genomic DNA and under polymerase chain reaction conditions, produces a nucleic acid product corresponding to a region of an PKHDL1 nucleic acid molecule, wherein the product is 30 to 1700 nucleotides in length. The nucleic acid product can include a nucleotide sequence variant relative to SEQ ID NO:1.
The invention also features a composition that includes a first oligonucleotide primer and a second oligonucleotide primer, wherein the first oligonucleotide primer and the second oligonucleotide primer are each 10 to 50 nucleotides in length, and wherein the first and second primers, in the presence of mammalian genomic DNA and under polymerase chain reaction conditions, produce a nucleic acid product corresponding to a region of a PKHDL1 nucleic acid molecule, wherein the product is 30 to 1700 nucleotides in length. The nucleic acid product can include a nucleotide sequence variant relative to SEQ ID NO:1. Isolated nucleic acids that include the nucleotide sequence of SEQ ID NO:1 or SEQ ID NO:2, or the complement of SEQ ID NO:1 or SEQ ID NO:2, also are featured.
In yet another aspect, the invention features an antibody having specific binding affinity for a fibrocystin-L polypeptide. Such antibodies can be used to detect fibrocystin-L in biological samples.
The invention also features a method for determining if a subject has altered cellular immunity. The method includes providing a nucleic acid sample (e.g., genomic DNA) from the subject, and determining whether the nucleic acid sample contains one or more sequence variants within the PKHDL1 gene of the subject relative to a wild-type PKHDL1 gene, wherein the presence of the one or more sequence variants is associated with altered cellular immunity in the subject. The determining step can be performed by denaturing high performance liquid chromatography or direct sequencing. The variant can be at position 2105, position 3574, position 3599, position 3668, position 4540, position 4913, or position 9150 of SEQ ID NO:1, or other positions. The method further can include identifying the sequence variant by DNA sequencing.
In yet another aspect, the invention features an article of manufacture that includes a substrate, wherein the substrate includes a population of isolated nucleic acid molecules, wherein each nucleic acid molecule is 10 to 1000 nucleotides in length, wherein each nucleic acid molecule includes a different nucleotide sequence variant relative to the sequence of SEQ ID NO:1, and wherein the nucleic acid molecule is at least 80% identical over its length to the corresponding sequence in SEQ ID NO:1.
The invention also features a method for monitoring the immune response of a patient after vaccination. The method includes a) providing a biological sample from the patient after vaccination; b) determining the number of fibrocystin-L expressing T-cells in the biological sample; and c) comparing the number of fibrocystin-L expressing T-cells to a baseline number of fibrocystin-L expressing T-cells before vaccination.
The invention also features a method for detecting endometrial cancer. The method includes detecting the level of fibrocystin-L expression in a biological sample (e.g., endometrial tissue sample) from a patient. An increase in the level of fibrocystin-L in the sample relative to the level in a corresponding control sample is indicative of the presence of endometrial cancer. Fibrocystin-L expression can be assessed by detecting the polypeptide (e.g., by immunohistochemistry, western blotting, or an ELISA) or by analysis of mRNA levels.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used to practice the invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 depicts expression analysis of PKHDL1/Pkhdl1. FIG. 1A is an autoradiograph of a murine, multiple tissue (adult) Northern blot, hybridized with Pkhdl1 and showing weak smears in most lanes. FIGS. 1B-1D are RT-PCR analyses of (B) newborn and adult murine tissues with Pkhdl1 and β-actin control showing widespread low level expression of Pkhdl1; (C) human cell-lines SW13, adrenal carcinoma; ACHN, renal adenocarcinoma; HEK293, embryonic kidney; HT29, colonic adenocarcinoma; G-CCM, astrocytoma; Fib, skin fibroblasts; G-401, renal Wilm's tumor; Hep 3B, hepatoma, HeLa, cervical carcinoma; K562, erythroleukemia; lymph, EBV-transformed B-lymphocytes and MCF-7, breast cancer, with PKHDL1 and GAPDH and (D) various murine leukocyte populations as indicated with 30 and 40 cycles of PCR for Pkhdl1, and β-actin control. NK=natural killer cells and DCs=dendritic cells.

FIG. 2A is a Clustal W alignment of human fibrocystin (fib) (SEQ ID NO: 5) and human fibrocystin-L (fibL) (SEQ ID NO: 3) from their N-termini to positions 3849 aa and 4185 aa, respectively. Black boxes show identities and shaded boxes similarities. Conserved domains (TIG; thin lines) and defined regions of homology (TMEM; thick lines) are indicated by a dashed line for fibrocystin-L and a solid line for fibrocystin. A “♦” indicates the start and stop of the dashed line. The predicted signal peptide cleavage sites are indicated with arrowheads.

FIG. 2B is an alignment of the 14 TIG domains of fibrocystin-L (fibL) (SEQ ID NO: 6-19) compared to fibrocystin (fib) TIG 5 (SEQ ID NO: 20), HGFR (murine) TIG 1 (SEQ ID NO: 21), plexin (murine) TIG 2 (SEQ ID NO: 22) and a receptor TIG consensus (SEQ ID NO: 23).

FIG. 2C is an alignment of the TMEM-A (A) and -B (B) regions of fibrocystin (fib) (SEQ ID NOS: 24 and 25) and fibrocystin-L (fibL) (SEQ ID NOS: 26 and 27) to a hypothetical protein from the bacteria Chloroflexus aurantiacus (chlor) (SEQ ID NO: 28) and human proteins of unknown function, TMEM2 (TMEM) (SEQ ID NO: 29) and XP051860 (51860) (SEQ ID NO: 30). These human proteins are gapped by the removal of 165 aa and 251 aa, respectively, as shown.

FIG. 3 is a diagram comparing the proposed structures of fibrocystin and fibrocystin-L.

FIG. 4 is the nucleotide sequence of the PKHDL1 cDNA (SEQ ID NO:1).

FIG. 5 is the nucleotide sequence of the Pkhdl1cDNA (SEQ ID NO:2).

FIG. 6 is the amino acid sequence of human fibrocystin-L polypeptide (SEQ ID NO:3).

FIG. 7 is the amino acid sequence of murine fibrocystin-L polypeptide (SEQ ID NO:4).

FIG. 8A is a schematic of the N terminal fusion expression construct of fibrocystin-L (371 amino acids of fibrocystin-L).

FIG. 8B is a schematic of the PET43 vector.

FIG. 8C is a schematic of the PK-FLAG tagged fibrocystin-L expressed in PEAK cells.

FIG. 8D is a schematic of the antibody-screening strategy using PEAK cell membranes by western blot.

DETAILED DESCRIPTION

In general, the invention features PKHDL1 nucleic acids and polypeptides. As used herein “PKHDL1” nucleic acids refers to both the human PKHDL1 gene and the murine Pkhdl1 gene. PKHDL1 nucleic acids can encode fibrocystin-L polypeptides. Identification of fibrocystin-L should greatly aid the understanding of the structure and function of the fibrocystin protein family. Furthermore, PKHDL1 nucleic acids may have a role in cellular immunity as such nucleic acids can be specifically up-regulated in T lymphocytes following activation, and in cancers such as endometrial cancer as expressed PKHDL1 nucleic acids are overrepresented in endometrial adenocarcinomas and fibrocystin-L expression is up-regulated in endometrial cancer relative to normal endometrial tissue.

1. Isolated PKHDL1 Nucleic Acid Molecules

As used herein, the term “nucleic acid” refers to both RNA and DNA, including cDNA, genomic DNA, and synthetic (e.g., chemically synthesized) DNA. The nucleic acid can be double-stranded or single-stranded (i.e., a sense or an antisense single strand). As used herein, “isolated nucleic acid” refers to a nucleic acid that is separated from other nucleic acid molecules that are present in a mammalian genome, including nucleic acids that normally flank one or both sides of the nucleic acid in a mammalian genome (e.g., nucleic acids that flank a PKHDL1 gene). The term “isolated” as used herein with respect to nucleic acids also includes any non-naturally-occurring nucleic acid sequence, since such non-naturally-occurring sequences are not found in nature and do not have immediately contiguous sequences in a naturally-occurring genome.
An isolated nucleic acid can be, for example, a DNA molecule, provided one of the nucleic acid sequences normally found immediately flanking that DNA molecule in a naturally-occurring genome is removed or absent. Thus, an isolated nucleic acid includes, without limitation, a DNA molecule that exists as a separate molecule (e.g., a chemically synthesized nucleic acid, or a cDNA or genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences as well as DNA that is incorporated into a vector, an autonomously replicating plasmid, a virus (e.g., a retrovirus, lentivirus, adenovirus, or herpes virus), or into the genomic DNA of a prokaryote or eukaryote. In addition, an isolated nucleic acid can include an engineered nucleic acid such as a DNA molecule that is part of a hybrid or fusion nucleic acid. A nucleic acid existing among hundreds to millions of other nucleic acids within, for example, cDNA libraries or genomic libraries, or gel slices containing a genomic DNA restriction digest, is not to be considered an isolated nucleic acid.
Isolated PKHDL1 nucleic acid molecules are at least 10 nucleotides in length (e.g., 10, 20, 50, 100, 200, 300, 400, 500, 1000, or more nucleotides in length). As described in the Examples (below), the full-length human PKHDL1 transcript contains 78 exons and is 13081 nucleotides in length, with a coding region that is 12,729 nucleotides in length (FIG. 4; SEQ ID NO:1). The full-length murine transcript has a coding region that is 12,747 nucleotides in length (FIG. 5; SEQ ID NO:2). A PKHDL1 nucleic acid molecule therefore is not required to contain all of the coding region listed in SEQ ID NO:1 or 2 or all of the exons; in fact, a PKHDL1 nucleic acid molecule can contain as little as a single exon (as listed in Table 2, for example) or a portion of a single exon (e.g., 10 nucleotides from a single exon). In some embodiments, the PKHDL1 transcript is alternatively spliced, which can remove a portion of an exon, a single exon, or multiple exons from the transcript. Nucleic acid molecules that are less than full-length can be useful, for example, for diagnostic purposes.
Nucleic acid molecules of the invention may have sequences identical to those found in SEQ ID NO:1 or SEQ ID NO:2. Nucleic acid molecules also can have sequences identical to those found in the complement of SEQ ID NO:1 or SEQ ID NO:2. Alternatively, the sequence of a PKHDL1 nucleic acid molecule may contain one or more variants relative to the sequences set forth in SEQ ID NO:1 or SEQ ID NO:2, or the complement of SEQ ID NO:1 or SEQ ID NO:2. As used herein, a “sequence variant” refers to any mutation that results in a difference between nucleotides at one or more positions within the nucleic acid sequence of a particular nucleic acid molecule and the nucleotides at the same positions within the corresponding wild-type sequence set forth in SEQ ID NO:1 or SEQ ID NO:2. Nucleotides are referred to herein by the standard one-letter designation (A, C, G, or T). Sequence variants can be found in coding and non-coding regions, including exons, introns, promoters, and untranslated sequences. The presence of one or more sequence variants in the PKHDL1 nucleic acid sequence of a subject can be detected as set forth below in subsection 8.
Sequence variants can be, for example, deletions, insertions, or substitutions at one or more nucleotide positions (e.g., 1, 2, 3, 10, or more than 10 positions), provided that the nucleic acid is at least 80% identical (e.g., 80%, 85%, 90%, 95%, or 99% identical) over its length to the corresponding region of the wild-type sequences set forth in SEQ ID NO:1 or SEQ ID NO:2. The human and murine coding regions are 84.1% identical. Percent sequence identity is calculated by determining the number of matched positions in aligned nucleic acid sequences, dividing the number of matched positions by the total number of aligned nucleotides, and multiplying by 100. A matched position refers to a position in which identical nucleotides occur at the same position in aligned nucleic acid sequences. Percent sequence identity also can be determined for any amino acid sequence. To determine percent sequence identity, a target nucleic acid or amino acid sequence is compared to the identified nucleic acid or amino acid sequence using the BLAST 2 Sequences (Bl2seq) program from the stand-alone version of BLASTZ containing BLASTN version 2.0.14 and BLASTP version 2.0.14. This stand-alone version of BLASTZ can be obtained from the State University of New York—Old Westbury campus library as well as at Fish & Richardson's web site (world wide web at fr.com/blast) or the U.S. government's National Center for Biotechnology Information web site (world wide web at ncbi.nlm.nih.gov/blast/executables). Instructions explaining how to use the Bl2seq program can be found in the readme file accompanying BLASTZ.
Bl2seq performs a comparison between two sequences using either the BLASTN or BLASTP algorithm. BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare amino acid sequences. To compare two nucleic acid sequences, the options are set as follows: -i is set to a file containing the first nucleic acid sequence to be compared (e.g., C:\seq1.txt); -j is set to a file containing the second nucleic acid sequence to be compared (e.g., C:\seq2.txt); -p is set to blastn; -o is set to any desired file name (e.g., C:\output.txt); -q is set to −1; -r is set to 2; and all other options are left at their default setting. The following command will generate an output file containing a comparison between two sequences: C:\Bl2seq -i c:\seq1.txt -j c:\seq2.txt -p blastn -o c:\output.txt -q -1-r 2. If the target sequence shares homology with any portion of the identified sequence, then the designated output file will present those regions of homology as aligned sequences. If the target sequence does not share homology with any portion of the identified sequence, then the designated output file will not present aligned sequences.
Once aligned, a length is determined by counting the number of consecutive nucleotides from the target sequence presented in alignment with sequence from the identified sequence starting with any matched position and ending with any other matched position. A matched position is any position where an identical nucleotide is presented in both the target and identified sequence. Gaps presented in the target sequence are not counted since gaps are not nucleotides. Likewise, gaps presented in the identified sequence are not counted since target sequence nucleotides are counted, not nucleotides from the identified sequence.
The percent identity over a particular length is determined by counting the number of matched positions over that length and dividing that number by the length followed by multiplying the resulting value by 100. For example, if (1) a 1000 nucleotide target sequence is compared to the sequence set forth in SEQ ID NO:1, (2) the Bl2seq program presents 200 nucleotides from the target sequence aligned with a region of the sequence set forth in SEQ ID NO: 1 where the first and last nucleotides of that 200 nucleotide region are matches, and (3) the number of matches over those 200 aligned nucleotides is 180, then the 1000 nucleotide target sequence contains a length of 200 and a percent identity over that length of 90 (i.e., 180÷200×100=90).
It will be appreciated that different regions within a single nucleic acid target sequence that aligns with an identified sequence can each have their own percent identity. It is noted that the percent identity value is rounded to the nearest tenth. For example, 78.11, 78.12, 78.13, and 78.14 are rounded down to 78.1, while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to 78.2. It also is noted that the length value will always be an integer.
Sequence variants that are deletions or insertions can create frame-shifts within the coding region that alter the amino acid sequence of the encoded polypeptide, and thus can affect its structure and function.
Substitutions include silent mutations that do not affect the amino acid sequence of the encoded polypeptide, missense mutations that alter the amino acid sequence of the encoded polypeptide, and nonsense mutations that prematurely terminate and therefore truncate the encoded polypeptide. Non-limiting examples of silent mutations are included in Table 5 (e.g., A substituted for G at position 1227 of SEQ ID NO:1, T substituted for C at position 1404 of SEQ ID NO:1, C substituted for T at position 1920 of SEQ ID NO:1, G substituted for A at position 1965 of SEQ ID NO:1, G substituted for C at position 6621 of SEQ ID NO:1, T substituted for A at position 9084 of SEQ ID NO:1). Non-limiting examples of missense mutations are included in Table 5 (e.g., G substituted for A at position 490, C substituted for A at position 2105 of SEQ ID NO:1, G substituted for A at position 3574 of SEQ ID NO:1, C substituted for T at position 3599 of SEQ ID NO:1, T substituted for G at position 3668 of SEQ ID NO:1, A substituted for C at position 4540 of SEQ ID NO:1, A substituted for G at position 4819 of SEQ ID NO:1, G substituted for A at position 4913 of SEQ ID NO:1, G substituted for C at position 9150 of SEQ ID NO:1, A substituted for C at position 10821 of SEQ ID NO:1, and A substituted for G at position 12658 of SEQ ID NO:1).
Deletion, insertion, and substitution sequence variants can create or destroy splice sites and thus alter the splicing of a PKHDL1 transcript, such that the encoded polypeptide contains a deletion or insertion relative to the polypeptide encoded by the corresponding wild-type nucleic acid sequences set forth in SEQ ID NO:1 or SEQ ID NO:2. Sequence variants that affect splice sites of PKHDL1 nucleic acid molecules can result in fibrocystin-L polypeptides that lack the amino acids encoded by particular exons or portions thereof.
Certain sequence variants described herein may be associated with ARPKD. Such sequence variants typically result in a change in the encoded polypeptide that can have a dramatic effect on the function of the polypeptide. These changes can include, for example, a truncation, a frame-shifting alteration, or a substitution at a highly conserved position. Conserved positions can be identified by inspection of a nucleotide or amino acid sequence alignment showing related nucleic acids or polypeptides from different species (e.g., the sequence alignments shown in FIGS. 2B and 2C). For example, the non-conservative substitution of a proline at amino acid 702 for a glutamine may be associated with ARPKD. In some ARPKD patients, the same ARPKD-associated sequence variant can be found on both alleles. In other patients, a combination of ARPKD-associated sequence variants can be found on separate alleles of an ARPKD gene.
Other sequence variants described herein include polymorphisms that occur within a normal population and typically are not associated with ARPKD. Sequence variants of this type can be, for example, nucleotide substitutions (e.g., silent mutations) that do not alter the amino acid sequence of the encoded fibrocystin-L polypeptide, or alterations that alter the amino acid sequence but that do not affect the overall function of the polypeptide. With respect to SEQ ID NO:1, sequence variants that are polymorphisms can include, for example, an A at position 4540 of SEQ ID NO:1 or an A at position 12658 of SEQ ID NO:1.
2. Production of isolated PKHDL1 Nucleic Acid Molecules
Isolated nucleic acid molecules of the invention can be produced by standard techniques, including, without limitation, common molecular cloning and chemical nucleic acid synthesis techniques. For example, polymerase chain reaction (PCR) techniques can be used to obtain an isolated PKHDL1 nucleic acid molecule. PCR refers to a procedure or technique in which target nucleic acids are enzymatically amplified. Sequence information from the ends of the region of interest or beyond typically is employed to design oligonucleotide primers that are identical in sequence to opposite strands of the template to be amplified. PCR can be used to amplify specific sequences from DNA as well as RNA, including sequences from total genomic DNA or total cellular RNA. Primers are typically 14 to 40 nucleotides in length, but can range from 10 nucleotides to hundreds of nucleotides in length. General PCR techniques are described, for example in PCR Primer: A Laboratory Manual, Ed. by Dieffenbach, C. and Dveksler, G, Cold Spring Harbor Laboratory Press, 1995.
When using RNA as a source of template, reverse transcriptase can be used to synthesize complementary DNA (cDNA) strands. Ligase chain reaction, strand displacement amplification, self-sustained sequence replication or nucleic acid sequence-based amplification also can be used to obtain isolated nucleic acids. See, for example, Lewis (1992) Genetic Engineering News 12(9): 1; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87:1874-1878; and Weiss (1991) Science 254:1292-1293.
In one embodiment, a primer is a single-stranded or double-stranded oligonucleotide that typically is 10 to 50 nucleotides in length, and when combined with mammalian genomic DNA and subjected to PCR conditions, is capable of being extended to produce a nucleic acid product corresponding to a region of a PKHDL1 nucleic acid molecule. Typically, a PKHDL1 PCR product is 30 to 1700 nucleotides in length (e.g., 30, 35, 50, 100, 250, 500, 1000, 1500, or 1650 nucleotides in length). Specific regions of mammalian DNA can be amplified (i.e., replicated such that multiple exact copies are produced) when a pair of oligonucleotide primers is used in the same PCR reaction, wherein one primer contains a nucleotide sequence from the coding strand of a PKHDL1 nucleic acid and the other primer contains a nucleotide sequence from the non-coding strand of a PKHDL1 nucleic acid. The “coding strand” of a nucleic acid is the nontranscribed strand, which has the same nucleotide sequence as the specified RNA transcript (with the exception that the RNA transcript contains uracil in place of thymidine residues), while the “non-coding strand” of a nucleic acid is the strand that serves as the template for transcription.
A single PCR reaction mixture may contain one pair of oligonucleotide primers. Alternatively, a single reaction mixture may contain a plurality of oligonucleotide primer pairs, in which case multiple PCR products can be generated. Each primer pair can amplify, for example, one exon or a portion of one exon. Intron sequences also can be amplified.
Oligonucleotide primers can be incorporated into compositions. Typically, a composition of the invention will contain a first oligonucleotide primer and a second oligonucleotide primer, each 10 to 50 nucleotides in length, which can be combined with genomic DNA from a mammal and subjected to PCR conditions as set out below, to produce a nucleic acid product that corresponds to a region of a PKHDL1 nucleic acid molecule. A composition also may contain buffers and other reagents necessary for PCR (e.g., DNA polymerase or nucleotides). Furthermore, a composition may contain one or more additional pairs of oligonucleotide primers (e.g., 3, 13, 16, or 23 primer pairs), such that multiple nucleic acid products can be generated.
Specific PCR conditions typically are defined by the concentration of salts (e.g., MgCl₂) in the reaction buffer, and by the temperatures utilized for melting, annealing, and extension. Specific concentrations or amounts of primers, templates, deoxynucleotides (dNTPs), and DNA polymerase also may be set out. For example, PCR conditions with a buffer containing 2.5 mM MgCl₂, and melting, annealing, and extension temperatures of 94° C., 44-65° C., and 72° C., respectively, are particularly useful. Under such conditions, a PCR sample can include, for example, 60 ng genomic DNA, 8 mM each primer, 200 pM dNTPs, 1 U DNA polymerase (e.g., AmpliTaq Gold), and the appropriate amount of buffer as specified by the manufacturer of the polymerase (e.g., 1×AmpliTaq Gold buffer). Denaturation, annealing, and extension each may be carried out for 30 seconds per cycle, with a total of 25 to 35 cycles, for example. An initial denaturation step (e.g., 94° C. for 2 minutes) and a final elongation step (e.g., 72° C. for 10 minutes) also may be useful.
Isolated nucleic acids of the invention also can be chemically synthesized, either as a single nucleic acid molecule (e.g., using automated DNA synthesis in the 3′ to 5′ direction using phosphoramidite technology) or as a series of oligonucleotides. For example, one or more pairs of long oligonucleotides (e.g., >100 nucleotides) can be synthesized that contain the desired sequence, with each pair containing a short segment of complementarity (e.g., about 15 nucleotides) such that a duplex is formed when the oligonucleotide pair is annealed. DNA polymerase is used to extend the oligonucleotides, resulting in a single, double-stranded nucleic acid molecule per oligonucleotide pair, which then can be ligated into a vector.
Isolated nucleic acids of the invention also can be obtained by mutagenesis. For example, the reference sequence depicted in FIG. 4 or 5 can be mutated using standard techniques including oligonucleotide-directed mutagenesis and site-directed mutagenesis through PCR. See, Short Protocols in Molecular Biology, Chapter 8, Green Publishing Associates and John Wiley & Sons, Edited by Ausubel et al., 1992. Examples of positions that can be modified are described above and in Table 5, as well as in the alignments of FIGS. 2B and 2C.

3. Vectors and Host Cells

The invention also provides vectors containing nucleic acids such as those described above. As used herein, a “vector” is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment. The vectors of the invention can be expression vectors. An “expression vector” is a vector that includes one or more expression control sequences, and an “expression control sequence” is a DNA sequence that controls and regulates the transcription and/or translation of another DNA sequence.
In the expression vectors of the invention, the nucleic acid is operably linked to one or more expression control sequences. As used herein, “operably linked” means incorporated into a genetic construct so that expression control sequences effectively control expression of a coding sequence of interest. Examples of expression control sequences include promoters, enhancers, and transcription terminating regions. A promoter is an expression control sequence composed of a region of a DNA molecule, typically within 100 nucleotides upstream of the point at which transcription starts (generally near the initiation site for RNA polymerase II). To bring a coding sequence under the control of a promoter, it is necessary to position the translation initiation site of the translational reading frame of the polypeptide between one and about fifty nucleotides downstream of the promoter. Suitable promoters can be tissue-specific (i.e., capable of directing expression of a nucleic acid preferentially in a particular cell type.
Non-limiting examples of tissue-specific promoters include the lymphoid-specific promoters (Calame and Eaton (1988), Adv. Immunol., 43:235-257) and T-cell specific promoters (Winoto and Baltimore (1989), EMBO J., 8:729-733). Enhancers provide expression specificity in terms of time, location, and level. Unlike promoters, enhancers can function when located at various distances from the transcription site. An enhancer also can be located downstream from the transcription initiation site. A coding sequence is “operably linked” and “under the control” of expression control sequences in a cell when RNA polymerase is able to transcribe the coding sequence into mRNA, which then can be translated into the protein encoded by the coding sequence.
Suitable expression vectors include, without limitation, plasmids and viral vectors derived from, for example, bacteriophage, baculoviruses, tobacco mosaic virus, herpes viruses, cytomegalovirus, retroviruses, vaccinia viruses, adenoviruses, and adeno-associated viruses. Numerous vectors and expression systems are commercially available from such corporations as Novagen (Madison, Wis.), Clontech (Palo Alto, Calif.), Stratagene (La Jolla, Calif.), and Invitrogen/Life Technologies (Carlsbad, Calif.). For example, the pET-43a⁺ vector from Novagen can be used.
An expression vector can include a tag sequence designed to facilitate subsequent manipulation of the expressed nucleic acid sequence (e.g., purification or localization). Tag sequences, such as green fluorescent protein (GFP), glutathione S-transferase (GST), polyhistidine, c-myc, hemagglutinin, or Flag™ tag (Kodak, New Haven, Conn.) sequences typically are expressed as a fusion with the encoded polypeptide. Such tags can be inserted anywhere within the polypeptide including at either the carboxyl or amino terminus.
The invention also provides host cells containing vectors of the invention. The term “host cell” is intended to include prokaryotic and eukaryotic cells into which a recombinant expression vector can be introduced. As used herein, “transformed” and “transfected” encompass the introduction of a nucleic acid molecule (e.g., a vector) into a cell by one of a number of techniques. Although not limited to a particular technique, a number of these techniques are well established within the art. Prokaryotic cells can be transformed with nucleic acids by, for example, electroporation or calcium chloride mediated transformation. Nucleic acids can be transfected into mammalian cells by techniques including, for example, calcium phosphate co-precipitation, DEAE-dextran-mediated transfection, lipofection, electroporation, or microinjection. Suitable methods for transforming and transfecting host cells are found in Sambrook et al., Molecular Cloning: A Laboratory Manual (2^ndedition), Cold Spring Harbor Laboratory, New York (1989), and reagents for transformation and/or transfection are commercially available (e.g., Lipofectin (Invitrogen/Life Technologies); ugene (Roche, Indianapolis, Ind.); and SuperFect (Qiagen, Valencia, Calif.)).
In one embodiment, host cells are T-cells that have been isolated from a subject. Such cells can be manipulated ex vivo (e.g., by transfecting with a vector that encodes a fibrocystin-L polypeptide as described above) then re-introduced into the subject to augment the subject's immune responses to, for example, an infectious disease or cancer, or as adjuvant therapy following an allogeneic bone marrow transplant (i.e., donor lymphocyte infusion). In other embodiments, the host cells are PEAK cells, a human embryonic kidney (HEK)-293 derivative selected for high transfection frequency (Edge Biosystems, Gaithersburg, Md.).

4. Fibrocystin-L Polypeptides

The invention provides purified fibrocystin-L polypeptides that are encoded by the PKHDL1 nucleic acid molecules of the invention. A “polypeptide” refers to a chain of at least 10 amino acid residues (e.g., 10, 20, 50, 75, 100, 200, or more than 200 residues), regardless of post-translational modification (e.g., phosphorylation or glycosylation). Typically, a fibrocystin-L polypeptide of the invention is capable of eliciting a fibrocystin-L-specific antibody response (i.e., is able to act as an immunogen that induces the production of antibodies capable of specific binding to fibrocystin-L).
A fibrocystin-L polypeptide may have an amino acid sequence that is identical to that of SEQ ID NO:3 or SEQ ID NO:4. Alternatively, a fibrocystin-L polypeptide can include an amino acid sequence variant. As used herein, an amino acid sequence variant refers to a deletion, insertion, or substitution at one or more amino acid positions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 positions), provided that the polypeptide has an amino acid sequence that is at least 80% identical (e.g., 80%, 85%, 90%, 95%, or 99% identical) over its length to the corresponding region of the sequences set forth in SEQ ID NO:3 or SEQ ID NO:4.
Percent sequence identity is calculated by determining the number of matched positions in aligned amino acid sequences, dividing the number of matched positions by the total number of aligned amino acids, and multiplying by 100. The percent identity between amino acid sequences therefore is calculated in a manner analogous to the method for calculating the identity between nucleic acid sequences, using the Bl2seq program from the stand-alone version of BLASTZ containing BLASTN version 2.0.14 and BLASTP version 2.0.14; see subsection 1, above. A matched position refers to a position in which identical residues occur at the same position in aligned amino acid sequences. To compare two amino acid sequences, the options of Bl2seq are set as follows: -i is set to a file containing the first amino acid sequence to be compared (e.g., C:\seq1.txt); -j is set to a file containing the second amino acid sequence to be compared (e.g., C:\seq2.txt); -p is set to blastp; -o is set to any desired file name (e.g., C:\output.txt); and all other options are left at their default setting. The following command will generate an output file containing a comparison between two amino acid sequences: C:\Bl2seq -i c:\seq1.txt -j c:\seq2.txt -p blastp -o c:\output.txt. If the target sequence shares homology with any portion of the identified sequence, then the designated output file will present those regions of homology as aligned sequences. If the target sequence does not share homology with any portion of the identified sequence, then the designated output file will not present aligned sequences.
Once aligned, a length is determined by counting the number of consecutive amino acid residues from the target sequence presented in alignment with sequence from the identified sequence starting with any matched position and ending with any other matched position. A matched position is any position where an identical amino acid residue is presented in both the target and identified sequence. Gaps presented in the target sequence are not counted since gaps are not amino acid residues. Likewise, gaps presented in the identified sequence are not counted since target sequence amino acid residues are counted, not amino acid residues from the identified sequence.
The percent identity over a particular length is determined by counting the number of matched positions over that length and dividing that number by the length followed by multiplying the resulting value by 100. For example, if (1) a 1000 amino acid target sequence is compared to the sequence set forth in SEQ ID NO:3, (2) the Bl2seq program presents 200 amino acids from the target sequence aligned with a region of the sequence set forth in SEQ ID NO:3 where the first and last amino acids of that 200 amino acid region are matches, and (3) the number of matches over those 200 aligned amino acids is 180, then the 1000 amino acid target sequence contains a length of 200 and a percent identity over that length of 90 (i.e. 180÷200×100=90). As described for aligned nucleic acids in subsection 1, different regions within a single amino acid target sequence that aligns with an identified sequence can each have their own percent identity. It also is noted that the percent identity value is rounded to the nearest tenth, and the length value will always be an integer.
The deletion of amino acids from a fibrocystin-L polypeptide or the insertion of amino acids into a fibrocystin-L polypeptide can significantly affect the structure of the polypeptide. A deletion can result in a fibrocystin-L polypeptide that is truncated. Amino acids also may be deleted from a fibrocystin-L polypeptide as a result of altered splicing (see subsection 1, above).
Amino acid substitutions may be conservative or non-conservative. Conservative amino acid substitutions replace an amino acid with an amino acid of the same class, whereas non-conservative amino acid substitutions replace an amino acid with an amino acid of a different class. Conservative amino acid substitutions typically have little effect on the structure or function of a polypeptide. Examples of conservative substitutions include amino acid substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine, glutamine, serine, and threonine; lysine, histidine, and arginine; and phenylalanine and tyrosine. Conservative substitutions within a fibrocystin-L polypeptide can include, for example, Ile substituted for Val at amino acid position 1607 of SEQ ID NO:3, Glu substituted for Asp at amino acid position 3607 of SEQ ID NO:3, and Ile substituted for Val at amino acid position 4220 of SEQ ID NO:3.
Non-conservative substitutions may result in a substantial change in the hydrophobicity of the polypeptide or in the bulk of a residue side chain. In addition, non-conservative substitutions may make a substantial change in the charge of the polypeptide, such as reducing electropositive charges or introducing electronegative charges. Examples of non-conservative substitutions include a basic amino acid for a non-polar amino acid, or a polar amino acid for an acidic amino acid. Non-conservative substitutions within a fibrocystin polypeptide can include, for example, Pro substituted for Gln at amino acid position 702 of SEQ ID NO:3, Ala substituted for Thr at amino acid position 1192 of SEQ ID NO:3, Ser substituted for Leu at amino acid position 1199 of SEQ ID NO:3, Val substituted for Gly at position 1223 of SEQ ID NO:3, Ser substituted for Arg at position 1514 of SEQ ID NO:3, Cys substituted for Tyr at position 1638 of SEQ ID NO:3, or Gln substituted for His at position 3050 of SEQ ID NO:1. The term “purified” as used herein with reference to a polypeptide refers to a polypeptide that either has no naturally occurring counterpart (e.g., a peptidomimetic), has been chemically synthesized and is thus uncontaminated by other polypeptides, or has been separated or purified from other cellular components by which it is naturally accompanied (e.g., other cellular proteins, polynucleotides, or cellular components). Typically, the polypeptide is considered “purified” when it is at least 70% (e.g., 70%, 80%, 90%, 95%, or 99%), by dry weight, free from the proteins and naturally occurring organic molecules with which it naturally associates.
Fibrocystin-L polypeptides typically contain multiple functional domains (e.g., two or more regions that are responsible for a specific function of the polypeptide.) A fibrocystin-L polypeptide may contain one or more transmembrane (TM) domains, such that part of the polypeptide is cytoplasmic and part is extracellular. Such a domain can be located, for example, between amino acid residues 4213 and 4235 of SEQ ID NO:3, such that the full length fibrocystin-L polypeptide has a large N-terminal extracellular region and a short, 8 amino acid C-terminal cytoplasmic tail. In order to facilitate insertion of the polypeptide into the cellular membrane, a fibrocystin-L polypeptide also may include a hydrophobic signal peptide (e.g., the 20 amino acid residues at the N-terminus). Additionally, a fibrocystin-L polypeptide can contain one or more (e.g., 3, 11, or 14) TIG/IPT domains (Transcription-associated ImmunoGlobulin domain/Immunoglobulin-like fold shared by Plexins and Transcription factors; referred to hereafter as TIG domains), similar to those found in fibrocystin, the hepatocyte growth factor receptor, plexins, and the macrophage-stimulating protein receptor. TIG domains can be located anywhere within the polypeptide, although localization within the N-terminal 50% of a fibrocystin-L polypeptide is particularly common. Fibrocystin-L polypeptides also can have one or more TMEM2 regions of homology (e.g., residues 2180-2375 or 3032-3376 of SEQ ID NO:3). Furthermore, a fibrocystin-L polypeptide can contain one or more sites for N-glycosylation (e.g., 56 N-glycosylation sites in the N-terminal region). A fibrocystin polypeptide also may contain sites (e.g., in the C-terminal tail) for phosphorylation by protein kinase A and/or protein kinase C.

5. Production of Fibrocystin-L Polypeptides

Fibrocystin-L polypeptides can be produced by a number of methods, many of which are well known in the art. By way of example and not limitation, fibrocystin-L polypeptides can be obtained by extraction from a natural source (e.g., from isolated cells, tissues or bodily fluids), by expression of a recombinant nucleic acid encoding the polypeptide, or by chemical synthesis.
Fibrocystin-L polypeptides of the invention can be produced by, for example, standard recombinant technology, using expression vectors encoding fibrocystin-L polypeptides. The resulting fibrocystin-L polypeptides then can be purified. Expression systems that can be used for small or large scale production of fibrocystin-L polypeptides include, without limitation, microorganisms such as bacteria (e.g., E. coli and B. subtilis) transformed with recombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expression vectors containing the nucleic acid molecules of the invention; yeast (e.g., S. cerevisiae) transformed with recombinant yeast expression vectors containing the nucleic acid molecules of the invention; insect cell systems infected with recombinant virus expression vectors (e.g., baculovirus) containing the nucleic acid molecules of the invention; plant cell systems infected with recombinant virus expression vectors (e.g., tobacco mosaic virus) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing the nucleic acid molecules of the invention; or mammalian cell systems (e.g., primary cells or immortalized cell lines such as COS cells, Chinese hamster ovary cells, HeLa cells, HEK-293 cells, PEAK cells, and 3T3 L1 cells) harboring recombinant expression constructs containing promoters derived from the genome of mammalian cells (e.g., the metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late promoter and the cytomegalovirus promoter), along with the nucleic acids of the invention.
Suitable methods for purifying the polypeptides of the invention can include, for example, affinity chromatography, immunoprecipitation, size exclusion chromatography, and ion exchange chromatography. See, for example, Flohe et al. (1970) Biochim. Biophys. Acta. 220:469-476, or Tilgmann et al. (1990) FEBS 264:95-99. The extent of purification can be measured by any appropriate method, including but not limited to: column chromatography, polyacrylamide gel electrophoresis, or high-performance liquid chromatography. Fibrocystin-L polypeptides also can be “engineered” to contain a tag sequence described herein that allows the polypeptide to be purified (e.g., captured onto an affinity matrix). Immunoaffinity chromatography also can be used to purify fibrocystin-L polypeptides.

6. Anti-Fibrocystin-L Antibodies

The invention also provides antibodies having specific binding affinity for fibrocystin-L polypeptides. Such antibodies can be useful for diagnostic purposes (an antibody that recognizes a specific fibrocystin-L variant could be used to determine if a subject's cellular immunity is compromised or to detect endometrial cancer). An antibody having specific binding affinity for a fibrocystin-L polypeptide also can be used to prevent the development of an autoimmune disease in a subject at risk for autoimmunity or to treat an autoimmune disease (e.g., thyroiditis, inflammatory bowel disease, asthma, rheumatoid arthritis, systemic lupus erythematosis (SLE), or type I diabetes) in a subject. For example, an antibody such as a monoclonal antibody can be administered to a subject that contains antibodies against pancreatic islet antigens and a family history of type I diabetes to prevent the development of diabetes. An antibody having specific binding affinity for a fibrocystin-L polypeptide also can be administered to a subject to prevent or treat rejection of an organ or tissue transplant.
An “antibody” or “antibodies” includes intact molecules as well as fragments thereof that are capable of binding to an epitope of a fibrocystin-L polypeptide. The term “epitope” refers to an antigenic determinant on an antigen to which an antibody binds. Epitopes usually consist of chemically active surface groupings of molecules such as amino acids or sugar side chains, and typically have specific three-dimensional structural characteristics, as well as specific charge characteristics. Epitopes generally have at least five contiguous amino acids. The terms “antibody” and “antibodies” include polyclonal antibodies, monoclonal antibodies, humanized or chimeric antibodies, single chain Fv antibody fragments, Fab fragments, and F(ab)₂fragments. Suitable “antibody” or “antibodies” can be of any isotype. Polyclonal antibodies are heterogeneous populations of antibody molecules that are specific for a particular antigen, while monoclonal antibodies are homogeneous populations of antibodies to a particular epitope contained within an antigen. Monoclonal antibodies are particularly useful.
In general, a fibrocystin-L polypeptide is produced as described above, i.e., recombinantly, by chemical synthesis, or by purification of the native protein, and then used to immunize animals. Various host animals including, for example, rabbits, chickens, mice, guinea pigs, and rats, can be immunized by injection of the protein of interest. Depending on the host species, adjuvants can be used to increase the immunological response and include Freund's adjuvant (complete and/or incomplete), mineral gels such as aluminum hydroxide, surface-active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol. Polyclonal antibodies are contained in the sera of the immunized animals. Monoclonal antibodies can be prepared using standard hybridoma technology. In particular, monoclonal antibodies can be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture as described, for example, by Kohler et al. (1975) Nature 256:495-497, the human B-cell hybridoma technique of Kosbor et al. (1983) Immunology Today 4:72, and Cote et al. (1983) Proc. Natl. Acad. Sci. USA 80:2026-2030, and the EBV-hybridoma technique of Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. pp. 77-96 (1983). Such antibodies can be of any immunoglobulin class including IgM, IgG, IgE, IgA, IgD, and any subclass thereof. The hybridoma producing the monoclonal antibodies of the invention can be cultivated in vitro or in vivo.
A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a mouse monoclonal antibody and a human immunoglobulin constant region. Chimeric antibodies can be produced through standard techniques.
Antibody fragments that have specific binding affinity for fibrocystin-L polypeptides can be generated by known techniques. Such antibody fragments include, but are not limited to, F(ab′)₂fragments that can be produced by pepsin digestion of an antibody molecule, and Fab fragments that can be generated by deducing the disulfide bridges of F(ab′)₂fragments. Alternatively, Fab expression libraries can be constructed. See, for example, Huse et al. (1989) Science 246:1275-1281. Single chain Fv antibody fragments are formed by linking the heavy and light chain fragments of the Fv region via an amino acid bridge (e.g., 15 to 18 amino acids), resulting in a single chain polypeptide. Single chain Fv antibody fragments can be produced through standard techniques, such as those disclosed in U.S. Pat. No. 4,946,778.
Once produced, antibodies or fragments thereof can be tested for recognition of a fibrocystin-L polypeptide by standard immunoassay methods including, for example, enzyme-linked immunosorbent assay (ELISA) or radioimmuno assay (RIA). See, Short Protocols in Molecular Biology, eds. Ausubel et al., Green Publishing Associates and John Wiley & Sons (1992). Suitable antibodies typically have equal binding affinities for recombinant and native proteins. As described herein, monoclonal antibodies FibLA-2, FibLA-10.1, FibLA-10.2, FibLA-4.1, FibLA-4.2, FibLA-11.1, FibLA-11.2, FibLA-11.3, FibLA-13.1, and FibLA-13.2 are useful for detecting fibrocystin-L in tissues and cell lines.

7. Methods for Using PKHDL1 Nucleic Acid Molecules, Fibrocystin-L Polypeptides, and Fibrocystin-L Expressing Cells

As described herein, fibrocystin-L is widely expressed at a low level across many human tissues, including kidney (proximal and distal tubules), fallopian tube, thyroid, liver, adrenal cortex, gallbladder, testis, breast, and spleen. It is strongly expressed in fallopian tube, which is known to be heavily ciliated. It also appears to be upregulated in endometrial adenocarcinomas compared with levels detected in normal endometrium. Immunostaining of endometrium and fallopian tubes indicated the protein has an apical distribution and is present on endometrial epithelial apical cilia of these cells and appears to mislocalize apically in some cancers. Other hormone dependent cancers (e.g., breast and ovarian) also had up-regulated levels of fibrocystin-L, as did some lung and colon cancers. The tissue distribution of fibrocystin-L elsewhere seems to correlate with known ciliated epithelial surfaces (e.g., thyroid, kidney, lung, adrenal cortex, and pituitary). The findings of localized up-regulation (shown, for example, by increased immunoreactivity and relative protein expression levels by western, and confocal microscopy) as well as the abundance of PKHDL1 cDNAs in endometrial cancer demonstrates there is a connection between this gene and endometrial carcinogenesis and other cancers.
Fibrocystin-L also is expressed in both human and mouse T-lymphocytes (T-cells) and is up-regulated following human and mouse T-cell activation. Activated T-cells refers to cells that have been recently stimulated, accumulate at sites of ongoing immune activity, and are usually depleted once a disease process has been eliminated. In autoimmune diseases or rejection of a transplant, inappropriate accumulation and persistence of activated T-cells can result in tissue injury. Fibrocystin-L also is present in higher amounts in CD4⁺ (helper) T-cells than in CD8⁺ (killer) T-cells and present in higher amounts in memory T-cells than in naïve helper T-cells. Naive cells have not been previously activated, require a strong stimulus to undergo activation, typically require a period of days to become fully activated, and do not circulate through most organs. Memory T-cells are long-lasting T-cells that have been activated previously, are capable of rigid re-activation upon receipt of a new stimulus (e.g., re-exposure to the same infectious agent), and circulate through most organs and tissues. The generation of memory T-cells is essential for successful vaccination against many infections.
As such, fibrocystin-L may play a role in the function of memory and activated T-cells. The genetic programs that are initiated by T-cell interactions with antigen-presenting cells (APCs) bearing cognate antigen regulate a host of specialized functions including T-cell proliferation, cytokine production, migration patterns, cytotoxicity, and cell survival. Coordination of these functions is essential for elimination of infection, immune surveillance against neoplasia, and generation of memory responses. Furthermore, aberrant T-cell activation underlies the pathogenesis of autoimmunity and rejection of transplanted organs and tissues. Fibrocystin-L may play a role in the regulation of the T-cell-APC interface structure or in adhesion and migration patterns.
Thus, detecting PKHDL1 nucleic acids or nucleic acid sequence variants thereof, or fibrocystin-L polypeptides can be useful for characterizing immune responses in subjects, or for diagnosing, preventing, or treating autoimmune disease, transplant rejection, infectious disease, or cancer (e.g., endometrial, breast, ovarian, lung, or colon cancer). For example, PKHDL1 mutation and polymorphism analysis can be used to identify individuals at greater or lesser risk for, e.g., immune-mediated diseases or cancer.
PKHDL1 nucleic acids or nucleic acid sequence variants thereof, or fibrocystin-L polypeptides also can be detected as a marker of endometrial cancer and other cancers, including breast, ovarian, lung, and colon cancer, or to grade endometrial or other cancers. For example, monoclonal antibody FibLA-2, FibLA-10.1, FibLA-10.2, FibLA-4.1, FibLA-4.2, FibLA-11.1, FibLA-11.2, FibLA-11.3, FibLA-13.1, or FibLA-13.2 can be used to detect fibrocystin L in a biological sample such as a tissue sample from the endometrium, breast, ovary, lung, or colon. In some embodiments, endometrial or other cancers can be graded by detecting the level of fibrocystin-L expression in tumor tissues using, for example, immunohistochemistry, western blotting, or an ELISA to detect the polypeptide or by analysis of mRNA levels. In other embodiments, endometrial or other cancers can be graded by detecting the level of secreted fibrocystin-L or fragments thereof in blood serum, urine or other bodily fluid. Fibrocystin L can be detected in combination with another marker for a particular cancer (e.g., PTEN or p53) or in combination with determining the status of a hormone receptor (e.g., estrogen or progesterone receptor status).
Furthermore, fibrocystin-L expressing T-cells can be detected and/or quantified in the blood to measure, for example, immune responses to vaccination or the nature and severity of autoimmune disease or transplant rejection. In general, the number of fibrocystin-L expressing T-cells in the blood will be increased after vaccination when compared with the baseline number of fibrocystin-L expressing T-cells in the blood before vaccination. Similarly, with respect to autoimmune disease or transplant rejection, the number of fibrocystin-L expressing T-cells will be increased relative to a baseline number of fibrocystin-L expressing T-cells in a control population (e.g., control subjects without autoimmune disease or subjects that have not undergone a transplant). Fibrocystin-L expressing T-cells also can be detected and/or quantified in tissue biopsy specimens to assess the nature and severity of autoimmune disease, transplant rejection, or a cancer-specific cellular immune response. Standard techniques can be used to detect and/or quantitate the number of fibrocystin-L expressing T-cells that are present in a biological sample (e.g., a blood or tissue sample).

8. Methods of Detecting Sequence Variants

Methods of the invention can be utilized to determine whether the PKHDL1 gene of a subject contains a sequence variant or combination of sequence variants. Furthermore, methods of the invention can be used to determine whether both PKHDL1 alleles of a subject contain sequence variants (either the same sequence variant(s) on both alleles or separate sequence variants on each allele), or whether only a single allele of a subject contains sequence variants.
Sequence variants within a PKHDL1 nucleic acid can be detected by a number of methods. Sequence variants can be detected by, for example, sequencing exons, introns, or untranslated sequences, denaturing high performance liquid chromatography (DHPLC; Underhill et al. (1997) Genome Res. 7:996-1005), allele-specific hybridization (Stoneking et al. (1991) Am. J. Hum. Genet. 48:370-382; and Prince et al. (2001) Genome Res. 11(1):152-162), allele-specific restriction digests, mutation specific polymerase chain reactions, single-stranded conformational polymorphism detection (Schafer et al. (1998) Nat. Biotechnol. 15:33-39), infrared matrix-assisted laser desorption/ionization mass spectrometry (WO 99/57318), and combinations of such methods.
Genomic DNA generally is used in the analysis of PKHDL1 sequence variants. Genomic DNA typically is extracted from a biological sample such as a peripheral blood sample, but can be extracted from other biological samples, including tissues (e.g., mucosal scrapings of the lining of the mouth or from renal or hepatic tissue). Routine methods can be used to extract genomic DNA from a blood or tissue sample, including, for example, phenol extraction. Alternatively, genomic DNA can be extracted with kits such as the QIAamp® Tissue Kit (Qiagen, Chatsworth, Calif.), the Wizard® Genomic DNA purification kit (Promega, Madison, Wis.), the Puregene DNA Isolation System (Gentra Systems, Inc., Minneapolis, Minn.), and the A.S.A.P.™ Genomic DNA isolation kit (Boehringer Mannheim, Indianapolis, Ind.).
Typically, an amplification step is performed before proceeding with the detection method. For example, exons or introns of the PKHDL1 gene can be amplified and then directly sequenced. Dye primer sequencing can be used to increase the accuracy of detecting heterozygous samples.
PKHDL1 sequence variants can be detected by, for example, DHPLC analysis of PKHDL1 nucleic acid molecules. Genomic DNA can be isolated from a subject (e.g., a human, a mouse, or a rat), and sequences from one or more regions of an ARPKD gene can be amplified (e.g., by PCR) using specific pairs of oligonucleotide primers (e.g., as described above in subsection 2). After amplification, PCR products can be denatured and reannealed, such that an allele containing a PKHDL1 sequence variant can reanneal with a wild-type allele to form a heteroduplex (i.e., a double-stranded nucleic acid with a mismatch at one or more positions). The reannealed products then can be subjected to DHPLC, which detects heteroduplexes based on their altered melting temperatures, as compared to homoduplexes that do not contain mismatches. Samples containing heteroduplexes can be sequenced by standard methods to specifically identify the variant nucleotides. Examples of specific sequence variants are provided in Table 5 below.
Allele specific hybridization also can be used to detect PKHDL1 nucleotide sequence variants, including complete haplotypes of a mammal. In practice, samples of DNA or RNA from one or more mammals are amplified using pairs of primers, and the resulting amplification products are immobilized on a substrate (e.g., in discrete regions). Hybridization conditions are selected such that a nucleic acid probe will specifically bind to the sequence of interest, e.g., the PKHDL1 nucleic acid molecule containing a particular sequence variant. Such hybridizations typically are performed under high stringency, as some nucleotide sequence variants include only a single nucleotide difference. High stringency conditions can include the use of low ionic strength solutions and high temperatures for washing. For example, nucleic acid molecules can be hybridized at 42° C. in 2×SSC (0.3M NaCl/0.03 M sodium citrate/0.1% sodium dodecyl sulfate (SDS)) and washed in 0.1×SSC (0.015M NaCl/0.0015 M sodium citrate), 0.1% SDS at 65° C. Hybridization conditions can be adjusted to account for unique features of the nucleic acid molecule, including length and sequence composition. Probes can be labeled (e.g., fluorescently) to facilitate detection. In some embodiments, one of the primers used in the amplification reaction is biotinylated (e.g., 5′ end of reverse primer) and the resulting biotinylated amplification product is immobilized on an avidin or streptavidin coated substrate.
Allele-specific restriction digests can be performed in the following manner. For PKHDL1 nucleotide sequence variants that introduce a restriction site, restriction digest with the particular restriction enzyme can differentiate the alleles. For PKHDL1 sequence variants that do not alter a common restriction site, mutagenic primers can be designed that introduce a restriction site when the variant allele is present or when the wild type allele is present. A portion of a PKHDL1 nucleic acid can be amplified using the mutagenic primer and a wild type primer, followed by digestion with the appropriate restriction endonuclease.
Certain sequence variants, such as insertions or deletions of one or more nucleotides, change the size of the DNA fragment encompassing the variant. The insertion or deletion of nucleotides can be assessed by amplifying the region encompassing the sequence variant and determining the size of the amplified products in comparison with size standards. For example, a region of a PKHDL1 nucleic acid can be amplified using a primer set from either side of the sequence variant. One of the primers is typically labeled, for example, with a fluorescent moiety, to facilitate sizing. The amplified products can be electrophoresed through acrylamide gels with a set of size standards that are labeled with a fluorescent moiety that differs from the primer.
Other methods also can be used to detect sequence variants. For example, conventional and field-inversion electrophoresis are known in the art to be useful for visualizing basepair changes. Furthermore, Southern blotting and hybridization can be utilized to detect larger rearrangements such as deletions and insertions.
The association of certain sequence variants with susceptibility to APRKD or a diagnosis of ARPKD can be determined. An ARPKD-associated (or disease-associated) sequence variant is a sequence variant or combination of sequence variants within the PKHDL1 gene of a subject that is correlated with the presence of ARPKD in that subject.
Sequence variants associated with the presence of ARPKD in a subject can include, for example, mutations that will result in truncation of a fibrocystin-L polypeptide or a substantial in-frame alteration within a PKHDL1 transcript from the subject, missense or small in-frame mutations found within a nucleic acid sample of a subject and not found at a significant level in the normal population, and mutations that segregate in ARPKD families in a fashion known in the art to be consistent with autosomal recessive inheritance. Other sequence variants may be identified that are not individually disease-associated, but which may be associated with ARPKD when combined with one or more additional sequence variants. Still other sequence variants can be identified that are simply polymorphisms within the normal population, and which are not associated with ARPKD.

9. Articles of Manufacture

PKHDL1 nucleic acid molecules (e.g., oligonucleotide primer pairs and probes) of the invention can be combined with packaging material and sold as kits for determining if a subject has altered cellular immunity, if a subject is susceptible to developing ARPKD, diagnosing a patient with ARPKD, for detecting endometrial cancer, based on the detection of PKHDL1 gene expression or sequence variants within the PKHDL1 gene of the subject. Components and methods for producing articles of manufacture such as kits are well known. An article of manufacture may include one pair of PKHDL1 oligonucleotide primers or a plurality of oligonucleotide primer pairs (e.g., 2, 3, 4, 10, or more than 10 primer pairs). In addition, the article of manufacture may include buffers or other solutions, or any other components necessary to assess whether the PKHDL1 gene of a subject contains one or more variants. Instructions describing how the PKHDL1 primer pairs are useful for detecting sequence variants within a PKHDL1 gene also can be included in such kits.
In other embodiments, articles of manufacture include populations of isolated PKHDL1 nucleic acid molecules immobilized on a substrate. Suitable substrates provide a base for the immobilization of the nucleic acids, and in some embodiments, allow immobilization of nucleic acids into discrete regions. In embodiments in which the substrate includes a plurality of discrete regions, different populations of isolated nucleic acids can be immobilized in each discrete region. Thus, each discrete region of the substrate can include a PKHDL1 nucleic acid molecule containing a different sequence variant (e.g., the sequence variants of Table 5). Such articles of manufacture can include two or more nucleic acid molecules with different sequence variants, or can include nucleic acid molecules with all of the sequence variants known for PKHDL1.
Suitable substrates can be of any shape or form and can be constructed from, for example, glass, silicon, metal, plastic, cellulose or a composite. For example, a suitable substrate can include a multiwell plate or membrane, a glass slide, a chip, or polystyrene or magnetic beads. Nucleic acid molecules or polypeptides can be synthesized in situ, immobilized directly on the substrate, or immobilized via a linker, including by covalent, ionic, or physical linkage. Linkers for immobilizing nucleic acids and polypeptides, including reversible or cleavable linkers, are known in the art. See, for example, U.S. Pat. No. 5,451,683 and WO98/20019. Immobilized nucleic acid molecules typically are about 20 nucleotides in length, but can vary from about 10 nucleotides to about 1000 or more nucleotides in length.
In practice, a sample of DNA or RNA from a subject is amplified, the amplification product is hybridized to an article of manufacture containing populations of isolated nucleic acid molecules in discrete regions, and hybridization can be detected. Typically, the amplified product is labeled to facilitate detection of hybridization. See, for example, Hacia et al. (1996) Nature Genet., 14:441-447; and U.S. Pat. Nos. 5,770,722 and 5,733,729.
The invention will be further described in the following examples, which do not limit the scope of the invention described in the claims.

EXAMPLES

Materials and Methods

Preparation of Resting and Stimulated Immune Cell Sub-Populations
Spleens, thymuses and subcutaneous lymph nodes were dissected from B6 and BALB/C mice under sterile conditions. Cell suspensions were prepared by disruption of the organs in DMEM/10% FCS and passage through 45 μm nylon mesh. For spleen and thymus suspensions, erythrocytes were lysed by 5 min incubation in ACK buffer (0.15 M NH₄Cl, 1.0 mM KHCO₃, 0.1 mM Na₂EDTA, pH 7.2). For flow cytometric sorting, surface staining with a panel of fluorochrome-labeled monoclonal antibodies (BD Pharmingen, San Diego, Calif.) was carried out by incubating the cells in DMEM/10% FCS at 4° C. for 1 hour. Labeled cell suspensions were washed and re-suspended in DMEM/10% FCS at 4 to 8×10⁶cells/ml and flow sorted using a FACS Vantage sorter (Becton Dickinson Immunocytometry Systems, San Jose, Calif.). The antibody combinations were as follows: Splenic T-cell sub-populations: anti-mouse CD4-FITC (RM4-4), and anti-mouse CD8α-PE (53-6.7). Sorted populations: CD4^+ve/CD8^−ve(CD4+ T-cells) and CD4^−ve/CD8^+ve(CD8+ T-cells). Splenic dendritic cell sub-populations: anti-mouse CD11c-FITC (HL3) and anti-mouse CD8α-PE. Sorted populations: CD11c^+ve/CD8^−ve(myeloid) and CD11c^+ve/CD8^+ve(lymphoid). Thymocyte sub-populations: anti-mouse CD4-FITC, anti-mouse CD8α-PE. Sorted populations: CD4^−ve/CD8^−ve, CD4^+ve/CD8^+ve, CD4^+ve/CD8^−ve, and CD4^−ve/CD8^+ve. Splenic NK and NKT cells: anti-mouse CD3ε-FITC (145-2C11) and anti-NK 1.1-PE (PK136). Sorted populations: NK1.1^+ve/CD3ε^−ve(NK-cells) and NK1.1^+ve/CD3ε^+ve(NKT-cells). Splenic B-cell sub-populations: anti-mouse IgD-FITC (11-26c.2a) and anti-mouse CD19-PE (1D3). Sorted populations: CD19^ve/IgD^+ve(naïve B-cells) and CD19^+ve/IgD^−ve(memory B-cells). An aliquot of the memory B-cells was stimulated for 96 hours (activated memory B-cells) with plate-bound goat-anti-mouse IgG (ICN Biomedicals Inc., Aurora, Ohio), 25 ng/ml lipopolysaccharide (LPS, Sigma Aldrich, St. Louis, Mo.), and 2.5 μg/ml purified anti-mouse CD40 (HM40-3, BD Pharmingen). Murine CD4^+veand CD8^+velymph node T-cells were purified by nylon wool column and complement-mediated depletion then activated for 72 hours in tissue culture plates coated with a combination of hamster anti-mouse CD3ε (145-2C11) and hamster anti-mouse CD28 (PV-1), as described by Griffin M. D., et al. (2000), J. Immunol., 164, 4433-42; or stimulated for 96 hours by co-culture with irradiated allogeneic (B6) bone marrow-derived dendritic cells (allo-stimulated T-cells).
Murine peritoneal inflammatory macrophages were generated from B6 mice by intraperitoneal injection (1 ml/animal) of sterile 3% thioglycollate (Becton Dickinson Microbiology Systems, Cockeysville, Md.). After 7 days, the animals were sacrificed and cells extracted by peritoneal lavage using sterile DMEM/10% FCS, washed and an aliquot retained for RNA preparation (fresh peritoneal inflammatory cells). The remainder of the cells were re-suspended in DMEM/10% FCS and allowed to adhere to tissue culture flasks at 37° C. for 2 hours. Non-adherent cells were removed by washing with sterile PBS and individual cell layers were exposed for 1 hour at 37° C. to PBS alone (unstimulated macrophages) or to PBS containing LPS 2 μg/ml (stimulated macrophages). These solutions then were removed, the cell layers washed with PBS, and culture medium re-applied for 24 hours.
Human lymphocytes were isolated from whole blood using 54% Percott (American Biosciences, Piscataway, N.J.) and the resulting PBS washed and pelleted cells were incubated in RPMI/10% FCS with concanavalin A (10 μg/ml) for 72 hours at 37° C.
RNA Analysis
RNA was isolated from snap frozen human tissues (adrenal, breast, colon, heart, kidney, liver, lung, pancreas, placenta and uterus, obtained as surgical waste), mouse tissues, cell-lines and the leukocyte populations described above (see FIG. 1) using the Trizol method (Invitrogen, Carlsbad, Calif.) or with the NucleoSpin (BD Biosciences, San Jose, Calif.) column system. Isolated RNA (1-5 μg) was used to make cDNA with the Clontech Powerscript™ Reverse Transcriptase cDNA Synthesis Kit (BD Biosystems, San Jose, Calif.) and 250 μg random primers (Invitrogen, Carlsbad, Calif.). PKHDL1 (Pkhdl1) expression was analyzed by RT-PCR using 50 ng cDNA and equalized by amplification of the control β-actin (mouse: F5′-CTGGCACCACACCTTCTACAATGAGCTG-3′ (SEQ ID NO: 31): R-5′-GCACAGCTTCTCTTTGATGTCACGCACGATTTC-3′) (SEQ ID NO: 32)) or GAPDH (human: F5′-GACCACAGTCCATGCCATCACT-3′ (SEQ ID NO: 33): R5′-TCCACCACCCTGTTGCTGTA-3′ (SEQ ID NO: 34); 453 bp product). For analysis of murine Pkhdl1, a 258 bp region from exons 76-77 (12429-12686 bp; F5′-TCCATTTAGCACCTGTTGGGC-3′ (SEQ ID NO: 35); R5′-AGTCTTCCTACAAGGCACGCTG-3′ (SEQ ID NO: 36)) was amplified and for human PKHDL1, a 247 bp region from exons 35-36 (4232-4478 bp; F5′-CACCAGTCCTAATGTGTCTGTGG-3′ (SEQ ID NO: 37); R5′-TGGAGAAAAATGGAGTGAGCCTC-3′ (SEQ ID NO: 38)) was assayed. PCR conditions were as follows: 0.125 U AmpliTaq Gold (Applied Biosystems, Foster City, Calif.) in the supplied buffer, 2.0 mM MgCl₂, 0.2 mM each nucleotide, 4 μM each primer and 50 ng cDNA, and PCR conditions of: 4 min at 94° C.; 60 s at 94° C., 30 s at 56-64° C. and 30-60 s at 72° C. for 30-40 cycles; and finally 10 min at 72° C. The products were electrophoresed through 2.0% agarose gels and visualized by ethidium bromide staining. Multi-Tissue Northern Blots (BD Biosciences, San Jose, Calif.) were hybridized and washed using standard methods with the probes: human PKHDL1, Ex 1-12, 979 bp (9-986 nt) and Ex 38-41, 1248 bp (5065-6312 nt) and murine Pkhdl1, ex32-38, 1048 bp (3788-4835 nt).
Cloning PKHDL1
The positions of likely human PKHDL1 exons were determined by comparison of genomic DNA (AC021001) to: the mouse D86 cDNA sequence, PKHD1, Genome Scan putative genes (Hs 8205 30 21 2; Hs 8205 30 23 1 and Hs 8205 30 23 2) and using the NIX suite of programs (world wide web at hgmp.mrc.ac.uk). The transcript was amplified as 16 overlapping fragments with primers positioned in the most strongly predicted exons (see Table 1) using PCR conditions as described above and human lung or adrenal cDNA. The mouse ORF from the cDNA D86 to the 3′UTR has been cloned as 14 cDNA clones (Table 2). Fragments were cloned into specific restriction site of pZERO using amplification primers with matching sites, or unmodified product was cloned into terminal transferase (New England Biolabs, Beverly, Mass.) treated, EcoRV cut vector using the Rapid DNA Ligation Kit (Roche Applied Science, Indianapolis, Ind.) and grown in E. coli XL-2MRF′ (Stratagene, LaJolla, Calif.). The 5′ and 3′ regions were amplified and cloned using RACE strategies with the SMART RACE cDNA Amplification Kit (BD Biosciences, San Jose, Calif.). For the 5′ RACE, human adrenal RNA was reverse transcribed with Powerscript RT and amplified with the 5′ RACE CDS and SMART-II primers using touchdown PCR and nested gene specific primers. At the 3′ end, cDNA synthesis was with the tailed and anchored oligo (dT) primer, 3′-CDS, and amplified as above with nested gene specific primers. Products were sequenced using the Big-Dye Terminator Kit (Applied Biosystems, Foster City, Calif.) and analyzed on ABI377 Sequencers. The sequence was assembled into a contig using the Sequencer 4.1.2 program.

TABLE 1

Details of PKHDL1 cDNA clones

Clone	Size *bp)	Exons	Position in coding region	Comments

MH1	338	1-3	−104:234	5′ RACE
MH2	978	1-12	9:986
MH3	1291	11-20	852:2143
MH4	568	20-23	2086:2653
MH5	1097	23-31	2588:3684
MH6	934	29-36	3488:4421
MH7	925	35-38	4217:5141
MH8	1249	38-41	5065:6312
MH9	1233	40-48	6046:7278
MH10	1185	47-49	7124:8308
MH11	702	49-51	7980:8681
MH12	1158	50-59	8586:9743
MH13	705	58-63	9618:10322
MH14	1311	61-70	10051:11362
MH15	1038	70-75	11256:12293
MH16	961	73-78	12017:3′ 339	3′ RACE

TABLE 2

Details of Pkhdl1 cDNA clones

	Size		Position in coding
Clone	(bp)	Exons	region (nt)	Comments

D86	5773	1-38	1:5773	Alternative 3′UTR
MS1	563	38-41	5618:6180	62 bp into IVS 38
MS2	690	39-43	5921:6610
MS3	606	42-47	6495:7100
MS4	640	46-49	6903:7542
MS5	1005	48-49	7343:8347
MS6	609	49-52	8244:8852
MS7	846	51-57	8698:9543
MS8	683	56-62	9452:10134
MS9	673	61-66	10010:10682
MS10	583	66-69	10603:11185
MS11	656	6-9-73	11114:11769
MS12	302	71-73	11509:11810
MS13	943	73-77	11771:12713
MS14	284	7778	12423:3′UTR, 56

Mutation Analysis of PKHDL1
All patients in the study gave informed consent and the project was approved by the Institutional Review Board at Mayo Clinic. Five previously described ARPKD patients from families: M36, P244, M51, M52 and M55 (Ward C. J., et al. (2002), Nature Genet., 30, 259-269), plus two additional typical ARPKD patients, in which no PKHD1 mutation was identified (n=4) or where only a single missense change was detected (M52, M52 and M55), were screened for PKHDL1 mutations. An additional thirteen typical ARPKD patients with no detected PKHD1 mutation were screened for changes in thirty-four PKHDL1 exons, but as no likely disease causing mutations were identified, the screening of the gene was not completed.
To screen for PKHDL1 mutations, all the 78 coding exons were amplified from genomic DNA as 85 fragments of 150-350 bp. Primers were typically positioned in the intron ˜20 bp from the intron/exon boundary. See Table 3 for the sequence of each of the primers. The fragments were amplified using the following protocol: genomic DNA (60 ng), primers (8 pmol each), dNTPs (200 μM each), MgCl₂(2.5 mM) and 1 U of Taq Gold in the supplied buffer (Applied Biosystems, Foster City, Calif.), in a total volume of 25 μl. The PCR program included: 120 s, 94° C., 35-40 cycles of 30 s at 94° C., 30 s at 50°-63° C. and 30 s at 72° C.; and 10 min 72° C. The PCR products were treated to form heteroduplexes and analyzed for base-pair changes using DHPLC on the WAVE Fragment Analysis System (Transgenomic, Omaha, Nebr.), as previously described (Ward C. J., et al. (2002) supra). See also Table 3 for DHPLC conditions. Briefly, crude PCR (300-500 ng) was injected into the chromatographic column (DNASep Cartridge, Transgenomic, Omaha, Nebr.) and eluted using calculated and empirically determined optimal conditions. Samples showing an abnormal chromatogram were further characterized by direct sequencing as previously described (Ward C. J., et al. (2002) supra). Potentially pathogenic changes were validated by DHPLC analysis of 25-100 normal controls (50-200 chromosomes).

TABLE 3

Details of PCR amplicons and DHPLC conditions to analyze the PKHDL1 gene

		PCR
		annealing					DHPLC conditions

Exon	Size	temp	Forward Primer	SEQ	Reverse Primer	SEQ	Temp	Initial %
frag.	(bp)	(° C.)	Sequences (5′-3′)	ID	Sequences (5′-3′)	ID	(C.°)	buffer B

1	152	57	GCACCAACTCCGCAGAAC	39	TGTCTACGCGGGCCTCTCCTGCTTG	40	63	47

2	302	48	GGCAGAGCCAAAAATAAAACCTG	41	ATAGGCTTGAAAATACCTCAACC	42	51	54

3	242	43	CTCTGAAGATAGAATACC	43	TCTGATATGATGAAAATG	44	53	52

4	311	48	GAATGGAACTTAACTAGACATCAG	45	GCAGGAAAGAAAGCAAGATCAAC	46	56	55

5	159	43	AACTAGAAACAAAACAGAG	47	ATTATTTACCATGAAACC	48	53	48

6	217	42	CTTACACAGAATCTTTTTG	49	TTTAATATCATTGGACCC	50	53	51

7	191	44	TTTAAGGGAACCAGTGAG	51	ATCTGCTATTTGTTTTTG	52	54	50

8	191	45	TGAGTAGTTTTTAAGAGAAAC	53	TCTGTGAGCATTATGAAG	54	54	50

9	198	43	ATTTTGTACTTTTTCTCTG	55	TATTTACCCTGTTGAATC	56	53	50

10	193	45	CTGAAGTTAAAAAAATGTTC	57	TTATACACTGCCTTCCCACCACCC	58	53, 55	51

11	223	46	AAAAATCAGGTATTTGGG	59	AGAATTTGTGGTAATAAAG	60	52, 56	51

12	198	46	GATACACTGATGTGATTTG	61	TGTGAATATAGAAATGGC	62	56	50

13	345	45	TTATGATCCTGATGAAAG	63	ACAGTATCAAATAGTATTCTG	64	57	55

14	179	44	ATAAATGTTGAAAAGGTC	65	TAATGACAGGAAAAAGCC	66	54	49

15	259	49	CTGGAAAAAAGTTATATTCATTAG	67	ATTGAGATCCTGCCTTCG	68	54, 57	53

16	249	43	TTGAATAGCTGAATTATG	69	ACAGAAACAGTATCTCCC	70	51	52

17	251	46	GAAAGATTTCAACTTTTC	71	GCACAGGATATACATTTG	72	56	53

18	243	46	AGTTAATGTCTACAAATTC	73	AGAGGGTGCAGGGAAAATGC	74	56	52

19	204	44	CGAGTGTTCTAACTTTTC	75	AGGTCAGTTTCACAGTTC	76	52, 54	50

20	243	45	TTGGGGGAAAACCAGAAC	77	AGATTCAATTAGCATATC	78	55	52

21	336	45	TCTCTGTGTTCTGGTACG	79	CCCAAGAAATGGATTTGTCTTATC	80	52	55

22	326	46	GCTTTCTAAAGTGTATTTGC	81	TGTTTCTATCCATACTGC	82	52, 55	55

23	343	46	TTACATGGCAAAAACCAC	83	TAGTTAGCTGTCTTTTCC	84	52	55

24	300	46	ACATGAGGCTCATTTATG	85	TGTGTGTGCGTATACACC	86	54, 55	54

25	276	51	AAGATTACAGGCGTGAAC	87	GCACATAGAAGAAAAGAG	88	62	53

26	297	44	GACTTTTATTCACCTTTG	89	GTCTTTAACATATTACAAAC	90	54	54

27	221	47	CTTTTGTTAAAACCTATTC	91	CTTTCACACCCAGTATAG	92	57	51

28	244	49	TGTCTGATTTCATAACAACAGG	93	CCTTTGATTCCACTTTATCTTAGAG	94	55	50

29	226	49	CATCTTTTTCTTTTTTTCAC	95	CATGCAATTTTCTCTCTG	96	58	52

30	204	44	TATAGTTGAACTGTTTTG	97	AGGAGGAAAAAGTGACTG	98	55	50

31	215	44	GCATTTCTGTATCTCAAC	99	TGACCAATCTTATTGAAG	100	54	51

32	322	47	AAGATGAGAGATGAATTG	101	AACTCCATCAAGTTTATG	102	54	55

33	212	45	GAAGCTCATTGAAAAATC	103	GATAATCACTTTCCTATG	104	55	51

34	204	45	ATTTGACAAAATGTTTGC	105	TCAGGTTTCAGTGCTTCC	106	55	50

35	313	47	ATTGCCATGTTGTCAAAG	107	CATTTAGGAAAAAGTGAC	108	52, 57	55

36	250	46	GTACAATCTCATTTTATG	109	TATCACATACACCCTGGG	110	57	53

37	325	45	AAACAGTTATCATTTTGG	111	CATATAATAGAAGTACAAAG	112	56	55

38a	349	50	ACTGGAGGTATGTATTGACTTG	113	TGACCCATAAGGACTTTTACAC	114	56	55

38b	347	50	AAAAGGCTCTGGATTTGC	115	CATTAACCTCTATTGCTCTGAAC	116	58	55

38c	334	50	GGTTTGGGGACTGTTTTG	117	AGTAGACTTCATTGGGGTTG	118	58	55

38d	350	50	GGAAATGGCTTCTATCCAG	119	AGGTATTAAGTGTAAGTGGGAAC	120	52, 56	56

39	472	48	AGACTGTAGGGTATATTGTAGTC	121	GAAACAAAATATCTGCAGGTTC	122	52	55

40	350	47	ATCAAAAGAGATTCAGTTGC	123	CTGCCATTACTTTTTCTGAC	124	52	55

41	281	55	GGAGGTTTTGGAAATGAATCAG	125	TGGAAATGCACAATGATGCGTG	126	60	54

42	275	52	AAAGGGTTTGACAGTGTGATCTAG	127	ATGCTGGTTTTCTATTGCTGTG	128	56	53

43	253	52	CGAATGAAAAACTCTGGTAAAATCC	129	TCAGGCAGAGTCCAATGAACAG	130	55	53

44	268	46	TGTAATGAATAATTTAATAGGTAAC	131	AAGATAAACTTAGGAGAGGTTG	132	53	53

45	223	54	TGGATTTGGGGTTTTAATTTTC	133	GAGTCTTCCTCTACCAACTCCC	134	60	51

46	231	49	AGTTCTCAATAACAAATCAAAC	135	CTTTTCTAAATACACATCATTAAG	136	57	52

47	344	48	TACCAAAACAATATGTTATGTC	137	GCATGATTATACCAACCACGAG	138	56	55

48	241	50	TCTTCAATATAAGAGGATTCCG	139	TAACCTTGAGCAAACCACTGTG	140	55	52

49a	304	50	CTAAATAACTGTGATTTCTGGG	141	GAAGACTGGTACTTTGCTGTAC	142	56	54

49b	303	53	CCAGTATAACTTGGCAGTATTTG	143	GTACAAGATCCCGTTTGCATGG	144	58	54

49c	333	53	ATTTCCCCATGCAAACGGGATC	145	CAGAAGAGACAGTCAAGCCTTC	146	59	55

49d	300	41	GGTTCTCCCATTTAGTGAAGGC	147	CAATTCAATTCTGTGCTAACAC	148	53, 58	54

50	314	50	CCTTTTTTATGTTTCTTAATGTG	149	ATGATGACAAAAGTTTAGGAAG	150	52, 58	55

51	269	46	GGAGGAGTTTATTAGAGG	151	ATGTAGGCTGTGTTTGGG	152	54	53

52	299	47	TAAATCTTAACATAATATAGGGG	153	TTAGATAAACTATCATTTCTGCC	154	52, 53	54

53	254	50	TTTGGTCACTATGTTCATTTAAC	155	AGATATTGAAGGGTATCAACTAC	156	57	51

54	240	48	CATTTTTTTTCTTCTCTACCATG	157	ACATTTCATTCATTTGTGTTTAC	158	55	52

55	281	47	AAGTTGTAGTTTATGGATTATG	159	TGCTTCTTTCTTATTATTTGAG	160	51, 56	54

56	272	49	GGGTGGATTTTTTTTCCTGGTC	161	AACTGATATGTACTTTAGTGCC	162	55	53

57	227	48	TTTATACTAGCACCTAACTCAG	163	CCACTGTGTATATTCATTTTCC	164	57	52

58	300	47	CATAATTGCCAATGAGATATAC	165	GTAAATGTGAATCTTTCAACAC	166	53	54

59	284	49	CTTCTCAGCATTGGCAATAATC	167	GAGCTGACTACATATAGATGAG	168	55	54

60	250	47	CAAAAATGTTTTATTCCAACTG	169	AAGATGTGGCTATTTAGAAGTC	170	52	53

61	283	47	TGAGTATTGATTATTGATAAAGG	171	CCACAGGATGTGTAATTTGAACC	172	52	54

62	270	48	GCAAATTGACTTATGTTTTTTGGGG	173	CATTCACTCCTTTAGTTAGCTC	174	52	53

63	208	48	GTGTATTGTCATATACTTACTCTCG	175	CTAGTTTTAGCGATTCCTGG	176	55	51

64	243	47	TTCTCTGGTTCTATATTTCC	177	CAGGTTACATAATACTAAGGAC	178	54	52

65	305	51	TTTGGACATGCTGGGATTATGG	179	TTCAGAAATTCCACCCTTCTCC	180	55	54

66	300	51	CCCATGTTTTCTTTTAGTAAGAGC	181	ATGAGCTGAAGCAAAGGTAGGC	182	56	54

67	301	48	TGAACTCACTGCTGCTCATCGG	183	TATCCTCTACATATTCTTTACAG	184	56	55

68	302	48	GGCAGAATGTGCATTAAATCTG	185	GGAGGAAGTGAGAATGAAAAAC	186	52	54

69	327	51	CAAGTGTATTCATATTGCTCTCTAG	187	GCCTAATGACAGATTAAGCAAG	188	57	55

70	333	48	CTAGCATAACAAGAAATAGATG	189	AATTTATGAGATGGCTTCATGC	190	53	55

71	301	53	GGAGTATGCACTTTCATTTTGC	191	ATGAGCTGTAAGGCTGACAATG	192	58	54

72	258	47	ATATTGAAGGACGGTTTAAGTG	193	TAAGTACATTTTCCATGTGTAC	194	54	53

73	409	52	CTGTGATGTTCTGGCTTTTTTC	195	ATTGCATTCCTCCATCTCAAAC	196	55	57

74	327	48	AGAATGCTAAAGTGAAAAACTC	197	GTTTTGAAATAGAAACAGAGAG	198	54	55

75	276	52	CTGCTGAGTGTAGTTTATCATG	199	GAGTGAAACTGGCTCATCCTTC	200	56, 58	53

76	267	48	TTTAAAAGCATGGAAACAGGAC	201	TATAATTGTCTCTATTTATGGC	202	54, 55	53

77	367	51	AGGAAATCAAACACTATGATGC	203	GATATCATGCACAAGAGCTGTG	204	56	56

78	342	46	TATGCTATTTCTACTTAAAAATTG	205	TTTGTTGGTACAATAACTTAGAGG	206	52	55

Sequence Analysis
The intron/exon structure of PKHDL1 was determined by comparison with genomic sequence using MacVector 7.0 and SIM4 (pbil.univ-lyon1.fr/sim4.html). The sequence of the murine Pkhdl1 transcript was determined by comparison of human PKHDL1 sequence and the Pkhdl1 cDNA clone, D86, to mouse genomic sequence using MacVector. The genomic sequence was used as the authentic sequence for the human and murine transcripts and the numbering of the transcript was from the start codon.
BLAST was used to screen for homologous sequences in the GenBank database. Comparison between orthologs, and fibrocystin and fibrocystin-L, was made by BLAST2 and the Pustell protein/DNA sequence alignment tool of MacVector. Protein domains were defined using the Pfam database (pfam.wustl.edu). To analyze protein topology the programs SOSUI (sosui.proteome.bio.tuat.acjp/sosuiframe0.html); TMHMM (v2.0) (cbs.dtn.dk/services/TMHMM-2.0) and SignalP (v2.0) (cbs.dtn.dk/services/SignalP/) were used. Potential N-glycosylation sites and phosphorylation sites were identified with MacVector and alignments made with the ClustalW (v1.4) program, within MacVector.
GenBank Accession Numbers
Fibrocystin (human) AAL74290; (mouse) AAN05018; D86 (mouse), NP_—619615; DKF2p586C1021, XP_—488-444; PKHDL1 cDNA clone ADBBEB10 (5′), AV706327; PKHDL1 genomic sequence (human) RP11-419L20, AC02001; (mouse) NW_—000106 (44,650K-44,800K). Fibrocystin-L related proteins: HGFR (mouse), NP_—032617; plexin 1 (mouse), NP_—032901; TMEM2 (human) NP_—037522; XP_—051860 (human) XP051860; hypothetical protein from C. aurantiacus, ZP_—00018581 and hypothetical protein from T. Tengeongensis, NP_—621862. GenomeScan and other predicted proteins similar to fibrocystin-L: Hs8 8205 30 32 2; Hs 8205 30231; Hs 8205 30 232 (human); LOC271264 (mouse) XP_—194970. Fugu PKHDL1, genomic sequence Scaffold 2621, CAAB01002621. PKHDL1 and Pkhdl1cDNA sequences, AY219181 and AY219182, respectively.

Example 1

Identification and Cloning of PKHDL1 and Pkhdl1

To identify the human ortholog of D86, the 1945 aa protein sequence was analyzed against the human genomic sequence by BLAST. The likely human ortholog was identified by a strong hit in genomic sequence from chromosome region 8q23 in the BAC clone RP11-419L20. Comparison of the genomic sequence to full-length fibrocystin using Pustell protein/DNA sequence alignment and BLAST showed that the homology on chromosome 8 extended over most of the length of the disease related protein, covering at least 150 kb of genomic DNA. This region not only contained homology to D86 but also matched the previously described cDNA, DKFZp586C1021, that is similar to the 3′ region of PKHD1, indicating that these cDNAs are part of the same large gene.
To clone the full-length human D86 ortholog, (the PKHD-like 1 gene, PKHDL1), a RT-PCR exon linking approach was used with primers located in exons strongly predicted by GenomeScan and NIX analysis, and by homology with fibrocystin. The full-length transcript was cloned as 16 overlapping fragments (see Table 1 for details) and the 5′ and 3′ ends of the mRNA identified and cloned by RACE strategies (as described above). RNA from human lung and adrenal was used for the RT-PCR and all products were cloned and sequenced. There was some evidence of alternative splicing, but sequence from the largest amplified fragment in each case was assembled into a contig containing an ORF of 12,729 bp. PKHDL1 has a 5′ untranslated region (UTR) of 104 bp and the putative start codon is the first in-frame ATG in the sequence. The start codon does not strongly match the Kozak consensus, but overall 5 of 13 sites, including +4 and -2, match the consensus. The 3′ UTR is 248 bp and has a typical polyadenylation signal preceding the site of polyA addition by 21 bp. The total transcript is 13081 bp.
Comparison of the PKHDL1 transcript to the genomic sequence showed that the gene contains 78 exons (see Table 4 for details) and the total genomic size of the gene is 167,918 bp.
Two splice donor sites (for IVS 8 and 67) have the non-canonical GC sequence rather than the typical GT. As is often found in the ˜0.5% of splice donor sites that have a GC, the rest of the donor sequence (at both exons) closely matches the splice site consensus. The transcriptional start of PKHDL1 is associated with a CpG island.
Many PKHDL1 exons were identified by gene prediction programs and GenomeScan defined the human and murine gene as three and two different genes, respectively (see Methods for details). Of the 78 PKHDL1 exons, 53 were predicted correctly, 3 exons had one different splice junction (one associated with the GC splice donor) and 22 exons were not predicted. A further 6 exons were predicted that were not found in the final transcript. Therefore, although these prediction programs are helpful to identify exons, RT-PCR and sequencing were required to define the most likely gene sequence. In this case, the availability of the D86 murine cDNA of Pkhdl1(exons 1-38) and human cDNA DKFZp586C1021 (exons 69-78) helped determine the structure of the gene.

TABLE 4

Intron/exon structure of human and murine PKHDL1

		Coding	Human
	Exon^▴	region position^□	IVS	Mouse IVS

Number	size (nt)	nt	aa	size (nt)	size (nt)

1	177*	1-73	1-25	1893	1767
2	90	74-163	25-55	16733	6795
3	145	164-308	55-103	948	916
4	109	309-417	103-139	1498	?
5	58	418-475	140-159	1409	1994
6	94	476-569	159-190	2866	4111
7	54	570-623	190-208	528	491
8	74	624-697	208-233	1299^†	980
9	43	698-740	233-247	3920	4065
10	71	741-811	247-271	1541	1440
11	111	812-922	271-308	2321	1926
12	90	923-1012	308-339	1527	2256
13	269	1013-1281	338-427	1152	464
14	92	1282-1373	428-458	2965	1695
15	160	1374-1533	458-511	281	377
16	136	1534-1669	512-557	1204	647
17	144	1670-1813	557-605	1570	606
18	158	1814-1971	605-657	1658	1281
19	114	1972-2085	658-695	2286	1661
20	150	2086-2235	696-745	1006	580
21	125	2236-2360	746-787	5551	1080
22	164	2361-2524	787-842	1257	173
23	173	2525-2697	842-899	4394	5891
24	148	2698-2845	900-949	1769	1093
25	155	2846-3000	949-1000	2183	1908
26	123	3001-3123	1001-1041	469	1128
27	106	3124-3229	1042-1077	3086	1900
28	111	3230-3340	1077-1114	1973	1467
29	165	3341-3505	1114-1169	983	869
30	122	3506-3627	1169-1209	1864	1753
31	133	3628-3760	1210-1254	440	764
32	196 (193)	3761-3956	1254-1319	1617	1516
33	143	3957-4099	1319-1367	422	602
34	105	4100-4204	1367-1402	627	639
35	189	4205-4393	1402-1465	750	739
36	171	4394-4564	1465-1522	559	312
37	227	4565-4791	1522-1597	758	731
38	985	4792-5776	1598-1926	2497	2813
39	249	5777-6025	1926-2009	946	637
40	150	6026-6175	2009-2059	1487	1821
41	175	6176-6350	2059-2117	974	921
42	157	6351-6507	2117-2169	437	370
43	157	6508-6664	2170-2222	1292	1402
44	80	6665-6744	2222-2248	476	670
45	130	6745-6874	2249-2292	1409	1073
46	130	6875-7004	2292-2335	3203	2392
47	242	7005-7246	2335-2416	1935	1750
48	137	7247-7383	2416-2461	2307	980
49	1030	7384-8413	2462-2805	1332	?
50	192	8414-8605	2805-2869	8348	3611
51	152	8605-8757	2869-2919	1238	1191
52	160	8758-8917	2920-2973	557	727
53	172	8918-9089	2973-3030	2154	970
54	89	9090-9178	3033-3060	351	498
55	149	9179-9327	3060-3109	1293	1728
56	130	9328-9457	3110-3153	1424	1253
57	119	9458-9576	3153-3192	1938	728
58	130	9577-9706	3193-3236	1474	1504
59	174	9707-9880	3236-3294	3130	1839
60	104	9881-9984	3294-3328	916	1864
61	130	9985-10114	3329-3372	771	1099
62	122	10115-10236	3372-3412	1666	375
63	91	10237-10327	3413-3443	3167	3522
64	149	10328-10476	3443-3492	82	83
65	123	10477-10599	3493-3533	1189	480
66	112	10600-10711	3534-3571	81	80
67	117	10712-10828	3571-3610	5555^†	3776^†
68	166	10829-10994	3610-3665	3170	3067
69	233	10995-11227	3665-3743	201	190
70	168	11228-11395	3743-3799	2512	2503
71	158	11396-11553	3799-3851	4235	1416
72	136	11554-11689	3852-3897	2861	2838
73	342	11690-12031	3897-4011	3677	2164
74	152	12032-12183	4011-4061	406	2434
75	147	12184-12330	4062-411-	342	283
76	154	12331-12484	411-4162	3397	1837
77	237 (258)	12485-12721	4162-4241	3059	4026
78	254^(209)^▪	12722-12729	4241-4243

^▴mouse size in brackets if different from human;
*human 5′ UTR = 104; mouse 5′ UTR not determined;
^†atypical GC splice donor;
^□in human and mouse to exon 32. Exon 32-77 mouse position 1 codon less. Exon 77-78 mouse position 6 codons more;
^human 3′ UTR = 248;
^▪ mouse 3′UTR = 201

To confirm the structure of PKHDL1 and determine if the D86 cDNA contained the entire mouse ortholog, the sequence of the mouse transcript was determined. The human transcript was compared to murine genomic sequence by BLAST and the NW000106 Mouse Supercontig was identified, indicating that Pkhdl1 is located in mouse chromosome region 15B3. Strong similarity between PKHDL1 and the murine genomic sequence (and also using the D86 cDNA) enabled the full-length Pkhdl1 ORF of 12,747 bp to be predicted. The mouse ORF from the D86 cDNA to the 3′UTR also was cloned as 14 RT-PCR fragments (see Table 2), confirming the structure of the gene. The intron/exon structure of Pkhdl1 is the same as its human counterpart with 78 exons and all exon sizes, except 1, 32, 77 and 78, the same in human in mouse (see Table 2 for details). The atypical GC splice donor to exon 67 is conserved in the mouse, but the exon 8 donor is GT in this organism. The murine Pkhdl1 gene is also associated with a CpG island.

Example 2

Tissue Expression of PKHDL1

Initially, PKHDL1 cDNAs of human and mouse were hybridized to multiple tissue northern blots but no clear bands were visualized. Faint smears were seen in many lanes (FIG. 1 a). The problem of visualizing PKHDL1 as a specific transcript by northern blotting appeared similar to that seen with PKHD1. In the case of PKHDL1, the problem of resolving the transcript as a discrete fragment was compounded by its low level of expression. Faint smears on the Northern blot may reflect particular sensitivity of this transcript to degradation or, as for PKHD1, the presence of multiple alternatively spliced transcripts. Therefore, RT-PCR was used to examine the tissue expression of this gene.
Analysis of human adult material showed expression in most tissues. Analysis of a fuller range of tissues was possible in mouse and this also showed that expression levels were low, with the product found in most tissues, both newborn and adult, after multiple cycle PCR (see FIG. 1 b). PKHDL1/Pkhdl1 therefore appeared to be expressed at a low level in most tissue types. The widespread low level of expression of PKHDL1 may reflect the association of a CpG island with the promoter of this gene. CpG islands often are associated with more widely expressed genes.
There was evidence of possible alternative splicing of PKHDL1 as several RT-PCR reactions generated more than one product. The GC splice donor found at two exon junctions is also often associated with alternative splicing. Furthermore, one human adrenal gland cDNA clone, ADBBEB10, that was fully sequenced, extends exon 18 a further 886 bp and leads to a novel 3′ UTR with an atypical, ATTAAA polyadenylation signal shortly before the site of polyA addition. D86, the originally described transcript of the 5′ part of Pkhdl1, has an extension of exon 38 into IVS38, producing a 3′ UTR of 62 bp, although no clear polyadenylation site is present in this sequence. It therefore seems likely that PKHDL1, like PKHD1, will generate many alternatively spliced transcripts, some predicted to produce secreted proteins (such as ADBBEB10 and D86), as well as the membrane bound form indicated by the longest ORF. There are significant regions of breakdown in homology between the two proteins (see FIG. 2 a). Alternative splice forms of the fibrocystin and fibrocystin-L proteins may match the other homolog better.

Example 3

Are Mutations to PKHDL1 Associated with ARPKD?

Previously, mutation analysis of PKHD1 revealed a population of clinically well-characterized ARPKD patients in which no mutation was identified. The homology of PKHDL1 to PKHD1 and expression in kidney and liver suggested that it could be a candidate as a second ARPKD gene. To test this hypothesis, ARPKD patients from seven families without definite PKHD1 mutations were screened for base-pair changes throughout the gene. The gene was partially screened in a further 13 PKHD1 mutation negative ARPKD patients (see Methods for details). The 78 PKHDL1 coding exons and flanking intronic sequences were amplified from genomic DNA and analyzed for base-pair mismatches by denaturing high-performance liquid chromatography (DHPLC; see Methods for details). This analysis revealed 17 exonic changes (see Table 5 for details), including silent changes at the amino acid level and conservative and non-conservative substitutions. No nonsense or deletion/insertion mutations were found. Seven non-conservative changes were screened in normal controls; five of these were found in that population (see Table 5), but two, Q/P702 and L/S1199, were not detected. Analysis to determine whether these two substitutions segregated with the disease was uninformative. The L/S1199 change, however, is probably not ARPKD associated as the family in which this change was detected (M52) also has the PKHD1 substitution, T36M. Although initially the significance of T36M was unclear, finding this change in other ARPKD families showed that M52 is a typical (PKHD1 mutated) ARPKD family. Q/P702 is conserved in the mouse ortholog, but not fibrocystin (where it is aspartic acid), and the pathogenic significance of this change remains unclear.
In summary, analysis of PKHDL1 in a group of ARPKD patients without detected PKHD1 mutations revealed a number of missense changes but no inactivating mutations (see Table 5). Although one non-conservative change was found only in an ARPKD family, overall the data did not provide compelling evidence associating this gene with ARPKD, even if this possibility cannot be entirely excluded. The lack of association of PKHDL1 with ARPKD is consistent with the major sites of expression of this gene in blood cell lineages.

TABLE 5

Detected variants in PKHDL1

	Amino Acid			Allele
	Position/			Frequency
	Nucleotide		Allele Frequency	Normal
Designation	Change	Exon	ARPKD Population	Controls

1/V164	490A/G	6	3/14
1227G/A	K409	13	3/14
1404C/T	Y468	15	2/14
1920T/C	N640	18	1/14
1965A/G	E655	18	1/14
Q/P702	2105A/C	20	1/14	0/200
T/A1192	3574A/G	30	4/14	13/100
L/S1199	3599T/C	30	1/14	0/100
G/V1223	3668G/T	31	1/14	1/200
R/S1514*	4540C/A	36	4/14	17/100
V/I1607	4819/G/A	38	1/14
Y/C1638	4913A/G	38	3/14	12/100
6621C/G	L2207	43	3/14
9084A/T	T3028	53	3/14
H/Q3050	9150C/G	54	4/14	14/64
D/E3607	10821C/A	67	1/14
V/I4220*	12658G/A	77	1/14

*Polymorphic change detected in cDNA sequence

Example 4

Cellular Expression of PKHDL1

To determine the cell types that express PKHDL1, tissue-specific cell lines were analyzed by RT-PCR (FIG. 1 c). These showed the highest level of expression in K562, an erythroleukemia cell line and stimulated T-cells, with the only other expressing cells being EBV transformed lymphoblasts. Expression limited to blood-derived cells is consistent with the database description of D86 as a lymphocyte-secreted protein. Furthermore, the tissue origin of described murine Pkhdl1ESTs is in thymus (n=5) and lymph node (n=2), as well as adrenal (n=2).
Detection of the PKHDL1 transcript in organs that are composed entirely of immune cell subtypes (spleen and thymus) as well as in activated T-cells and B lymphoblasts suggested that expression of PKHDL1 may be important within cells of the immune system. To determine whether the expression is confined to specific immune cell populations or to states of immune activity, flow cytometric sorting from murine lymphoid organs and in vitro activation protocols were carried out (see Methods for details). RT-PCR analysis of RNA isolated from these cells resulted in detection of Pkhdl1 at low cycle number only in activated bulk T-cells and purified CD4^+ve(helper) and CD8^+ve(cytotoxic) T-cells (FIG. 1 d). Strong in vitro stimulation of B-cells and inflammatory macrophages did not result in high-level expression. At high cycle number expression was also detectable in CD^4+ve, CD^8+ve(double positive) thymocytes, resting naïve and memory B-cells, unstimulated and stimulated peritoneal macrophages, resting CD4^+veand CD8^+veT-cells, NKT-cells, and both CD8^+veand CD8^−vedendritic cells.
In summary, analysis of highly purified cell populations from murine lymphoid organs indicated that fibrocystin-L is up-regulated in T-cells following activation and, therefore, may serve a specific function in cellular immunity. Increased expression of mRNA in purified helper (CD4^+ve) and cytotoxic (CD8^+ve) T-cells was detected following activating stimuli delivered by lectins, allogeneic antigen presenting cells (APCs), and immobilized antibodies to the T-cell receptor and the co-stimulatory receptor CD28. In contrast, strong activation stimuli failed to induce up-regulation of Pkhdl1 in memory phenotype B-cells and inflammatory macrophages.

Example 5

The Structure of Fibrocystin-L

Analysis of the longest ORF of the PKHDL1 sequence allowed the structure of the corresponding protein, termed fibrocystin-like (fibrocystin-L) to be determined. Fibrocystin-L was predicted to be larger than fibrocystin, with 4243 aa and a calculated unglycosylated molecular mass of 466 kDa. A signal peptide was predicted at the N-terminal end with cleavage after the sequence CAA (see FIG. 2 a). Analysis of likely transmembrane regions in fibrocystin-L gave conflicting results but the most likely structure (predicted by SOSUI) is of a single transmembrane domain, from 4213-4235 aa, leaving a short, 8 aa, cytoplasmic tail. The predicted topology was therefore similar to fibrocystin with a large, 4212 aa, extracellular region and single transmembrane pass (FIG. 3). The extracellular region contains 56 putative N-linked glycosylation sites indicating that this region may be highly glycosylated. A single potential protein kinase C phosphorylation site is found in the C-terminal tail at position 4239 aa.
The protein most similar to fibrocystin-L is fibrocystin, which is homologous from the N-terminal end to 4185 aa (see FIG. 2 a). In this region, the two proteins show homology of 25.0% and similarity of 41.5%. There is no significant homology between the two proteins in the transmembrane or short cytoplasmic regions. As with fibrocystin, the most clearly recognized protein domain in fibrocystin-L is the TIG/IPT domain. Analysis by Pfam indicated that fibrocystin-L contains 14 copies of this immunoglobulin-like fold, with three immediately after the signal peptide and the remaining 11 in tandem from 1067 aa to 2177 aa, with a gap between 1470 aa-1566 aa; almost one-third of the protein is in TIG domains (FIGS. 2 a and 3). FIG. 2 b shows that all the TIG domains closely match the TIG consensus and are similar to the corresponding domains of fibrocystin, the HGFR and members of the plexin family. In all the fibrocystin-L TIG domains, apart from TIG10, the cysteine residues, which are important to stabilize the domain through the formation of a disulfide bond, are present.
A second region of significant homology is to a protein of unknown function, TMEM2, and two related proteins. This region of homology also was noted with fibrocystin. Two TMEM regions of homology with the fibrocystins have been defined: TMEM domain-A (2180 aa-2375 aa) and -B (3032 aa-3376 aa; FIGS. 2 a and c). The size of TMEM-B has been extended further N-terminal compared to the area of homology that was previously described with fibrocystin. Interestingly, this homology is not only with the previously described proteins, TMEM2 and XP051860, but also to a newly described hypothetical protein from the thermorphilic, filamentous, photosynthetic bacteria, Chloroflexus aurantiacus. Indeed, the bacterial protein is more clearly related to the TMEM domains of fibrocystin and fibrocystin-L than either of the other human proteins as it does not require to be gapped to match the sequence (FIG. 2 c).
In summary, the description of a second member of the fibrocystin protein family has helped to refine the likely structure of these proteins. A notable difference between fibrocystin-L and fibrocystin is the length of the predicted cytoplasmic tail. In fibrocystin-L it is only 8 aa, while in fibrocystin it contains 192 aa and has several possible PKA, PKC and casein kinase phosphorylation sites. Although the short fibrocystin-L tail in humans has a single potential PKC site, this is not conserved in the mouse. Fibrocystin-L is predicted to have 14 TIG domains, far more than the seven predicted in fibrocystin by Pfam and other programs. Inspection of sequence alignments of the two proteins suggests that fibrocystin may have further TIG-like domains C-terminal to those defined previously (see FIGS. 2 a and 3). A second important region of homology that is present twice in the fibrocystins is with TMEM2 and related proteins. TMEM2, XP051860 and the newly described sequence in the filamentous bacteria C. aurantiacus, have a single copy of the TMEM repeat; in the first two proteins is it interrupted by addition sequence (see FIG. 2 c). In the TMEM2, XP051860 and C. aurantiacus proteins, as in the fibrocystins, this region is predicted to be extracellular. As C. aurantiacus is the only sequenced prokaryote with this protein domain, it appears likely that this may be an example of horizontal gene transfer.

Example 6

Fibrocystin-L Orthologs and Homologs

Murine fibrocystin-L is predicted to have 4249 aa with an overall identity of 81.9% and similarity of 90.0% to the human protein. This is higher than the corresponding figures of 72.6% and 83.1% for human and murine fibrocystin. The murine fibrocystin-L is predicted to have a signal peptide with cleavage at the corresponding position to the human protein and a similarly located single transmembrane domain, leaving a 6 residue C-terminal tail. Murine fibrocystin-L is also predicted to have 14 TIG domains and similar TMEM homology. Fifty-six N-linked glycosylation sites are predicted in the extracellular region of the protein but the C-terminal tail does not contain a PKC site.
BLAST analysis for related proteins in other species where the complete genomic sequence is available showed a fibrocystin-L ortholog in the fish Takifugu rubripes. Strong similarity is seen with several predicted proteins and the corresponding genome sequence, Scaffold 2621. The Fugu PKHDL1 ortholog is encoded by a genomic region of ˜30 kb. Interestingly, analysis of Fugu genomic sequence with fibrocystin only identified the PKHDL1 ortholog, but with a much lower score and E value than with fibrocystin-L. This indicates that Fugu has only PKHDL1 and no PKHD1 ortholog. Analysis of available genomic sequence from other eukaryotes and prokaryotes revealed no clear full-length orthologs of PKHD1 or PKHDL1. However, other significant regions of homology were detected in these species. The strongest homology was with the TMEM domain in C. aurantiacus as described above. The next most significant region was with a conserved hypothetical protein from the bacterium Thermoanaerobacter tengeongensis that has 11 TIG domains, two from 246 aa-332 aa and nine tandemly arranged from 580 aa. This protein also has a fibronectin type III domain, and a signal peptide indicating that it is a secreted protein. Other high scoring homologies were with other TIG domain proteins, most notably to plexin-like proteins, that typically have four such domains.

Example 7

EST Analysis

Expression data for fibrocystin-L was examined in human and mouse EST libraries (Table 6) using the NCBI database on the World Wide Web at ncbi.nlm.nih.gov/UniGene/. There was an overrepresentation of PKHDL1 expression (16/58 clones; 27.5%) in human ESTs originating from endometrial adenocarcinoma. Fifty percent (8/16) were from well-differentiated endometrial adenocarcinoma, 6/16 (37.5%) from moderately differentiated adenocarcinoma, and 2/16 (12.5%) originated from poorly differentiated tumors. A number of other epithelial cancers show upregulated expression of fibrocystin-L. The mouse ESTs occurred most commonly in thymus and pituitary gland (Rathke's pouch).

TABLE 6

Tissue distribution of ESTs for PKHDL1 in human and mouse
in Unigene database.

Human	Number	Mouse	Number

Uterus (adenocarcinoma)	16	Thymus	5
Other	8	Pituitary - Rathke's Pouch	4
Germinal B cell center;	7	Embryo	3
lymph node
Liver
	4	Lymph node	2
Brain	3	Mixed	2
Pancreas	3	Brain	1
Adrenal	2	Whole mouse	1
Muscle	2	One cell embryo	1
Mixed	2	Urinary bladder	1
Pooled glandular	2	Spleen	1
Head and neck	1	Colon tumor	1
Fetal	1
Pooled organ	1
Mixed embryo	1
Vascular	1
Adipose	1
Larynx	1
Lung	1
Pituitary	1
Total Human ESTs	58	Total Mouse ESTs	21

RT-PCR in human and mouse tissue and cell lines demonstrated PKHDL1 expression was seen in kidney, adrenal, brain, liver, lung, placenta, ovary, and tonsil. A limited number of human cancer tissues were also examined. Upregulation of PKHDL1 was observed to be 1.7× in endometrial cancer compared with the levels seen in normal endometrium by semi-quantitative RT-PCR using GAPDH as a control.

Example 8

Manufacture and Characterization of Fibrocystin-L Antibodies

A NusA (His)₆-tagged N-terminal human fibrocystin-L expressed protein was produced as follows and used to immunize mice. A fragment of PKHDL1 was amplified and cloned from EST clone ADBBEB10 (human adrenal tissue) containing the 5′ sequence of PKHDL1 (containing the sequence for the third TIG domain of fibrocystin-L (amino acids 288-656)) into pZERO, then subcloned into a modified pET-43a⁺ (Novagen) vector with a tobacco etch virus (TEV) protease site between the NusA/His₆ORF and the PKHDL1 region. See FIGS. 8A and 8B. The (His)₆NusA fibrocystin-L fusion protein was produced in E. coli AD494 and DE3 strains; soluble protein fractions were purified by metal affinity chromatography on an imidazole gradient followed by a round of size exclusion chromatography using a Superdex 200 column. The calculated weight of 102.86 kDa including the NusA construct of the fusion protein was confirmed by running IPTG induced bacterial cell lysates on an SDS-PAGE gel.
The PKHDL1 cDNA described above also was used to transfect PEAK cells (Edge Biosystems). Cells were transfected with the plasmid construct using Lipofectamine® (Invitrogen) and stable transfectants were selected using puromycin selection (2 μg/mL). This construct contains a C-terminal Pk tag (a 14 amino acid sequence derived from the P and V proteins of the paramyxovirus), Simian Virus 5 SV5, and an N-terminal FLAG tag after the signal peptide. The cDNA was prepared by utilizing an adaptor primer to obliterate a BamH1 site and generate a SAP1 site for subcloning of the coding sequence of PKHDL1 with SAP1/Not1 ends into a PEAK plasmid (a modified pIRES PURO/PEAK 10 plasmid).
BALB/c ByJ mice were immunized with a subcutaneous injection of the purified recombinant (His)₆NusA-tagged fibrocystin-L (uncleaved) protein (see FIG. 8C) emulsified in complete Freund's adjuvant (Difco) after a protein concentration step by centrifugation and buffer exchange to PBS (Millipore; Amicon Ultra centrifugal filter). Twenty-eight days after immunization, mice were screened for their immune responsiveness to the antigen.
Anti-sera from one mouse gave a strong positive band of the protein as confirmed by western blot and by overlay of tagged protein constructs. Confirmation of the identity of the prokaryotically expressed protein was confirmed by mass spectroscopy analysis of the expressed purified protein, which was excised from an acrylamide gel. Spleens were removed from immune-responsive mice three days after an intravenous boosting with antigen.
Single cell suspensions were prepared and red blood cells were removed by lysis with ammonium chloride potassium buffer (ACK). Lymphocytes and F/O myeloma cells (non-secreting myeloma derived from SP2/0 BALB/c myeloma cells) were mixed in a ratio of 2:1 and centrifuged to form a cell pellet. The cell pellet was resuspended in 1 mL of a 50% solution of polyethylene glycol 1540 (Baker) and RPMI, then incubated at 37° C. for 90 seconds, washed and resuspended in fresh Iscove's Modified Dulbecco's Medium (IMDM) with 10% fetal bovine serum (Hyclone). Aliquots of 100 μl then were added to each well of five microtiter plates (Costar 3595). Twenty four hours later, 100 μl IMDM culture medium was replaced with fresh medium supplemented with 1M Hypoxanthine, 4 mM Aminopterin and 0.16M Thymidine (HAT), which was added to each microtiter well. Every 3-4 days thereafter, 100 μl IMDM culture medium was replaced with fresh medium containing HAT, HT and complete medium without HAT successively over a period of approximately 14 days. Upon reaching 75% confluence, the culture supernatants were tested for the presence of fibrocystin-L antibody. The hybridomas of interest were cloned in limiting dilution cultures at 1 cell per microtiter well and later subcloned at 0.3 cells per microtiter well. Balb/c spleen cells serve as feeder layer for fusions (5×10⁴per well), cloning, and subcloning (3×10⁶per well). Positive subclones were isotyped and cryopreserved.
Twenty-three reactive clones from 960 supernatants tested were detected in an initial screen using an Immunetics Miniblotter 28® system for western blotting and membrane preparations from transfected PEAK cells stably expressing the fibrocystin-L partial construct (confirmed by the detection of a GFP tag by immunofluorescence microscopy). See FIGS. 8C and 8D. Simultaneous western lanes were run using primary antibodies to FLAG and PK tags as positive controls. The reaction was detected with goat anti-mouse IgG secondary antibodies conjugated to horseradish peroxidase (Dako), and developed using a chemiluminescent substrate with the results being recorded on film.
Initial positivity was obtained by western blot from 14/960 hybridoma supernatants. Seven cloning plates subsequently had band reactivity (from 192 wells) and monoclonal antibodies generated. De StGrowth (1980) J. Immunol. Methods 35:1-21. Positive reactors underwent additional rounds of re-screening using the same system. All positive supernatants detected a single prominent reactive band migrating at ˜75 kDa consistent with the size of the expressed tagged protein construct. No other reactive bands were detected from these lysates. Positive reactors underwent additional rounds of re-screening by western. Final repeated screenings of the chosen sub-cloned antibody supernatants from selected cells yielded 10 monoclonal antibodies (Table 7).

TABLE 7

Monoclonal antibodies generated to fibrocystin-L and isotype details

	Name	Isotype

	FibLA-2	IgG3 κ
	FibLA-10.1	IgG1 κ
	FibLA-10.2	IgG1 κ
	FibLA-4.1	IgG1 λ
	FibLA-4.2	IgG1 λ
	FibLA-11.1	IgG1 λ
	FibLA-11.2	IgG1 λ
	FibLA-11.3	IgG1 λ
	FibLA-13.1	IgG1 λ
	FibLA-13.2	IgG1 λ

To confirm that these antibodies were sensitive and specific, each antibody was used to detect endogenous human fibrocystin-L in various tissues by Western blotting. Peroxidase conjugated secondary antibody (goat anti-mouse IgG) was used to detect the primary antibodies. Each antibody was able to detect endogenous fibrocystin-L.
Endometrial carcinoma tissue showed the highest level of expression of fibrocystin-L protein, followed by moderate expression in activated T human lymphocytes, kidney, spleen and the erythroleukemia K562 cell line. A high molecular weight protein was detected at ˜500 kDa, with a second smaller protein detected in some tissues. This smaller product was clearly present in kidney, endometrium, and tonsillar tissue. Endogenous fibrocystin-L also was detected by Western blotting with mouse tissue membrane preparations; strongest levels were detected in kidney, followed by lung, thymus, liver, lymph node, spleen, and endometrium tissues using the FibLA4.1 and FibLA4.2 antibodies.

Example 8

Tissue Microarray Studies

Tissue microarray blocks were made from formalin-fixed paraffin embedded archival tissue, including 60 normal human tissue sections. The expression of fibrocystin-L was studied in the tissue microarray blocks using the FibLA-4.1 and FibLA-11.3 antibodies and compared with the staining using an isotype control antibody. The intensity of staining was graded on a scale of 0-4. Expression levels of fibrocystin-L were demonstrated to be at a low level across many tissues (Table 8). Staining was most intense in fallopian tube epithelium in the normal tissue dataset. Control slides stained with IgG1 isotype control antibody were negative. Staining intensity also was very prominent in thyroid epithelium (apical). Other sites with prominent staining were liver parencyhmal cells, particularly around the central veins (control liver sections were negative); adrenal cortex, in all three layers (zona fascicularis, reticulata and glomerulosa), gallbladder endothelium (apical and cytoplasmic), testis Leydig cells, and the breast duct epithelium and spleen red pulp. Follicles, the sites of B cells, were negative.
Cilia are reported to occur in several of these unusual sites of staining; testicular interstitium in fertile men (containing myoid cells, fibroblast-like cells and Leydig cells) have one cilium per cell in the 9+0 microtubule configuration (Takayama (1981) Int. J. Androl. 4:246-256). In thyroid follicular cells, cilia are thought to be in the 9+2 microtubule configuration. Although the number of cilia per cell seen in scanning electron microscopy studies has been debated, the most recent studies have determined one 9+2 cilium is present on each follicular thyroid cell with cilia extending into the follicular lumen and abnormal secondary cilia observed in studies of follicular carcinoma (Martin (1988) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol. 55:159-166). Single cilia are reported in the cells of rat cortex and medulla (Wheatley (1967) J. Anat. 101:223-237). Ward et al. also have previously detected polycystin-1 in Leydig cells of the testis and the adrenal cortex (Ward et al. (1996) Proc. Natl. Acad. Sci. USA 93:1524-1528).

TABLE 8

Fibrocystin-L expression in normal human tissues

		Inten-
Tissue	11.3	sity	Extent	4.1	Intensity	Extent

Thyroid	1	2	3	1	1	2
Esophagus LS	1	1	3	0	0	0
Esophagus XS	1	0	0	1	0	0
Stomach Mucosa LS	1	1	2	1	0	0
Stomach Muscularis	1	0	0	1	1	3
Small Bowel	1	0	0	1	0	0
Sigmoid Colon	1	0	0	1	0	0
Mucosa
Sigmoid Colon	1	0	0	0	0	0
Muscularis
Colon Submucosa	0	0	0	1	1	1
Tonsil	1	1	3	1	2	3
Adrenal	1	2	3	1	1	3
Pancreas	1	1	1	1	1	2
Ureter	1	1	3	1	1	4
Bladder Wall	1	0	0	1	0	0
Bladder Mucosa	1	0	0	0	0	0
Kidney	1	2	3	1	2	3
Lymph Node	1	1	1	1	0	0
(colon/rectum)
Spleen	1	2	3	1	2	2
Thymus,	1	0	0	1	0	0
preinvoluted
Thymus, involuted	1	0	0	1	0	0
Liver	1	2	4	1	2	4
Skeletal Muscle	1	1	1	1	1	1
Lung (bronchioles)	1	1	1	1	0	0
Heart (epicardium)	1	0	0	0	0	0
Heart (myocardium)	1	1	1	1	0	0
Fallopian tube	1	3	4	1	3	4
Endometrium	1	1	2	1	0	0
Breast Stroma	1	0	0	1	0	0
Breast Ducts	1	1	4	1	0	0
Hypodermis	1	0	0	0	0	0
Skin (thin)	1	0	0	1	0	0
Hippocampus	1	1	4	1	0	0
Cervix	1	1	3	1	0	0
Placenta	1	1	4	1	1	3
Testis	1	2	1	1	2	1
Gall Bladder	1	2	4	1	1	4
Spleen	1	2	2	1	2	2
Prostate	1	2	3	1	1	2

Example 9

Fibrocystin-L Expression in Kidney

Kidney tissue was examined by western blot and immunohistochemistry. There appeared to be predominantly proximal but also some distal tubule staining (both cytoplasmic in distribution). A granular pattern of cytoplasmic distal tubule type of staining also was detected, but it is possible this was artifactual. No staining was detected using an isotype control IgG1 primary antibody simultaneously tested. Positively staining proteinaceous casts also were found in medullary collecting ducts from normal kidney. Expression levels of fibrocystin-L were much lower when compared with fibrocystin levels detected using immunohistochemistry.

Example 10

Studies of Fibrocystin-L in Endometrium

As there was an overrepresentation of PKHDL1 cDNAs in endometrial carcinoma tissue, and since female reproductive tract epithelium (normal fallopian tube) appeared to have highest expression levels of the protein, human endometrial carcinoma sections were examined in greater detail using additional tissue microarrays. Custom-made high quality microarrays were obtained through the Mayo Clinic tissue array core facility and Mayo Clinic Endometrial Cancer Working group as follows.
Five micron sections were made from formalin-fixed, paraffin-embedded archival tissue using 1-mm punches mounted on glass slides, and dewaxed. Endogenous peroxidase activity was blocked with 0.03% hydrogen peroxide, all sections were subjected to heat-induced epitope retrieval by steaming in EDTA×40 minutes, and nonspecific binding sites were blocked in 5% Bovine Serum Albumin (Sigma) in PBS (pH 7.5). Immunohistochemical staining was performed on a Dako Autostainer (DakoCytomation, Carpenteria, Calif.). Staining was performed using the Dako Envision+system HRP using 3,3-diaminobenzidine (DAB) (Dako; Code K4006) with primary monoclonal antibody against fibrocystin-L (4.1 and 11.3 were found to be consistent for immunochemistry and western blotting). This system uses a two-step staining procedure that employs HRP-labeled polymer conjugated with secondary antibodies. The labeled polymer does not contain avidin or biotin and as such, nonspecific staining was avoided from endogenous avidin-biotin activity, which is often problematic in kidney and liver tissue. The tissue was immunostained with immunohistochemical staining optimized for human endometrium tissue. After deparaffinization in xylene, slides were rehydrated through a graded series of alcohol and placed in TBS-Tween 0.1% followed by a 5-minute block of the endogenous peroxidase from the Dako Envision+horseradish peroxidase kit. Antigen retrieval was then performed using steamer boiling in 1 mM EDTA pH 8.0 for 40 minutes and then rinsed and washed in TBS-Tween 0.1%. The primary antibody (4.1:1 in 50 dilution in fish blocking buffer; or mouse IgG1 isotype control (1:500 dilution; R& D Systems; Catalog MAB002)) was incubated on the tissue in a humidified container for 30 minutes, then washed in TBS-Tween 0.1% for 5 minutes on a horizontal shaker. The secondary antibody was added and incubated again for 30 minutes. After a further 5 minute wash step in TBS-Tween 0.1%, the peroxidase activity was visualized by incubating with DAB a at room temperature for 10 minutes then rinsing in H₂0. The slides then were counterstained in 1:20 hematoxolylin for two minutes at room temperature.
Stained slides were scanned and the digitized images were available for viewing along with the grid overlay (linked to an excel database containing the specimen information and pathologists annotations) on the Bacus Webslide® Browser software system (World wide web at bacuslabs.com) from a desktop PC. Grading of the normal human tissue slides for fibrocystin-L was evaluated relative to epithelial staining intensity in the corresponding negative control slide and performed by an experienced pathologist. The reviewer assigned a score of 0 for no staining, 1+ for weak staining, 2+ for moderate staining, and 3+ for strong staining. Eight slides containing human endometrial tissue, carcinoma of the endometrium in the same patients and other miscellaneous cancer tissues from the same index cases were also evaluated on the same slides. Endometrial cancer slides are currently being graded by an experienced pathologist with specific expertise in endometrial pathology. A preliminary analysis of the staining in these tissues was performed using a scale of 0-3 grading for immunoreactivity. Fifty-eight patient samples from a total of 191 cases of endometrial carcinoma were excluded from the paired analysis due to the absence of either the index normal or endometrial carcinoma tissue punch or an insufficient tissue section available for quantitation.
The microarrays from this cohort of cancer patients incorporated normal and endometrial cancer tissues but also additional sections of synchronous cancers occurring in 521 patients. There was widespread low intensity epithelial expression in the normal tissues but marked upregulation of the protein in the endometrial cancer tissue. Staining also was localized in the ciliated epithelium of endometrial epithelium and in several ovarian cancers. Fibrocystin-L immunoreactivity was variable within normal endometrium, with the majority of cases showing weak fibrocystin-L epithelial expression. A low level of stromal immunoreactivity also was seen in the majority of normal and endometrial cancer specimens.
The staining was cytoplasmic in normal endometrial endothelium. High levels of fibrocystin-L expression seemed to be localized in the apical membranes of endometrial cancer tissue in many of the cases where upregulated expression was seen. Staining in fallopian tube carcinomas also was examined (n=2), but the intensity seemed less than the levels detected in normal fallopian tube endothelium.

Example 11

Analysis of Fibrocystin-L in Endometrial Cancer Microarray Cohort

Of 521 patient samples in the dataset, 135 patients had staining data from both normal and cancer tissues. Eighty-six of 133 (65%) patient sections showed upregulation of the protein (median staining intensity) and the minority of cases showed either unchanged (43/133; 32%) or down-regulation (4/133; 3%) of the protein compared with normal endometrium from these same patients. Paired analysis (using median intensity of staining of normal and tumor tissues) demonstrated there was a statistically significant upregulation of fibrocystin-L in the cancers (p<0.0001; Wilcoxon signed rank test (n=135)). Similar analysis using maximal staining intensity demonstrated a statistically significant difference in the staining intensity (P<0.0001; t test) in these two groups and contingency table analysis using maximal grade intensity also demonstrated a statistically significant difference in the staining intensity (P<0.0027; Pearson test; see Table 9).
There was no correlation between survival (P=0.67; log rank test) or recurrence rates (0.96 NS: log rank test) and patients with high, medium or low expression of this protein. No correlation was observed between staining intensity of fibrocystin-L and histologic grade (n=178; P=0.10; Pearson), cancer stage, body mass index, age, depth of myometrial invasion or vaginal recurrence or hematologic spread. There did seem to be a trend toward significance between fibrocystin-L staining intensity and nodal invasion (P=0.19; NS).

TABLE 9

Contingency table analysis of maximum grade of staining intensity in
cancer of endometrium (Y axis) by maximum grade observed in normal
tissue sections (X axis)

Count (%)
Column %					Total
Row %	0	1	2	3	(%)

0	20	3	0	0	23
	14.81	2.22	0	0	17.04
	27.78	6.25	0	0
	86.96	13.04	0	0
1	16	21	1	1	39
	11.85	15.56	0.74	0.74	28.89
	22.22	43.75	7.69	50.00
	41.03	53.85	2.56	2.56
2	16	16	6	1	39
	11.85	11.85	4.44	0.74	28.89
	22.22	33.33	46.15	50.00
	41.03	41.03	15.38	2.56
3	20	8	6	0	34
	14.81	5.93	4.44	0	25.19
	27.78	16.67	46.15	0
	58.82	23.53	17.65	0
	72	48	13	2	Total = 135
	53.33	35.56	9.63	1.48

Example 12

Expression of Fibrocystin-L in Other Human Cancers

Immunohistochemistry also was used to examine the extent of fibrocystin-L staining in the 109 cases of synchronous cancers from this patient cohort (Table 10). A subset of the breast, ovarian and colon cancers seemed to have significant staining of fibrocystin-L. Eleven out of twenty two (50%) ovarian cancers and five out of thirty (20%) breast cancers had grade 2 or 3 staining intensity.

TABLE 10

Analysis of fibrocystin-L immunostaining in synchronous/metachronous
cancers studied in patients with endometrial carcinoma

		No. (%) with grade 2 or 3
Tissue	Number	intensity staining

Breast	30	5/30	(20%)
Ovary	22	11/22	(50%)
Colon	20	3/20	(15%)
Lymphoma	6	0/6	(0%)
Lung	5	2/5	(40%)
Thyroid	5	2/5	(40%)
Melanoma	2	1/2	(50%)
Omentum	4	1/4	(25%)
Bladder	2	0/2	(0%)
Skin/Vulva	2	0/2	(0%)
Skin (scalp)	1	0/0	(0%)
Mouth/Tongue	2	1/2	(50%)
Myeloma	1	1/1	(100%)
Renal clear cell	1	1/1	(100%)
Stomach	2	0	(0%)
GIST	1	0	(0%)
Perineal	1	0	(0%)
Appendix	1	2	(100%)
Cecum	1	0	(0%)

GIST; gastrointestinal stromal tumor

Expression of fibrocystin-L also was assessed in K562 erythroleukemia cells using the FibLA-4.1 monoclonal antibody. Fibrocystin-L was present in abundant amounts in the cytoplasm of K562 cells (confocal and immunoflouoresence microscopy data) as indicated in images of fixed permeabilized K562 cells grown in culture and also was detectable in membrane preparations from this cell line by western blot.

Other Embodiments

It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Claims

1. An isolated nucleic acid comprising a sequence encoding a fibrocystin-L polypeptide.

2. The isolated nucleic acid of claim 1, wherein said fibrocystin-L polypeptide is encoded by SEQ ID NO:1.

3. The isolated nucleic acid of claim 1, wherein said fibrocystin-L polypeptide is encoded by SEQ ID NO:2.

4. The isolated nucleic acid of claim 1, wherein said fibrocystin-L polypeptide comprises the amino acid sequence of SEQ ID NO:3.

5. The isolated nucleic acid of claim 1, wherein said fibrocystin-L polypeptide comprises the amino acid sequence of SEQ ID NO:4.

6. The isolated nucleic acid of claim 1, wherein said fibrocystin-L polypeptide comprises an amino acid sequence variant at a position selected from the group consisting of: position 702, position 1192, position 1199, position 1223, position 1514, position 1607, position 1638, position 3050, position 3607, and position 4220 of SEQ ID NO:3.

7. The isolated nucleic acid of claim 6, wherein said amino acid sequence variant is selected from the group consisting of: Pro at position 702, Ala at position 1192, Ser at position 1199, Val at position 1223, Ser at position 1514, Ile at position 1607, Cys at position 1638, Gln at position 3050, Glu at position 3607, and Ile at position 4220.

8. The isolated nucleic acid of claim 1, wherein said sequence comprises a sequence variant with respect to SEQ ID NO:1.

9. The isolated nucleic acid of claim 8, wherein said sequence variant is at a position selected from the group consisting of: position 1227, position 1404, position 1920, position 1965, position 2105, position 3574, position 3599, position 3668, position 4540, position 4819, position 4913, position 6621, position 9084, position 9150, position 10821, and position 12658 of SEQ ID NO:1.

10. The isolated nucleic acid of claim 9, wherein said sequence variant is selected from the group consisting of: A at position 1227, T at position 1404, C at position 1920, G at position 1965, C at position 2105, G at position 3574, C at position 3599, T at position 3668, A at position 4540, A at position 4819, G at position 4913, G at position 6621, T at position 9084, G at position 9150, A at position 10821, and A at position 12658.

11. An isolated nucleic acid encoding a fibrocystin polypeptide, wherein said nucleic acid comprises at least 300 contiguous nucleotides of SEQ ID NO:1 or a sequence variant thereof.

12. A vector comprising the isolated nucleic acid of claim 11.

13. Host cells comprising the vector of claim 12.

14. An isolated nucleic acid 10 to 1700 nucleotides in length, said nucleic acid comprising a sequence, said sequence comprising one or more sequence variants relative to the sequence of SEQ ID NO:1, and wherein said sequence is at least 80% identical over its length to the corresponding sequence in SEQ ID NO:1.

15. The isolated nucleic acid of claim 14, wherein said sequence variant is at a position selected from the group consisting of: position 1227, position 1404, position 1920, position 1965, position 2105, position 3574, position 3599, position 3668, position 4540, position 4819, position 4913, position 6621, position 9084, position 9150, position 10821, and position 12658 of SEQ ID NO:1.

16. A plurality of oligonucleotide primer pairs, wherein each primer is 10 to 50 nucleotides in length, and wherein each said primer pair, in the presence of mammalian genomic DNA and under polymerase chain reaction conditions, produces a nucleic acid product corresponding to a region of an PKHDL1 nucleic acid molecule, wherein said product is 30 to 1700 nucleotides in length.

17. The plurality of primer pairs of claim 16, wherein said nucleic acid product comprises a nucleotide sequence variant relative to SEQ ID NO:1.

18. The plurality of primer pairs of claim 16, wherein said plurality comprises at least three primer pairs.

19. The plurality of primer pairs of claim 16, wherein said plurality comprises at least thirteen primer pairs.

20. The plurality of primer pairs of claim 16, wherein said plurality comprises at least sixteen primer pairs.

21. The plurality of primer pairs of claim 16, wherein said plurality comprises at least twenty-three primer pairs.

22. A composition comprising a first oligonucleotide primer and a second oligonucleotide primer, wherein said first oligonucleotide primer and said second oligonucleotide primer are each 10 to 50 nucleotides in length, and wherein said first and second primers, in the presence of mammalian genomic DNA and under polymerase chain reaction conditions, produce a nucleic acid product corresponding to a region of a PKHDL1 nucleic acid molecule, wherein said product is 30 to 1700 nucleotides in length.

23. The composition of claim 22, wherein said nucleic acid product comprises a nucleotide sequence variant relative to SEQ ID NO:1.

24. An isolated nucleic acid comprising the nucleotide sequence of SEQ ID NO:1 or SEQ ID NO:2, or the complement of SEQ ID NO:1 or SEQ ID NO:2.

25. An antibody having specific binding affinity for a fibrocystin-L polypeptide.

26. A method for determining if a subject has altered cellular immunity, said method comprising providing a nucleic acid sample from said subject, and determining whether said nucleic acid sample contains one or more sequence variants within the PKHDL1 gene of said subject relative to a wild-type PKHDL1 gene, wherein the presence of said one or more sequence variants is associated with altered cellular immunity in said subject.

27. The method of claim 26, wherein said nucleic acid sample is genomic DNA.

28. The method of claim 26, wherein said determining step is performed by denaturing high performance liquid chromatography or direct sequencing.

29. The method of claim 26, wherein said variant is at position 2105, position 3574, position 3599, position 3668, position 4540, position 4913, or position 9150 of SEQ ID NO:1.

30. The method of claim 26, further comprising identifying said sequence variant by DNA sequencing.

31. An article of manufacture comprising a substrate, wherein said substrate comprises a population of isolated nucleic acid molecules, wherein each said nucleic acid molecule is 10 to 1000 nucleotides in length, wherein each said nucleic acid molecule comprises a different nucleotide sequence variant relative to the sequence of SEQ ID NO:1, and wherein said nucleic acid molecule is at least 80% identical over its length to the corresponding sequence in SEQ ID NO:1.

32. A method for monitoring the immune response of a patient after vaccination, said method comprising a) providing a biological sample from said patient after vaccination; b) determining the number of fibrocystin-L expressing T-cells in said biological sample; and c) comparing the number of fibrocystin-L expressing T-cells to a baseline number of fibrocystin-L expressing T-cells before vaccination.