WO2004001034A2 - Regulation of gene expression using intron 1 of the ly-6 gene superfamily - Google Patents

Regulation of gene expression using intron 1 of the ly-6 gene superfamily Download PDF

Info

Publication number
WO2004001034A2
WO2004001034A2 PCT/GB2003/002652 GB0302652W WO2004001034A2 WO 2004001034 A2 WO2004001034 A2 WO 2004001034A2 GB 0302652 W GB0302652 W GB 0302652W WO 2004001034 A2 WO2004001034 A2 WO 2004001034A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
gene
nucleic acid
intron
human
Prior art date
Application number
PCT/GB2003/002652
Other languages
French (fr)
Other versions
WO2004001034A3 (en
Inventor
Begona Aguado
Robert Duncan Campbell
Meera Mallya
Original Assignee
Medical Research Council
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medical Research Council filed Critical Medical Research Council
Priority to AU2003277979A priority Critical patent/AU2003277979A1/en
Publication of WO2004001034A2 publication Critical patent/WO2004001034A2/en
Publication of WO2004001034A3 publication Critical patent/WO2004001034A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2840/00Vectors comprising a special translation-regulating system
    • C12N2840/44Vectors comprising a special translation-regulating system being a specific part of the splice mechanism, e.g. donor, acceptor

Definitions

  • the present invention relates to isolated nucleic acid molecules, vectors and host cells comprising said molecules, and in vivo and in vitro methods of regulating the expression of a gene.
  • the Lymphocyte Antigen 6 (LY-6) protein domain is approximately 80 amino acids long, characterized by a conserved pattern of 8-10 cysteine residues that have a defined pattern of disulphide bonding.
  • Most members of the LY-6 superfamily are extracellular GPI (glycosyl phosphatidylinositol) anchored proteins, such as CD59, the uro inase plasminogen activator (uPA) receptor (uPAR) and sperm acrosomal protein (SP-10) (reviewed by Palfree, 1996 Tissue 48, 71-79).
  • LY-6 superfamily members have been identified in a number of different organisms, including a wide variety of mammalian species such as human, mouse, rat, fox and baboon; amphibian species such as newt and frog; and in invertebrates such as squid and C. elegans (Chou et al, 2001 Genetics 157, 211-224).
  • LY-6 superfamily members in mammals are not known, except for CD59, which is an inhibitor of the complement cascade, inhibiting the formation of the membrane attack complex (Davies et al, 1989 J. Exp. Med. 170, 637-654), and uPAR, which plays an important role in proteolysis of extracellular matrix proteins (Tarui et al, 2001 J. Biol. Chem. 276, 3983-90).
  • CD59 is an inhibitor of the complement cascade, inhibiting the formation of the membrane attack complex
  • uPAR which plays an important role in proteolysis of extracellular matrix proteins (Tarui et al, 2001 J. Biol. Chem. 276, 3983-90).
  • Genes of the LY-6 superfamily are frequently found in clusters. On human chromosome 8 (8q24-qter), a cluster of five LY-6 family members has been identified, (Brakenhoff et al, 1995 J. Cell Biol.
  • Human E48 and mouse ThB, human RIG-E and mouse TSA-l/Sca-2, and human and mouse Ly-6h are thought to be orthologous genes.
  • the human orthologues of the remaining members of the murine cluster have not yet been identified.
  • the members of this murine cluster have been well studied, and a possible role for these LY-6 family members in T cell activation, differentiation and maturation has been suggested.
  • Ly- 6c can regulate endothelial adhesion and homing of T cells by activating integrin dependent pathways (Hanninen et al, 1997 Proc. Natl. Acad. Sci. USA 94, 6898-6903) and that Ly- 6A has a role in mediating cell-cell adhesion (Bamezai et al, 1995 Proc. Natl. Acad. Sci. USA 92, 4294-98).
  • a novel LY-6 cluster has been described in the class III region of the Major Histocompatibility Complex (MHC) (Ribas et al, 1999 J. Immunol. 163, 278-87; The MHC Consortium 1999 Nature 401, 921-923).
  • MHC Major Histocompatibility Complex
  • the human MHC is located at chromosome 6p21.3, and is " 4Mb in length. It consists of three regions: class I and class II, which flank a central class III region.
  • the complete sequence of the human MHC has been determined and 224 genes have been identified so far, of which approximately half are predicted to be pseudogenes. Of the expressed genes, " 40% have an immune related function.
  • the central class III region is ⁇ 0.8Mb in length and contains 59-63 genes and 0-2 pseudogenes, depending on the haplotype. Of the predicted genes, at least 24 (41 %) have a definite or potential role in the immune system.
  • the mouse MHC class III region located at chromosome 17, has also been completely sequenced in the inbred laboratory mouse 129/Sv strain (Lee Rowen, http://clrroma.mbt.washington.edu/msg_www/PROJECTS/mmhc.html), showing a complete conservation of all class III genes as well as conservation of gene order.
  • Y6G6C, LY6G6D, LY6G6E, LY6G5C and Y6G5B genes are predicted to be members of the LY-6 superfamily of proteins based on translations of the predicted gene sequences.
  • prokaryotic and eukaryotic genes are structurally similar to prokaryotic and eukaryotic genes. These are non-coding sequences which are initially represented in transcripts but which are generally removed by "splicing" to leave a mature mRNA molecule containing "exons" (i.e. the coding sequences which are translated into a polypeptide).
  • NMD Nonsense-mediated decay
  • the invention provides an isolated nucleic acid molecule which comprises at least part of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from animal, or a corresponding sequence.
  • the present inventors have surprisingly found that inclusion of such a sequence in an RNA transcript markedly increases the stability of the transcript.
  • the term "at least part of the sequence of intron 1" as used herein is intended to refer to a portion of at least 10 contiguous bases of the intron 1 sequence, preferably at least 20, more preferably at least 30, and most preferably at least 40 contiguous bases.
  • the isolated nucleic acid molecule of the invention may comprise other portions of a human LY-6 superfamily gene, such as portions of other introns or other 5' portions of the gene such as portions of the first and or second exons, but specifically does not encompass nucleotide sequences which include the whole of a LY-6 superfamily gene.
  • substantially the only LY-6 superfamily gene sequence present in the isolated nucleic acid molecule of the invention is representative of, derived from, or identical to, part or all of the intron 1 sequence.
  • an LY-6 superfamily gene is considered as one which codes for an LY6 domain-containing polypeptide
  • a homologous gene is one which is obtainable from an animal (conveniently a mammalian) and, when compared using a commercially available sequence alignment program, (such as Clustal/BLAST) exhibits at least 25% sequence identity (preferably at least 40%, more preferably at least 50% and most preferably at least 60%) with a human LY-6 superfamily gene.
  • sequence alignment program such as Clustal/BLAST
  • a "corresponding sequence” is a sequence of 10-150 nucleotides, preferably 20-100 nucleotides, more preferably 30-100 nucleotides, and most preferably 40-100 nucleotides, and which, when compared using a commercially available sequence alignment program (such as Clustal/BLAST) exhibits at least 25% sequence identity (preferably at least 40%, more preferably at least 50% and most preferably at least 60%) with intron 1 of LY6G6D or LY6G5B.
  • a commercially available sequence alignment program such as Clustal/BLAST
  • An LY-6 domain may be defined as one which has a conserved pattern of disulphide bridges (as illustrated schematically in Figure 5) formed by 10 cysteine residues in which:
  • LY-6 domains Other characteristics which will normally (but not invariably) be found in a LY-6 domain are - (i) a typical length of 75-85 amino acid residues and (ii) the presence of a glycosyl phosphatidylinositol ("GPI") membrane anchor "recognition" sequence.
  • GPI glycosyl phosphatidylinositol
  • genes included in the LY-6 superfamily include the following:
  • RIG-E RIG-E, PSCA, ThB, TSA-1/Sca2, and all LY-6 genes (such as LY-6A to I etc.).
  • LY-6 gene For brevity an LY-6 superfamily gene will be referred to hereafter as a "LY-6 gene” and the term “LY-6 gene” should therefore be construed accordingly unless the context dictates otherwise.
  • the intron 1 portion of any LY-6 gene or animal homologue thereof can readily be identified, with the benefit of the present disclosure, using standard techniques and sequence comparisons within the ambit of those skilled in the art.
  • the isolated nucleic acid molecule may contain a sequence which deviates slightly from a base sequence identical to that of intron 1 of a LY-6 gene or mammalian homologue thereof. For example a number of bases may be substituted, inserted, inverted, deleted (or any combination thereof) without substantially disrupting the functionality thereof. Indeed, some changes to a naturally- occurring intron 1 sequence may be desirable in order to optimise the NMD-resistance phenomenon observed by the present inventors.
  • the part or whole of the intron 1 sequence present in the isolated molecule of the invention will be at least 80% identical with a naturally-occurring intron 1 sequence, preferably at least 85% identical, more preferably at least 90% and most preferably at least 95% identical.
  • the isolated nucleic acid molecule of the invention will comprise all or part of intron 1 of the LY6G6D or LY6G5B genes, or have the aforementioned degree of sequence identity with intron 1 of those genes.
  • the inventors have identified a 45 nucleotide consensus sequence which appears to be conserved to some extent within the LY-6 superfamily. This sequence is shown in Figure 8 A and is labelled as "multilevel consensus sequence" . Variants of the consensus sequence may comprise one or more of the alternative bases shown in Figure 8 A immediately below the consensus sequence.
  • the isolated nucleic acid molecule of the invention may conveniently comprise the multilevel consensus sequence or one of its variants. Initial analysis by the inventors has indicated that the consensus sequence may be present in an intron in either orientation (see Figure 8B).
  • the isolated nucleic acid molecule of the invention may be prepared by any conventional means, such as cloning of restriction digest fragments containing relevant portions of a LY-6 gene or mammalian homologue thereof, by PCR or de novo synthesis in vitro or any other known method.
  • the isolated molecule may be single stranded or, more preferably, double stranded.
  • the isolated molecule will typically comprise or consist of DNA, but may additionally or alternatively comprise or consist of RNA, PNA, LNA ("locked" nucleic acid) or other nucleic acid analogues e.g. containing modified bases such as inosine, hypoxanfhine and the like and/or containing modifications to the sugar/phosphate backbone.
  • the isolated nucleic acid molecule of the invention is effective, when placed in operable combination with another transcribable nucleic acid sequence, to confer a degree of resistance to NMD in respect of transcripts of the other transcribable nucleic acid sequence.
  • the isolated sequence may be inserted, preferably, after the first exon of a gene to be transcribed, or in the 5' UTR. Less preferably the sequence may be inserted elsewhere (e.g. in the 3' UTR).
  • the invention thus provides a nucleic acid construct comprising the isolated nucleic acid molecule of the first aspect.
  • the construct will typically be in the form of a replicable vector, such as a plasmid, cosmid, yeast artificial chromosome, replicable viral genome or the like.
  • a replicable vector such as a plasmid, cosmid, yeast artificial chromosome, replicable viral genome or the like.
  • the construct will typically additionally comprise a further transcribable sequence, which will normally be positioned in operable combination with the molecule of the first aspect.
  • the construct will typically comprise one or more promoters operable in a eukaryotic, typically mammalian (and preferably human) cell.
  • promoters may be promoters e.g. from human genes, or viral promoters. Numerous examples of each type are known and available to those skilled in the art.
  • a nucleic acid construct may comprise the nucleic acid molecule of the first aspect in either orientation - since only two orientations are possible it is a simple task for those skilled in the art to determine the optimum orientation. (The inventors have made some preliminary findings that suggest that, at least in some situations, orientation may be significant.) In addition it may be that more than one copy of the molecule of the first aspect may be included in the construct (e.g. two, three or more copies).
  • optimum results might be achieved by replacing every intron of a gene with molecule in accordance with the first aspect. These may be adjacent to each other in the construct or separated. If present in a plurality of copies in a single construct, they may be all in the same orientation or in a variety of orientations. Again, it is routine for the person skilled in the art, with the benefit of the present disclosure, to perform the necessary experiments to identify the optimum copy number and arrangement within the construct.
  • the invention provides a method of regulating the expression of a nucleic acid sequence, the method comprising the step of placing the nucleic acid sequence (the expression of which is to be regulated) in operable combination with a regulatory nucleic acid sequence, which regulatory sequence comprises a molecule in accordance with the first aspect of the invention.
  • the invention may allow for either up-regulation of the expression of a gene (especially a poorly-expressed gene) or for the down-regulation of the expression of a gene, as explained below.
  • a regulatory nucleic acid sequence which confers a degree of resistance to NMD, as defined in the third aspect of the invention above.
  • this will be achieved by incorporating part or all of the isolated sequence of the invention into the transcribed portion of the nucleic acid sequence to be expressed, by means of recombinant DNA technology.
  • this must be performed in such a way as to avoid inducing a frameshift or a premature 'stop' codon, which would lead to production of nonfunctional polypeptides or prematurely terminated transcripts.
  • NMD is known to be a very widespread phenomenon, having been documented in many different mammalian cells, in plant cells (Isshiki et al, 2001 Plant Physiol 125, 1388-1395) and in yeast cells. It is possible therefore that the present method of protecting nucleic acids in general, and transcripts in particular, should be generally applicable in all eukaryotic cells.
  • the method of the third aspect of the invention may be performed in vitro or in vivo.
  • Methods of introducing nucleic acids into cells and tissues in culture in vitro are well- known to those skilled in the art. Such methods include transfection, transformation, transduction, electroporation etc. Any suitable method may be employed. Where the method is performed in vivo, it will generally be necessary to introduce into a human or other mammalian subject a suitable vector (such as a safe viral vector, e.g. a retrovirus, adeno virus, or vaccinia virus).
  • a suitable vector such as a safe viral vector, e.g. a retrovirus, adeno virus, or vaccinia virus.
  • the invention also provides an embodiment in which the molecule of the first aspect of the invention can be utilised to cause inhibition of expression of a particular selected endogenous target gene or genes: by introducing into a cell or tissue (in vitro or in vivo) a vector or other nucleic acid directing the expression of one or more mis-spliced, or truncated, or otherwise defective transcript variants of a target gene, it may be possible to inhibit the expression of the endogenous target gene, the defective variants being protected against NMD by inclusion therein of a NMD-resistance conferring sequence (e.g. a part or whole of intron 1 of a Ly-6 gene or mammalian homologue thereof) in accordance with the first aspect of the invention.
  • a NMD-resistance conferring sequence e.g. a part or whole of intron 1 of a Ly-6 gene or mammalian homologue thereof
  • target genes might be suitable candidates for inhibition of expression including in particular oncogenes or any other genes the products of which are known or believed to be associated with a pathological condition.
  • target genes include, for instance: Glutathione S-transferase pi, expression of which is down- regulated in patients with Barrett's esophagus and esophageal adenocarcinoma (see Brabender et al, J. Gastrointest. Surg. 2002 6, 359-367); comparative gene identification (CGI) 94, which is down-regulated at the mRNA level in the hippocampus of early stage Alzheimer's disease brains (Heese et al, Eur. J. Neurosci.
  • CGI comparative gene identification
  • Bak a member of the Bcl-2 family of proteins, is overexpressed at the mRNA level in coeliac disease patients (Chernavsky et al, Autoimmunity 2002, 35, 39-37); and Apolipoprotern LI (apoLl), apo L2 and apo L4 are all up-regulated at the mRNA level in schizophrenia (Mimmack et al, Proc. Natl. Acad. Sci. USA 2002 99, 4680-5).
  • the method of regulating the expression of a gene or other nucleic acid sequence in accordance with the invention, has the advantage of conferring very sequence-specific regulation, in that the expression of unrelated genes should be unaffected.
  • the sequence to be transcribed in order to perform the method of the invention it will typically be necessary to engineer the sequence to be transcribed so as to include a regulatory sequence corresponding to the isolated molecule of the first aspect of the invention.
  • the sequence will be inserted towards the 5' end of the sequence to be transcribed, for example at or near the first intronic portion of the sequence to be transcribed or near the 5' UTR.
  • the regulatory sequence may be inserted, for example, as an addition to an endogenous first intronic portion or may be used to replace part or all of the endogenous first intronic portion or part of the 5' UTR.
  • This may be achieved by the use of naked RNA, especially where the RNA sequence involved is very short and especially when the method is being performed in vitro.
  • it may be more convenient to introduce into the cell a DNA vector which causes the synthesis of appropriate RNA transcripts.
  • Methods of introducing DNA into a host cell are well known to those skilled in the art and include transduction, transfection, transformation, electroporation and "biolistic" methods.
  • the invention also provides for the use of a recombinant nucleic acid molecule (whether as part of a vector or otherwise) comprising the isolated molecule of the first aspect of the invention, in the preparation of a medicament to regulate expression of a target gene in a eukaryotic (typically mammalian, preferably human) subject.
  • a recombinant nucleic acid molecule (whether as part of a vector or otherwise) comprising the isolated molecule of the first aspect of the invention, in the preparation of a medicament to regulate expression of a target gene in a eukaryotic (typically mammalian, preferably human) subject.
  • the invention provides a pharmaceutical composition
  • a pharmaceutical composition comprising a recombinant nucleic acid molecule (whether as part of a vector or otherwise) and a physiologically acceptable excipient, carrier or diluent, wherein the recombinant nucleic acid molecule comprises the isolated molecule of the first aspect of the invention.
  • Physiologically acceptable excipients, carriers or diluents include sterile saline solution, phosphate buffered saline, and the like.
  • the pharmaceutical composition may typically be administered by injection, which may be intravenous, subcutaneous, intramuscular etc, but may alternatively be administered by any safe and convenient route (e.g. intranasal, oral, rectal etc).
  • Effective amounts of the composition to be delivered can readily be ascertained, with the benefit of the present disclosure, by routine trial and error typically an effective dose of the pharmaceutical composition will comprise between l ⁇ g and lmg of nucleic acid per Kg of body weight of the subject in question.
  • the invention provides a host cell transformed with a nucleic acid molecule in accordance with the first aspect of the invention or a nucleic acid construct in accordance with the second aspect of the invention.
  • the host cell may be a prokaryotic cell (e.g. bacterial cell) used for the purpose of replicating the nucleic acid molecule/construct.
  • the host cell may be any eukaryotic cell e.g. mammalian cell, yeast cell or plant cell.
  • the host cell may be a cell in which the expression of an endogenous gene is being regulated.
  • Figure 1 is a schematic representation of that part of the MHC class III region containing the cluster of LY-6 superfamily genes.
  • the five members of the LY-6 superfamily are indicated by filled boxes and in bold text.
  • Arrows below genes indicate direction of transcription. Numbers along the bottom line indicate the approximate distance in kilobases (kb) from the class II region of the MHC (HLA DRA gene).
  • Figure 2 is a schematic representation of the intron/exon structure, encompassing the ATG to the stop codon, of the five LY-6 superfamily members in the MHC Class III region. Numbers indicate size in base pairs (bp). Protein domains are illustrated as follows: diagonal line shading indicates signal peptide, C indicates a cysteine residue in the LY-6 domain and wavy line shading indicates the C-terminal hydrophobic region. Horizontal line shading indicates an unknown domain (LY6G5b only).
  • Figures 3(a)-(f) are composite figures showing the results of RT-PCR analysis of human cell lines or tissues.
  • Part (i) of each figure is a picture of gel analysis of RT-PCR products from a variety of human cell lines (from left to right; K562, U937, HL60, Molt4, Jurkat, Raji, 143B and HeLa). Marker sizes (in Kilobases) are shown to the left of part (i) of each figure.
  • Part (ii) of each figure is a picture of gel analysis of RT-PCR products from a variety of human tissues (from left to right: fetal liver, lung, kidney, spleen and brain; and adult liver, lung and kidney).
  • Part (iii) of each figure is a schematic representation of the RT-PCR products detected.
  • Figures (a)-(f) relate, respectively, to LY6G6C, LY6G6D, LY6G6E, LY6G5CA, LY6G5CB and LY6G5B.
  • Protein coding domains are illustrated as follows: diagnonal line shading indicates signal peptide, C indicates LY-6 domain and wavy line shadmg indicates C-terminal hydrophobic region. Horizontal line shading indicates unknown domain (LY6G5B), and the dotted area indicates the alternatively spliced part of LY6G6D. An X indicates a premature stop codon. Arrows indicate the position on the gels of the predicted product.
  • Figures 4(a)-4(e) are similar to Figure 3.
  • Part (i) of each figure is a picture of gel analysis of RT-PCR products of various mouse cell lines (from left to right: L929, RAW264, WEHI-3B, WEHI-231 and EL4); part (ii) of each figure is a picture of gel analysis of RT- PCR products of various mouse tissues (from left to right: lung, kidney, brain).
  • Part (iii) of each figure is a picture of gel analysis of RT-PCR products from (left to right) mouse liver, mouse spleen and tissue from mouse embryos at 12.5 and 15.5 days post-conception, respectively.
  • Part (iv) of each figure is a schematic representation of the RT-PCR products detected (legend as for Figure 3).
  • Figures 4(a)-(e) relate, respectively, to Ly6G6c, Ly6G6d, Ly6G6e, Ly6G5c and Ly6G5b.
  • Figure 5 is a schematic illustration of the conserved pattern of disulphide bridges found in LY-6 superfamily protein domains.
  • Figures 6a-g and 7a-f are photographs of gel electrophoretic analysis of various samples, indicating that transcripts of LY6GB and LY6G5B are not subject to NMD.
  • Figures 8 A-B are graphical representations of the output of the MEME computer program analysis of various of the retained introns. The order of the introns in Figure 8 A is the same as that in Figure 8B.
  • Figure 8A shows the consensus sequence of the retained introns ("multilevel consensus sequence") and the degree of conservation.
  • Figure 8B shows the location and orientation of the motif in the intron.
  • Figure 9 A illustrates, schematically, various plasmid constructs employed by the inventors in performing luciferase assays
  • Figure 9B is a bar chart showing the results (average RRR values i.e. "relative response ratio"), expressed as percentage maximal response, of luciferase assays performed using various constructs.
  • the typical Ly-6 protein domain is approximately 80 amino acids long, characterised by a conserved pattern of 8-10 cysteine residues that have a defined pattern of disulphide bonding (see Figure 5).
  • the consensus sequence of the Ly-6 domain extends from the first cysteine to the sixth, and there is a generally conserved spacing pattern as follows [EQR]- C-[LIVMFYAH]-x-C-x(5,8)-C-x(3,8)-[EDNQSTV]-C- ⁇ C ⁇ -x(5)-C-x(12,24)-C. (Prosite:http : //www.protomap .cs .huji. ac. il/Amino/Prosite/By Family /LY6JUPAR) .
  • the MHC class III region contains a cluster of five genes coding for potential LY-6 superfamily members, lying "100-150 kb centromeric of the TNF cluster (Fig. 1).
  • the exon/intron structure of these five human LY-6 superfamily members was previously partially annotated (Ribas et al, 1999; and the MHC Consortium 1999, both cited above).
  • the human LY6G6C, LY6G6D and LY6G6E genes were predicted to have three coding exons (Accession no. AF129756) (Fig. 2).
  • Human LY6G6D was noted as potentially having an alternative splice acceptor site upstream of the exon III splice site (Fig.
  • LY6G6E a complete Ly-6 protein could be encoded if a non- consensus splice site is used (37523GT instead of 37522AG in AF129756) at the 5' end of exon III.
  • the LY6G5C gene in humans was predicted to contain four exons (AF129756), and does not contain a C terminal hydrophobic domain (Fig. 2).
  • Fig. 2 We found an ATG codon ⁇ 50 bp upstream of the 5' splice site of exon II, and this extended exon II would encode a signal peptide similar to exon I of the other LY-6 genes.
  • Another potential methionine can also be found 9 bp further upstream of it, but it was not considered in this study.
  • mouse Ly6G5c gene structure by comparisons of the human CDS (LY6G5CA) with the mouse genomic sequence.
  • mouse the equivalent exon II of human LY6G5CA lacks the splice acceptor site at the 5' end of the predicted exon LT, and we found an ATG codon "50 bp upstream as a potential start codon (as for LY6G5CB) (Fig. 2).
  • ATG codon 50 bp upstream as a potential start codon (as for LY6G5CB) (Fig. 2).
  • mouse Ly6G5c only contains three exons and is, therefore, more similar to the predicted LY6G5CB form found in humans.
  • sequence similarity at both the protein (66% identity) and DNA (76% identity) level, between exon I of the human LY6G5CB variant and the analogous exon I of mouse Ly6G5c.
  • the annotated LY6G52? gene in humans (AJ245417) consisted of only two exons, comprising half the LY-6 domain and the C-terminal region, with a long 5' UTR.
  • this annotation was based on a mis-spliced mRNA that retains an intron, and we propose a new intron/exon structure (Fig. 2).
  • Ly6G5b was not annotated, so by comparing the new predicted human CDS with mouse genomic sequence, we predicted the mouse Ly6G5b gene structure (Fig. 2).
  • Exon III in both LY6G55 and Ly6G5b is longer than the equivalent exons in the other Ly-6 genes in this cluster, encoding an extra domain with no known function (Fig. 2).
  • the five MHC class III region LY-6 superfamily members are predicted to have similar exon/intron structures (Fig. 2).
  • the predicted intron sizes of this cluster of LY-6 genes vary considerably from gene to gene (Fig. 2).
  • the sizes of predicted exons are more conserved, with exon I (signal peptide) being 52-58 bp in length except in LY6G5C (112-121 bp).
  • the length of the predicted exon II (first half of Ly-6 domain) would also be conserved being 111-138 bp for each gene except LY6G5C, where it is longer at 165-168 bp.
  • the predicted length of exon HI (Ly-6 domain and C-terminal hydrophobic domain) would also be conserved for the LY6G6C, LY6G6D, and LY6G6E genes, being 200-224 bp.
  • the predicted exon III of LY6G5C (exon IV in LY6G5CA) is shorter, being 164 bp and lacks the C-terminal hydrophobic domain, and in LY6G5B, the predicted exon III is much longer (398-419 bp) due to an extra domain (Fig. 2).
  • the predicted coding sequence of each MHC class III region LY-6 gene was compared with the human and mouse EST databases at the National Center for Biotechnology Information (NCBI) (http : //www . ncbi . nlm. nih. gov/BLAST) , and the matching EST clones identified.
  • NCBI National Center for Biotechnology Information
  • LY6G6C, LY6G6E, and LY6G5B genes also showed similarity to ESTs from bull and pig.
  • the IMAGE (Lennon et al, 1996 Genomics 33, 151-152) clones were obtained from the MRC UK HGMP Resource Centre (http://www.hgmp.mrc.ae.uk).
  • the clones were confirmed by full sequence analysis, using either gene-specific primers (Tables 1A and IB) or the vector primers [M13 forward primer (5'- GTAAAACGACGGCCAGT-3') and either the T3 reverse primer (5'- ATTAACCCTCACTAAAG-3') or the PT7T3D reverse primer (5'- TAGGGAATTTGGCCCTCGAG-3')] using the BigDye Terminator sequencing kit (ABI).
  • the sequences obtained were compared against the NCBI non-redundant (nr) database, and the published genomic sequences [Accession No.: AF129756 (human) and AF109905 and AF109719 (mouse)].
  • RNA isolation from the human cell lines K562 (erythroleukemia), U937 (monocyte), HL60 (monocyte), Molt4 (T cell), Jurkat (T cell), Raji (B cell), 143B (fibroblast) and HeLa (epithelial) was performed as described previous (Aguado et al, 1999 Biochem. J. 341, 679-689). Briefly, total RNA was extracted- from cells (RNAzolTMB method), and polyA + RNA (Pharmacia Quick Prep kit) was extracted from the total RNA. Total RNA from human tissues was obtained from Stratagene.
  • Mouse polyA + RNA was obtained using a Quickprep Micro mRNA Purification kit (Amersham Pharmacia Biotech), using either ⁇ 5xl0 7 cells from the cell lines L929 (fibroblast), RAW-264 (macrophage), WEHI-3B (monocyte), WEHI-231 (B cell), and EL4 (T cell lymphoblast) or " O.lg tissue from an adult mouse (129 strain), following the manufacturer's instructions.
  • Oligo-dT primed cDNA synthesis was carried out using the Reverse Transcription System (Promega) using approximately l g RNA (total or poly A + RNA) in a 20 ⁇ l reaction volume following the manufacturer's instructions. Nested PCR was then performed using gene specific primers (Table 1) to obtain the full-length cDNA from the predicted start codon (ATG) to the stop codon. Primer sequences were based on both the genomic DNA sequence (Lee Rowen, see Accession numbers: AF129756, AF109905 and AF109719) and the published gene predictions, except for the LY6G5C and LY6G5B genes.
  • RNA sequence For the first round PCR, primers aligning in the 5' and 3' UTRs were used (5UTR1 and 3UTR1). For the second round of PCR, primers aligning with the potential start codon (RT2) and the predicted stop codon (RTl) were used.
  • RT2 primers aligning with the potential start codon
  • RTl predicted stop codon
  • human LY6G5C two different PCRs were performed. In one case the forward primer G5CRT3 (RT3), aligning with the annotated exon I of LY6G5C (LY6G5CA form), and the primer G5CA5UTR1, aligning in the potential 5' UTR, were used.
  • the forward primer G5CRT2 aligning with the ATG codon of a new predicted exon I of LY6G5C (LY6G5CB form), and the primer G5CB5UTR1, aligning in the potential 5' UTR, were used.
  • the primers for LY6G5B align with a new predicted LY6G5B ATG codon (RT2) and the stop codon (RTl), while the UTR primers align just 5' of the new predicted LY6G5B start codon (G5B5UTR1) and 3' of the stop codon (G5B3UTR1).
  • RT2 forward primer G5CRT2
  • the first round PCR product was cleaned (Qiagen PCR columns) and 3 l of a 1 in 10 (v/v) dilution was then used for the second round PCR reaction.
  • human tissue 1.5 ⁇ l of a 1 in 3 (v/v) dilution of the first round PCR reaction was used in the second round reaction.
  • the conditions were as follows: 95° for 2 min followed by 35 cycles of 95° for 45s, 60° for 30s, 72° for 1 min, followed by 72° for 5 min.
  • 0.5/d of the first round PCR reaction was used in the second round reaction.
  • mouse cell line RNA the PCR cycling was as for the human RNA.
  • both rounds of PCR were as follows:
  • PCR reactions contained 2mM MgCl 2 , 0.2mM dNTPs, 0.4 ⁇ M each primer and 0.75U Taq polymerase (Promega) in a total volume of 25 ⁇ l.
  • a negative control the PCR was performed without the addition of a cDNA template.
  • the products obtained were then isolated either by gel extraction (Qiagen) or using Qiaquick PCR columns (Qiagen), and either sequenced directly with gene-specific primers, or else cloned into the pGEM-T vector (Promega) for sequencing.
  • pGEM clones were sequenced with vector primers [M13F] (described above) and SP6 (5'-TAGGTGACACTATAGAATAC-3')]. All the sequence analysis was performed as described above. The sequences obtained were compared to the published sequences using the BLAST algorithm at NCBI.
  • Human LY6G6D in contrast, showed multiple bands by RT-PCR, with different expression patterns in different cell lines and tissues (Fig. 3). These bands were cloned into pGEM and the different products sequenced.
  • This band corresponds to a 493 bp mis-spliced form of LY6G6D, which introduces a stop codon before the Ly-6 domain.
  • This splice form does contain a long in-frame translation with no significant homology to sequences in the non-redundant (nr) database.
  • the upper band (501 bp) contained the entire intron between exons LT and III, introducing premature stop codons.
  • No EST clones were available for LY6G6E.
  • LY6G5C For LY6G5C, two sets of RT-PCR were performed, one for LY6G5CA and the other for LY6G5CB.
  • the RT-PCR produced an intense "550 bp band in the Molt4 and Jurkat cell lines and a much fainter band in Raji and HeLa, and a " 650 bp product in fetal kidney (Fig. 3 and Table 4A).
  • the expected size is 447 bp and upon sequencing, we found the " 550 bp band to be a splice variant of LY6G5CA (557 bp), matching part of the sequence in human IMAGE clones 1504964, 3476011 and 3368609 (Table 2, Fig. 3 and Table 4A).
  • the 1428 bp band present in fetal spleen and all cell lines was a mis-spliced form of LY6G5CB, introducing premature stop codons (Fig. 3 and Table 4A).
  • the 900 bp band produced in fetal brain was a non-specific amplification product.
  • LY6G5C a total of eight splice forms were found by EST analysis, of which many showed similarity to the 557 bp form of LY6G5CA, (Table 2) and these all introduce premature stop codons or have ORFs with no significant similarity to sequences in the nr database.
  • LY6G5B For LY6G5B, RT-PCR analysis showed two bands, one at " 600 bp (expected size) and another at " 750 bp, in all cell lines (Fig. 3). In tissues, human LY6G5B had a more restricted pattern of expression, with the expected band of " 600 bp present only in fetal brain and adult lung, and faintly in fetal liver and fetal kidney (Fig. 3). All other tissues produced the band at "750 bp, and in some, we saw a band corresponding to genomic size ( " 1 kb) (Fig. 3 and Table 4A). When sequenced, the lower band (606 bp) was the expected product.
  • the upper band (753 bp) introduced premature stop codons, and corresponds to the cDNA previously described [22] and found in IMAGE clone 2139582 (Table 2).
  • Mouse Ly6G6d in contrast, showed different expression by RT-PCR in different cell lines and tissues (Fig. 4).
  • a product of "400 bp was expected, but all cell lines, as well as lung and kidney, showed only a " 500 bp product. This was a 505 bp mis-spliced product that contained the intron between exons I and II and changed the reading frame of the protein and introduced stop codons (Fig. 4 and Table 4B).
  • Adult brain and both fetal samples showed bands at " 400 bp and " 500 bp, with the band of 408 bp being the product of the expected sequence (Table 4B).
  • RT-PCR of Ly6G6e showed three bands.
  • the expected " 400 bp band and a faint additional higher band of " 500 bp was seen in all cell lines (Fig. 4 and Table 4B).
  • the lower band contained an additional 18 bp 5' of exon III compared to the predicted sequence (405 bp in total), utilizing a splice site upstream of the predicted one (similar to what was found in humans), thus introducing six extra amino acid residues and maintaining the translation frame. In humans, this introduces a stop codon, but due to base differences, no stop codon is introduced in mouse.
  • the upper faint band of 501 bp is a splice form that contains the intron between exons II and III but maintains the reading frame and introduces an additional 32 amino acid residues with no significant similarity to sequences in the nr database.
  • the third band of " 1 kb represents genomic DNA contamination of the RNA (or totally unspliced mRNA) (Fig. 4).
  • RT-PCR produced only one band of " 450 bp (expected size) in all the cell lines, adult brain and both the fetal samples (Fig. 4 and Table 4B). When sequenced, this band was the product of the expected sequence (450 bp).
  • the "3.6 kb band present in some samples represents the genomic product or unspliced mRNA.
  • RT-PCR analysis of Ly6g5b showed different expression in different cell lines with only WEHI-3B showing a product of the expected size ( " 600 bp).
  • a band of " 700 bp was present in all the cell lines (Fig. 4 and Table 4B).
  • the lower band (585 bp) was the product of the expected sequence and the upper band (681 bp) proved to contain the entire intron between exons I and II, introducing premature stop codons.
  • the expected Ly6G5b product (585 bp) was present only in lung and faintly in kidney (Table 4B).
  • tissue samples also showed a " 700 bp band, which was found to be the 681 bp mis-spliced form found in the cell lines and was also detected in liver, brain and both embryo samples.
  • the 400 bp band seen in 12.5dpc embryo is a non-specific product.
  • the "1 kb band present in all tissue samples, and in L929 and RAW264 represents the genomic product or unspliced mRNA.
  • the MHC region is extremely highly conserved between human and mouse and this has helped us to elucidate the exon/intron structure of the mouse MHC class III region Ly-6 family members where these were previously not predicted by computer programs. From comparisons between genes and between species, it can be seen that the Ly6G6c, Ly6G6d and Ly6G6e genes show a conserved pattern of exon structure and protein domains, while the Ly6G5c and Ly6G5b genes have different patterns of exon structure and different N and C terminal protein domains. Mouse and human exon-intron structures are very conserved for each member, and interestingly intron lengths are also relatively similar (though generally shorter in mouse), except for Ly6G5c, where the intron sizes vary considerably between both species.
  • Ly-6 superfamily members such as the murine cluster on mouse chromosome 15
  • Ly-6 MHC class III region genes it was not possible to obtain a correctly spliced full-length IMAGE cDNA clone.
  • Ly6G6c two proper full-length clones were available for Ly6G6c, covering both 5' and 3' UTRs. We observed a big difference in the availability of clones per gene between species.
  • Ly6G5c which has fourteen IMAGE clones in humans and none in mouse, possibly indicative of differential expression in mouse and humans, or of the smaller number of ESTs currently available from mouse.
  • the information from dbEST is not sufficient as, for Ly6G6c and Ly6G5c, the splice variants were only found when the IMAGE clones were totally sequenced.
  • There is a wide tissue distrbution of ESTs in both human and mouse with the genes seemingly expressed in both immune relate and non-immune related tissues.
  • Comparison of the seven alternatively spliced forms found in the EST clones with the fourteen alternative splice forms found by direct cDNA analysis showed that only three were found by both methods.
  • mis-spliced forms in both human and mouse normally retain the same intron (usually the first one), or part of the intron (also the first one) .
  • These intronic sequences produce changes in the reading frame of the protein, and/or introduce stop codons, except for the 677bp splice form of LY6G5CA found in human fetal kidney, and the 501bp splice form of mouse Ly6G6e.
  • the only cases that retain the most 3' intron are the very poorly expressed 723bp form of mouse G6c (in some poly A + RNA samples and both total RNA samples), and the 501bp form of the potential human pseudogene LY6G6E.
  • Nonsense-mediated decay is a quality-control mechanism that checks newly made messages, and degrades them rapidly. Most NMD is cytoplasmic, but a fraction seems to be nuclear (Maquat & Carmichael 2001 Cell 104, 173-176). Pre-mRNA processing is thought to occur co-transcriptionally, with the mRNA cap structure promoting removal of the first intron (Proudfoot et al, 2002 Cell 108, 501-512), and splicing and polyadenylation factors found coincident with RNA polymerase II staining. If the mis-spliced products we have identified were splicing intermediates, they should be quickly degraded.
  • mis-spliced forms especially those containing the first intron
  • stability of the mis-spliced forms based on the presence and abundance of the RT-PCR products.
  • This together with the retention of the first intron (or part of it), suggests that our mis-spliced forms are real transcripts and further that they might have a potential regulatory function.
  • These transcripts could indicate the presence of a regulatory mechanism whereby the mis-spliced forms compete in some way with the correctly spliced forms for interaction with the translation and export machinery, adding an additional layer of control to the expression of these genes.
  • mouse Ly-6C can have different functions depending on the cell type in which it is expressed (Yamanouchi et al, 1998 Eur. J. Immunol. 28, 696-707), and there is also evidence to suggest that transcription initiation and splicing of Ly-6 superfamily members can be altered when cells are stimulated with cytokines (Khodadoust et al, 1999 J. Immunol. 163, 811-819).
  • mis-splicing of the MHC class III region LY-6 genes may act as a regulatory mechanism for gene expression at the transcriptional level, and that different cytokines could act at another level of control on splicing, initiation of transcription, or the expression in different cell types of these LY-6 genes. It would be interesting to perform similar analyses with the other LY-6 superfamily members in the genome, to elucidate whether all follow the same exon/intron structure pattern, and more interestingly whether they present differential splicing to the same degree as this MHC class III region LY-6 cluster.
  • the inventors wished to establish the subcellular location of the Ly-6 transcripts.
  • Subcellular localisation of the transcripts was analysed by cell fractionation followed by RT-PCR.
  • Cell fractionation was performed using a modification of the RNeasy mini kit (Qiagen) protocol.
  • Cell membranes were lysed by incubation in a lysis buffer (lOmM Tris-HCl, lOOmM NaCl, l.SmM MgCl 2 , 0.5% (v/v) NP-40, lmM DTT, 1000 U/ml RNasin) for 5 minutes on ice. Samples were then centrifuged for 2 minutes at 600g to pellet nuclei.
  • ⁇ -actin, ⁇ -globin and GAPDH transcripts were used as a measure of abundance and RNA quality.
  • C-myc transcripts were used as a control for unstable mRNA transcripts (Rodgers et al, 2002 RNA 8, 1526-1537).
  • the intron of c-myc was used as a control for genomic contamination of the cytoplasmic RNA fraction.
  • RNA isolation was used for RNA isolation.
  • a low salt buffer lOmM Tris-HCl, lOOmM NaCl, 1.5mM MgCl 2 , 0.5% (v/v) NP-40, ImM DTT, lOOOU/ml Rnasin
  • the RNeasy kit Qiagen was used for RNA isolation.
  • An equivalent amount of the RNA obtained from each fraction was used for oligo-dT primed cDNA synthesis and was performed using the Reverse Transcription System (Promega) in a 20 ⁇ l reaction volume following the manufacturer's instructions, with l ⁇ l of cDNA being used in each PCR reaction.
  • RT-PCR reactions contained 2mM MgCl 2 , O. ⁇ mM dNTPs, 0.4 ⁇ M each primer and 0.75U Taq polymerase in a 25 ⁇ l reaction volume.
  • the PCR conditions were as follows: 95°C for 2 min followed by 35 cycles of 95°C for 45s, 60°C for 30s, 72°C for 30s, followed by 72°C for 5 min.
  • the primers used amplified across the retained intron. While RT-PCR is not fully quantitative, it was possible to measure levels of each splice form relative to other conditions, therefore giving a qualitative result, especially as only one round of PCR was performed.
  • Figure 6 comprises a series of photographs showing gel electrophoretic separation and analysis of the RT-PCR products from samples using primers specific for (a) Actin or ⁇ - globin genes; (b) GAPDH; (c) C-myc; (d) C-myc intron; (e) and (f) LY6G6D and LY6G5B in either K562 or Raji cells respectively.
  • the order of samples between the marker ladders is nuclear fraction time Ohrs, 1, 2, 4hrs; cytoplasmic fraction 0, 1, 2, 4 hrs; negative control.
  • RT-PCR products of mis-spliced transcripts are indicated by arrows.
  • the stability of the actin, ⁇ -globin and GAPDH transcripts indicated that the RNA was not degraded by the treatment or the RNA isolation procedure, while the decay of the c-myc transcript indicated that the actinomycin D treatment was effective.
  • the lack of c-myc intronic product in the cytoplasmic fraction showed there was minimal genomic contamination of the cytoplasmic fraction.
  • the inventors also decided to investigate the effect of translation on stability of the transcripts, as translation of mRNA has been shown to be requhed for NMD to become effective (see Hilleren & Parker 1999 Annu. Rev. Genet. 33, 229-260 and Buhler et al, 2002 EMBO Reports 3, 646-651), probably for recognition of premature termination codons.
  • K562 cells were treated with 300 ⁇ g/ml of either cycloheximide (“CHX”) or puromycin (“PUR”), both translational inhibitors.
  • CHX cycloheximide
  • PUR puromycin
  • Cycloheximide inhibits the peptidyl transferase on the large subunit of the eukaryotic ribosome, while puromycin is a tRNA analogue that causes premature chain termination. If the stability of a mis-spliced transcript is greatly increased relative to that of the correctly spliced form (observed as an increase in PCR product), following inhibition of translation, it indicates that the transcript is subject to NMD.
  • FIG. 7 is similar to Figure 6 and comprises a series of photographs showing gel electrophoretic separation and analysis of RT-PCR products from samples using primers specific for: 7(a) actin or ⁇ -globin; 7(b) GAPDH 7(c) C-myc; 7(d) C-myc intron; 7(e) and (f) LY6G6D or LY6G5B in K562 cells treated with CHX (Fig. 7e) or PUR (Fig. If).
  • Ly6G6d and Ly6G5b genes are not closely related. Additionally, this weak motif is not present due to a bias in base composition, as scrambled sequence with an identical base composition found this motif with a much-reduced probability (P value of 8.4e "5 ).
  • transcripts are cytoplasmic and are not subject to NMD, and could therefore potentially have a regulatory function physiologically. They could therefore potentially be used to stabilise transcripts for therapeutic purposes by insertion e.g. in the 5' or 3' UTRs of genes of interest, or else as an intron to create "competing" mRNA species to reduce production, export or translation of the correctly spliced form of the gene of interest.
  • intronic sequences to stabilise a luciferase gene reporter construct was examined.
  • the intronic sequences were inserted into either the 5' or 3' UTR of the luciferase gene in the pGL3 construct and their effect on luciferase gene expression measured using luminescence.
  • the inventors employed a luciferase assay to try and determine whether these intronic sequences were capable of stabilising other gene transcripts.
  • the method was adapted from that described by Kubota et al (2000, Oncogene 19, 4773-4786) based on the Dual luciferase system (Promega) and cell lysates.
  • the inventors used instead the Dual-Glo luciferase assay system (Promega). These two methods are very similar except that the Dual-Glo system gives up to two hours of stability after the reagents are added in order to be able to read entire plates, rather than the rapidly decreasing activity of the normal luciferin as used in the dual luciferase system.
  • the other advantage of the Dual-Glo method is the fact that there is no need for cell lysates, and instead the reagent can be added directly onto the cells and before the luminescence is read.
  • Hek293T cells human embryonic kidney cells
  • 200ng DNA comprising both the renilla construct and a firefly luciferase construct
  • transfection reagent (2a) The 25ul of transfection reagent (2a) was added to each well, swirled and incubated for
  • NB- volumes and weights are given per well and are usually done in triplicate; therefore multiply all by 3 and aliquot 25ul into each of three wells.
  • Dual-glo luciferase buffer was added to dual-glo luciferase reagent (as per manufacturer's instructions), mixed well, and 1 volume (i.e. 75ul) was aliquotted into each well
  • vector constructs used in this assay system were as follows ( Figure 9A): rLUC vector containing the renilla luciferase ORF (control for transfection efficiency) pGL3 vector containing the firefly luciferase open reading frame (ORF) (control for firefly luciferase readings)
  • VEGFA 3' UTR (known to cause instability under normal oxygen conditions (Claffey et al, 1998 Mol. Biol. Cell. 9, 469-481)) inserted 3' of the firefly luciferase ORF (in the pGL3 vector).
  • pGL3 vector containing 150bp of the c-myc intron (part of the same intron that was used as a control for genomic DNA contamination of the cytoplasmic RNA fraction) inserted either 5' of the firefly luciferase ORF (chn construct) or 3' of the firefly luciferase ORF (cpe construct) pGL3 vector containing the human LY6G6D first intron (91bp) inserted either 5' of the firefly luciferase ORF (dhn construct) or 3' of the firefly luciferase ORF (dpe construct) pGL3 vector containing the human LY6G5B first intron (147bp) inserted either 5' of the firefly luciferase ORF (bh constructs) or 3' of the firefly luciferase ORF (bpe construct)
  • the rLUC vector was co-transfected with one of the pGL3 constructs.
  • Constructs containing intronic sequence were made by PCR amplifying the intron from genomic DNA using gene specific primers containing different restriction sites.
  • c-myc and LY6G6D the Hindlll and Ncol sites in the pGL3 vector were used to clone the fragment at the 5' end of the luciferase ORF, while for LY6G5B only the Hindlll site was used as the insert contained an Ncol site.
  • the Pstl and EcoRV sites were used to clone the fragments at the 3' end of the luciferase ORF.
  • the c-myc intron was chosen as a control because it was known not to be retained in the c- myc mRNA transcript detected in the RNA stability experiments. Only part of the intron was used to act as a control to see the effect of inserting a ⁇ 150bp fragment of DNA on the stability/expression of the luciferase gene product. Experiments were performed in triplicate to ensure reproducibility, while the luciferase controls (the rLUC and pGL3 vector co- transfections) were performed in triplicate three times. The RRR is calculated based on an average of all 9 control experiments (which is why each individual triplicate varies slightly from the 100% normalised value (Figure 9B)).
  • the VEGFA construct was used as a control for destabilisation of the luciferase expression and therefore activity (decrease to ⁇ 30% of normal ( Figure 9B)).
  • the results show that when the intronic sequence or part of it is inserted 3' of the luciferase ORF there is some stabilisation, but as this is seen with the cpe control construct as well as the Ly-6 constructs (dpe and bpe constructs), this shows this effect is not gene specific (Figure 9B).
  • the bh6 and bhll clones are cloned in the "correct" orientation, in the same orientation as would be found in the mis-spliced mRNA species, while the other bh constructs contain the intronic sequence cloned in the opposite orientation.
  • These "opposite" orientation clones show a marked decrease in stability and luciferase expression, indicating that the stabilising effect is orientation-dependent.
  • nucleic acids comprising a regulatory sequence in accordance with the invention.
  • Delivery of the corrective nucleic acid to cells in vitro or in vivo may be classified as adopting either a viral or a non- viral approach.
  • Non viral approaches would include: 1) direct transfer of naked DNA molecules, 2) transfer of DNA complexed with cationic lipids, 3) transfer of particles comprising DNA condensed with cationic polymers or liposomes, or 5) physical methods involving needle- free injectors and electroporation.
  • Viral vectors would include: 1) Retroviral vectors (including lentivirus), 2) adenoviruses and adeno-associated viruses, 3) herpes simplex, and 4) poxviruses, such as vaccinia virus.
  • Retroviral vectors including lentivirus
  • adenoviruses and adeno-associated viruses such as vaccinia virus.
  • herpes simplex such as herpes simplex
  • poxviruses such as vaccinia virus.
  • the DNA will be delivered to primary cell types (or cell lines) that are defective in the expression of a particular target gene or that are over expressing a target gene. These cell types (or cell lines) may be blood cells, fibroblast cells, or tissue specific cells of any type.
  • HeLa cells HeLa cells (epithelial) grown in Dulbecco's modified Eagle's medium (DMEM) with 10% (v/v) foetal bovine serum (FBS) and 100 IU/rnl penicillin and lOO g/ml streptomycin. Trypsinise.
  • DMEM Dulbecco's modified Eagle's medium
  • FBS foetal bovine serum
  • the gene, or splice form, of interest would be expressed in a retroviral system, such as RetroXpress System (Clontech) following the manufacturer's instructions. Essentially the gene of interest will be cloned into the multiple cloning sites of the RetroXpress vectors.
  • the retroviral vector carrying the gene of interest would be transfected into the RetroPack PT67 packaging cell line leading to production of retroviral particles (10 5 -10 6 ffu/ml) followed by antibiotic selection and isolation of clones and production of high-titer clones (10 6 -10 7 ffu/ml).
  • the packaged virus will be used to infect target cells with high efficiency and expression of the target gene would be monitored by PCR. For gene transfer protocols in humans, examples followed in trials can be found at www4.od.nih.gov/oba.
  • a tumour nodule will be removed from a patient and the lymphocytes infiltrating into this cancer will be cultured and expanded in vitro following standard conditions. These lymphocytes will be incubated with a retroviral vector (containing the genes of interest) once they have reached the log phase growth. These cells will be expanded in culture until approx 4xl0 10 cells were obtained. These lymphocytes will be cryopreserved in aliquots and tested to ensure that they are free of replication competent virus and that they express the gene of interest. These lymphocytes will be administered to the cancer patient. After administration, samples of tumour biopsy material would be tested for the presence of the gene of interest and what effect it had on the tumor.
  • LY6G6C (AJ315533, AJ315534); LY6G6D (AJ315535, AJ315536, AJ315537); LY6G6E
  • Ly ⁇ g ⁇ c (AJ315546, AJ315547); Ly6g6d (AJ315548, AJ315549); Ly6g6e (AJ315550,
  • LY6G6C H03135 151586 placenta EII, EIII, 3'UTR LY6G6C R25237 132305 placenta EIII, 3'UTR LY6G6C AI127339 1708589 foetal heart new El (in intron EI-EH), EH, EIII, 3'UTR
  • Ly6g6c W12301 314768 total foetus part 5'UTR, El, EII, EIII, 3'UTR 381 bp
  • Ly6g6c AW213479 2259239 embryo part 5'UTR, El, EII, EIII, 3'UTR 381 bp
  • Table 4 A Summary of Human RT-PCR Analysis.
  • Table 4B Summary of mouse RT-PCR analysis.

Abstract

Disclosed is an isolated nucleic acid molecule which comprises at least part of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal or a corresponding sequence, and a method of regulating the expression of a gene using the aforementioned molecule.

Description

Title: Improvements in or Relating to Regulation of Gene Expression
Field of the Invention
The present invention relates to isolated nucleic acid molecules, vectors and host cells comprising said molecules, and in vivo and in vitro methods of regulating the expression of a gene.
Background of the Invention
The Lymphocyte Antigen 6 (LY-6) protein domain is approximately 80 amino acids long, characterized by a conserved pattern of 8-10 cysteine residues that have a defined pattern of disulphide bonding. Most members of the LY-6 superfamily are extracellular GPI (glycosyl phosphatidylinositol) anchored proteins, such as CD59, the uro inase plasminogen activator (uPA) receptor (uPAR) and sperm acrosomal protein (SP-10) (reviewed by Palfree, 1996 Tissue 48, 71-79).
Some secreted members of the superfamily lacking the GPI anchor, such as the snake neurotoxins (Fleming et al, 1993 J. Immunol. 150, 5379-90), and human SLURP-1 (secreted LY-6/uPAR related protein 1) (Andermann et al, 1999 Protein Sci. 8, 810-819) have also been described. LY-6 superfamily members have been identified in a number of different organisms, including a wide variety of mammalian species such as human, mouse, rat, fox and baboon; amphibian species such as newt and frog; and in invertebrates such as squid and C. elegans (Chou et al, 2001 Genetics 157, 211-224). The biological function of the LY-6 superfamily members in mammals is not known, except for CD59, which is an inhibitor of the complement cascade, inhibiting the formation of the membrane attack complex (Davies et al, 1989 J. Exp. Med. 170, 637-654), and uPAR, which plays an important role in proteolysis of extracellular matrix proteins (Tarui et al, 2001 J. Biol. Chem. 276, 3983-90). Genes of the LY-6 superfamily are frequently found in clusters. On human chromosome 8 (8q24-qter), a cluster of five LY-6 family members has been identified, (Brakenhoff et al, 1995 J. Cell Biol. 129, 1677-89) containing E48, RIG-E (Retinoic acid induced gene E) (Mao et al, 1996 Proc. Natl. Acad. Sci. USA 93, 5910-14), PSCA (Prostate Stem Cell Antigen) (Reiter et al, 1998 Proc. Natl. Acad. Sci. USA 95, 1735-40), LY-6H (Apostolopoulos et al, 1999 Immunogenetics 49, 987-990) and SLURP-1. On mouse chromosome 15 there is a cluster of nine Ly-6 family members, containing ThB (Gumley et al, 1992 J. Immunol. 149, 2615-18), TSA-1/Sca2 (MacNeil et al, 1993 J. Immunol. 151, 6913-23), Ly-6a/e, Ly-6c, Ly-6f, Ly-6g, (Gumley et al, 1995, Immunol. Cell Biol. 73, 277-96) Ly-6h (Horie et al, 1998 Genomics 53, 365-368), Ly-6i (Pflugh et al, 2000 J. Immunol. 165, 313-321) and Ly-6m (Patterson et al, 2000 Blood 95, 3125-32). This mouse region is syntenic to human chromosome 8q24. Human E48 and mouse ThB, human RIG-E and mouse TSA-l/Sca-2, and human and mouse Ly-6h are thought to be orthologous genes. The human orthologues of the remaining members of the murine cluster have not yet been identified. The members of this murine cluster have been well studied, and a possible role for these LY-6 family members in T cell activation, differentiation and maturation has been suggested. In addition it has been shown that Ly- 6c can regulate endothelial adhesion and homing of T cells by activating integrin dependent pathways (Hanninen et al, 1997 Proc. Natl. Acad. Sci. USA 94, 6898-6903) and that Ly- 6A has a role in mediating cell-cell adhesion (Bamezai et al, 1995 Proc. Natl. Acad. Sci. USA 92, 4294-98).
A novel LY-6 cluster has been described in the class III region of the Major Histocompatibility Complex (MHC) (Ribas et al, 1999 J. Immunol. 163, 278-87; The MHC Consortium 1999 Nature 401, 921-923). The human MHC is located at chromosome 6p21.3, and is "4Mb in length. It consists of three regions: class I and class II, which flank a central class III region. The complete sequence of the human MHC has been determined and 224 genes have been identified so far, of which approximately half are predicted to be pseudogenes. Of the expressed genes, "40% have an immune related function. Many of the genes located in the class I and class II regions code for the highly polymorphic cell surface proteins involved in antigen presentation to T cells during an immune response. Interspersed between these genes are other genes that encode proteins with a variety of different functions in the immune and inflammatory responses such as the TAP and LMP genes. The central class III region is ~0.8Mb in length and contains 59-63 genes and 0-2 pseudogenes, depending on the haplotype. Of the predicted genes, at least 24 (41 %) have a definite or potential role in the immune system. The mouse MHC class III region, located at chromosome 17, has also been completely sequenced in the inbred laboratory mouse 129/Sv strain (Lee Rowen, http://clrroma.mbt.washington.edu/msg_www/PROJECTS/mmhc.html), showing a complete conservation of all class III genes as well as conservation of gene order.
The Y6G6C, LY6G6D, LY6G6E, LY6G5C and Y6G5B genes are predicted to be members of the LY-6 superfamily of proteins based on translations of the predicted gene sequences.
One of the fundamental distinctions between the structure of prokaryotic and eukaryotic genes is the existence in the latter of introns. These are non-coding sequences which are initially represented in transcripts but which are generally removed by "splicing" to leave a mature mRNA molecule containing "exons" (i.e. the coding sequences which are translated into a polypeptide).
Occasionally, erroneous splicing events take place, resulting in the formation of an mRNA molecule which may retain intronic sequences. Translation of such erroneously-spliced mRNA molecules, retaining intronic sequences, will not usually result in the formation of a functional polypeptide, since the retained intronic portions will generally cause a shift of the reading frame (leading to synthesis of a nonsense polypeptide) and/or include a premature stop codon and so cause premature termination of translation (leading to synthesis of a truncated polypeptide).
There exists in eukaryotic cells a mechanism to degrade many erroneously-spliced RNA variants, which mechanism is referred to as "Nonsense-mediated decay" (NMD). This process of NMD recognises mRNAs containing premature stop codons and rapidly degrades these mRNAs.
The content of the publications cited in this specification is incorporated herein by reference.
Summary of the Invention
In a first aspect the invention provides an isolated nucleic acid molecule which comprises at least part of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from animal, or a corresponding sequence. The present inventors have surprisingly found that inclusion of such a sequence in an RNA transcript markedly increases the stability of the transcript.
The term "at least part of the sequence of intron 1" as used herein is intended to refer to a portion of at least 10 contiguous bases of the intron 1 sequence, preferably at least 20, more preferably at least 30, and most preferably at least 40 contiguous bases. The isolated nucleic acid molecule of the invention may comprise other portions of a human LY-6 superfamily gene, such as portions of other introns or other 5' portions of the gene such as portions of the first and or second exons, but specifically does not encompass nucleotide sequences which include the whole of a LY-6 superfamily gene. In particular embodiments substantially the only LY-6 superfamily gene sequence present in the isolated nucleic acid molecule of the invention is representative of, derived from, or identical to, part or all of the intron 1 sequence.
For present purposes, an LY-6 superfamily gene is considered as one which codes for an LY6 domain-containing polypeptide, and a homologous gene is one which is obtainable from an animal (conveniently a mammalian) and, when compared using a commercially available sequence alignment program, (such as Clustal/BLAST) exhibits at least 25% sequence identity (preferably at least 40%, more preferably at least 50% and most preferably at least 60%) with a human LY-6 superfamily gene. Homologues of Ly-6 genes have been found in many different animals. For present purposes, a "corresponding sequence" is a sequence of 10-150 nucleotides, preferably 20-100 nucleotides, more preferably 30-100 nucleotides, and most preferably 40-100 nucleotides, and which, when compared using a commercially available sequence alignment program (such as Clustal/BLAST) exhibits at least 25% sequence identity (preferably at least 40%, more preferably at least 50% and most preferably at least 60%) with intron 1 of LY6G6D or LY6G5B.
An LY-6 domain may be defined as one which has a conserved pattern of disulphide bridges (as illustrated schematically in Figure 5) formed by 10 cysteine residues in which:
Cl forms a bridge with C5 C2 forms a bridge with C3 C4 forms a bridge with C6 C7 forms a bridge with C8 and C9 forms a bridge with CIO
Other characteristics which will normally (but not invariably) be found in a LY-6 domain are - (i) a typical length of 75-85 amino acid residues and (ii) the presence of a glycosyl phosphatidylinositol ("GPI") membrane anchor "recognition" sequence.
Examples of genes included in the LY-6 superfamily include the following:
LY6G5B, LY6G5C, LY6G6C, LY6G6D, LY6G6E, CD59, uPAR, SP10, SLURP-1, E48,
RIG-E, PSCA, ThB, TSA-1/Sca2, and all LY-6 genes (such as LY-6A to I etc.).
For brevity an LY-6 superfamily gene will be referred to hereafter as a "LY-6 gene" and the term "LY-6 gene" should therefore be construed accordingly unless the context dictates otherwise. The intron 1 portion of any LY-6 gene or animal homologue thereof can readily be identified, with the benefit of the present disclosure, using standard techniques and sequence comparisons within the ambit of those skilled in the art.
It will be understood by those skilled in the art that the isolated nucleic acid molecule may contain a sequence which deviates slightly from a base sequence identical to that of intron 1 of a LY-6 gene or mammalian homologue thereof. For example a number of bases may be substituted, inserted, inverted, deleted (or any combination thereof) without substantially disrupting the functionality thereof. Indeed, some changes to a naturally- occurring intron 1 sequence may be desirable in order to optimise the NMD-resistance phenomenon observed by the present inventors. Typically the part or whole of the intron 1 sequence present in the isolated molecule of the invention will be at least 80% identical with a naturally-occurring intron 1 sequence, preferably at least 85% identical, more preferably at least 90% and most preferably at least 95% identical. Preferably the isolated nucleic acid molecule of the invention will comprise all or part of intron 1 of the LY6G6D or LY6G5B genes, or have the aforementioned degree of sequence identity with intron 1 of those genes.
In particular the inventors have identified a 45 nucleotide consensus sequence which appears to be conserved to some extent within the LY-6 superfamily. This sequence is shown in Figure 8 A and is labelled as "multilevel consensus sequence" . Variants of the consensus sequence may comprise one or more of the alternative bases shown in Figure 8 A immediately below the consensus sequence. The isolated nucleic acid molecule of the invention may conveniently comprise the multilevel consensus sequence or one of its variants. Initial analysis by the inventors has indicated that the consensus sequence may be present in an intron in either orientation (see Figure 8B).
The isolated nucleic acid molecule of the invention may be prepared by any conventional means, such as cloning of restriction digest fragments containing relevant portions of a LY-6 gene or mammalian homologue thereof, by PCR or de novo synthesis in vitro or any other known method. The isolated molecule may be single stranded or, more preferably, double stranded. The isolated molecule will typically comprise or consist of DNA, but may additionally or alternatively comprise or consist of RNA, PNA, LNA ("locked" nucleic acid) or other nucleic acid analogues e.g. containing modified bases such as inosine, hypoxanfhine and the like and/or containing modifications to the sugar/phosphate backbone.
The inventors consider that inclusion of at least part of the intron 1 sequence of a LY-6 gene or other sequence in accordance with the first aspect of the invention confers a degree of resistance to the mechanism of nonsense-mediated decay (NMD) which in mammalian cells normally degrades nucleic acid containing premature stop codons. Accordingly, in preferred embodiments, the isolated nucleic acid molecule of the invention is effective, when placed in operable combination with another transcribable nucleic acid sequence, to confer a degree of resistance to NMD in respect of transcripts of the other transcribable nucleic acid sequence. The isolated sequence may be inserted, preferably, after the first exon of a gene to be transcribed, or in the 5' UTR. Less preferably the sequence may be inserted elsewhere (e.g. in the 3' UTR).
In a second aspect the invention thus provides a nucleic acid construct comprising the isolated nucleic acid molecule of the first aspect. The construct will typically be in the form of a replicable vector, such as a plasmid, cosmid, yeast artificial chromosome, replicable viral genome or the like. Such vectors are well known to those skilled in the art and a huge range is readily available.
The construct will typically additionally comprise a further transcribable sequence, which will normally be positioned in operable combination with the molecule of the first aspect.
The construct will typically comprise one or more promoters operable in a eukaryotic, typically mammalian (and preferably human) cell. Such promoters may be promoters e.g. from human genes, or viral promoters. Numerous examples of each type are known and available to those skilled in the art. A nucleic acid construct may comprise the nucleic acid molecule of the first aspect in either orientation - since only two orientations are possible it is a simple task for those skilled in the art to determine the optimum orientation. (The inventors have made some preliminary findings that suggest that, at least in some situations, orientation may be significant.) In addition it may be that more than one copy of the molecule of the first aspect may be included in the construct (e.g. two, three or more copies). For example, it is possible that optimum results might be achieved by replacing every intron of a gene with molecule in accordance with the first aspect. These may be adjacent to each other in the construct or separated. If present in a plurality of copies in a single construct, they may be all in the same orientation or in a variety of orientations. Again, it is routine for the person skilled in the art, with the benefit of the present disclosure, to perform the necessary experiments to identify the optimum copy number and arrangement within the construct.
Other features which may be conveniently varied include: the separation of the intronic sequence of the first aspect from the first exon of the sequence to be transcribed; and the sequence of the intervening bases. At its simplest, the person skilled in the art could simply reproduce the circumstances of an LY-6 gene, but switching the LY-6 gene exons for those of the relevant gene whose expression is to be regulated.
By incorporating or otherwise operably associating a molecule in accordance with the first aspect of the invention with a nucleotide sequence of a gene to be expressed, it should prove possible to regulate the expression of the gene in a eukaryotic cell. Thus in a third aspect the invention provides a method of regulating the expression of a nucleic acid sequence, the method comprising the step of placing the nucleic acid sequence (the expression of which is to be regulated) in operable combination with a regulatory nucleic acid sequence, which regulatory sequence comprises a molecule in accordance with the first aspect of the invention. The invention may allow for either up-regulation of the expression of a gene (especially a poorly-expressed gene) or for the down-regulation of the expression of a gene, as explained below. In order to improve the expression of a poorly-expressed sequence, it may be desirable to associate the sequence with a regulatory nucleic acid sequence which confers a degree of resistance to NMD, as defined in the third aspect of the invention above. Typically this will be achieved by incorporating part or all of the isolated sequence of the invention into the transcribed portion of the nucleic acid sequence to be expressed, by means of recombinant DNA technology. Clearly this must be performed in such a way as to avoid inducing a frameshift or a premature 'stop' codon, which would lead to production of nonfunctional polypeptides or prematurely terminated transcripts. NMD is known to be a very widespread phenomenon, having been documented in many different mammalian cells, in plant cells (Isshiki et al, 2001 Plant Physiol 125, 1388-1395) and in yeast cells. It is possible therefore that the present method of protecting nucleic acids in general, and transcripts in particular, should be generally applicable in all eukaryotic cells.
The method of the third aspect of the invention may be performed in vitro or in vivo. Methods of introducing nucleic acids into cells and tissues in culture in vitro are well- known to those skilled in the art. Such methods include transfection, transformation, transduction, electroporation etc. Any suitable method may be employed. Where the method is performed in vivo, it will generally be necessary to introduce into a human or other mammalian subject a suitable vector (such as a safe viral vector, e.g. a retrovirus, adeno virus, or vaccinia virus).
The invention also provides an embodiment in which the molecule of the first aspect of the invention can be utilised to cause inhibition of expression of a particular selected endogenous target gene or genes: by introducing into a cell or tissue (in vitro or in vivo) a vector or other nucleic acid directing the expression of one or more mis-spliced, or truncated, or otherwise defective transcript variants of a target gene, it may be possible to inhibit the expression of the endogenous target gene, the defective variants being protected against NMD by inclusion therein of a NMD-resistance conferring sequence (e.g. a part or whole of intron 1 of a Ly-6 gene or mammalian homologue thereof) in accordance with the first aspect of the invention. A whole range of target genes might be suitable candidates for inhibition of expression including in particular oncogenes or any other genes the products of which are known or believed to be associated with a pathological condition. Other examples of possible target genes include, for instance: Glutathione S-transferase pi, expression of which is down- regulated in patients with Barrett's esophagus and esophageal adenocarcinoma (see Brabender et al, J. Gastrointest. Surg. 2002 6, 359-367); comparative gene identification (CGI) 94, which is down-regulated at the mRNA level in the hippocampus of early stage Alzheimer's disease brains (Heese et al, Eur. J. Neurosci. 2002 15, 79-86); Bak, a member of the Bcl-2 family of proteins, is overexpressed at the mRNA level in coeliac disease patients (Chernavsky et al, Autoimmunity 2002, 35, 39-37); and Apolipoprotern LI (apoLl), apo L2 and apo L4 are all up-regulated at the mRNA level in schizophrenia (Mimmack et al, Proc. Natl. Acad. Sci. USA 2002 99, 4680-5).
In inhibitory embodiments of the invention (i.e. to inhibit expression), frameshifts and/or premature stop codons are to be encouraged. Without wishing to be bound by any particular theory, the inventors believe that such defective transcripts (and their products) compete with the full length transcripts of the target gene for other cell components e.g. export or secretory receptors, translation machinery etc.
The method of regulating the expression of a gene or other nucleic acid sequence, in accordance with the invention, has the advantage of conferring very sequence-specific regulation, in that the expression of unrelated genes should be unaffected.
In order to perform the method of the invention it will typically be necessary to engineer the sequence to be transcribed so as to include a regulatory sequence corresponding to the isolated molecule of the first aspect of the invention. Typically (but not necessarily) the sequence will be inserted towards the 5' end of the sequence to be transcribed, for example at or near the first intronic portion of the sequence to be transcribed or near the 5' UTR. The regulatory sequence may be inserted, for example, as an addition to an endogenous first intronic portion or may be used to replace part or all of the endogenous first intronic portion or part of the 5' UTR.
In order to affect the expression of a gene or other transcribable sequence, in accordance with the invention, it is necessary to bring about the presence in a cell or cells of multiple copies of an RNA sequence corresponding either to a full-length effective transcript (in the case of up-regulation) or truncated or otherwise defective transcripts (in the case of down- regulation). This may be achieved by the use of naked RNA, especially where the RNA sequence involved is very short and especially when the method is being performed in vitro. In other embodiments it may be more convenient to introduce into the cell a DNA vector which causes the synthesis of appropriate RNA transcripts. Methods of introducing DNA into a host cell are well known to those skilled in the art and include transduction, transfection, transformation, electroporation and "biolistic" methods.
In another aspect the invention also provides for the use of a recombinant nucleic acid molecule (whether as part of a vector or otherwise) comprising the isolated molecule of the first aspect of the invention, in the preparation of a medicament to regulate expression of a target gene in a eukaryotic (typically mammalian, preferably human) subject.
In yet another aspect the invention provides a pharmaceutical composition comprising a recombinant nucleic acid molecule (whether as part of a vector or otherwise) and a physiologically acceptable excipient, carrier or diluent, wherein the recombinant nucleic acid molecule comprises the isolated molecule of the first aspect of the invention.
Physiologically acceptable excipients, carriers or diluents include sterile saline solution, phosphate buffered saline, and the like. The pharmaceutical composition may typically be administered by injection, which may be intravenous, subcutaneous, intramuscular etc, but may alternatively be administered by any safe and convenient route (e.g. intranasal, oral, rectal etc). Effective amounts of the composition to be delivered can readily be ascertained, with the benefit of the present disclosure, by routine trial and error typically an effective dose of the pharmaceutical composition will comprise between lμg and lmg of nucleic acid per Kg of body weight of the subject in question.
In still another embodiment the invention provides a host cell transformed with a nucleic acid molecule in accordance with the first aspect of the invention or a nucleic acid construct in accordance with the second aspect of the invention.
The host cell may be a prokaryotic cell (e.g. bacterial cell) used for the purpose of replicating the nucleic acid molecule/construct. Alternatively, the host cell may be any eukaryotic cell e.g. mammalian cell, yeast cell or plant cell. In particular the host cell may be a cell in which the expression of an endogenous gene is being regulated.
The invention will now be further described by way of illustrative example and with reference to the accompanying drawings, in which:
Figure 1 is a schematic representation of that part of the MHC class III region containing the cluster of LY-6 superfamily genes. The five members of the LY-6 superfamily are indicated by filled boxes and in bold text. Arrows below genes indicate direction of transcription. Numbers along the bottom line indicate the approximate distance in kilobases (kb) from the class II region of the MHC (HLA DRA gene).
Figure 2 is a schematic representation of the intron/exon structure, encompassing the ATG to the stop codon, of the five LY-6 superfamily members in the MHC Class III region. Numbers indicate size in base pairs (bp). Protein domains are illustrated as follows: diagonal line shading indicates signal peptide, C indicates a cysteine residue in the LY-6 domain and wavy line shading indicates the C-terminal hydrophobic region. Horizontal line shading indicates an unknown domain (LY6G5b only).
Figures 3(a)-(f) are composite figures showing the results of RT-PCR analysis of human cell lines or tissues. Part (i) of each figure is a picture of gel analysis of RT-PCR products from a variety of human cell lines (from left to right; K562, U937, HL60, Molt4, Jurkat, Raji, 143B and HeLa). Marker sizes (in Kilobases) are shown to the left of part (i) of each figure. Part (ii) of each figure is a picture of gel analysis of RT-PCR products from a variety of human tissues (from left to right: fetal liver, lung, kidney, spleen and brain; and adult liver, lung and kidney). Part (iii) of each figure is a schematic representation of the RT-PCR products detected. Figures (a)-(f) relate, respectively, to LY6G6C, LY6G6D, LY6G6E, LY6G5CA, LY6G5CB and LY6G5B.
Boxes indicated predicted exons, and the lines underneath indicate the splice forms found. Protein coding domains are illustrated as follows: diagnonal line shading indicates signal peptide, C indicates LY-6 domain and wavy line shadmg indicates C-terminal hydrophobic region. Horizontal line shading indicates unknown domain (LY6G5B), and the dotted area indicates the alternatively spliced part of LY6G6D. An X indicates a premature stop codon. Arrows indicate the position on the gels of the predicted product.
Figures 4(a)-4(e) are similar to Figure 3. Part (i) of each figure is a picture of gel analysis of RT-PCR products of various mouse cell lines (from left to right: L929, RAW264, WEHI-3B, WEHI-231 and EL4); part (ii) of each figure is a picture of gel analysis of RT- PCR products of various mouse tissues (from left to right: lung, kidney, brain). Part (iii) of each figure is a picture of gel analysis of RT-PCR products from (left to right) mouse liver, mouse spleen and tissue from mouse embryos at 12.5 and 15.5 days post-conception, respectively. Part (iv) of each figure is a schematic representation of the RT-PCR products detected (legend as for Figure 3). Figures 4(a)-(e) relate, respectively, to Ly6G6c, Ly6G6d, Ly6G6e, Ly6G5c and Ly6G5b.
Figure 5 is a schematic illustration of the conserved pattern of disulphide bridges found in LY-6 superfamily protein domains.
Figures 6a-g and 7a-f are photographs of gel electrophoretic analysis of various samples, indicating that transcripts of LY6GB and LY6G5B are not subject to NMD. Figures 8 A-B are graphical representations of the output of the MEME computer program analysis of various of the retained introns. The order of the introns in Figure 8 A is the same as that in Figure 8B. Figure 8A shows the consensus sequence of the retained introns ("multilevel consensus sequence") and the degree of conservation. Figure 8B shows the location and orientation of the motif in the intron.
Figure 9 A illustrates, schematically, various plasmid constructs employed by the inventors in performing luciferase assays, and Figure 9B is a bar chart showing the results (average RRR values i.e. "relative response ratio"), expressed as percentage maximal response, of luciferase assays performed using various constructs.
EXAMPLES
Example 1
The typical Ly-6 protein domain is approximately 80 amino acids long, characterised by a conserved pattern of 8-10 cysteine residues that have a defined pattern of disulphide bonding (see Figure 5). The consensus sequence of the Ly-6 domain extends from the first cysteine to the sixth, and there is a generally conserved spacing pattern as follows [EQR]- C-[LIVMFYAH]-x-C-x(5,8)-C-x(3,8)-[EDNQSTV]-C-{C}-x(5)-C-x(12,24)-C. (Prosite:http : //www.protomap .cs .huji. ac. il/Amino/Prosite/By Family /LY6JUPAR) .
There is also a conserved section at the C-terminal end of the Ly-6 protein domain, with the following amino acid pattern: C-C (or C-Y or Y-C) - — D- [hydrophobic residue]-C-N.
Exon-Intron structure
The MHC class III region contains a cluster of five genes coding for potential LY-6 superfamily members, lying "100-150 kb centromeric of the TNF cluster (Fig. 1). The exon/intron structure of these five human LY-6 superfamily members was previously partially annotated (Ribas et al, 1999; and the MHC Consortium 1999, both cited above). The human LY6G6C, LY6G6D and LY6G6E genes were predicted to have three coding exons (Accession no. AF129756) (Fig. 2). Human LY6G6D was noted as potentially having an alternative splice acceptor site upstream of the exon III splice site (Fig. 2), that causes a frameshift and loses the last four conserved cysteine residues (see AF129756). Translation of the annotated exon III of human LY6G6E also produces a frameshift such that it also loses the last five conserved cysteine residues and no longer encodes an LY-6 protein. However, in LY6G6E, a complete Ly-6 protein could be encoded if a non- consensus splice site is used (37523GT instead of 37522AG in AF129756) at the 5' end of exon III. Recently, an Ensembl (http://www.ensembl.org) annotation was made, in which the human LY6G6E gene is predicted to contain four exons instead of three to avoid the frameshift or premature stop codons. In mouse, G6c and Ly6G6d were also predicted to contain three exons, with a similar intron-exon structure as found in humans (AF 109905) (Fig. 2), with no alternative splice form of Ly6G6d. Mouse Ly6G6e was not annotated and we predicted the gene intron-exon structure based on comparisons of the potential human coding sequence (CDS) with the mouse genomic sequence. Interestingly, in mouse a consensus splice site could be used at the 5' end of Ly6G6e exon III (at the equivalent position to that of the non-consensus splice site in humans) and the Ly-6 translation frame maintained.
The LY6G5C gene in humans was predicted to contain four exons (AF129756), and does not contain a C terminal hydrophobic domain (Fig. 2). We found an ATG codon ~50 bp upstream of the 5' splice site of exon II, and this extended exon II would encode a signal peptide similar to exon I of the other LY-6 genes. Another potential methionine can also be found 9 bp further upstream of it, but it was not considered in this study. This LY6G5C gene, with three exons, we named Y 6G5CB and the annotated LY6G5C, with four exons, we named LY6G5CA. We defined the mouse Ly6G5c gene structure by comparisons of the human CDS (LY6G5CA) with the mouse genomic sequence. Interestingly, in mouse the equivalent exon II of human LY6G5CA lacks the splice acceptor site at the 5' end of the predicted exon LT, and we found an ATG codon "50 bp upstream as a potential start codon (as for LY6G5CB) (Fig. 2). This indicates that mouse Ly6G5c only contains three exons and is, therefore, more similar to the predicted LY6G5CB form found in humans. In addition, there is high sequence similarity at both the protein (66% identity) and DNA (76% identity) level, between exon I of the human LY6G5CB variant and the analogous exon I of mouse Ly6G5c.
The annotated LY6G52? gene in humans (AJ245417) consisted of only two exons, comprising half the LY-6 domain and the C-terminal region, with a long 5' UTR. However, we have found that this annotation was based on a mis-spliced mRNA that retains an intron, and we propose a new intron/exon structure (Fig. 2). In mouse, Ly6G5b was not annotated, so by comparing the new predicted human CDS with mouse genomic sequence, we predicted the mouse Ly6G5b gene structure (Fig. 2). Exon III in both LY6G55 and Ly6G5b is longer than the equivalent exons in the other Ly-6 genes in this cluster, encoding an extra domain with no known function (Fig. 2).
In general, the five MHC class III region LY-6 superfamily members are predicted to have similar exon/intron structures (Fig. 2). The predicted intron sizes of this cluster of LY-6 genes vary considerably from gene to gene (Fig. 2). However, the sizes of predicted exons are more conserved, with exon I (signal peptide) being 52-58 bp in length except in LY6G5C (112-121 bp). The length of the predicted exon II (first half of Ly-6 domain) would also be conserved being 111-138 bp for each gene except LY6G5C, where it is longer at 165-168 bp. The predicted length of exon HI (Ly-6 domain and C-terminal hydrophobic domain) would also be conserved for the LY6G6C, LY6G6D, and LY6G6E genes, being 200-224 bp. The predicted exon III of LY6G5C (exon IV in LY6G5CA) is shorter, being 164 bp and lacks the C-terminal hydrophobic domain, and in LY6G5B, the predicted exon III is much longer (398-419 bp) due to an extra domain (Fig. 2). It is interesting to note that while human LY6G6D has only eight conserved cysteine residues, mouse Lyόgόd has 10 cysteine residues, and that human LY6G5C has nine conserved cysteine residues, while mouse Ly6g5c has 10 cysteine residues.
Example 2
EST and cDNA analysis (human) In order to obtain full-length cDNAs for mRNA characterization and to try to confirm our annotations, we searched EST databases and performed RT-PCR analysis. The five LY-6 MHC genes showed similarity to human and mouse IMAGE clones, identified by searching EST databases.
In particular, the predicted coding sequence of each MHC class III region LY-6 gene (LY6G6C, LY6G6D, LY6G6E, LY6G5C and LY6G5B) was compared with the human and mouse EST databases at the National Center for Biotechnology Information (NCBI) (http : //www . ncbi . nlm. nih. gov/BLAST) , and the matching EST clones identified. (In addition the LY6G6C, LY6G6E, and LY6G5B genes also showed similarity to ESTs from bull and pig.) The IMAGE (Lennon et al, 1996 Genomics 33, 151-152) clones were obtained from the MRC UK HGMP Resource Centre (http://www.hgmp.mrc.ae.uk). The clones were confirmed by full sequence analysis, using either gene-specific primers (Tables 1A and IB) or the vector primers [M13 forward primer (5'- GTAAAACGACGGCCAGT-3') and either the T3 reverse primer (5'- ATTAACCCTCACTAAAG-3') or the PT7T3D reverse primer (5'- TAGGGAATTTGGCCCTCGAG-3')] using the BigDye Terminator sequencing kit (ABI). The sequences obtained were compared against the NCBI non-redundant (nr) database, and the published genomic sequences [Accession No.: AF129756 (human) and AF109905 and AF109719 (mouse)].
Sequencing of clones extended the EST database sequence by between ~0.1 kb and ~2 kb. After sequencing, we found that some of the clones did not correspond to the EST sequence in the database, and for these, only the published EST database sequence is shown (Tables 2 and 3). The sequences of the human IMAGE clones supported prediction of the intron exon structure, or gave an indication of alternative splice forms.
RNA extraction
The RNA isolation (polyA+ RNA) from the human cell lines K562 (erythroleukemia), U937 (monocyte), HL60 (monocyte), Molt4 (T cell), Jurkat (T cell), Raji (B cell), 143B (fibroblast) and HeLa (epithelial) was performed as described previous (Aguado et al, 1999 Biochem. J. 341, 679-689). Briefly, total RNA was extracted- from cells (RNAzol™B method), and polyA+RNA (Pharmacia Quick Prep kit) was extracted from the total RNA. Total RNA from human tissues was obtained from Stratagene.
Mouse polyA+ RNA was obtained using a Quickprep Micro mRNA Purification kit (Amersham Pharmacia Biotech), using either ~5xl07 cells from the cell lines L929 (fibroblast), RAW-264 (macrophage), WEHI-3B (monocyte), WEHI-231 (B cell), and EL4 (T cell lymphoblast) or "O.lg tissue from an adult mouse (129 strain), following the manufacturer's instructions. Total fetal RNA [12.5 and 15.5 days post conception (dpc) embryos] from the FI generation of a CBA/Ca x C57 BL/10 cross, was a kind gift from R. Elaswarapu and E. Anderson (MRC UK HGMP Resource Centre).
cDNA Analysis
Oligo-dT primed cDNA synthesis was carried out using the Reverse Transcription System (Promega) using approximately l g RNA (total or poly A + RNA) in a 20μl reaction volume following the manufacturer's instructions. Nested PCR was then performed using gene specific primers (Table 1) to obtain the full-length cDNA from the predicted start codon (ATG) to the stop codon. Primer sequences were based on both the genomic DNA sequence (Lee Rowen, see Accession numbers: AF129756, AF109905 and AF109719) and the published gene predictions, except for the LY6G5C and LY6G5B genes.
For the first round PCR, primers aligning in the 5' and 3' UTRs were used (5UTR1 and 3UTR1). For the second round of PCR, primers aligning with the potential start codon (RT2) and the predicted stop codon (RTl) were used. For human LY6G5C, two different PCRs were performed. In one case the forward primer G5CRT3 (RT3), aligning with the annotated exon I of LY6G5C (LY6G5CA form), and the primer G5CA5UTR1, aligning in the potential 5' UTR, were used. In the other case, the forward primer G5CRT2 (RT2), aligning with the ATG codon of a new predicted exon I of LY6G5C (LY6G5CB form), and the primer G5CB5UTR1, aligning in the potential 5' UTR, were used. The primers for LY6G5B align with a new predicted LY6G5B ATG codon (RT2) and the stop codon (RTl), while the UTR primers align just 5' of the new predicted LY6G5B start codon (G5B5UTR1) and 3' of the stop codon (G5B3UTR1). In each first round PCR reaction lμl of cDNA was used. For the human cell lines, the first round PCR product was cleaned (Qiagen PCR columns) and 3 l of a 1 in 10 (v/v) dilution was then used for the second round PCR reaction. For human tissue 1.5μl of a 1 in 3 (v/v) dilution of the first round PCR reaction was used in the second round reaction. In both rounds of PCR the conditions were as follows: 95° for 2 min followed by 35 cycles of 95° for 45s, 60° for 30s, 72° for 1 min, followed by 72° for 5 min. For all mouse samples, 0.5/d of the first round PCR reaction was used in the second round reaction. For mouse cell line RNA, the PCR cycling was as for the human RNA. However, for mouse tissues, both rounds of PCR were as follows:
95° for 2 min followed by 25 cycles of 95° for 45s, 60° for 30s, 72° for 3 min, then 72° for 10 min. All the PCR reactions contained 2mM MgCl2, 0.2mM dNTPs, 0.4μM each primer and 0.75U Taq polymerase (Promega) in a total volume of 25μl. As a negative control, the PCR was performed without the addition of a cDNA template. The products obtained were then isolated either by gel extraction (Qiagen) or using Qiaquick PCR columns (Qiagen), and either sequenced directly with gene-specific primers, or else cloned into the pGEM-T vector (Promega) for sequencing. pGEM clones were sequenced with vector primers [M13F] (described above) and SP6 (5'-TAGGTGACACTATAGAATAC-3')]. All the sequence analysis was performed as described above. The sequences obtained were compared to the published sequences using the BLAST algorithm at NCBI.
By RT-PCR analysis human LY6G6C showed the expected "400 bp band in all cell lines and an additional higher band of ~500 bp (Fig. 3 and Table 4A). In human tissues, both bands were present and with similar intensity except for fetal liver and adult lung where the "500 bp band was much fainter, or absent in adult liver (Fig. 3 and Table 4A). Direct sequence of these cDNAs indicated that the lower band (378 bp) was the expected product and that the upper band (489 bp) contained part of the intron between exons I and II, which introduces premature stop codons. This is a different splice variant to that seen in human IMAGE clone 1708589 (Table 2) where the differential splicing at the beginning of exon II removes the first two conserved cysteine residues while mamtaining the Ly-6 protein frame (Table 2). One consistent synonymous single nucleotide polymorphism (SNP) appeared in all RT-PCR samples sequenced that changed a T to a C at position 30838 in AF129756, but the amino acid (R 81) is unchanged.
Human LY6G6D, in contrast, showed multiple bands by RT-PCR, with different expression patterns in different cell lines and tissues (Fig. 3). These bands were cloned into pGEM and the different products sequenced. We found the expected 402 bp cDNA product in the cell lines U937, HL60, Raji, 143B and HeLa; in fetal lung, kidney and brain as well as adult lung; and faintly in fetal liver and spleen (Fig. 3 and Table 4A). All cell lines (except HeLa), fetal kidney and brain, and adult lung showed an additional band at ~500 bp (Fig. 3 and Table 4A). This band corresponds to a 493 bp mis-spliced form of LY6G6D, which introduces a stop codon before the Ly-6 domain. U937 and all fetal tissues, except brain, also had an additional 523 bp product (not possible to observe on the gel) similar to IMAGE clone 927511 (Table 2). This changes the frame of the protein losing the four C-terminal conserved cysteine residues in the Ly-6 domain. This splice form does contain a long in-frame translation with no significant homology to sequences in the non-redundant (nr) database. Further cloning and sequence analysis indicated that the band at "450 bp present in U937, Molt4, and Jurkat, and in fetal liver and spleen is likely to be a heteroduplex formed between the correctly spliced product (402 bp) and the mis- spliced product of 493 bp. Adult liver and kidney showed no product (Fig. 3 and Table 4A).
When we performed RT-PCR of LY6G6E, we observed only genomic sized products in all cell lines using primers to amplify the whole gene (G6ERT2 and G6ERT1) (data not shown), or using primers amplifying just exons II and III (G6ERT3 and G6ERT1) (Fig. 3 and Table 4A). However, in fetal kidney and fetal brain tissue we found three bands at "400 bp, "500 bp and "1.5 kb. The cDNA expected was 378 bp in size and the sequence of the "400 bp band showed that this band contained an extra 21 bp (399 bp in total) at the beginning of exon HI, such that when translated, a stop codon was introduced. The upper band (501 bp) contained the entire intron between exons LT and III, introducing premature stop codons. The 1.5 kb product, containing both introns, represents genomic DNA contamination of the RNA (or totally unspliced mRNA) (Fig. 3 and Table 4A). No EST clones were available for LY6G6E. These findings, together with the lack of an "in-frame" consensus splice site at the 5' end of exon III, could suggest that human LY6G6E is a pseudogene.
For LY6G5C, two sets of RT-PCR were performed, one for LY6G5CA and the other for LY6G5CB. For the LY6G5CA form, the RT-PCR produced an intense "550 bp band in the Molt4 and Jurkat cell lines and a much fainter band in Raji and HeLa, and a "650 bp product in fetal kidney (Fig. 3 and Table 4A). The expected size is 447 bp and upon sequencing, we found the "550 bp band to be a splice variant of LY6G5CA (557 bp), matching part of the sequence in human IMAGE clones 1504964, 3476011 and 3368609 (Table 2, Fig. 3 and Table 4A). This splice form changes the reading frame of the protein and introduces stop codons. The 677 bp product (Fig. 3) was different to that seen in the cell lines or IMAGE clones and encoded an ORF that contains the Ly-6 domain in-frame with no obvious signal peptide, and a methionine residue without an upstream stop codon, suggesting that this may be part of a longer gene product. Upstream of the 5' end of the RT-PCR product, if the genomic sequence is translated, a methionine residue is present and a stop codon is present in-frame upstream of the methionine. Therefore, we performed 5' RACE to identify the 5' end of the transcript,
To amplify the 5' end of the mRNA species, we did 5' RACE (Rapid amplification of cDNA ends) using 1 μg total RNA from fetal kidney using the SMART RACE cDNA amplification kit (Clontech), following the manufacturer's instructions and using the Reverse transcriptase enzyme 'Superscript II' (Gibco). In the first round reaction we used 1.25 μl cDNA and in the second round 0.5 μ\ first round product. We used the gene specific primers G5C3UTR1 (1st round) and G5CRT1 (2nd round), and the Universal primer mix (UPM) (1st round) and Nested universal primer (NUP) (2nd round), in a 25 μl reaction volume. The PCR buffer and cycling conditions were the same as described above for RT-PCR on human RNA. The products obtained were gel isolated (Qiagen) and either sequenced directly, or else cloned into the pGEM-T vector (Promega) for sequencing. We sequenced pGEM clones with the primers M13F and SP6 and performed all the sequence analysis and comparisons as described above. Only one splice product of LY6G5C (same as IMAGE clone 1504964) could be detected, but no extension at the 5' end of the 677 bp transcript could be found (data not shown). For the LY6G5CB form, the RT-PCR produced a range of products with very different expression in different cell lines and very restricted tissue expression (Fig. 3 and Table 4A). By direct sequencing, only Molt4 and HeLa had a product of the expected sequence (444 bp), as well as fetal and adult lung, in which tissues it was the only product. The 538 bp product present in U937, HL60, Molt4, Jurkat and 143B changes the translation frame and introduces a stop codon before the LY-6 domain. The 1428 bp band present in fetal spleen and all cell lines (except 143B and HeLa) was a mis-spliced form of LY6G5CB, introducing premature stop codons (Fig. 3 and Table 4A). The 900 bp band produced in fetal brain was a non-specific amplification product. For LY6G5C, a total of eight splice forms were found by EST analysis, of which many showed similarity to the 557 bp form of LY6G5CA, (Table 2) and these all introduce premature stop codons or have ORFs with no significant similarity to sequences in the nr database.
For LY6G5B, RT-PCR analysis showed two bands, one at "600 bp (expected size) and another at "750 bp, in all cell lines (Fig. 3). In tissues, human LY6G5B had a more restricted pattern of expression, with the expected band of "600 bp present only in fetal brain and adult lung, and faintly in fetal liver and fetal kidney (Fig. 3). All other tissues produced the band at "750 bp, and in some, we saw a band corresponding to genomic size ("1 kb) (Fig. 3 and Table 4A). When sequenced, the lower band (606 bp) was the expected product. The upper band (753 bp) introduced premature stop codons, and corresponds to the cDNA previously described [22] and found in IMAGE clone 2139582 (Table 2). There is one consistent non-synonymous, non-conservative SNP present in all RT-PCR samples sequenced, which changes an A to a G at position 78092 in AF129756. This causes an amino acid change at position 102, from an asparagine residue (N 102) to an aspartic acid residue (D 102).
Example 3 EST and cDNA analysis (mouse)
In mouse, all IMAGE clones confirmed our annotation, though full-length clones were only available for Ly6G6c (Table 3). The RT-PCR results for Ly6G6c showed two products of "400 bp and "700 bp (Fig. 4 and Table 4B). Upon sequencing, the strong widely expressed lower band (381 bp) was the expected product and the very faint upper band (723 bp), which retains the 3' most intron and has a more restricted expression pattern, introduced premature stop codons in the protein translation (Fig. 4). The top band found in tissues represents genomic DNA contamination or totally unspliced RNA.
Mouse Ly6G6d, in contrast, showed different expression by RT-PCR in different cell lines and tissues (Fig. 4). A product of "400 bp was expected, but all cell lines, as well as lung and kidney, showed only a "500 bp product. This was a 505 bp mis-spliced product that contained the intron between exons I and II and changed the reading frame of the protein and introduced stop codons (Fig. 4 and Table 4B). Adult brain and both fetal samples showed bands at "400 bp and "500 bp, with the band of 408 bp being the product of the expected sequence (Table 4B).
RT-PCR of Ly6G6e showed three bands. The expected "400 bp band and a faint additional higher band of "500 bp was seen in all cell lines (Fig. 4 and Table 4B). We also found both these bands in all tissues except brain and spleen. Upon direct sequencing, the lower band contained an additional 18 bp 5' of exon III compared to the predicted sequence (405 bp in total), utilizing a splice site upstream of the predicted one (similar to what was found in humans), thus introducing six extra amino acid residues and maintaining the translation frame. In humans, this introduces a stop codon, but due to base differences, no stop codon is introduced in mouse. The upper faint band of 501 bp is a splice form that contains the intron between exons II and III but maintains the reading frame and introduces an additional 32 amino acid residues with no significant similarity to sequences in the nr database. The third band of "1 kb represents genomic DNA contamination of the RNA (or totally unspliced mRNA) (Fig. 4). For Ly6G5c, RT-PCR produced only one band of "450 bp (expected size) in all the cell lines, adult brain and both the fetal samples (Fig. 4 and Table 4B). When sequenced, this band was the product of the expected sequence (450 bp). The "3.6 kb band present in some samples represents the genomic product or unspliced mRNA.
RT-PCR analysis of Ly6g5b, in contrast, showed different expression in different cell lines with only WEHI-3B showing a product of the expected size ("600 bp). A band of "700 bp was present in all the cell lines (Fig. 4 and Table 4B). When sequenced, it was found that the lower band (585 bp) was the product of the expected sequence and the upper band (681 bp) proved to contain the entire intron between exons I and II, introducing premature stop codons. In tissues, after sequencing all bands, we could see that the expected Ly6G5b product (585 bp) was present only in lung and faintly in kidney (Table 4B). These two tissues also showed a "700 bp band, which was found to be the 681 bp mis-spliced form found in the cell lines and was also detected in liver, brain and both embryo samples. The 400 bp band seen in 12.5dpc embryo is a non-specific product. The "1 kb band present in all tissue samples, and in L929 and RAW264 represents the genomic product or unspliced mRNA.
The MHC region is extremely highly conserved between human and mouse and this has helped us to elucidate the exon/intron structure of the mouse MHC class III region Ly-6 family members where these were previously not predicted by computer programs. From comparisons between genes and between species, it can be seen that the Ly6G6c, Ly6G6d and Ly6G6e genes show a conserved pattern of exon structure and protein domains, while the Ly6G5c and Ly6G5b genes have different patterns of exon structure and different N and C terminal protein domains. Mouse and human exon-intron structures are very conserved for each member, and interestingly intron lengths are also relatively similar (though generally shorter in mouse), except for Ly6G5c, where the intron sizes vary considerably between both species. It is interesting to note that other Ly-6 superfamily members, such as the murine cluster on mouse chromosome 15, also have three coding exons, with similar exon structures and protein domains to the one found in this study. For the human LY-6 MHC class III region genes, it was not possible to obtain a correctly spliced full-length IMAGE cDNA clone. In mouse, however, two proper full-length clones were available for Ly6G6c, covering both 5' and 3' UTRs. We observed a big difference in the availability of clones per gene between species. The most striking difference is for Ly6G5c, which has fourteen IMAGE clones in humans and none in mouse, possibly indicative of differential expression in mouse and humans, or of the smaller number of ESTs currently available from mouse. In addition, we have found that the information from dbEST is not sufficient as, for Ly6G6c and Ly6G5c, the splice variants were only found when the IMAGE clones were totally sequenced. There is a wide tissue distrbution of ESTs in both human and mouse, with the genes seemingly expressed in both immune relate and non-immune related tissues. Comparison of the seven alternatively spliced forms found in the EST clones with the fourteen alternative splice forms found by direct cDNA analysis showed that only three were found by both methods.
Direct cDNA analysis gives a more detailed look at gene expression at the mRNA level and we have observed that all the LY-6 genes in the human MHC class III region, and all but one in the mouse, are differentially spliced. It is interesting to note the different expression levels of the different splice forms, and while RT-PCR is only a semi- quantitative method, in some cases we can observe a clear difference in the band intensity, indicating a clear over-expression of that product, which is sometimes a mis-spliced form, as in the case of human LY6G5B and mouse Ly6G6d.
It is particularly interesting to note that, in this study, the mis-spliced forms in both human and mouse normally retain the same intron (usually the first one), or part of the intron (also the first one) . These intronic sequences produce changes in the reading frame of the protein, and/or introduce stop codons, except for the 677bp splice form of LY6G5CA found in human fetal kidney, and the 501bp splice form of mouse Ly6G6e. It should be noted that the only cases that retain the most 3' intron are the very poorly expressed 723bp form of mouse G6c (in some poly A+ RNA samples and both total RNA samples), and the 501bp form of the potential human pseudogene LY6G6E. We have used both poly A+ RNA (all cell lines and adult mouse tissues) and total RNA (human and embryonic mouse tissues), and we have found that all the mis-spliced forms present in total RNA are also present in the poly A+RNA. The exceptions to this are the human LY6G6E gene transcripts, but this gene seems to be a pseudogene. We have not studied cytoplasmic RNA, but recent evidence has suggested that translation can occur in the nucleus, with between 10-15% of proteins translated there (Iborra et al, 2001 Science 293, 1139-1142), indicating the importance of analyzing total cellular RNA.
Nonsense-mediated decay (NMD) is a quality-control mechanism that checks newly made messages, and degrades them rapidly. Most NMD is cytoplasmic, but a fraction seems to be nuclear (Maquat & Carmichael 2001 Cell 104, 173-176). Pre-mRNA processing is thought to occur co-transcriptionally, with the mRNA cap structure promoting removal of the first intron (Proudfoot et al, 2002 Cell 108, 501-512), and splicing and polyadenylation factors found coincident with RNA polymerase II staining. If the mis-spliced products we have identified were splicing intermediates, they should be quickly degraded. However, we have found stability of the mis-spliced forms (especially those containing the first intron), relative to that of the correctly spliced forms, based on the presence and abundance of the RT-PCR products. This, together with the retention of the first intron (or part of it), suggests that our mis-spliced forms are real transcripts and further that they might have a potential regulatory function. These transcripts could indicate the presence of a regulatory mechanism whereby the mis-spliced forms compete in some way with the correctly spliced forms for interaction with the translation and export machinery, adding an additional layer of control to the expression of these genes.
It has been previously suggested that mouse Ly-6C can have different functions depending on the cell type in which it is expressed (Yamanouchi et al, 1998 Eur. J. Immunol. 28, 696-707), and there is also evidence to suggest that transcription initiation and splicing of Ly-6 superfamily members can be altered when cells are stimulated with cytokines (Khodadoust et al, 1999 J. Immunol. 163, 811-819). It is possible that mis-splicing of the MHC class III region LY-6 genes, and retention of introns or intron fragments, may act as a regulatory mechanism for gene expression at the transcriptional level, and that different cytokines could act at another level of control on splicing, initiation of transcription, or the expression in different cell types of these LY-6 genes. It would be interesting to perform similar analyses with the other LY-6 superfamily members in the genome, to elucidate whether all follow the same exon/intron structure pattern, and more interestingly whether they present differential splicing to the same degree as this MHC class III region LY-6 cluster.
Example 4
In view of the surprising findings described in the preceding examples, the inventors wished to establish the subcellular location of the Ly-6 transcripts.
Subcellular localisation of the transcripts (nucleus or cytoplasm) was analysed by cell fractionation followed by RT-PCR. Cell fractionation was performed using a modification of the RNeasy mini kit (Qiagen) protocol. Cell membranes were lysed by incubation in a lysis buffer (lOmM Tris-HCl, lOOmM NaCl, l.SmM MgCl2, 0.5% (v/v) NP-40, lmM DTT, 1000 U/ml RNasin) for 5 minutes on ice. Samples were then centrifuged for 2 minutes at 600g to pellet nuclei. The supernatant, containing the cytoplasmic fraction, was removed to a separate tube and treated according to the RNeasy protocol for cytoplasmic RNA extraction. Nuclear RNA was extracted from the pelleted nuclei using the RNeasy kit protocol for total RNA extraction.
These experiments (data omitted for brevity) confirmed that both the correctly spliced transcripts and the mis-spliced variants were present in the cytoplasmic fraction. Accordingly, subsequent experiments (described below) were performed to examine the stability of the transcripts in the cytoplasm.
Example 5 - NMD susceptibility analysis
A number of experiments were performed to investigate the susceptibility of various transcripts to Nonsense-Mediated Decay. Actinomycin D binds tightly to double stranded DNA, and thus blocks transcription, probably by interfering with the progression of the RNA polymerase along the DNA template. Correctly spliced mRNAs should, according to accepted theory, be more stable after termination of transcription than those containing premature termination codons (PTCs), as mRNAs with PTCs (the mis-spliced forms) should be degraded by NMD. Therefore, the mis-spliced form should decay faster. In fact, contrary to this conventional expectation, and in agreement with the inventors previous findings, the mis-spliced variants were generally as stable as, or more stable than, the correctly spliced transcripts.
The experiments were performed as follows:
Cells were treated with 5μg/ml actinomycin D to block de novo transcription, to measure mRNA decay rates and whether there were differences within the splice forms of one gene, and between the different genes. K562 (undifferentiated erythroleukaemia) and Raji (B cell) cells were used and the mRNA levels were measured using one round of RT-PCT at various time points (Time 0, 1 hour, 2 hours and 4 hours) after addition of actinomycin D.
For these experiments, the LY6G6D and LY6G5B genes were selected because:
a) they are differentially expressed in K562 and Raji cells (as observed by RT-PCR), which allowed investigation as to whether there were cell line specific differences in stability (see Figure 3); and b) it was easy to distinguish the different splice forms for each gene, due to clear size differences between the different splice forms (see Figure 3).
Various other genes were used as controls:
a) β-actin, β-globin and GAPDH transcripts were used as a measure of abundance and RNA quality. b) C-myc transcripts were used as a control for unstable mRNA transcripts (Rodgers et al, 2002 RNA 8, 1526-1537). c) The intron of c-myc was used as a control for genomic contamination of the cytoplasmic RNA fraction.
At each time point 5xl06 cells were used and in all cases cells were fractionated in a low salt buffer (lOmM Tris-HCl, lOOmM NaCl, 1.5mM MgCl2, 0.5% (v/v) NP-40, ImM DTT, lOOOU/ml Rnasin) and the RNeasy kit (Qiagen) was used for RNA isolation. An equivalent amount of the RNA obtained from each fraction was used for oligo-dT primed cDNA synthesis and was performed using the Reverse Transcription System (Promega) in a 20μl reaction volume following the manufacturer's instructions, with lμl of cDNA being used in each PCR reaction. All RT-PCR reactions contained 2mM MgCl2, O.δmM dNTPs, 0.4μM each primer and 0.75U Taq polymerase in a 25 μl reaction volume. The PCR conditions were as follows: 95°C for 2 min followed by 35 cycles of 95°C for 45s, 60°C for 30s, 72°C for 30s, followed by 72°C for 5 min. The primers used amplified across the retained intron. While RT-PCR is not fully quantitative, it was possible to measure levels of each splice form relative to other conditions, therefore giving a qualitative result, especially as only one round of PCR was performed.
Typical results are shown in Figure 6.
Figure 6 comprises a series of photographs showing gel electrophoretic separation and analysis of the RT-PCR products from samples using primers specific for (a) Actin or β- globin genes; (b) GAPDH; (c) C-myc; (d) C-myc intron; (e) and (f) LY6G6D and LY6G5B in either K562 or Raji cells respectively. In each instance the order of samples between the marker ladders (from left to right) is nuclear fraction time Ohrs, 1, 2, 4hrs; cytoplasmic fraction 0, 1, 2, 4 hrs; negative control. RT-PCR products of mis-spliced transcripts are indicated by arrows.
Treatment with actinomycin D indicated that the mis-spliced forms were stable in the cytoplasm, though there did appear to be cell-specific differences in stability for the LY6G6D gene, which seemed unstable in the K562 cell line but stable in the Raji cell line. Interestingly, the mis-spliced form of the LY6G5B gene was also more abundant in the cytoplasmic fraction of Raji cells, relative to the correctly spliced form, compared to K562 cells. There also appeared to be no differences in stability between the mis-spliced and correctly spliced forms of LY6G5B as there was no obvious decay of either form after transcription was stopped. The stability of the actin, β-globin and GAPDH transcripts indicated that the RNA was not degraded by the treatment or the RNA isolation procedure, while the decay of the c-myc transcript indicated that the actinomycin D treatment was effective. The lack of c-myc intronic product in the cytoplasmic fraction showed there was minimal genomic contamination of the cytoplasmic fraction.
The inventors also decided to investigate the effect of translation on stability of the transcripts, as translation of mRNA has been shown to be requhed for NMD to become effective (see Hilleren & Parker 1999 Annu. Rev. Genet. 33, 229-260 and Buhler et al, 2002 EMBO Reports 3, 646-651), probably for recognition of premature termination codons.
To this end, K562 cells were treated with 300μg/ml of either cycloheximide ("CHX") or puromycin ("PUR"), both translational inhibitors.
Cycloheximide inhibits the peptidyl transferase on the large subunit of the eukaryotic ribosome, while puromycin is a tRNA analogue that causes premature chain termination. If the stability of a mis-spliced transcript is greatly increased relative to that of the correctly spliced form (observed as an increase in PCR product), following inhibition of translation, it indicates that the transcript is subject to NMD.
K562 cells were treated with the relevant inhibitor with mRNA levels measured in cytoplasmic or nuclear fractions (prepared as before) by way of RT-PCR after 0, 1, 2 or 4 hrs of inhibitor treatment. The results are shown in Figure 7. Figure 7 is similar to Figure 6 and comprises a series of photographs showing gel electrophoretic separation and analysis of RT-PCR products from samples using primers specific for: 7(a) actin or β-globin; 7(b) GAPDH 7(c) C-myc; 7(d) C-myc intron; 7(e) and (f) LY6G6D or LY6G5B in K562 cells treated with CHX (Fig. 7e) or PUR (Fig. If). In each instance the order of the samples between the ladder markers is as described for Figure 6. RT-PCR products of mis-spliced transcripts are arrowed. In Figures 7(a)-(d), the cells were treated with either CHX and PUR: similar results were obtained in either case.
Neither CHX nor PUR treatment resulted in any increase of stability of the mis-spliced form of LY6G5B, relative to the correctly-spliced form, indicating that the mis-spliced transcript is not subject to NMD.
Example 6 - Computer analysis
The presence of common key sequences present in the introns of the LY6G5B and LY6G6D genes was investigated, as well as then presence in other genes, to determine whether any potential motif was restricted to these genes, to this gene family, or whether it is found more widely in the genome. The program MEME was used for this analysis (http : //meme . sdsc . edu/meme/website/intro . html) .
The MEME sequence analysis indicated that there was potentially a weakly conserved sequence motif (Figure 8A) in these intronic sequences in the experimentally confirmed retained introns of human and mouse Ly6G6d and Ly6G5b as well as in the predicted rat and pig orthologues (defined using the publicly available genomic sequences) (P values of between 7.08e"14 and 8.91e"18; Figure 8), whereas no clear motif could be found (above that found by chance) when intron 2 of these family members were included in the analysis. This indicates that this "retained-intron-specific" motif does not occur due to sequence conservation of this genomic region between species, and is not a general feature of this cluster of genes. By phylogenetic analysis, it was also possible to conclude that the Ly6G6d and Ly6G5b genes are not closely related. Additionally, this weak motif is not present due to a bias in base composition, as scrambled sequence with an identical base composition found this motif with a much-reduced probability (P value of 8.4e"5).
The results from this analysis indicate that these transcripts are cytoplasmic and are not subject to NMD, and could therefore potentially have a regulatory function physiologically. They could therefore potentially be used to stabilise transcripts for therapeutic purposes by insertion e.g. in the 5' or 3' UTRs of genes of interest, or else as an intron to create "competing" mRNA species to reduce production, export or translation of the correctly spliced form of the gene of interest.
Example 7 - Luciferase Assay
The ability of these intronic sequences to stabilise a luciferase gene reporter construct was examined. The intronic sequences were inserted into either the 5' or 3' UTR of the luciferase gene in the pGL3 construct and their effect on luciferase gene expression measured using luminescence.
The inventors employed a luciferase assay to try and determine whether these intronic sequences were capable of stabilising other gene transcripts. The method was adapted from that described by Kubota et al (2000, Oncogene 19, 4773-4786) based on the Dual luciferase system (Promega) and cell lysates. The inventors used instead the Dual-Glo luciferase assay system (Promega). These two methods are very similar except that the Dual-Glo system gives up to two hours of stability after the reagents are added in order to be able to read entire plates, rather than the rapidly decreasing activity of the normal luciferin as used in the dual luciferase system. The other advantage of the Dual-Glo method is the fact that there is no need for cell lysates, and instead the reagent can be added directly onto the cells and before the luminescence is read.
In detail, the method used was as follows:
1- 25,000 Hek293T cells (human embryonic kidney cells) were seeded into each well of a 96 well plate. 2- 24 hours later, the cells were co-transfected with 200ng DNA, comprising both the renilla construct and a firefly luciferase construct, using Polyfect transfection reagent (Qiagen).
2a -25ul DMEM (no serum and no antibiotic), 25ng renilla construct, 175ng firefly luciferase construct and 0.8ul of PolyFect reagent were mixed in a tube and incubated at room temperature for 10 mins.
2b - During the incubation, the media was removed from cells, and 50ul of fresh DMEM
(with serum and antibiotics) was added.
2c - The 25ul of transfection reagent (2a) was added to each well, swirled and incubated for
48 hours.
NB- volumes and weights are given per well and are usually done in triplicate; therefore multiply all by 3 and aliquot 25ul into each of three wells.
3- 48 hours post transfection, cells, dual-Glo buffers, and dual-glo reagents were brought to room temperature.
4- Dual-glo luciferase buffer was added to dual-glo luciferase reagent (as per manufacturer's instructions), mixed well, and 1 volume (i.e. 75ul) was aliquotted into each well
5- Cells were incubated at room temp for 10-30mins and then the firefly luminescence was read on an alpha- fusion Universal microplate analyser (Packard/Perkin Elmer).
6- Dual-glo stop and glo reagent was added to buffer as per manufacturer's instructions, mixed and 1 volume (i.e. 75ul) was aliquotted into each well
7- Cells were incubated for the same amount of time as was done for the first readings, and then luminescence was read using the same voltage and gain settings as used for the firefly readings.
8- The readings were corrected based on transfection efficiency (renilla readings) and were normalised compared to control readings (pGL3 and rLUC control transfections).
The vector constructs used in this assay system were as follows (Figure 9A): rLUC vector containing the renilla luciferase ORF (control for transfection efficiency) pGL3 vector containing the firefly luciferase open reading frame (ORF) (control for firefly luciferase readings)
VEGFAgl31900 construct containing ~1.9kb of the Vascular Endothelial Growth Factor A
(VEGFA) 3' UTR (known to cause instability under normal oxygen conditions (Claffey et al, 1998 Mol. Biol. Cell. 9, 469-481)) inserted 3' of the firefly luciferase ORF (in the pGL3 vector). pGL3 vector containing 150bp of the c-myc intron (part of the same intron that was used as a control for genomic DNA contamination of the cytoplasmic RNA fraction) inserted either 5' of the firefly luciferase ORF (chn construct) or 3' of the firefly luciferase ORF (cpe construct) pGL3 vector containing the human LY6G6D first intron (91bp) inserted either 5' of the firefly luciferase ORF (dhn construct) or 3' of the firefly luciferase ORF (dpe construct) pGL3 vector containing the human LY6G5B first intron (147bp) inserted either 5' of the firefly luciferase ORF (bh constructs) or 3' of the firefly luciferase ORF (bpe construct) In each case the rLUC vector was co-transfected with one of the pGL3 constructs. Constructs containing intronic sequence were made by PCR amplifying the intron from genomic DNA using gene specific primers containing different restriction sites. For c-myc and LY6G6D the Hindlll and Ncol sites in the pGL3 vector were used to clone the fragment at the 5' end of the luciferase ORF, while for LY6G5B only the Hindlll site was used as the insert contained an Ncol site. To clone the fragments at the 3' end of the luciferase ORF, the Pstl and EcoRV sites were used.
The c-myc intron was chosen as a control because it was known not to be retained in the c- myc mRNA transcript detected in the RNA stability experiments. Only part of the intron was used to act as a control to see the effect of inserting a ~150bp fragment of DNA on the stability/expression of the luciferase gene product. Experiments were performed in triplicate to ensure reproducibility, while the luciferase controls (the rLUC and pGL3 vector co- transfections) were performed in triplicate three times. The RRR is calculated based on an average of all 9 control experiments (which is why each individual triplicate varies slightly from the 100% normalised value (Figure 9B)). The VEGFA construct was used as a control for destabilisation of the luciferase expression and therefore activity (decrease to ~30% of normal (Figure 9B)). The results show that when the intronic sequence or part of it is inserted 3' of the luciferase ORF there is some stabilisation, but as this is seen with the cpe control construct as well as the Ly-6 constructs (dpe and bpe constructs), this shows this effect is not gene specific (Figure 9B). However, when these intronic sequences are inserted 5' of the luciferase ORF, a stabilisation effect can be seen for the Ly-6 constructs (dhn, bh6 and bhl 1 constructs), while the c-myc construct shows a destabilising effect (chn construct) (Figure 5B). As the stabilising effects are well above the normalised value and above that of c-myc, it suggests that this is a specific effect. Interestingly, the effect of orientation of the intronic sequence in the construct can also be seen for the LY6G5B intron, as the intronic sequence was cloned in both orientations. The bh6 and bhll clones are cloned in the "correct" orientation, in the same orientation as would be found in the mis-spliced mRNA species, while the other bh constructs contain the intronic sequence cloned in the opposite orientation. These "opposite" orientation clones show a marked decrease in stability and luciferase expression, indicating that the stabilising effect is orientation-dependent.
Example 8
Delivery of nucleic acids comprising a regulatory sequence in accordance with the invention.
Delivery of the corrective nucleic acid to cells in vitro or in vivo may be classified as adopting either a viral or a non- viral approach.
Non viral approaches would include: 1) direct transfer of naked DNA molecules, 2) transfer of DNA complexed with cationic lipids, 3) transfer of particles comprising DNA condensed with cationic polymers or liposomes, or 5) physical methods involving needle- free injectors and electroporation.
Viral vectors would include: 1) Retroviral vectors (including lentivirus), 2) adenoviruses and adeno-associated viruses, 3) herpes simplex, and 4) poxviruses, such as vaccinia virus. For a review of viral vectors, see Breyer et al. 2002 "Development and use of viral vectors for gene transfer: lessons from their applications in gene therapy" Applied Genomics and Proteomics 1 (1) 45-63. The DNA will be delivered to primary cell types (or cell lines) that are defective in the expression of a particular target gene or that are over expressing a target gene. These cell types (or cell lines) may be blood cells, fibroblast cells, or tissue specific cells of any type.
An example of a non-viral in vitro approach for delivery to HeLa cells is described below.
HeLa cells: HeLa cells (epithelial) grown in Dulbecco's modified Eagle's medium (DMEM) with 10% (v/v) foetal bovine serum (FBS) and 100 IU/rnl penicillin and lOO g/ml streptomycin. Trypsinise.
Cells seeded in a 24 well plate at -40,000 cells/cm2 in 1ml of DMEM with FBS, then transfected with 0.6 g DNA (pcDNA3 containing the gene of interest) using the DEAE- dextran method as follows: cells washed twice with DMEM (no serum), then 0.5ml of transfection mix added. The transfection mix would be: 0.5ml DMEM, 5μ\ chloroquine (5.15mg/ml in phosphate buffered saline (PBS)), 2/ 1 DEAE-dextran (lOOmg/ml in PBS). Cells should be incubated with the DNA for 3 hours, then shocked with 10% (v/v) DMSO for 2 minutes. After being washed with PBS, 600 1 of DMEM with 10% (v/v) FBS has to be added and the cells incubated at 37°C for 2-4 days before harvesting/treatment, to check by PCR for up or down-regulation of expression of the gene of interest. In order to make stably transfected cells lines, G418 will be added to the cells and after two weeks selection clones expressing the gene of interest will be monitored by PCR.
An example of a viral in vitro approach might be as follows:
The gene, or splice form, of interest would be expressed in a retroviral system, such as RetroXpress System (Clontech) following the manufacturer's instructions. Essentially the gene of interest will be cloned into the multiple cloning sites of the RetroXpress vectors. The retroviral vector carrying the gene of interest would be transfected into the RetroPack PT67 packaging cell line leading to production of retroviral particles (105-106 ffu/ml) followed by antibiotic selection and isolation of clones and production of high-titer clones (106-107 ffu/ml). The packaged virus will be used to infect target cells with high efficiency and expression of the target gene would be monitored by PCR. For gene transfer protocols in humans, examples followed in trials can be found at www4.od.nih.gov/oba.
As an example:
A tumour nodule will be removed from a patient and the lymphocytes infiltrating into this cancer will be cultured and expanded in vitro following standard conditions. These lymphocytes will be incubated with a retroviral vector (containing the genes of interest) once they have reached the log phase growth. These cells will be expanded in culture until approx 4xl010 cells were obtained. These lymphocytes will be cryopreserved in aliquots and tested to ensure that they are free of replication competent virus and that they express the gene of interest. These lymphocytes will be administered to the cancer patient. After administration, samples of tumour biopsy material would be tested for the presence of the gene of interest and what effect it had on the tumor.
Accession numbers (submitted to EMBL):
Human:
LY6G6C (AJ315533, AJ315534); LY6G6D (AJ315535, AJ315536, AJ315537); LY6G6E
(AJ315538, AJ315539); LY6G5CA (AJ315540, AJ315541); LY6G5CB (AJ315542,
AJ315543, AJ315544); LY6G5B (AJ315545, AJ245417 (updated)).
Mouse:
Lyόgόc (AJ315546, AJ315547); Ly6g6d (AJ315548, AJ315549); Ly6g6e (AJ315550,
AJ315551), Ly6g5c (AJ315552), Ly6g5b (AJ315553, AJ315554).
Table 1. Primers used in the cDNA analysis of the human (top) and mouse (bottom) MHC class III region Ly-6 superfamily members.
Gene Primer name Primer sequence
Human
1st round LY6G6C F G6C5UTR1 5 '-GCCACCATTCTCAGGACCCTCG-3 '
R G6C3UTR1 5'-GGGAGAGAGGCTCAGGCCAAGGC-3 '
LY6G6D F G6D5UTR1 5'-GGTCCTGACACGGGCAGACTGC-3'
R G6D3UTR1 5'-GTTCCCTTCTCTACTCCTACTCCCC-3'
LY6G6E F G6E5UTR1 5 '-CAGATTCCTGAGCCTGTGTCTGG-3 '
R G6E3UTR1 5 '-CCTGGGTTGGGGGCTGCAGATCC-3 ' Y6G5C F G5CA5U R1 5 '-GCAGTTCCAGGGGATGATCAGC-3 '
F G5CB5UTR1 5 '-GAGGCCAGCAGGCAGTCATGC-3 '
R G5C3UTR1 5'-GGTGGAATTTTACACCAAAGTTTGC-3'
LY6G5B F G5B5UTR1 5 '-CTCAGG AACTGCCCATCTCCCCAG-3 '
R G5B3UTR1 5 '-GGGTGTTACAGAAAGATATTTGCAC-3 '
2nd round LY6G6C F G6CRT2 5 '-ATGAAAGCCCTTATGCTGCTCACC-3 '
R G6CRT1 5 '-GCCAATGGAATGAGTCTCAGTGC-3 '
LY6G6D F G6DRT2 5 ' -ATGAAACCCCAGTTTGTTGGG-3 '
R G6DRT1 5 '-CTATCCGCTCCACAGTCCTGG-3 '
LY6G6E F G6ERT3 5 '-GTCTCACCATGTCCCCTGCCC-3 '
F G6ERT2 5 '-ATGGGCACCTCCAGCATCTTCC-3 '
R G6ERT1 5 '-GGAAGCTAGTGGAGGAGGTGTCC-3 '
LY6G5C F G5CRT3 5 ' -ATGCAGACCTTCCCAGTTGC-3 '
F G5CRT2 5'-ATGGCAGGGCCTGCAGGGAGC-3'
R G5CRT1 5 '-CACTAAGGAGTATAGAGCCCTCTG-3 '
LY6G5B F G5BRT2 5 '-ATGAAGGTCCATATGCTTGTAGG-3 '
R G5BRT1 5 '-TCAGGAAGGGTGAGGTGTCAAG-3 '
Mouse 1st round y6g6c F MG6C5UTR1 5'-CTGTGGACATTCTCAGGACCCTC-3'
R MG6C3UTR1 5'-CTATAGCAGGTGGTAAGATTGTAGC-3' y6g6d F MG6D5UTR1 5'-GGAGAGGCACTACAGGCATAAACG-3'
R MG6D3UTR1 5'-CCTGCCCTCGGTCCCACCCTCC-3' y6g6e F MG6E5UTR1 5'-CCAGACTGCTGAGCTGTACCTGC-3'
R MG6E3UTR1 5'-GTGACACTGGAATCCTGGGTTGAC-3'
Ly6g5c F MG5C5UTR1 5'-CAGGCTGGAGGCTGGAGAGAAGC-3'
R MG5C3UTR1 5'-GGAGGTTAGCTGGCAGGACGTGG-3' yόg5b F MG5B5UTR1 5'-GTTGCTCCTCTCCCCCAAATTCC-3'
R MG5B3UTR1 5'-ACAGAGCAGAATCTAGGAAGGGG-3'
2nd round y6g6c F MG6CRT2 5'-ATGAAACACCTCCTGTTGCTCACC-3'
R MG6CRT1 5'-GGAGCTAGTCTCAATGCAATAACC-3' y6g6d F MG6DRT2 5'-ATGAACTCCCAGTTGGTCGGGATC-3'
R MG6DRT1 5'-CTACAGTCCTGGCAAGAGCCAGG-3'
Ly6g6e F MG6ERT2 5'-ATGGGCCCGTCCAGCGCCTTCC-3'
R MG6ERT1 5'-CCGTGGACGCATGGGCTAGAGG-3'
Ly6g5c F MG5CRT2 5'-ATGCTTTTTATGGCAGGCCCTGC-3'
R MG5CRT1 5'-GGTTCCTGCTGTTCTAGTGCATGC-3'
Ly6g5b F MG5BRT2 5'-ATGAGGGCCCGCGTGCTTGTAGG-3'
R MG5BRT1 5 '-CATCAGGGTCTATCCCAGGGAAGG-3 ' Table 2. Summary of human EST data, (db) indicates sequence data from dbEST (www.ncbi.nlm.nih.gov). All other results are from direct sequence analysis. Alternatively spliced clones are shown in bold. RT-PCR indicates to which RT-PCR product the EST clone sequence corresponds.
Gene Ace. No. Image ID Tissue Coverage (db and/or direct seq) RT-PCR
LY6G6C H03135 151586 placenta EII, EIII, 3'UTR LY6G6C R25237 132305 placenta EIII, 3'UTR LY6G6C AI127339 1708589 foetal heart new El (in intron EI-EH), EH, EIII, 3'UTR
LY6G6D AI800033 2321242 prostate EIH, 3'UTR
LY6G6D AA535815 927511 colon part El, EII, Alt. EIH 523 bp
LY6G6E NO EST CLONES
LY6G5C AI201375 1755487 testis EIV, 3'UTR (db)
LY6G5C AA702487 448136 foetal liver + spleen EIV, 3'UTR (db)
LY6G5C AA179679 609340 foetal retina Part EIV, 3'UTR
LY6G5C N49842 282450 MS lesions part intron Eiπ-ETV, EIV, 3'UTR
LY6G5C AA731226 1251855 germinal centre B cell part EIIL ElV, 3'UTR
LY6G5C AI160056 1708976 foetal heart part EIH, EIV, 3'UTR
LY6G5C AA868740 146064 pooled organs part EDI, EIV, 3'UTR
LY6G5C AW001934 2513938 foetal thyrnus part of intron EI-EIL EIII, EIV, 3'UTR
LY6G5C AA905388 1504964 pooled organs new part of intron EI-EII, EIH, EIV, 3'UTR 557 bp
LY6G5C BF056626 3476011 fibrotheoma (ovary) new part of intron EI-EII, EIII, ETV, 3'UTR 557 bp
LY6G5C BG150413 3368609 carcinoid lung new part of intron EI-EII, EIH, EIV, 3'UTR 557 bp
LY6G5C AA773839 1048297 pooled tissues "1 new part of intron EI-EII
LY6G5C AA773720 1048434 pooled tissues J Em, part of intron Em-TV, EIV, 3'UTR
LY6G5C AA812816 1377114 testis part EIII, part of intron Em-TV, EIV, 3'UTR
LY6G5B AI446559 2139582 stomach part 5'UTR, El, intron, EII, EIH, 3'UTR 753 bp
LY6G5B R79468 146288 placenta part Em, 3'UTR
Table 3. Summary of mouse EST data, (db) indicates sequence data from dbEST (www.ncbi.nlm.mh.gov). All other data are from direct sequence analysis. RT-PCR indicates to which RT-PCR product the EST clone sequence corresponds.
Gene Ace. No. Image LD Tissue Coverage (db and/or direct seq) RT-PCR
Ly6g6c AA930440 1150527 skin part 5'UTR, El, EII, EIH (db)
Ly6g6c AA500454 918663 skin part 5'UTR, El, EII, Em (db)
Ly6g6c AI151937 1885608 embryo El, EII (db)
Ly6g6c W12301 314768 total foetus part 5'UTR, El, EII, EIII, 3'UTR 381 bp
Ly6g6c W10502 313190 total foetus Em, part 3'UTR
Ly6g6c AW213479 2259239 embryo part 5'UTR, El, EII, EIII, 3'UTR 381 bp
Ly6g6d AA794551 1196599 skin part El, EII, EIII 408 bp
Ly6g6e AA591211 990135 blastocyst Em, 3'UTR
Ly6g5c NO EST CLONES
Ly6g5b AI591806 1140329 T-cell part Em, 3'UTR
Table 4 A. Summary of Human RT-PCR Analysis.
Cell line/ Gene tissue LY6G6C LY6G6D LY6G6E c LY6G5CA LY6G5CB LY6G5B
K562 378, 489 493 1472(g) 1428 606, 753
U937 378, 489 402, (493), 523 1472(g) 538, 1428 606, 753
HL60 378, 489 402, 493 1472(g) 538, 1428 606, 753
Molt4 378, 489 493 1472(g) 557 444, 538, 1428 606, 753
Jurkat 378, 489 493 1472(g) 557 538, 1428 606, 753
Raji 378, 489 402, 493 1472(g) (557) 1428 606, 753
143B (378), 489 402, (493) 1472(g) 538 606, 753
HeLa 378, 489 402 1472(g) (557) 444 606, 753
Foetal liver 378, 489 (402), 523 1472(g) (606), 753
Foetal lung 378, 489 402, 523 1472(g) 444 753
Foetal kidney 378, 489 402, 493, 523 399, 501 , 1472(g) 677 (606), 753
Foetal spleen 378, 489 523 1472(g) 1428 753
Foetal brain 378, 489 402, 493 399, 501 606, 753
Adult liver 378 1472(g) 753
Adult lung 378, 489 402, 493 1472(g) 444 606, 753
Adult kidney 378, 489 1472(g) 753
a Numbers in bold indicate product of expected sequence b Numbers in brackets indicates decreased expression in that cell line/tissue c (g) indicates genomic product
Table 4B. Summary of mouse RT-PCR analysis.
Cell line/ tissue Gene
Ly6g6c "4 Ly6g6d Ly6g6e Ly6g5c Ly6g5b
L929 381, (723) 505 405, 501 450 681
RAW 264 381, (723) 505 405, 501 450 681
WEHI-3B 381, 723 505 405, 501 450 585, 681
WEffl-231 381, (723) 505 (405), 501 450 681
EL4 381, (723) 505 405, 501 450 681
Lung (129) 381, 723 505 405, 501 585, 681
Kidney (129) 381, 723 505 405, 501 (585), 681
Brain (129) 381 408 505 450 681
Liver (129) 381 405, 501 681
Spleen (129) 381
Embryo (12.5dpc 381, 723 408 505 405, 501 450 681
Embryo (15.5dpc) 381, 723 408 505 405, 501 450 681
a Numbers in bold indicate product of expected sequence b Numbers in brackets indicates decreased expression in that cell line/tissue

Claims

1. An isolated nucleic acid molecule which comprises at least part of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal, or a corresponding sequence.
2. A sequence according to claim 1, comprising at least 10 contiguous bases of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal.
3. A sequence according to claim 1, comprising at least 20 contiguous bases of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal.
4. A sequence according to claim 1, comprising at least 30 contiguous bases of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal.
5. A sequence according to claim 1, comprising at least 40 contiguous bases of the sequence of intron 1 of a human LY-6 superfamily gene or of a homologous gene from an animal.
6. A sequence according to any one of the preceding claims, comprising at least part of intron 1 of a gene selected from the group consisting of: all LY-6 genes, CD59, uPAR, SP10, SLURP-1, E48, RIG-E, PSCA, ThB and TSA-1/Sca2.
7. A sequence according to any one of the preceding claims, comprising all or part of the consensus sequence:
AGGAGGGAGCCAGGGGGTCTCTCACACACGCCCGGCGCCAGCTGT GA T A T CAAT C T GGAG GA TACGATTCC CG C T TT C
8. A nucleic acid construct comprising a sequence according to any one of the preceding claims.
9. A nucleic acid construct according to claim 8, comprising a transcribable sequence in operable combination with a sequence in accordance with any one of claims 1-7.
10. A replicable vector according to claim 8 or 9.
11. A method of regulating the expression of a nucleic acid sequence, the method comprising the step of placing the nucleic acid sequence, the expression of which is to be regulated, in operable combination with a regulatory nucleic acid sequence, which regulatory sequence comprises a molecule in accordance with the first aspect of the invention.
12. A method according to claim 11, comprising use of a nucleic acid construct according to any one of claims 8, 9 or 10.
13. A method of inhibiting the expression of a selected gene endogenous to a target cell, the method comprising the step of introducing into the target cell a nucleic acid molecule which, when present in the target cell, directs the expression of a mis- spliced, truncated or otherwise defective transcript variant of the endogenous gene, which variant is protected against nonsense mediated decay by the inclusion therein of a nucleic acid sequence in accordance with any one of claims 1-7.
14. Use of a nucleic acid molecule in accordance with any one of claims 1-10, in the preparation of a medicament to regulate the expression of a gene in a subject.
15. A pharmaceutical composition comprising a recombinant nucleic acid molecule in accordance with any one of claims 1-10, and a physiologically acceptable excipient, carrier or diluent,
16. A host cell transformed with a nucleic acid molecule in accordance with any one of claims 1-10.
17. A nucleic acid molecule substantially as hereinbefore described and with reference to the accompanying drawings.
18. A method of regulating the expression of a gene substantially as hereinbefore described and with reference to the accompanying drawings.
PCT/GB2003/002652 2002-06-24 2003-06-20 Regulation of gene expression using intron 1 of the ly-6 gene superfamily WO2004001034A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003277979A AU2003277979A1 (en) 2002-06-24 2003-06-20 Regulation of gene expression using intron 1 of the ly-6 gene superfamily

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0214524A GB0214524D0 (en) 2002-06-24 2002-06-24 Improvements in or relating to regulation of gene expression
GB0214524.1 2002-06-24

Publications (2)

Publication Number Publication Date
WO2004001034A2 true WO2004001034A2 (en) 2003-12-31
WO2004001034A3 WO2004001034A3 (en) 2004-04-01

Family

ID=9939160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/002652 WO2004001034A2 (en) 2002-06-24 2003-06-20 Regulation of gene expression using intron 1 of the ly-6 gene superfamily

Country Status (3)

Country Link
AU (1) AU2003277979A1 (en)
GB (1) GB0214524D0 (en)
WO (1) WO2004001034A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7521943B2 (en) 2005-01-23 2009-04-21 Serconet, Ltd. Device, method and system for estimating the termination to a wired transmission-line based on determination of characteristic impedance

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CASEY J R ET AL: "The structure of the urokinase-type plasminogen activator receptor gene" BLOOD, W.B. SAUNDERS, PHILADELPHIA, VA, US, vol. 84, no. 4, 1994, pages 1151-1156, XP002231478 ISSN: 0006-4971 *
GUMLEY T P ET AL: "SEQUENCE AND STRUCTURE OF THE MOUSE THB GENE" IMMUNOGENETICS, SPRINGER VERLAG, BERLIN, DE, vol. 42, no. 3, 1995, pages 221-224, XP002920550 ISSN: 0093-7711 *
HORIE M ET AL: "Isolation and Characterization of a New Member of the HumanLy6Gene Family(LY6H)" GENOMICS, ACADEMIC PRESS, SAN DIEGO, US, vol. 53, no. 3, 1 November 1998 (1998-11-01), pages 365-368, XP004448990 ISSN: 0888-7543 *
MALLYA M ET AL: "transcriptional analysis of a novel cluster of LY-6 family members in the human and mouse major histocompatibility complex: five genes with many splice forms" GENOMICS, vol. 80, no. 1, 1 July 2002 (2002-07-01), pages 113-123, XP002266012 *
SHAN X ET AL: "characterization and mapping to human chromosome 8q24.3 of Ly-6-related gene 9804 encoding apparent homologue of mouse tsa-1" THE AMERICAN ASSOCIATION OF IMMUNOLOGISTS, 1998, pages 197-208, XP002266013 *
TEMERINAC SNEZANA ET AL: "Cloning of PRV-1, a novel member of the uPAR receptor superfamily, which is overexpressed in polycythemia rubra vera" BLOOD, W.B.SAUNDERS COMPAGNY, ORLANDO, FL, US, vol. 95, no. 8, 15 April 2000 (2000-04-15), pages 2569-2576, XP002159981 ISSN: 0006-4971 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7521943B2 (en) 2005-01-23 2009-04-21 Serconet, Ltd. Device, method and system for estimating the termination to a wired transmission-line based on determination of characteristic impedance
US7919970B2 (en) 2005-01-23 2011-04-05 Mosaid Technologies Incorporated Device, method and system for estimating the termination to a wired transmission-line based on determination of characteristic impedance
US8391470B2 (en) 2005-01-23 2013-03-05 Mosaid Technologies Incorporated Device, method and system for estimating the termination to a wired transmission-line based on determination of characteristic impedance

Also Published As

Publication number Publication date
WO2004001034A3 (en) 2004-04-01
AU2003277979A8 (en) 2004-01-06
AU2003277979A1 (en) 2004-01-06
GB0214524D0 (en) 2002-08-07

Similar Documents

Publication Publication Date Title
Shen et al. Identification of the human prostatic carcinoma oncogene PTI-1 by rapid expression cloning and differential RNA display.
Miura et al. Human UDP-galactose translocator: molecular cloning of a complementary DNA that complements the genetic defect of a mutant cell line deficient in UDP-galactose translocator
US11946082B2 (en) Compositions and methods for identifying RNA binding polypeptide targets
US20030099976A1 (en) Androgen receptor complex-associated protein
Sayo et al. The multiple endocrine neoplasia type 1 gene product, menin, inhibits insulin production in rat insulinoma cells
Hsu et al. Identification of Fetuin-B as a member of a cystatin-like gene family on mouse chromosome 16 with tumor suppressor activity
US6730486B1 (en) Human βTrCP protein
US5912168A (en) CD95 regulatory gene sequences
Walpole et al. Identification and characterization of the human homologue (RAI2) of a mouse retinoic acid-induced gene in Xp22
WO2004001034A2 (en) Regulation of gene expression using intron 1 of the ly-6 gene superfamily
AU702844B2 (en) Interleukin-1 receptor-associated protein kinase and assays
US5976808A (en) Assays for identifying agents which affect regulators of UCP3 gene expression
US5972655A (en) Y2H61 an IKK binding protein
Simmons et al. Identification of NOM1, a nucleolar, eIF4A binding protein encoded within the chromosome 7q36 breakpoint region targeted in cases of pediatric acute myeloid leukemia
US6127176A (en) Mutant cell lines unresponsive to interleukin 1
Kurita et al. Overexpression of CR/periphilin downregulates Cdc7 expression and induces S-phase arrest
US6214582B1 (en) Y2H35 a strong IKK binding protein
US5958697A (en) Isolated nucleic acids encoding CYP7 promoter-binding factors
US6994975B2 (en) Expression and purification of ATM protein using vaccinia virus
Cao et al. Regulation of tumor necrosis factor-and Fas-mediated apoptotic cell death by a novel cDNA, TR2L
Hayakawa et al. Genomic organization, tissue expression, and cellular localization of AF3p21, a fusion partner of MLL in therapy‐related leukemia
US20040126824A1 (en) Ovarian cancer-specific promoter
EP1688487A1 (en) Method of controlling transcription of insulin gene
US7585962B2 (en) Multimerized enhancer domains for cell-specific expression
US7598077B2 (en) Compositions and methods for enhancing differential expression

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP