Jump to content

C6ORF47

From Wikipedia, the free encyclopedia

Gene

[edit]
C6orf47
Identifiers
AliasesC6orf47, D6S53E, G4, NG34, chromosome 6 open reading frame 47
External IDsMGI: 90673; HomoloGene: 75155; GeneCards: C6orf47; OMA:C6orf47 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_021184

NM_033477

RefSeq (protein)

NP_067007

NP_258438

Location (UCSC)Chr 6: 31.66 – 31.66 MbChr 17: 35.35 – 35.35 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse

General Information

[edit]

In humans,Chromosome 6 open reading frame 47, C6ORF47, is a single exon gene that spans 2481 nucleotides that encodes for a 294 amino acid protein.[5] [6]

Location

[edit]

In humans, this gene is located on the minus strand at 6p21.33.[7]

Gene Expression

[edit]
NCBI GEO shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas. The Salivary Gland and Cerebellum showed high RNA expression levels.[8]

Tissue expression in human C6ORF47 was found to ubiquitously expressed throughout all tissues. C6ORF47 gene is also seen to be over-expressed in the colon, urinary bladder, ovary, and pancreas.[7] NCBI GEO Profiles shows that C6ORF47 RNA is expressed ubiquitously varying from low expression (in most tissues) to high expression in a couple of areas like the Salivary Gland and Cerebellum.[9]

Research by Pontus Boström et al. looked into C6ORF47 mRNA expression using microarray data from macrophages from 4 healthy donors. The goal of this study was to investigate whether or not hypoxia can influence the accumulation of lipids in macrophages. These results would help identify whether or not the macrophages loaded with lipids in the atherosclerotic lesions are there because of the hypoxic regions. Human macrophages were exposed to hypoxia for 24 hours and showed an increased formation of cytosolic lipid droplets and increased tri-glyceride accumulation. Results showed that the hypoxic regions in the atherosclerotic lesions could contribute to forming lipid-loaded macrophages and accumulating triglycerides.8 As we can see below, expression of C6ORF47 shows that expression is almost 6 times greater in the non-hypoxic region than in the hypoxic regions, showing that C6ORF47 is likely not contributing to either the lipid accumulation or an essential process since expression decreased. Once put under hypoxic conditions, only essential processes are left on likely hence when C6ORF47 expression decreased.[10]

Transcription Factors

[edit]

Below is a short list of transcription factors binding to the promoter region, contains 5' UTR and 500 nucleotides upstream. Bioline[11] software was utilized for the double-stranded DNA seqeunce. UCSC genome browers[12] was used for transcription factors and binding sites providing the information of the transcription factors that bind listed below in the table (click show button below).

Transcription Factor Generalized Function
KLF17, Krüppel-like factor 17 regulates gene expression, influencing cell differentiation and development.
PROX1, Prospero homeobox 1 Regulates lymphatic development, cell differentiation, and organogenesis processes.
WT1, Wilms' tumor 1 Regulates kidney development, cell growth, and tissue differentiation processes
GATA1, GATA binding protein 1 Controls red blood cell development and regulates hematopoiesis processes.
THRB, Thyroid hormone receptor beta Regulates thyroid hormone signaling, influencing metabolism and growth regulation.
ZNF454, Zinc Finger Protein 454 Regulates gene expression, potentially influencing cell differentiation and development.
SP9, Specificity Protein 9  Regulates cartilage development and skeletal patterning during embryogenesis.
EGR3, Early Growth Response 3 Regulates gene expression involved in neuronal activity and immune response.
SOX4, SRY-box transcription factor 4 Regulates cell fate, development, and differentiation in multiple tissues.
EBF1, Early B-cell Factor 1 Regulates B cell differentiation and immune system development.
ZNF669, Zinc Finger Protein 669 Regulates gene expression, potentially involved in development and differentiation.
KLF1 Krüppel-like factor 1 Regulates red blood cell development and hemoglobin expression.
STAT3, Signal Transducer and Activator of Transcription 3 Regulates immune response, cell survival, and inflammation processes.
ZIC3, Zinc Finger of the Cerebellum 3 Regulates brain and heart development, influencing neuronal patterning and function.
NHLH2, Nighthawk-like Protein 2 Regulates neural differentiation and development, influencing nervous system patterning.
ZNF454, Zinc Finger Protein 454 Involved in transcriptional regulation, potentially affecting gene expression and development.
EBF2, Early B-cell Factor 2 Regulates adipocyte differentiation and energy metabolism, influencing fat tissue development.
ZNF42, Zinc Finger Protein 42 Involved in regulating gene expression and cellular differentiation processes.
ERF::FIGLA, ETS2 Repressor Factor and Factor of Germline Alpha Transcription factor complex that regulates ovarian development and folliculogenesis.

Single-Nucleotide-Polymorphisms (SNPs)

[edit]
SNPs Position Base Change Amino Acid Change Mutation Type Significance Clinical Significance
Rs963273525 Amino Acid 1 TC MetVal Missense In start codon (CDS) N/A
Rs1800736098 Base pair 8 CA N/A Transversion mutation Conserved Transcription binding region (NHLH2) in 5’ UTR that is conserved between all orthologs tested N/A
Rs1296872402 Base pair 2425 TG N/A Transversion mutation PolyA signal (3’ UTR) that is conserved in all orthologs tested N/A

This table above illustrates 3 SNPs that occur within the CDS, 5' UTR, and 3' UTR. These SNPs were found using Variation Viewer[13] These SNPs were chosen due to location within C6ORF47 gene. Variation Viewer showed no pathogenic SNPs and only large deletions that include copious gene.

Protein

[edit]

Basic Information

[edit]
  • The encoded protein weighs 31,579 daltons (~31 kD).[6]
  • EMBL-EBI-SAPS[14] found the human C6ORF47 protein to have a isoelectric point of 5.95.
  • C6ORF47 protein was shown to be slightly more abundant than half of the proteins present in the human body.[15]

Family

[edit]

The C6ORF47 protein belongs to the family of proteins referred to as MHC proteins (Major histocompatibility complex) which is a band on the short arm of chromosome located at 6p21.3 that spans 3.6 megabases. [16]The generalized function of MHC molecules is to bind peptide fragments that are from pathogens and display them on the surface of the cell for recognition by T cells.[17] C6ORF47 protein is considered to be part of the MHC Class III protein.[18] MHC class III proteins are noted to be poorly defined structurally and functionally. It is noted that the MHC Class III genes contain cytokines and heat shock proteins within this region. It was recently found that genes encoded in the telomeric region on the MHC class III and appears to be involved in specific and global inflammatory responses.[19]

Primary

[edit]

Human C6ORF47 mRNA encodes for a 294 amino acid protein. SAPS also showed that the protein had shown enrichment of leucine, proline, and glycine in C6ORF47 protein compared to other human proteins.[14] It had also shown that a significantly lower amount of isoleucine as well as lower valine, tyrosine, threonine, phenylalanine, and asparagine than normal in the C6ORF47 protein when compared to other human proteins. Repeats of leucine residues spaced seven amino acids apart in the basic leucine zipper (as shown in blue text in the conceptual translation below on right) and was found to be conserved in mammalian orthologs of the C6ORF47 protein via Motif Scan. [20]

Conceptual Translation of human C6ORF47 gene.

Secondary

[edit]

PredictProtein[21] predicted that the secondary structure of the human C6ORF47 protein was 35.4% helix, 2.4% strand, and 62.2% loop.

Tertiary

[edit]
I-TASSER predicted tertiary structure for human C6ORF47 protein. [22]

PSORT II prediction tool[23] showed three transmembrane segments in amino acids 182-198, 222-238, and 246-262 of the human C6ORF47 protein.

It is also important to note that all of the mammalian orthologs presented show quite similar transmembrane regions (close in A.A sequence locations) besides the platypus (See table below for all Mammalian ortholgos used).

Due to other C6ORF47 orthologs mainly being much shorter than the mammalian sequences, the predicted cleavage site is usually slightly higher, while the transmembrane segments vary depending on the length of protein sequences. 1-2 transmembrane segments were found in reptiles, one of the two amphibians, and one fish ortholog, but it is by far still most popular to have 3 transmembrane segments in orthologs.

PSORT II[23] showed that the C6ORF47 protein is predicted to be localized in the endoplasmic reticulum (55.6%). DeepLoc[24] software further supports the idea that the C6ORF47 protein is localized to the endoplasmic reticulum, showing that there is about an 86.12% chance that it is localized there. It also supports the idea previous finding by PSORT II prediction and SOSUI about human C6ORF47 protein being a transmembrane protein (93.6% chance).

Post-Translational Modifications

[edit]

Phosphorylation sites were experimentally proven on amino acids 34, 35, 71, and 90 in the human C6ORF47 protein via NCBI.[6] Sites 34 and 35 are predicted to be phosphorylated by Casein Kinase II.[20]

Endoplasmic Reticulum (ER) signals ensure the protein remains in the endoplasmic reticulum, aiding proper folding, quality control, and trafficking.

Sumoylation attaches SUMO proteins to targets, regulating nuclear transport, transcription, DNA repair, and protein stability. Sumolyation was found at amino acids 75, 114, and 147.[25]

O-linked β-N-acetylglucosamine modifies serine/threonine residues, regulating signaling, transcription, and protein-protein interactions dynamically and was found to be at amino acid 60.[26]

Human C6ORF47 protein with annotated domains, transmembrane regions, and post-translational modifications. P=experimentally proven phosphorylation sites, Pre. P=predicted phosphorylation sites that showed a likelihood of above 0.5, Pre. S= predicted sumoylation sites, TM= transmembrane segments, and Pre-O-GlcNAc= predicted O-linked β-N-acetylglucosamine.[27]

Interactions

[edit]

FGFR3: An interaction of C6ORF47 and FGFR3 was found via a two-hybrid assay with an average detection confidence of medium. This was found via a BioGRID interaction database that was found in August 2022 during a large-scale dataset being scored individually and all other interactions globally.[7][28]

Fibroblast growth factor receptor 3, FGFR3, is part of the fibroblast growth factor receptor family that shares similar structure and functions. FGFR3 is known to span the membrane with one end remaining within the membrane while the other end projects to the outer surface of the cell.[29] Fibroblast growth factor receptor 3 is known to play an important role in cartilage development in the growth plate. FGFR3, commonly known as fibroblast growth factor receptor 3, is a tyrosine-protein kinase that acts on the cell-surface receptor for fibroblast growth factors and plays an essential role in cell proliferation, angiogenesis, differentiation, and apoptosis.[30] FGFR3 is known to interact with growth factors outside the cell and receive signals that regulate growth and development within the cell. [29]

Homology

[edit]

Orthologs

[edit]

C6ORF47 gene is estimated to have first appeared approximately 563 million years ago (MYA) in lampreys. C6ORF47 was found in ray-fined fish (actinopterygii), cartilaginous fish, lampreys, and lobe-finned fish (sacropterygii), but no hagfish suggesting that possibly this gene was inserted into lampreys. C6ORF47 is conserved to vertebrates with no traces of it being present before vertebrates as seen by its oldest ancestor lampreys (563 MYA). The C6ORF47 gene evolved quite rapid since it was shown to evolve slightly slower than Fibrinogen Alpha and it much faster than Cytochrome C. Orthologs used for this diagram included Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Danio rerio, Seven-gill Sharpnose Shark, and Sea Lamprey) (See Time-Calibrated comparative date of divergence diagram located to the down to the right).

Time-Calibrated comparative date of divergence diagram comparing the evolution of C6ORF47, Cytochrome C, and Fibrinogen Alpha. Orthologs used for this table are as listed: Human, House Mouse, African Bush Elephant, Koala , Painted Turtle, Eastern Brown Snake, Iberian ribbed newt, West African Lungfish, Zebrafish, Seven-gill Sharpnose Shark, and Sea Lamprey.


Global Alignments with Human C6ORF47 protein with the seven-gill sharpnose shark C6ORF47 protein showed two noticeable large gaps found from human C6ORF47 protein in amino acids 44-62 and 153-173 . These gaps were present in all descendants of vertebrates until rodents and rabbits. The second global alignment with the human C6ORF47 protein and pacific pocket mouse (rodent) C6ORF47 protein shows that these gaps are no longer present indicating a possible insertions of these gaps in the protein in mammals. It is important to note that the pacific pocket mouse C6ORF47 protein was one of the least related sequences within the rodents from the orthologs table and still showed these 2 large gaps being no longer being present when aligned with the human C6ORF47 protein sequence.[31]






Ortholog Table for C6ORF47 Protein

[edit]
C6ORF47 Genus and Species Common Name Taxonomic Order Date of Divergence (MYA) Acession #[32] Sequence (aa) Identity(%) Similariity (%) Gaps (%)
Mammals Homo sapiens Humans Primates 0 NP_067007 294 100 100 0
Perognathus longimembris pacificus Pacific Pocket Mouse Rodentia 87 XP_048204128 293 79.3 84 0.3
Mus musculus House Mouse Rodentia 87 NP_258438 293 75.9 81.3 0.3
Loxodonta africana African Bush Elephant Proboscideans 99 XP_003422325 297 74.5 81.5 1.7
Phascolarctos cinereus Koala Diprotodontia 160 XP_020829739 302 54 63.8 10.8
Vombatus ursinus Common Wombat Diprotodontia 160 XP_027732497 300 55 66.1 6.5
Ornithorhynchus anatinus Platypus Monotremata 180 XP_028911230 241 42.3 50.2 31.2
Reptile Chrysemys picta bellii Painted Turtle Testudines 319 XP_005289373 199 23.4 29.4 52
Terrapene triunguis Three-toed Box Turtle Testudines 319 XP_024079724 174 20.4 25.1 59.9
Anolis sagrei Brown Anole Squamata 319 XP_060615449 217 25.9 34.5 36.7
Pseudonaja textilis Eastern Brown Snake Squamata 319 XP_026575869 212 27.4 36.6 38.9
Amphibians Xenopus laevis African Clawed Frog Anura 352 XP_018088740 224 21.4 30.3 39.6
Pleurodeles waltl Iberian ribbed newt Urodela 352 KAJ1134448 268 27.1 32.9 36.2
Fish Protopterus annectens West African Lungfish Lepidosireniformes 408 XP_043939206 289 27.7 39.2 24.4
Misgurnus anguillicaudatus Pond Loach Cypriniformes 429 XP_055075080 302 25.4 39 27.7
Cirrhinus molitorella Mud Carp Cypriniformes 429 KAK2887169 311 24 38 23.1
Danio rerio Zebra fish Cypriniformes 429 NP_001410332 315 22.5 35.7 28.9
Carcharodon carcharias Great White Shark Lamniformes 462 XP_041069364 250 22.5 30.4 40.9
Heptranchias perlo Seven-gill Sharpnose Shark Hexanchiformes 462 XP_067830079 249 25.5 37.6 27.1
Lethenteron reissneri Far Eastern Brook Lamprey Petromyzontiformes 563 XP_061406601 217 22.1 29.5 36.2
Petromyzon marinus Sea Lamprey Petromyzontiformes 563 XP_032814877 215 22.7 30.4 35.3

The Table above illustrates 20 orthologs of C6ORF47 protein. This table shows a couple orthologs from each major class of class of vertebrates except Aves (Agnatha, Chondrichthyes, Osteichthyes, Amphibia, Reptilia, Mammalia). This is because the C6ORF47 gene is conserved in vertebrates. The identity, similarity, and gaps are referring to each of the orthologs protein amino acid contents being compared to the human C6ORF47 protein.

The C6ORF47 ortholog phylogenetic tree is limited to vertebrates because it is only conserved back to vertebrates.
Abbreviation (From MYA Youngest to Oldest) Common Name
Hsa Humans
Mum House Mouse
Phc Koala
Ora Platypus
Heb (319 MYA) Bynoes Gecko
Ans Brown Anole
Pst Eastern Brown Snake
Xel African Clawed Frog
Plw Iberian ribbed newt
Pra West African Lungfish
Mia Pond Loach
Cim Mud Carp
Dar Zebra fish
Cac West African Lungfish
Hst Pond Loach
Ebl Far Eastern Brook Lamprey
Sel Sea Lamprey

Paralogs

[edit]

No paralogs were found for the human C6ORF47 gene in humans.[7][33]

Conserved Regions

[edit]

The promoter region was found to have many stretched of nucleotides that were conserved across mammalian orthotlogs like transcriptional bindings sites of at least one SP9 spot (just upstream to 5' UTR), NHLH2 and ERF:FIGLA (just just after the start of transcription), ZNF454 (shortly after previous mentioned transcription factor; ~20 nucleotides downstream), EBF1 and EBF2 (~330 basepairs downstream of transcriptional start), NR5A2, ZNF423, STAT3 (all found ~120 basepairs downstream of previous transcription factor mentioned), and ZND42 (found overlaying the start of the coding sequence).

Multiple sequence alignments with C6ORF47 orthologs showed that there were many amino acids on the C-terminal side of the protein that are conserved while there is much less conservation in the N-terminal side. This is likely due to the protein containing a large disordered region on the N-terminal side.

The 3' UTR was found to have 9 conserved areas in it. Listed below in the table is all conserved ares that were found for C6ORF47

miRNA Position in the UTR seed match
Conserved sites in The 3' UTR
hsa-miR-125b-5p 85-92 8mer
hsa-miR-4319 85-92 8mer
hsa-miR-125a-5p 85-92 8mer
hsa-miR-138-5p 204-210 7mer-m8
hsa-miR-24-3p 438-445 8mer
hsa-miR-137 677-684 8mer
hsa-miR-325-3p 679-685 7mer-1A
hsa-miR-140-5p 714-720 7mer-1A
hsa-miR-142-3p.1 716-722 7mer-1A

References

[edit]
  1. ^ a b c ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531 GRCh38: Ensembl release 89: ENSG00000203623, ENSG00000235360, ENSG00000228177, ENSG00000228435, ENSG00000204439, ENSG00000226103, ENSG00000226531Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000043311Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ "Homo sapiens chromosome 6 open reading frame 47 (C6orf47), mRNA". NCBI. 2024-04-04.
  6. ^ a b c "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.
  7. ^ a b c d "C6orf47 Gene - Chromosome 6 Open Reading Frame 47". Gene Card The Human Gene Database. Weizmann Institute of Science. Retrieved 26 September 2024.
  8. ^ "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-15.
  9. ^ "GDS1096 / 204968_at". www.ncbi.nlm.nih.gov. Retrieved 2024-12-05.
  10. ^ Boström, Pontus; Magnusson, Björn; Svensson, Per-Arne; Wiklund, Olov; Borén, Jan; Carlsson, Lena M. S.; Ståhlman, Marcus; Olofsson, Sven-Olof; Hultén, Lillemor Mattsson (August 2006). "Hypoxia converts human macrophages into triglyceride-loaded foam cells". Arteriosclerosis, Thrombosis, and Vascular Biology. 26 (8): 1871–1876. doi:10.1161/01.ATV.0000229665.78997.0b. ISSN 1524-4636. PMID 16741148.
  11. ^ "Six-Frame Translation". www.bioline.com. Retrieved 2024-12-05.{{cite web}}: CS1 maint: url-status (link)
  12. ^ "UCSC Genome Browser Home". genome.ucsc.edu. Retrieved 2024-12-05.
  13. ^ "Variation Viewer". www.ncbi.nlm.nih.gov. Retrieved 2024-12-13.
  14. ^ a b "SAPS". www.ebi.ac.uk. Retrieved 2024-12-05.
  15. ^ "PaxDb: Protein Abundance Database". pax-db.org. Retrieved 2024-12-14.
  16. ^ Mungall, A. J.; Palmer, S. A.; Sims, S. K.; Edwards, C. A.; Ashurst, J. L.; Wilming, L.; Jones, M. C.; Horton, R.; Hunt, S. E.; Scott, C. E.; Gilbert, J. G. R.; Clamp, M. E.; Bethel, G.; Milne, S.; Ainscough, R. (October 2003). "The DNA sequence and analysis of human chromosome 6". Nature. 425 (6960): 805–811. doi:10.1038/nature02055. ISSN 1476-4687.
  17. ^ Charles A Janeway, Jr; Travers, Paul; Walport, Mark; Shlomchik, Mark J. (2001), "The major histocompatibility complex and its functions", Immunobiology: The Immune System in Health and Disease. 5th edition, Garland Science, retrieved 2024-10-16
  18. ^ Lehner, Ben; Semple, Jennifer I; Brown, Stephanie E; Counsell, Damian; Campbell, R. Duncan; Sanderson, Christopher M (2004-01-01). "Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region". Genomics. 83 (1): 153–167. doi:10.1016/S0888-7543(03)00235-0. ISSN 0888-7543.
  19. ^ Gruen, J R; Weissman, S M (2001-08-01). "Human MHC class III and IV genes and disease associations". Frontiers in bioscience. 6: D960–72. doi:10.2741/gruen. ISSN 1093-9946. PMID 11487469.
  20. ^ a b "Motif Scan". myhits.sib.swiss. Retrieved 2024-12-05.
  21. ^ "PredictProtein - Protein Sequence Analysis, Prediction of Structural and Functional Features". predictprotein.org. Retrieved 2024-12-05.
  22. ^ "I-TASSER results". seq2fun.dcmb.med.umich.edu. Retrieved 2024-12-13.
  23. ^ a b "PSORT II Prediction". psort.hgc.jp. Retrieved 2024-12-05.
  24. ^ "DeepLoc 2.1 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-05.
  25. ^ "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interacting Motifs". sumo.biocuckoo.cn. Retrieved 2024-12-14.
  26. ^ "YinOYang 1.2 - DTU Health Tech - Bioinformatic Services". services.healthtech.dtu.dk. Retrieved 2024-12-14.
  27. ^ "IBS 2.0: Illustrator for Biological Sequences". ibs.renlab.org. Retrieved 2024-12-04.
  28. ^ "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-05.
  29. ^ a b "FGFR3 gene: MedlinePlus Genetics". medlineplus.gov. Retrieved 2024-12-13.
  30. ^ "STRING: functional protein association networks". string-db.org. Retrieved 2024-12-10.
  31. ^ "Emboss Needle". www.ebi.ac.uk. Retrieved 2024-12-15.
  32. ^ "uncharacterized protein C6orf47 [Homo sapiens]". NCBI Protein. NCBI. Retrieved 26 September 2024.{{cite web}}: CS1 maint: url-status (link)
  33. ^ "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2024-10-16.