Sequence Baby: Prenatal & Neonatal Genetic Analysis Resource: 2010

Tuesday, November 30, 2010

The role of Oxytocin in Childhood Memories

One interesting article at Science Daily today. It is not directly related to prenatal diagnostics, but will be interesting to anyone in the field.

http://www.sciencedaily.com/releases/2010/11/101129152433.htm?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+sciencedaily+%28ScienceDaily%3A+Latest+Science+News%29&utm_content=Google+Feedfetcher

Researchers have found that the naturally-occurring hormone and neurotransmitter oxytocin intensifies men's memories of their mother's affections during childhood. The study was published November 29 in Proceedings of the National Academy of Sciences.

Researchers at the Seaver Autism Center for Research and Treatment at Mount Sinai School of Medicine wanted to determine whether oxytocin, a hormone and neurotransmitter that is known to regulate attachment and social memory in animals, is also involved in human attachment memories. They conducted a randomized, double-blind, placebo-controlled, cross-over trial, giving 31 healthy adult men oxytocin or a placebo delivered nasally on two occasions. Prior to administering the drug/placebo, the researchers measured the men's attachment style. About 90 minutes after administering the oxytocin or the placebo the researchers assessed participants' recollection of their mother's care and closeness in childhood.

Monday, October 11, 2010

Lana & Dave Aprey's Better Baby Book is coming to stores near you!

Dave Asprey and his wife Lana, The Karolinska Institute-trained MD, launched the BetterBabyBook.com website to promote their book on prenatal, perinatal and neonatal baby nutrition, excersise and education to nurture better, smarter and autism-free kids. They also launched the betterbabyblog.com.
I am waiting eagerly for this book to come out.

Monday, July 12, 2010

Daily Telegraph & Foxnews articles on cost-effective DS diagnostics

Interesting article in the Telegraph UK recently on DS diagnostics on June 30 using fetal DNA. They believe that the test would cost as low as 30 GBP... Unlikely, but...
Foxnews publishes a follow up story based on the Telegraph article. Now they believe the test will cost $36 and start talking about pro-life concerns....

Original Daily Telegraph article (http://www.telegraph.co.uk/health/healthnews/7862624/Blood-test-for-Downs-syndrome.html):

Researchers hope it will provide a better alternative to invasive tests which give an accurate result, but raise the risk of the mother suffering a miscarriage.

They hope to have the test available within four years and have suggested it may eventually cost as little as £30 per patient.

The new test works by extracting the DNA of the foetus from the mother's blood and screening it for Down's syndrome and other abnormalities.

At present, pregnant women are given the odds on whether they are carrying a child with Down's syndrome, and if they want to know for certain they have to undergo one of two invasive processes; either amniocentesis or chorionic villus sampling. The first involves taking a sample of fluid from around the foetus and can, in some cases, cause a miscarriage even if the woman is carrying a healthy foetus. The second requires taking a fragment of the placenta.

The new test involves the same equipment needed for amniocentesis testing, but uses blood instead of amniotic fluid and is not invasive.

So far, researchers have been able to prove the technique works in principle and have described the results as “promising”. They hope to use the same method to detect other abnormalities in an unborn child’s DNA such as Edwards’ syndrome, which causes structural malformations in the foetus, and Patau’s syndrome, which can result in severe physical and mental impairment and is often fatal.

It could also be used to screen for muscular dystrophy and haemophilia.

Research on the new test began in 2009 and is ongoing. To date it has involved 21 women who have had pregnancy terminations or pre-natal diagnosis and screening procedures. Dr Suzanna Frints, of Maastricht University Medical Centre in the Netherlands, who carried out the research, said she hoped all women in the world would eventually be offered the test.

She said the next phase of development would need to involve more women to establish the accuracy of the test.

“Although we need to test and refine this technique further our results so far are promising,” she said.

“When we succeed in developing the procedure for use in maternal blood we will be able to offer a safe, cheap, fast, reliable and accurate non-invasive test, which will be of immediate benefit to pregnant women, young and old, all over the world.” Down’s syndrome is a genetic abnormality that affects around one in 1,000 babies born in the UK – about 750 babies a year – and is the most common cause of learning disability.

Prof Stephen Robson, spokesman for the Royal College of Obstetrics and Gynaecology, said there was an ''enormous research effort’’ going into finding the ''holy grail’’ of a non-invasive test for Down’s syndrome.

Ultrasound scans of the baby at 12 or 13 weeks and again at around 20 weeks will still be necessary to detect other abnormalities in the foetus as well as check age and growth.

Foxnews follow up (http://www.foxnews.com/printer_friendly_story/0,3566,595705,00.html):

A simple blood test may one day become a safer alternative for checking if an unborn baby has Down syndrome or other disorders, the Daily Telegraph reported.

The test, which takes a blood sample from a pregnant woman to examine the DNA of the fetus, would cost as little as $36, and could be available within four years, according to the report.

It would provide an inexpensive and much less invasive way to detect many genetic abnormalities in fetuses, but it also raises concerns among pro-life advocates who say it could result in more abortions.

“If it might more conclusively prevent false positives, it might have some benefit, but it will also likely lead to more abortions of children with disabilities,” Mailee Smith, staff counsel for Americans United for Life, told FoxNews.com.

The tests currently used to determine if an unborn child has Down syndrome are both quite invasive. One is an amniocentesis, where doctors extract amniotic fluid from around the fetus. The other is a procedure known as chorionic villus sampling, which involves the removal of a small piece of placenta tissue. Researchers hope the new test will become a safer alternative to the current procedures, which are highly accurate, but raise the mother’s risk of suffering a miscarriage.

Dr. Brian Skotko, a physician at the Children's Hospital Boston who is on the board of directors of the National Down Syndrome Society, told FoxNews.com that many doctors aren't adequately trained to counsel women on having children with Down syndrome, and worse, some who diagnose an expecting couple's child with Down syndrome encourage them to terminate the pregnancy.

"The age is swiftly coming where not all possible technologic advances may bring welcomed change. Parents who have children with Down syndrome have already found much richness in life with an extra chromosome," Skotko wrote in an article published in the BMJ in October 2009.

Dr. Suzanna Frints, of Maastricht University Medical Center in the Netherlands, began the ongoing research with her team in 2009, and claims that their technique is 80 percent reliable. Her team has proven their technique works by using the mother’s blood to identify the Y chromosome from the fetus.

Twenty-one women who have either had abortions or underwent amniocentesis, or other prenatal screening procedure, have participated in the research. But to establish the accuracy of the test, Frints said the next phase of development would need to involve more women.

Frints described the results as “promising,” and hopes that their technique will be able to screen for other abnormalities, like muscular dystrophy, hemophilia, Edwards syndrome and Patau syndrome.

“When we succeed in developing the procedure for use in maternal blood, we will be able to offer a safe, cheap, fast, reliable and accurate non-invasive test, which will be of immediate benefit to pregnant women, young and old, all over the world,” Frints said.

Professor Stephen Robson, spokesman for the Royal College of Obstetrics and Gynecology, considers a non-invasive test for detecting Down syndrome the “holy grail” and said there was an ''enormous research effort’’ behind it.

Down syndrome is a genetic abnormality that affects around 1 in 800 babies born in the U.S., and is the most common genetic cause of severe learning disability.

Wednesday, May 12, 2010

Rapid Prenatal Test for Alpha-Thalassemia

Interesting article in Science Daily today on alpha-thalassemia screening. We will read the actual article and report on what assay methods they are using.

ScienceDaily (May 11, 2010) — Researchers from Mahidol University in Thailand have developed a rapid, high-throughput screening method for prevention and control of the blood disease thalassemia.

Their report appears in the May 2010 issue of The Journal of Molecular Diagnostics.

α-Thalassemia is a blood disease caused by a genetic defect in the production of a component of hemoglobin. This disease is more prevalent in areas that either were previously or are currently endemic for malaria, including the Mediterranean and South Asia. Carriers of mutations in α-thalassemia may have some degree of protection against malaria, but children of parents who both carry the mutation α-thalassemia-1 may develop Hb Bart's hydrops fetalis, which results in fetal death in utero or soon after birth.

Prenatal screening and genetic counseling are essential for prevention and control of α-thalassemia. The current diagnostic assay is both labor-intensive and time-consuming. Therefore, researchers led by Dr. Saovaros Svasti of Mahidol University developed a novel, rapid, and reliable assay for the diagnosis of α-thalassemias. This assay has high sensitivity and specificity, rapid turnaround time, and a decreased risk of contamination between samples.

Munkongdee et al suggest that this technique will "allow [for] high throughput screening suitable for prevention and control of thalassemia in the Southeast Asia population."

This study was supported in part by Vejdusit Foundation.
Email or share this story:

Story Source:

Adapted from materials provided by American Journal of Pathology, via EurekAlert!, a service of AAAS.

Journal Reference:

1. Munkongdee T, Vattanaviboon P, Thummarati P, Sewamart P, Winichagoon P, Fucharoen S, Svasti S. Rapid diagnosis of α-thalassemia by melting curve analysis. Journal of Molecular Diagnostics, 2010; 12: 354-358 DOI: 10.2353/jmoldx.2010.090136

Sunday, May 9, 2010

Diseases that may be resulting from competition between the mother and the baby

A very interesting article in Science Daily today titled: "Gender Specific Disease Risks Start in the Womb":

Pregnancy places competing demands on a mother's physiology: Her body wants to produce a strong healthy baby but not at the expense of her own health. Some of the genes that she passes on to her child therefore try to protect her own body from excessive demands from her child. These so-called "imprinted genes" inherited from the father however do not show the same restraint -- their goal is to get as many resources for the fetus as possible.

Evidence that this battle of the imprinted genes might be at the root of later life disease processes will be presented at the International Conference The Power of Programming in Munich on 6 to 8 May, organized by the EC-funded Early Nutrition Programming Project (EARNEST).

"The imprinted genes derived from the father are greedy whilst those from the mother are conservative in their needs to ensure future reproductive success," said Dr. Miguel Constancia from the University of Cambridge, England. "We have found evidence that imprinted genes play important roles in the control of endocrine functions of the placenta. These placental adaptations have marked effects on nutrient delivery to the fetus, resulting in the programming of homeostatic mechanisms with metabolic consequences extending to adulthood, for example for type 2 diabetes susceptibility."

There is evidence that some programming effects are different in male and female offspring. Dr. Rachel Dakin from the University of Edinburgh, Scotland, shows how maternal obesity is associated with sex-specific programming effects in young adult mice. Female offspring of obese mothers had raised blood insulin levels, whilst male offspring did not. Male offspring did have alterations in the expression of liver genes important in lipid and glucocorticoid metabolism.

Professor Claudine Junien from the Institut National de Recherche Agronomique (INRA) in France says: "For me a gene, a cell and even a sex does not think and has no intelligent design. Instead it reacts to diverse environments and situations according to what its build-up can afford, pushing in one direction or another (or several at a time). The limits to which it can go without going awry or dying have been established progressively throughout the slow and long process of evolution, with different genetic backgrounds throughout the world depending on the diversity of experiences over the ages. We have data showing that gene expression and DNA methylation are sexually dimorphic in male and female placentae under normal/control conditions. Surprisingly, in stressful conditions, such as a high fat diet or low calorie diet, or maternal overweight/obesity -- the male and female placentae do not use the same strategies: they use different gene pathways and networks to cope with the stress. Does this directly lead to different outcomes? It may lead to sex-dependent differences in the outcome of programming with long lasting effects. Alternatively, it may be that metaphorically speaking males climb the mountain taking the north face while females take the south face -- but they ultimately reach the same peak after using these different paths."

Professor Ricardo Closa Monasterolo from the University Rovira I Virgili of Tarragona, Italy, presents work that suggested that infant boys and girls might have different responses to lower or higher protein diets. Females given higher protein formula milk had higher IGF-1 levels than males, whilst males showed higher C-peptide/creatinine levels compared to females. The significance of lower or higher protein diets has also been examined in the EU Childhood Obesity Project (CHOP) co-ordinated by Professor Berthold Koletzko of Ludwig-Maximilians-Universität (LMU) in Munich. Starting in 1990, over 1,000 infants were followed.

The first results show that, after 2 years, the infants fed a formula milk with a lower protein content -- closer to the composition of breast milk -- weighed significantly less than those on higher protein formula, with their weights being more similar to those of breast fed infants. These differences emerged by 6 months of age and persisted, even after the intervention ceased and the children went onto similar diets. The researchers predict that these low protein induced differences in early growth would reduce obesity at 14 to 16 years of age by 13%.

Koletzko, who is also the Co-ordinator of the EARNEST project said, "This is a new and exciting area of research which suggests that some of the differences in disease risk seen in men and women in later life might be explained by different responses to programming effects in early life."

Source: http://www.sciencedaily.com/releases/2010/05/100506205427.htm?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+sciencedaily+%28ScienceDaily%3A+Latest+Science+News%29&utm_content=Google+Feedfetcher

Happy Mother's Day!

On a note remotely related to sequencing and prenatal diagnostics, we would like to wish all of the mothers and soon-to-be mothers a very Happy Mother's Day!

Unfortunately, many countries disagree on the calendar date for this celebration, which goes goes back many centuries and can be traced to ancient Greece, and have conveniently correlated it with their respective religious holidays.

The following countries celebrate Mother's Day on the second Tuesday of May:

* Australia

* Austria

* Bahamas

* Barbados

* Bangladesh

* Belgium

* Bermuda

* Brazil

* Canada

* Chile

* China

* Colombia

* Cuba

* Croatia

* Czech Republic

* Denmark

* Ecuador

* Finland

* Germany

* Greece

* Hong Kong

* Iceland

* Italy

* India

* Ireland

* Jamaica

* Japan

* Latvia

* Malta

* Mexico

* Netherlands

* New Zealand

* Peru

* Philippines

* Puerto Rico

* Singapore

* South Africa

* Switzerland

* Taiwan

* Turkey

* Uruguay

* UK

* USA

* Zimbabwe

Thursday, May 6, 2010

SEQUENOM TO LAUNCH SEQUENCING-BASED TEST FOR DOWN SYNDROME (T21) SOON!

Big news today:

Sequenom (Nasdaq: SQNM) is planning to launch the Non-Invasive Down Syndrome test as an LDT (and likely FDA-regulated test) at the end of 2011.

So the 35+ women will not need to stay on birth control for much longer!

Let's hope that some socially-responsible investors will help finance the company soon so it starts working on more tests.

Here is the press release from Sequenom.Com website:

Sequenom Reports First Quarter 2010 Financial Results
-Company Unveils Trisomy21 Test Development & Launch Timeline-
-First Quarter Revenues Grow 22% Year-Over-Year to $10.6 Million- SAN DIEGO, May 6, 2010 /PRNewswire via COMTEX/ --Sequenom, Inc. (Nasdaq: SQNM) today reported its financial results for the first quarter ended March 31, 2010.
(Logo: http://www.newscom.com/cgi-bin/prnh/20040415/SQNMLOGO)
First Quarter Results

Total revenue for the first quarter of 2010 grew 22% to $10.6 million, compared with $8.7 million for the first quarter of 2009. The increase in revenue was due to higher systems and consumables sales over the same period last year.
Net loss for the first quarter of 2010 was $16.9 million, or $0.27 per share, compared with $17.5 million, or $0.29 per share, for the first quarter of 2009.
Net cash used in operating activities was $13.4 million for the first quarter of 2010.

Gross margin in the first quarter of 2010 was 50.5% compared with 60.6% for the first quarter of 2009, reflecting increased costs associated with the start-up of the diagnostics business and changes in the mix of products sold in the genetic analysis business. The overall gross margin included a 61% margin generated by the genetic analysis segment which was offset by the negative margin generated from the company's molecular diagnostics segment.
Research and development (R&D) expenses were $11.2 million for the first quarter of 2010, compared with $8.8 million for the same period in the prior year. The increase was primarily related to clinical sample acquisition costs associated with the company's Trisomy 21 (T21) program and a licensing payment for certain intellectual property rights for age-related macular degeneration (AMD) related genetic variants.
Selling, general and administration expenses of $11.1 million for the first quarter of 2010 decreased from $14.3 million for the first quarter of 2009. The decrease was primarily due to decreased legal fees associated with litigation and lower share based compensation expense.
Total costs and expenses for the first quarter of 2010 were $27.5 million, compared with $26.5 million for the comparable quarter in 2009. For the three months ended March 31, 2010 and 2009, the company recorded $2.4 million and $3.0 million, respectively, of stock-based compensation expense.
Cash, Cash Equivalents and Available for Sale Securities
As of March 31, 2010 Sequenom had total cash and short- and long-term marketable securities of $29.2 million and $8.6 million in accounts receivable.
"The Sequenom team is optimistic about the future opportunities that lie ahead for our genetic analysis and molecular diagnostics businesses." stated Harry F. Hixson, Jr., Ph.D., chairman and chief executive officer. "We are pleased to announce timelines for the development and clinical testing of our T21 test, which we expect will be of interest to physicians, patients and investors. Successfully meeting our T21 test development milestones, advancing our AMD test development program, and seeking partnering opportunities for some of our unfunded projects will be a major focus for Sequenom during 2010."
Paul V. Maier, interim chief financial officer, stated, "Overall first quarter financial results met our expectations. Our achievement of more than $10 million of revenue in a quarter that historically has lower revenues is a good indicator that we can deliver growth in 2010. As a result, we believe that orders for the MassARRAY system and related consumables will provide a concrete foundation for the company as we continue to develop our molecular diagnostics capabilities."
Recent and Upcoming Business Highlights
T21 (Down syndrome) Update - The company remains committed to the development, validation and launch of a noninvasive T21 test. Following extensive scientific experimentation, the company has decided to proceed with a purely DNA based method for the detection of the T21 aneuploidy using massively parallel sequencing. Taken together the R&D and clinical sample collection costs required for a key T21 test validation study represent the single largest investment the company will make in 2010.
The company has established a number of program milestones to measure progress in the development of its noninvasive T21 test. Each milestone will be dependent upon the successful completion of preceding milestones:

The company anticipates optimization of a DNA sequencing-based test to be completed by the end of the third quarter of 2010.
By the end of 2010 our third party sample collection sites expect to have collected a sufficient number of blood samples from high risk pregnancies in order to provide the requisite number of T21 and euploid samples that will enable our planned blinded clinical studies. These blinded studies represent the pivotal validation studies to support launch of a noninvasive T21 test.
The company anticipates that Sequenom CMM, its CLIA laboratory, will start accessioning and testing these T21 and euploid samples during the fourth quarter of 2010.
To support the launch of a laboratory developed test (LDT) by the end of the fourth quarter of 2011, the company plans to complete testing of the validation samples during the second quarter of 2011.
Data analysis for the blinded validation studies, manuscript preparation, journal submission by our academic clinical partners and peer-review are expected to be completed by the end of 2011.
Following acceptance by a peer-reviewed journal the company plans to launch its T21 test as a LDT before the end of 2011.
The company plans to complete the appropriate studies and documentation necessary to file for a premarket approval (PMA) for the T21 test by the end of 2012.

Expanding Diagnostic Opportunities - In order to meet anticipated demand for the LDTs the company is developing, Sequenom plans to open a second CLIA-certified laboratory in its San Diego facility. It is anticipated that this lab will be operational early in the fourth quarter of 2010.
In February 2010 the company entered into a license agreement with Optherion, under which Sequenomwas granted an exclusive, worldwide, royalty-bearing license to know-how and a consolidated portfolio of issued and pending patent rights relating to age-related macular degeneration diagnostics. This portfolio had been assembled by Optherion from a number of prominent academic institutions. The licensed patent portfolio includes 17 issued or allowed United States and foreign patents, and 68 pending United States and foreign patent applications. The license agreement covers extensive intellectual property rights for significant AMD related genetic variants. The company expects to launch an AMD LDT during the first half of 2011.
Commercial Launch - On April 19, 2010, the company announced the availability of its next generation MassARRAY platform, the MassARRAY Analyzer 4. This new high performance nucleic acid analysis platform has been designed to meet customer demand for a bench top instrument with greater flexibility across multiple applications, improved reliability and faster performance. With the capability for quantitative gene expression analysis, epigenetic nucleic acid methylation analysis as well as high-throughput genotyping and SNP fine mapping applications, the MassARRAY Analyzer 4 is designed to empower the basic and translational research community to advance findings from basic genetic and biomarker studies toward clinical utility in diagnosis, prognosis and monitoring of diseases. The MassARRAY Analyzer 4 system will be initially offered for research-use-only and, subject to FDA clearance, will be released to CLIA certified laboratories for the generation and implementation of LDTs. For more information on the MassARRAY Analyzer 4 see http://www.massarrayanalyzer.com.
The company launched the SensiGene(TM) RHD Genotyping and the Fetal Sex Determination tests in February 2010. Both of these new tests detect and analyze circulating cell-free fetal (ccff) DNA. The SensiGene RHD Genotyping test examines multiple regions of the gene that is known to be the most common genetic basis of RhD negative phenotypes. In addition to quality control metrics to ensure accuracy both tests also utilize a fetal identifier control assay to verify the presence of fetal DNA, in particular for RhD negative, female fetuses.
Litigation Update - On May 3, 2010, the U.S. District Court for the Southern District of California entered an order approving the final approval of a stipulation of settlement reached in the class action securities lawsuits related to alleged violations of federal securities laws consolidated under the caption In re Sequenom Inc. Securities Litigation. Even though the settlement has received final approval from the court, the court's approval may be subject to appeal and will not become effective until the time for appeals has lapsed without any appeal. If the settlement becomes effective, Sequenom will issue approximately 6.8 million shares of its common stock in connection with the settlement.
On May 6, 2010, Sequenom and the individual defendants entered into a stipulation of settlement that will resolve all of the pending derivative actions. The stipulation of settlement remains subject to approval by the U.S. District Court and the Superior Court of California. Subject to final approval of the stipulation of settlement, in exchange for a release of all claims by the plaintiffs and a dismissal of the derivative actions, Sequenom has agreed (i) to adopt or continue certain corporate governance measures and (ii) to pay the plaintiffs' attorneys a total of $2.5 million. A significant portion of the attorney's fees is to be paid by its insurance carriers. Sequenom has the right to elect to issue up to 200,000 shares of its common stock to pay its portion of the attorneys' fees.
In October 2009, plaintiff Xenomics, Inc. (now known as TrovaGene) filed a complaint in the Supreme Court of the State of New York naming Sequenom as the defendant alleging that Sequenom had breached the license agreement entered into by the parties on October 29, 2008, which provides Sequenom with exclusively licensed patent rights for the use of fetal nucleic acids obtained from maternal urine, and that the plaintiff has suffered damages as a result. In December 2009, Sequenom removed the case to the U.S. District Court for the Southern District of New York. On May 4, 2010, the district court granted the Sequenom's motion to dismiss the action because the license agreement specifically provides that if TrovaGene seeks to resolve a dispute arising under the agreement, it must do so by commencing an arbitration in San Diego. TrovaGene has not notified Sequenom whether it would appeal the dismissal or commence arbitration proceedings in San Diego.

Monday, May 3, 2010

Pompe’s Disease & Harrison Ford's Extraordinary Measures Movie

If you are to watch just one movie this year (an that includes the movies from last year), don’t get hooked on Watmen, Avatar, Up in the Air or other great blockbusters. These movies are awesome, but you won’t come out of the movie with any value. One movie to watch this year is Harrison Ford’s “Extraordinary Measures”.

Most critics give this movie low ratings. Just two or three stars out of five and discuss Ford’s acting and playing on public sentiment. But what the critics don’t see are two things:

1. The movie shows how bad the Pompe disease is and how important prenatal diagnostics are in our lives.

2. It shows an example of a successful biotech startup and technology progression from academia into the biotech industry, then into biopharma and then into the clinic. The movie is based on a real story of John Crawley, a father of two Pompe disease children, who started up a biotech company focusing on developing the enzyme replacement therapy, which was later acquired by Genzyme, which developd a working treatment.

Regardless how bad the acting is, this movie brings value.

Pompe disease occurs in about 1 in 40,000 births. But since most patients don’t make it through the first year after birth and the treatment costs $300,000 per year for the rest of the patient’s life, there are only about 5,000 to 10,000 suffers of this disease.

It is extremely important to do genetic testing before conception and at the pre-implantation or prenatal level.

When both parents are carriers of the defective alpha-glucosidase gene, there is one in four chance of giving birth to the Pompe disease child and two in four chance of giving birth to the carrier of the defective gene. It is possible to avoid this by genetic analysis of both parents before conception.

The image to the left is taken from the Tribune website.

There should be more movies like “Extraordinary Measures”. Movie companies are too focused on creating war movies, dirty humor comedies and bloody vampire fairitales and ignore skripts that can positively influence the younger generations. There should be more movies made on genetic analysis and prenatal diagnostics.

One such skript can cover the story of Sequenom (Nasdaq: SQNM) and the colorful biography of its founder, Dr. Charles Cantor. I am sure that in a few years this company will also launch the Pompe disease test. Currently they are focusing on Down Syndrome.

Saturday, May 1, 2010

Blood-sucking animals can exchange genetic material with mammals

Interesting article in Science Daily today (based on a very credible publication in Nature). It looks like the blood-sucking animals can exchange genetic material with their hosts. Scary stuff.
If you think about it, anything that integrates into your genome (including your own transposons) can alter some of the cancer-fighting genes. So ultimately, these mosquito bites May lead to cancer.

Well, one thing is clear. You should avoid mosquito bites during pregnancy just to be on the safe side.

Here is the article:
Scientists Uncover Transfer of Genetic Material Between Blood-Sucking Insect and Mammals
ScienceDaily (Apr. 30, 2010) — Researchers at The University of Texas at Arlington have found the first solid evidence of horizontal DNA transfer, the movement of genetic material among non-mating species, between parasitic invertebrates and some of their vertebrate hosts.

The findings are published in the April 28 issue of the journal Nature, one of the world's foremost scientific journals.
Genome biologist Cédric Feschotte and postdoctoral researchers Clément Gilbert and Sarah Schaack found evidence of horizontal transfer of transposon from a South American blood-sucking bug and a pond snail to their hosts. A transposon is a segment of DNA that can replicate itself and move around to different positions within the genome. Transposons can cause mutations, change the amount of DNA in the cell and dramatically influence the structure and function of the genomes where they reside.
"Since these bugs frequently feed on humans, it is conceivable that bugs and humans may have exchanged DNA through the mechanism we uncovered. Detecting recent transfers to humans would require examining people that have been exposed to the bugs for thousands of years, such as native South American populations," Feschotte said.
Data on the insect and the snail provide strong evidence for the previously hypothesized role of host-parasite interactions in facilitating horizontal transfer of genetic material. Additionally, the large amount of DNA generated by the horizontally transferred transposons supports the idea that the exchange of genetic material between hosts and parasites influences their genomic evolution.
"It's not a smoking gun, but it is as close to it as you can get," Feschotte said
The infected blood-sucking triatomine, causes Chagas disease by passing trypanosomes (parasitic protozoa) to its host. Researchers found the bug shared transposon DNA with some hosts, namely the opossum and the squirrel monkey. The transposons found in the insect are 98 percent identical to those of its mammal hosts.
The researchers also identified members of what Feschotte calls space invader transposons in the genome of Lymnaea stagnalis, a pond snail that acts as an intermediate host for trematode worms, a parasite to a wide range of mammals.
The long-held theory is that mammals obtain genes vertically, or handed down from parents to offspring. Bacteria receive their genes vertically and also horizontally, passed from one unrelated individual to another or even between different species. Such lateral gene transfers are frequent in bacteria and essential for rapid adaptation to environmental and physiological challenges, such as exposure to antibiotics.
Until recently, it was not known horizontal transfer could propel the evolution of complex multicellular organisms like mammals. In 2008, Feschotte and his colleagues published the first unequivocal evidence of horizontal DNA transfer.
Millions of years ago, tranposons jumped sideways into several mammalian species. The transposon integrated itself into the chromosomes of germ cells, ensuring it would be passed onto future generations. Thus, parts of those mammals' DNA did not descend from their common ancestors, but were acquired laterally from another species.
The actual means by which transposons can spread across widely diverse species has remained a mystery.
"When you are trying to understand something that occurred over thousands or millions of years ago, it is not possible to set up a laboratory experiment to replicate what happened in nature," Feschotte said.
Instead, the researchers made their discovery using computer programs designed to compare the distribution of mobile genetic elements among the 102 animals for which entire genome sequences are currently available. Paul J. Brindley of George Washington University Medical Center in Washington, D.C., contributed tissues and DNA used to confirm experimentally the computational predictions of Feschotte's team.
When the human genome was sequenced a decade ago, researchers found that nearly half of the human genome is derived from transposons, so this new knowledge has important ramifications for understanding the genetics of humans and other mammals.
Feschotte's research is representative of the cutting edge research that is propelling UT Arlington on its mission of becoming a nationally recognized research institution.
Email or share this story:
http://www.addthis.com/bookmark.php?pub=&v=250&source=tbx-250&tt=0&s=facebook&url=http%3A%2F%2Fwww.sciencedaily.com%2Freleases%2F2010%2F04%2F100430155856.htm%3Futm_source%3Dfeedburner%26utm_medium%3Dfeed%26utm_campaign%3DFeed%3A%2Bsciencedaily%2B%28ScienceDaily%3A%2BLatest%2BScience%2BNews%29%26utm_content%3DGoogle%2BFeedfetcher&title=Scientists uncover transfer of genetic material between blood-sucking insect and mammals&content=&lng=en| More

Story Source:
Adapted from materials provided by University of Texas at Arlington.

Journal Reference:
1. Clément Gilbert, Sarah Schaack, John K. Pace II, Paul J. Brindley, Cédric Feschotte. A role for host-parasite interactions in the horizontal transfer of transposons across phyla. Nature, 2010; 464 (7293): 1347 DOI: 10.1038/nature08939

New Recommendations for Down Syndrome: Screening Should Be Offered to All Pregnant Women (2007)

The ASOG recommendation is two years old, but is as current as it can be. Every pregnant woman should get tested for Down Syndrome

ACOG NEWS RELEASE

New Recommendations for Down Syndrome: Screening Should Be Offered to All Pregnant Women

Washington, DC -- All pregnant women, regardless of their age, should be offered screening for Down syndrome, according to a new Practice Bulletin issued today by The American College of Obstetricians and Gynecologists (ACOG). Previously, women were automatically offered genetic counseling and diagnostic testing for Down syndrome by amniocentesis or chorionic villus sampling (CVS) if they were 35 years and older.

The new ACOG guidelines recommend that all pregnant women consider less invasive screening options for assessing their risk for Down syndrome, a common disorder that is caused by an extra chromosome and can result in congenital heart defects and mental retardation. Screening for Down syndrome should occur before the 20th week of pregnancy.

"This new recommendation says that the maternal age of 35 should no longer be used by itself as a cut-off to determine who is offered screening versus who is offered invasive diagnostic testing," noted Deborah Driscoll, MD, a lead author of the document and vice chair of ACOG's Committee on Practice Bulletins-Obstetrics, which developed the Practice Bulletin with ACOG's Committee on Genetics and the Society for Maternal-Fetal Medicine.

ACOG also advises that all pregnant women, regardless of their age, should have the option of diagnostic testing. ACOG recognizes that a woman's decision to have an amniocentesis or CVS is based on many factors, such as a family or personal history of birth defects, the risk that the fetus will have a chromosome abnormality or an inherited condition, and the risk of pregnancy loss from an invasive procedure.

According to the new guidelines, the goal is to offer screening tests with high detection rates and low false positive rates that also provide patients with diagnostic testing options if the screening test indicates that the patient is at an increased risk for having a child with Down syndrome. Because of the number of multiple screening strategies currently available, the document provides ob-gyns with some suggested screening strategies that they can choose to offer in their practice to best meet the needs of their patients. The guidelines discuss the advantages and disadvantages of each screening test and some of the factors that determine which screening test should be offered, including gestational age at first prenatal visit, number of fetuses, previous obstetrical and family history, and availability of various screening tests.

The following ACOG recommendations are based on good and consistent scientific evidence:

* First-trimester screening using both nuchal translucency (NT), an ultrasound exam that measures the thickness at the back of the neck of the fetus, and a blood test is an effective screening test in the general population and is more effective than NT alone.
* Women found to be at increased risk of having a baby with Down syndrome with first-trimester screening should be offered genetic counseling and the option of CVS or mid-trimester amniocentesis.
* Specific training, standardization, use of appropriate ultrasound equipment, and ongoing quality assessment are important to achieve optimal NT measurement for Down syndrome risk assessment, and this procedure should be limited to centers and individuals meeting this criteria.
* Neural tube defect screening should be offered in the mid-trimester to women who elect only first-trimester screening for Down syndrome.

Practice Bulletin #77, "Screening for Fetal Chromosomal Abnormalities," is published in the January 2007 issue of Obstetrics & Gynecology.

Source: The American College of Obstetricians and Gynecologists is the national medical organization representing over 51,000 members who provide health care for women.
http://www.acog.org/from_home/publications/press_releases/nr01-02-07-1.cfm

Tuesday, April 20, 2010

Top prenatal diagnostics research projects financed by the National Institutes of Health

Here is a list of top projects related to prenatal screening and diagnostics financed by the National Institutes of Health (NIH) over the past 25 years.
Of course, the Human Genome Project and other sequencing/interpretation projects made major contributions to the field; however, it looks like the NIH is overlooking prenatal diagnostics as a research area.

Project Title	Organization	Year	Category	Total Funding
HIGH-RESOLUTION GENOME SCANS IN CONGENITAL HEART DISEASE	COLUMBIA UNIVERSITY HEALTH SCIENCES	2009	Cardiovascular; Clinical Research; Genetics; Heart Disease; Human Genome; Pediatric; Perinatal Period - Conditions Originating in Perinatal Period	$786,447
HIGH-RESOLUTION GENOME SCANS IN CONGENITAL HEART DISEASE	COLUMBIA UNIVERSITY HEALTH SCIENCES	2008	Biotechnology; Cardiovascular; Clinical Research; Genetics; Heart Disease; Human Genome; Pediatric; Perinatal Period - Conditions Originating in Perinatal Period	$749,823
FETO-MATERNAL DNA/RNA TRAFFICKING: BIOLOGY AND APPLICATION	TUFTS MEDICAL CENTER	2009	Clinical Research; Conditions affecting unborn children; Genetics; Pediatric; Perinatal Period - Conditions Originating in Perinatal Period	$342,125
FETO-MATERNAL DNA/RNA TRAFFICKING: BIOLOGY AND APPLICATION	TUFTS MEDICAL CENTER	2010	Clinical Research; Conditions affecting unborn children; Genetics; Pediatric; Perinatal Period - Conditions Originating in Perinatal Period	$338,704
A TOOL FOR ANALYSIS OF GENE-SPECIFIC DNA METHYLATION IN CLINICAL SAMPLES	NORTHWESTERN UNIVERSITY	2009	Breast Cancer; Cancer; Genetic Testing; Genetics	$305,000
PATHOGENESIS/TREATMENT-INHERITED CHOLESTEROL DEFICIENCY	CHILDREN'S HOSPITAL & RES CTR AT OAKLAND	2009	Brain Disorders; Genetics; Mental Retardation (Intellectual and Developmental Disabilities (IDD)); Neurosciences; Pediatric	$265,111
PATHOGENESIS/TREATMENT-INHERITED CHOLESTEROL DEFICIENCY	CHILDREN'S HOSPITAL & RES CTR AT OAKLAND	2008	Behavioral and Social Science; Brain Disorders; Genetics; Mental Retardation (Intellectual and Developmental Disabilities (IDD)); Neurosciences; Pediatric	$265,111
PATHOGENESIS/TREATMENT-INHERITED CHOLESTEROL DEFICIENCY	CHILDREN'S HOSPITAL & RES CTR AT OAKLAND	2010	Brain Disorders; Genetics; Mental Retardation (Intellectual and Developmental Disabilities (IDD)); Neurosciences; Pediatric	$262,460
BARCODED HYDROGEL MICROPARTICLES AND SCANNER FOR MULTIPLEXED BIOMOLECULE ASSAYS	MASSACHUSETTS INSTITUTE OF TECHNOLOGY	2009	Bioengineering; Biotechnology	$225,634
HIGH THROUGHPUT RARE CELL ISOLATION SYSTEM	FLUXION BIOSCIENCES, INC.	2009	Bioengineering; Biotechnology; Cancer	$203,579
WASHINGTON OBSTETRIC-FETAL PHARMACOLOGY RESEARCH UNIT	GEORGETOWN UNIVERSITY	2008	Behavioral and Social Science; Bioengineering; Biotechnology; Brain Disorders; Cancer; Cardiovascular; Clinical Research; Clinical Trials; Conditions affecting unborn children; Diabetes; Diagnostic Radiology; Epilepsy; Genetics; Health Services; Heart Disease; Infant Mortality/ (LBW); Infectious Diseases; Mental Health; Mental Retardation (Intellectual and Developmental Disabilities (IDD)); Mind and Body; Neurodegenerative; Neurosciences; Nutrition; Pediatric; Perinatal - Birth - Preterm (LBW); Perinatal Period - Conditions Originating in Perinatal Period; Prevention; Sudden Infant Death Syndrome	$135,000
FETO-MATERNAL DNA/RNA TRAFFICKING: BIOLOGY AND APPLICATION	TUFTS MEDICAL CENTER	2009	Clinical Research; Conditions affecting unborn children; Genetics; Pediatric; Pediatric Research Initiative; Perinatal Period - Conditions Originating in Perinatal Period	$7,950
RARE CELL ANALYSIS BY MULTI-SPECTRAL FLOW IMAGING	AMNIS CORPORATION	2002	Commercial project
NEW DIAGNOSTIC TEST FOR MUSCULAR DYSTROPHY	POLYCLONAL SERA LABS, INC.	1989	No NIH Category available.
MUTATIONAL ANALYSIS OF PEROXISOME BIOGENESIS DISORDERS	UNIVERSITY OF SOUTHERN CALIFORNIA	2006	No NIH Category available.
MUTATIONAL ANALYSIS OF PEROXISOME BIOGENESIS DISORDERS	UNIVERSITY OF SOUTHERN CALIFORNIA	2005	No NIH Category available.
MICROFABRICATED DEPOSITION TOOLS FOR CREATING NANOARRAYS	BIOFORCE NANOSCIENCES, INC.	2004	No NIH Category available.
MICROFABRICATED DEPOSITION TOOLS FOR CREATING NANOARRAYS	BIOFORCE NANOSCIENCES, INC.	2003	No NIH Category available.
MOLECULAR MECHANISMS OF BONE FORMATION	FORSYTH INSTITUTE	2006	No NIH Category available.
MOLECULAR MECHANISMS OF BONE FORMATION	FORSYTH INSTITUTE	2005	No NIH Category available.
MOLECULAR MECHANISMS OF BONE FORMATION	FORSYTH INSTITUTE	2004	No NIH Category available.
MOLECULAR MECHANISMS OF BONE FORMATION	FORSYTH INSTITUTE	2003	No NIH Category available.
MOLECULAR MECHANISMS OF BONE FORMATION	FORSYTH INSTITUTE	2002	No NIH Category available.
COMPREHENSIVE NON-INVASIVE PRENATAL DIAGNOSTICS	REPROGENETIC RESEARCH, LLC	2004	No NIH Category available.
CLINICAL AND GENETIC STUDIES OF NETHERTON SYNDROME	THOMAS JEFFERSON UNIVERSITY	2002	No NIH Category available.
CLINICAL AND GENETIC STUDIES OF NETHERTON SYNDROME	THOMAS JEFFERSON UNIVERSITY	2001	No NIH Category available.
CLINICAL AND GENETIC STUDIES OF NETHERTON SYNDROME	THOMAS JEFFERSON UNIVERSITY	2000	No NIH Category available.

Source: National Institutes of Health (searching for grants containing "prenatal diagnostics" and "prenatal screening"

Monday, April 19, 2010

Sequenom Center for Molecular Medicine - potential to replace amniocentesis with safe non-invasive tests

Sequenom, Inc, a genetic analysis company started in 1996 holds the key to delivering highly-accurate non-invasive prenatal diagnostics that can eliminate the need for amnio, CVS and cordocentesis that can result in spontaneous abortion or pregnancy complications.

Sequenom already launched the cystic fibrosis, Rhesus-D and sex determination tests at its Center for Molecular Medicine in Michigan. It is expected that the Down's Syndrome test will hit the market this year. And then it may be possible to diagnose 1,400+ diseases at the prenatal level.
Women worldwide will salute such a development, because it will give them piece of mind and confidence during pregnancy with no side effects.

Link: http://www.scmmlab.com/

Friday, April 16, 2010

Non-invasive Down's Syndrome Test - European Patent Office Application

Here is an application filed with the European Patent office describing the method for T21 screening using whole genome sequencing: http://v3.espacenet.com/publicationDetails/description;jsessionid=74B1390CA928ECC89E2BEC1273A1C54D.espacenet_levelx_prod_3?CC=WO&NR=2010033578A2&KC=A2&FT=D&date=20100325&DB=&locale= .

It was published on the 25th of March 2010:

NONINVASIVE DIAGNOSIS OF FETAL ANEUPLOIDY BY SEQUENCING Inventors: Hei-Mun Christina Fan, Stephen R. Quake
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority from U.S. Provisional Patent Application No. 61/098,758, filed on September 20, 2008, which is hereby incorporated by reference in its entirety
STATEMENT OF GOVERNMENTAL SUPPORT
This invention was made with U.S. Government support under NIH Director's Pioneer Award DPI OD000251. The U.S. Government has certain rights in this invention.
REFERENCE TO SEQUENCE LISTING, COMPUTER PROGRAM,
OR COMPACT DISK
Applicants assert that the text copy of the Sequence Listing is identical to the Sequence Listing in computer readable form found on the accompanying computer file. Applicants incorporate the contents of the sequence listing by reference in its entirety.
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to the field of molecular diagnostics, and more particularly to the field of prenatal genetic diagnosis.
Related Art Presented below is background information on certain aspects of the present invention as they may relate to technical features referred to in the detailed description, but not necessarily described in detail. That is, certain components of the present invention may be described in greater detail in the materials discussed below. The discussion below should not be construed as an admission as to the relevance of the information to the claimed invention or the prior art effect of the material described.
Fetal aneuploidy and other chromosomal aberrations affect 9 out of 1000 live births (1). The gold standard for diagnosing chromosomal abnormalities is karyotyping of fetal cells obtained via invasive procedures such as chorionic villus sampling and amniocentesis. These
I of 49 procedures impose small but potentially significant risks to both the fetus and the mother (2).
Non-invasive screening of fetal aneuploidy using maternal serum markers and ultrasound are available but have limited reliability (3-5). There is therefore a desire to develop non-invasive genetic tests for fetal chromosomal abnormalities.
Since the discovery of intact fetal cells in maternal blood, there has been intense interest in trying to use them as a diagnostic window into fetal genetics (6-9). While this has not yet moved into practical application (10), the later discovery that significant amounts of cell-free fetal nucleic acids also exist in maternal circulation has led to the development of new non-invasive prenatal genetic tests for a variety of traits (11, 12). However, measuring aneuploidy remains challenging due to the high background of maternal DNA; fetal DNA often constitutes <10% of total DNA in maternal cell-free plasma (13).
Recently developed methods for aneuploidy rely on detection focus on allelic variation between the mother and the fetus. Lo et al. demonstrated that allelic ratios of placental specific mRNA in maternal plasma could be used to detect trisomy 21 in certain populations (14).
Similarly, they also showed the use of allelic ratios of imprinted genes in maternal plasma DNA to diagnose trisomy 18 (15). Dhallan et al. used fetal specific alleles in maternal plasma DNA to detect trisomy 21 (16). However, these methods are limited to specific populations because they depend on the presence of genetic polymorphisms at specific loci. We and others argued that it should be possible in principle to use digital PCR to create a universal, polymorphism independent test for fetal aneuploidy using maternal plasma DNA (17-19).
An alternative method to achieve digital quantification of DNA is direct shotgun sequencing followed by mapping to the chromosome of origin and enumeration of fragments per chromosome. Recent advances in DNA sequencing technology allow massively parallel sequencing (20), producing tens of millions of short sequence tags in a single run and enabling a deeper sampling than can be achieved by digital PCR. As is known in the art, the term "sequence tag" refers to a relatively short (e.g., 15-100) nucleic acid sequence that can be used to identify a certain larger sequence, e.g., be mapped to a chromosome or genomic region or gene. These can be ESTs or expressed sequence tags obtained from mRNA. Specific Patents and Publications
Science 309:1476 (2 Sept. 2005) News Focus "An Earlier Look at Baby's Genes" describes attempts to develop tests for Down Syndrome using maternal blood. Early attempts to detect Down Syndrome using fetal cells from maternal blood were called "just modestly encouraging." The report also describes work by Dennis Lo to detect the Rh gene in a fetus where it is absent in the mother. Other mutations passed on from the father have reportedly been detected as well, such as cystic fibrosis, beta-thalassemia, a type of dwarfism and Huntington's disease. However, these results have not always been reproducible.
Venter et al., "The sequence of the human genome," Science, 2001 Feb 16;291(5507):1304-51 discloses the sequence of the human genome, which information is publicly available from NCBI. Another reference genomic sequence is a current NCBI build as obtained from the UCSC genome gateway.
Wheeler et al., "The complete genome of an individual by massively parallel DNA sequencing," Nature, 2008 Apr 17;452(7189):872-6 discloses the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence.
Quake et al., US 2007/0202525 entitled "Non-invasive fetal genetic screening by digital analysis," published August 30, 2007, discloses a process in which maternal blood containing fetal DNA is diluted to a nominal value of approximately 0.5 genome equivalent of DNA per reaction sample.
Chiu et al., "Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic DNA sequencing of DNA in maternal plasma," Proc. Natl. Acad. ScL 105(51):20458-20463 (December 23, 2008) discloses a method for determining fetal aneuploidy using massively parallel sequencing. Disease status determination (aneuploidy) was made by calculating a "z score." Z scores were compared with reference values, from a population restricted to euploid male fetuses. The authors noted in passing that G/C content affected the coefficient of variation.
Lo et al., "Diagnosing Fetal Chromosomal Aneuploidy Using Massively Parallel
Genomic Sequencing," US 2009/0029377, published January 29, 2009, discloses a method in which respective amounts of a clinically-relevant chromosome and of background chromosomes are determined from results of massively parallel sequencing. It was found that the percentage representation of sequences mapped to chromosome 21 is higher in a pregnant woman carrying a trisomy 21 fetus when compared with a pregnant woman carrying a normal fetus. For the four pregnant women each carrying a euploid fetus, a mean of 1.345% of their plasma DNA sequences were aligned to chromosome 21.
Lo et al., Determining a Nucleic Acid Sequence Imbalance," US 2009/0087847 published April 2, 2009, discloses a method for determining whether a nucleic acid sequence imbalance exists, such as an aneuploidy, the method comprising deriving a first cutoff value from an average concentration of a reference nucleic acid sequence in each of a plurality of reactions, wherein the reference nucleic acid sequence is either the clinically relevant nucleic acid sequence or the background nucleic acid sequence; comparing the parameter to the first cutoff value; and based on the comparison, determining a classification of whether a nucleic acid sequence imbalance exists.
BRIEF SUMMARY OF THE INVENTION
The following brief summary is not intended to include all features and aspects of the present invention, nor does it imply that the invention must include all features and aspects discussed in this summary.
The present invention comprises a method for analyzing a maternal sample, e.g., from peripheral blood. It is not invasive into the fetal space, as is amniocentesis or chorionic villi sampling. In the preferred method, fetal DNA which is present in the maternal plasma is used. The fetal DNA is in one aspect of the invention enriched due to the bias in the method towards shorter DNA fragments, which tend to be fetal DNA. The method is independent of any sequence difference between the maternal and fetal genome. The DNA obtained, preferably from a peripheral blood draw, is a mixture of fetal and maternal DNA. The DNA obtained is at least partially sequenced, in a method which gives a large number of short reads. These short reads act as sequence tags, in that a significant fraction of the reads are sufficiently unique to be mapped to specific chromosomes or chromosomal locations known to exist in the human genome. They are mapped exactly, or may be mapped with one mismatch, as in the examples below. By counting the number of sequence tags mapped to each chromosome (1-22, X and Y), the over- or under- representation of any chromosome or chromosome portion in the mixed DNA contributed by an aneuploid fetus can be detected.
This method does not require the sequence differentiation of fetal versus maternal DNA, because the summed contribution of both maternal and fetal sequences in a particular chromosome or chromosome portion will be different as between an intact, diploid chromosome and an aberrant chromosome, i.e., with an extra copy, missing portion or the like. In other words, the method does not rely on a priori sequence information that would distinguish fetal DNA from maternal DNA. The abnormal distribution of a fetal chromosome or portion of a chromosome (i.e., a gross deletion or insertion) may be determined in the present method by enumeration of sequence tags as mapped to different chromosomes. The median count of autosomal values (i.e., number of sequence tags per autosome) is used as a normalization constant to account for differences in total number of sequence tags is used for comparison between samples and between chromosomes The term "chromosome portion" is used herein to denote either an entire chromosome or a significant fragment of a chromosome. For example, moderate Down syndrome has been associated with partial trisomy 21q22.2->qter . By analyzing sequence tag density in predefined subsections of chromosomes (e.g., 10 to 100 kb windows), a normalization constant can be calculated, and chromosomal subsections quantified (e.g., 21q22.2). With large enough sequence tag counts, the present method can be applied to arbitrarily small fractions of fetal DNA. It has been demonstrated to be accurate down to 6% fetal DNA concentration. Exemplified below is the successful use of shotgun sequencing and mapping of DNA to detect fetal trisomy 21 (Down syndrome), trisomy 18 (Edward syndrome), and trisomy 13 (Patau syndrome), carried out non-invasively using cell-free fetal DNA in maternal plasma. This forms the basis of a universal, polymorphism-independent non-invasive diagnostic test for fetal aneuploidy. The sequence data also allowed us to characterize plasma DNA in unprecedented detail, suggesting that it is enriched for nucleosome bound fragments. The method may also be employed so that the sequence data obtained may be further analyzed to obtain information regarding polymorphisms and mutations.
Thus, the present invention comprises, in certain aspects, a method of testing for an abnormal distribution of a specified chromosome portion in a mixed sample of normally and abnormally distributed chromosome portions obtained from a single subject, such as a mixture of fetal and maternal DNA in a maternal plasma sample. One carries out sequence determinations on the DNA fragments in the sample, obtaining sequences from multiple chromosome portions of the mixed sample to obtain a number of sequence tags of sufficient length of determined sequence to be assigned to a chromosome location within a genome and of sufficient number to reflect abnormal distribution. Using a reference sequence, one assigns the sequence tags to their corresponding chromosomes including at least the specified chromosome by comparing the sequence to reference genomic sequence. Often there will be on the order of millions of short sequence tags that are assigned to certain chromosomes, and, importantly, certain positions along the chromosomes. One then may determine a first number of sequence tags mapped to at least one normally distributed chromosome portion and a second number of sequence tags mapped to the specified chromosome portion, both chromosomes being in one mixed sample. The present method also involves correcting for nonuniform distribution sequence tags to different chromosomal portions. This is explained in detail below, where a number of windows of defined length are created along a chromosome, the windows being on the order of kilobases in length, whereby a number of sequence tags will fall into many of the windows and the windows covering each entire chromosome in question, with exceptions for non-informative regions, e.g., centromere regions and repetitive regions. Various average numbers, i.e., median values, are calculated for different windows and compared. By counting sequence tags within a series of predefined windows of equal lengths along different chromosomes, more robust and statistically significant results may be obtained. The present method also involves calculating a differential between the first number and the second number which is determinative of whether or not the abnormal distribution exists.
In certain aspects, the present invention may comprise a computer programmed to analyze sequence data obtained from a mixture of maternal and fetal chromosomal DNA. Each autosome (chr. 1-22) is computationally segmented into contiguous, non-overlapping windows. (A sliding window could also be used). Each window is of sufficient length to contain a significant number of reads (sequence tags, having about 20-100 bp of sequence) and not still have a number of windows per chromosome. Typically, a window will be between 10kb and 100kb, more typically between 40 and 60 kb. There would, then, for example, accordingly be approximately between 3,000 and 100,000 windows per chromosome. Windows may vary widely in the number of sequence tags that they contain, based on location (e.g., near a centromere or repeating region) or G/C content, as explained below. The median (i.e., middle value in the set) count per window for each chromosome is selected; then the median of the autosomal values is used to account for differences in total number of sequence tags obtained for different chromosomes and distinguish interchromosomal variation from sequencing bias from aneuploidy. This mapping method may also be applied to discern partial deletions or insertions in a chromosome. The present method also provides a method for correcting for bias resulting from G/C content. For example, some the Solexa sequencing method was found to produce more sequence tags from fragments with increased G/C content. By assigning a weight to each sequence tag based on the G/C content of a window in which the read falls. The window for GC calculation is preferably smaller than the window for sequence tag density calculation.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a scatter plot graph showing sequence tag densities from eighteen samples, having five different genotypes, as indicated in the figure legend. Fetal aneuploidy is detectable by the over-representation of the affected chromosome in maternal blood. Figure IA shows sequence tag density relative to the corresponding value of genomic DNA control; chromosomes are ordered by increasing G/C content. The samples shown as indicated, are plasma from a woman bearing a T21 fetus; plasma from a woman bearing a T18 fetus; plasma from a normal adult male; plasma from a woman bearing a normal fetus; plasma from a woman bearing a Tl 3 fetus. Sequence tag densities vary more with increasing chromosomal G/C content. Figure IB is a detail from Fig. IA, showing chromosome 21 sequence tag density relative to the median chromosome 21 sequence tag density of the normal cases. Note that the values of 3 disomy 21 cases overlap at 1.0. The dashed line represents the upper boundary of the 99% confidence interval constructed from all disomy 21 samples. The chromosomes are listed in Figure IA in order of G/C content, from low to high. This figure suggests that one would prefer to use as a reference chromosome in the mixed sample with a mid level of G/C content, as it can be seen that the data there are more tightly grouped. That is, chromosomes 18, 8, 2, 7, 12, 21 (except in suspected Down syndrome), 14, 9, and 11 may be used as the nominal diploid chromosome if looking for a trisomy. Figure IB represents an enlargement of the chromosome 21 data.
Figure 2 is a scatter plot graph showing fetal DNA fraction and gestational age. The fraction of fetal DNA in maternal plasma correlates with gestational age. Fetal DNA fraction was estimated by three different ways: 1. From the additional amount of chromosomes 13, 18, and 21 sequences for T13, T18, and T21 cases respectively. 2. From the depletion in amount of chromosome X sequences for male cases. 3. From the amount of chromosome Y sequences present for male cases. The horizontal dashed line represents the estimated minimum fetal DNA fraction required for the detection of aneuploidy. For each sample, the values of fetal DNA fraction calculated from the data of different chromosomes were averaged. There is a statistically significant correlation between the average fetal DNA fraction and gestational age (p=0.0051). The dashed line represents the simple linear regression line between the average fetal DNA fraction and gestational age. The R2 value represents the square of the correlation coefficient. Figure 2 suggests that the present method may be employed at a very early stage of pregnancy. The data were obtained from the 10- week stage and later because that is the earliest stage at which chorionic villi sampling is done. (Amniocentesis is done later). From the level of the confidence interval, one would expect to obtain meaningful data as early as 4 weeks gestational age, or possibly earlier.
Figure 3 is a histogram showing size distribution of maternal and fetal DNA in maternal plasma. It shows the size distribution of total and chromosome Y specific fragments obtained from 454 sequencing of maternal plasma DNA from a normal male pregnancy. The distribution is normalized to sum to 1. The numbers of total reads and reads mapped to the Y- chromosome are 144992 and 178 respectively. Inset: Cumulative fetal DNA fraction as a function of sequenced fragment size. The error bars correspond to the standard error of the fraction estimated assuming the error of the counts of sequenced fragments follow Poisson statistics.
Figure 4 is a pair of line graphs showing distribution of sequence tags around transcription start sites (TSS) of ReSeq genes on all autosomes and chromosome X from plasma DNA sample of a normal male pregnancy (top, Fig. 4A) and randomly sheared genomic DNA control (bottom, Fig. 4B). The number of tags within each 5bp window was counted within +-lOOObp region around each TSS, taking into account the strand each sequence tag mapped to. The counts from all transcription start sites for each 5bp window were summed and normalized to the median count among the 400 windows. A moving average was used to smooth the data. A peak in the sense strand represents the beginning of a nucleosome, while a peak in the anti-sense strand represents the end of a nucleosome. In the plasma DNA sample shown here, five well-positioned nucleosomes are observed downstream of transcription start sites and are represented as grey ovals. The number below within each oval represents the distance in base pairs between adjacent peaks in the sense and anti-sense strands, corresponding to the size of the inferred nucleosome. No obvious pattern is observed for the genomic DNA control.
i of 49 Figure 5A is a scatter plot graph showing the mean sequence tag density for each chromosome of all samples, including cell-free plasma DNA from pregnant women and male donor, as well as genomic DNA control from male donor, is plotted above. Exceptions are chromosomes 13, 18 and 21, where cell-free DNA samples from women carrying aneuploid fetuses are excluded. The error bars represent standard deviation. The chromosomes are ordered by their G/C content. G/C content of each chromosome relative to the genome- wide value (41%) is also plotted. Figure 5B is a scatter plot of mean sequence tag density for each chromosome versus G/C content of the chromosome. The correlation coefficient is 0.927, and the correlation is statistically significant (p<10<~9>).
Figure 5C is a scatter plot of the standard deviation of sequence tag density of each chromosome versus G/C content of the chromosome. The correlation coefficient between standard deviation of sequence tag density and the absolute deviation of chromosomal G/C content from the genome-wide G/C content is 0.963, and the correlation is statistically significant (p<10-12).
Figure 6 is a scatter plot graph showing percent difference of chromosome X sequence tag density of all samples as compared to the median chromosome X sequence tag density of all female pregnancies. All male pregnancies show under-representation of chromosome X.
Figure 7 is a scatter plot graph showing a comparison of the estimation of fetal DNA fraction for cell-free DNA samples from 12 male pregnancies using sequencing data from chromosomes X and Y. The dashed line represents a simple linear regression line, with a slope of 0.85. The R2 value represents the square of the correlation coefficient. There is a statistically significant correlation between fetal DNA fraction estimated from chromosomes X and Y (p=0.0015).
Figure 8 is a line graph showing length distribution of sequenced fragments from maternal cell-free plasma DNA sample of a normal male pregnancy at lbp resolution. Sequencing was done on the 454/Roche platform. Reads that have at least 90% mapping to the human genome with greater than or equal to 90% accuracy are retained, totaling 144992 reads. Y-axis represents the number of reads obtained. The median length is 177bp while the mean length is 180bp. Figure 9 is a schematic illustrating how sequence tag distribution is used to detect the over and under-representation of any chromosome, i.e., a trisomy (over representation) or a missing chromosome (typically an X or Y chromosome, since missing autosomes are generally lethal). As shown in left panels A and C, one first plots the number of reads obtained versus a window that is mapped to a chromosome coordinate that represents the position of the read along the chromosome. That is, chromosome 1 (panel A) can be seen to have about 2.8 x 108 bp. It would have this number divided by 50kb windows. These values are replotted (panels B and D) to show the distribution of the number of sequence tags/50kb window. The term "bin" is equivalent to a window. From this analysis, one can determine a median number of reads M for each chromosome, which, for purposes of illustration, may be observed along the x axis at the approximate center of the distribution and may be said to be higher if there are more sequence tags attributable to that chromosome. For chromosome 1, illustrated in panels A and B, one obtains a median Ml. By taking the median M of all 22 autosomes, one obtains a normalization constant N that can be used to correct for differences in sequences obtained in different runs, as can be seen in Table 1. Thus, the normalized sequence tag density for chromosome 1 would be Ml/N; for chromosome 22 it would be M22/N. Close examination of panel A, for example would show that towards the zero end of the chromosome, this procedure obtained about 175 reads per 50kb window. In the middle, near the centromere, there were no reads, because this portion of the chromosome is ill defined in the human genome library.
That is, in the left panels (A and C), one plots the distribution of reads per chromosome coordinate, i.e., chromosomal position in terms of number of reads within each 50kb non-overlapping sliding window. Then, one determines the distribution of the number of sequence tags for each 50 kb window, and obtains a median number of sequence tags per chromosome for all autosomes and chromosome X (Examples of chr 1 [top] and chr 22
[bottom] are illustrated here). These results are referred to as M. The median of the 22 values of M (from all autosomes, chromosomes 1 through 22) is used as the normalization constant N. The normalized sequence tag density of each chromosome is M/N (e.g., chr 1: Ml/N; chr 22: M22/N). Such normalization is necessary to compare different patient samples since the total number of sequence tags (thus, the sequence tag density) for each patient sample is different (the total number of sequence tags fluctuates between ~8 to -12 million). The analysis thus flows from frequency of reads per coordinate (A and C) to # reads per window (B and D) to a combination of all chromosomes. Figure 10 is a scatter plot graph showing data from different samples, as in Figure 1, except that bias for G/C sampling has been eliminated.
Figure 11 is a scatter plot graph showing the weight given to different sequence samples according to percentage of G/C content, with lower weight given to samples with a higher G/C content. G/C content ranges from about 30% to about 70%; weight can range over a factor of about 3.
Figure 12 is a scatter plot graph which illustrates results of selected patients as indicated on the x axis, and, for each patient, a distribution of chromosome representation on the Y axis, as deviating from a representative t statistic, indicated as zero.
Figure 13 is a scatter plot graph showing the minimum fetal DNA percentage of which over- or under-representation of a chromosome could be detected with a 99.9% confidence level for chromosomes 21, 18, 13 and Chr. X, and a value for all other chromosomes.
Figure 14 is a scatter plot graph showing a linear relationship between log 10 of minimum fetal DNA percentage that is needed versus log 10 of the number of reads required.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Overview
Definitions
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. Generally, nomenclatures utilized in connection with, and techniques of, cell and molecular biology and chemistry are those well known and commonly used in the art. Certain experimental techniques, not specifically defined, are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification. For purposes of the clarity, following terms are defined below.
"Sequence tag density" means the normalized value of sequence tags for a defined window of a sequence on a chromosome (in a preferred embodiment the window is about 50kb), where the sequence tag density is used for comparing different samples and for subsequent analysis. A "sequence tag" is a DNA sequence of sufficient length that it may be assigned specifically to one of chromosomes 1-22, X or Y. It does not necessarily need to be, but may be non-repetitive within a single chromosome. A certain, small degree of mismatch (0-1) may be allowed to account for minor polymorphisms that may exist between the reference genome and the individual genomes (maternal and fetal) being mapped. The value of the sequence tag density is normalized within a sample. This can be done by counting the number of tags falling within each window on a chromosome; obtaining a median value of the total sequence tag count for each chromosome; obtaining a median value of all of the autosomal values; and using this value as a normalization constant to account for the differences in total number of sequence tags obtained for different samples. A sequence tag density as calculated in this way would ideally be about 1 for a disomic chromosome. As further described below, sequence tag densities can vary according to sequencing artifacts, most notably G/C bias; this is corrected as described. This method does not require the use of an external standard, but, rather, provides an internal reference, derived from al of the sequence tags (genomic sequences), which may be, for example, a single chromosome or a calculated value from all autosomes.
"T21" means trisomy 21.
"T18".means trisomy 18.
"T13" means trisomy 13.
"Aneuploidy" is used in a general sense to mean the presence or absence of an entire chromosome, as well as the presence of partial chromosomal duplications or deletions or kilobase or greater size, as opposed to genetic mutations or polymorphisms where sequence differences exist.
"Massively parallel sequencing" means techniques for sequencing millions of fragments of nucleic acids, e.g., using attachment of randomly fragmented genomic DNA to a planar, optically transparent surface and solid phase amplification to create a high density sequencing flow cell with millions of clusters, each containing -1,000 copies of template per sq. cm. These templates are sequenced using four-color DNA sequencing-by-synthesis technology. See, products offered by Illumina, Inc., San Diego, California. In the present work, sequences were obtained, as described below, with an Illumina/Solexa IG Genome Analyzer. The Solexa/[upsilon]lumina method referred to below relies on the attachment of randomly fragmented genomic DNA to a planar, optically transparent surface. In the present case, the plasma DNA does not need to be sheared. Attached DNA fragments are extended and bridge amplified to create an ultra-high density sequencing flow cell with >= 50 million clusters, each containing -1,000 copies of the same template. These templates are sequenced using a robust four-color DNA sequencing-by- synthesis technology that employs reversible terminators with removable fluorescent dyes. This novel approach ensures high accuracy and true base-by-base sequencing, eliminating sequence-context specific errors and enabling sequencing through homopolymers and repetitive sequences.
High- sensitivity fluorescence detection is achieved using laser excitation and total internal reflection optics. Short sequence reads are aligned against a reference genome and genetic differences are called using specially developed data analysis pipeline software.
Copies of the protocol for whole genome sequencing using Soelxa technology may be found at BioTechniques<(R)> Protocol Guide 2007 Published December 2006: p 29, www.biotechniques.com/default.asp? page=protocol&subsection=article_display&id=l 12378. Solexa's oligonucleotide adapters are ligated onto the fragments, yielding a fully-representative genomic library of DNA templates without cloning. Single molecule clonal amplification involves six steps: Template hybridization, template amplification, linearization, blocking 3' ends, denaturation and primer hybridization. Solexa's Sequencing-by-Synthesis utilizes four proprietary nucleotides possessing reversible fluorophore and termination properties. Each sequencing cycle occurs in the presence of all four nucleotides.
The presently used sequencing is preferably carried out without a preamplification or cloning step, but may be combined with amplification-based methods in a microfluidic chip having reaction chambers for both PCR and microscopic template-based sequencing. Only about 30 bp of random sequence information are needed to identify a sequence as belonging to a specific human chromosome. Longer sequences can uniquely identify more particular targets. In the present case, a large number of 25bp reads were obtained, and due to the large number of reads obtained, the 50% specificity enabled sufficient sequence tag representation.
Further description of a massively parallel sequencing method, which employed the below referenced 454 method is found in Rogers and Ventner, "Genomics: Massively parallel sequencing," Nature, 437, 326-327 (15 September 2005). As described there, Rothberg and colleagues (Margulies, M. et al. Nature 437, 376-380 (2005)), have developed a highly parallel system capable of sequencing 25 million bases in a four-hour period - about 100 times faster than the current state-of-the-art Sanger sequencing and capillary-based electrophoresis platform. The method could potentially allow one individual to prepare and sequence an entire genome in a few days. The complexity of the system lies primarily in the sample preparation and in the microfabricated, massively parallel platform, which contains 1.6 million picoliter-sized reactors in a 6.4-cm<2> slide. Sample preparation starts with fragmentation of the genomic DNA, followed by the attachment of adaptor sequences to the ends of the DNA pieces. The adaptors allow the DNA fragments to bind to tiny beads
(around 28 [mu] in diameter). This is done under conditions that allow only one piece of DNA to bind to each bead. The beads are encased in droplets of oil that contain all of the reactants needed to amplify the DNA using a standard tool called the polymerase chain reaction. The oil droplets form part of an emulsion so that each bead is kept apart from its neighbor, ensuring the amplification is uncontaminated. Each bead ends up with roughly 10 million copies of its initial DNA fragment. To perform the sequencing reaction, the DNA-template- carrying beads are loaded into the picoliter reactor wells - each well having space for just one bead. The technique uses a sequencing-by-synthesis method developed by Uhlen and colleagues, in which DNA complementary to each template strand is synthesized. The nucleotide bases used for sequencing release a chemical group as the base forms a bond with the growing DNA chain, and this group drives a light-emitting reaction in the presence of specific enzymes and luciferin. Sequential washes of each of the four possible nucleotides are run over the plate, and a detector senses which of the wells emit light with each wash to determine the sequence of the growing strand. This method has been adopted commercially by 454 Life Sciences.
Further examples of massively parallel sequencing are given in US 20070224613 by Strathmann, published September 27, 2007, entitled "Massively Multiplexed Sequencing." Also, for a further description of massively parallel sequencing, see US 2003/0022207 to Balasubramanian, et al., published January 30, 2003, entitled "Arrayed polynucleotides and their use in genome analysis." General description of method and materials
Overview
Non-invasive prenatal diagnosis of aneuploidy has been a challenging problem because fetal DNA constitutes a small percentage of total DNA in maternal blood (13) and intact fetal cells are even rarer (6, 7, 9, 31, 32). We showed in this study the successful development of a truly universal, polymorphism- independent non-invasive test for fetal aneuploidy. By directly sequencing maternal plasma DNA, we could detect fetal trisomy 21 as early as 14th week of gestation. Using cell-free DNA instead of intact cells allows one to avoid complexities associated with microchimerism and foreign cells that might have colonized the mother; these cells occur at such low numbers that their contribution to the cell- free DNA is negligible (33, 34). Furthermore, there is evidence that cell-free fetal DNA clears from the blood to undetectable levels within a few hours of delivery and therefore is not carried forward from one pregnancy to the next (35-37).
Rare forms of aneuploidy caused by unbalanced translocations and partial duplication of a chromosome are in principle detectable by the approach of shotgun sequencing, since the density of sequence tags in the triplicated region of the chromosome would be higher than the rest of the chromosome. Detecting incomplete aneuploidy caused by mosaicism is also possible in principle but may be more challenging, since it depends not only on the concentration of fetal DNA in maternal plasma but also the degree of fetal mosaicism. Further studies are required to determine the effectiveness of shotgun sequencing in detecting these rare forms of aneuploidy.
The present method is applicable to large chromosomal deletions, such as 5p- Syndrome (five p minus), also known as Cat Cry Syndrome or Cri du Chat Syndrome. 5p- Syndrome is characterized at birth by a high-pitched cry, low birth weight, poor muscle tone, microcephaly, and potential medical complications. Similarly amenable disorders addressed by the present methods are p-, monosomy 9P, otherwise known as Alfi's Syndrome or 9P-, 22ql 1.2 deletion syndrome, Emanuel Syndrome, also known in the medical literature as the Supernumerary Der(22) Syndrome, trisomy 22, Unbalanced 11/22 Translocation or partial trisomy 11/22, Microdeletion and Microduplication at 16pl 1.2, which is associated with autism, and other deletions or imbalances, including those that are presently unknown.
An advantage of using direct sequencing to measure aneuploidy non-invasively is that it is able to make full use of the sample, while PCR based methods analyze only a few targeted sequences. In this study, we obtained on average 5 million reads per sample in a single run, of which -66,000 mapped to chromosome 21. Since those 5 million reads represent only a portion of one human genome, in principle less than one genomic equivalent of DNA is sufficient for the detection of aneuploidy using direct sequencing. In practice, a larger amount of DNA was used since there is sample loss during sequencing library preparation, but it may be possible to further reduce the amount of blood required for analysis.
Mapping shotgun sequence information (i.e., sequence information from a fragment whose physical genomic position is unknown) can be done in a number of ways, which involve alignment of the obtained sequence with a matching sequence in a reference genome. See, Li et al., "Mapping short DNA sequencing reads and calling variants using mapping quality score," Genome Res., 2008 Aug 19. [Epub ahead of print].
We observed that certain chromosomes have large variations in the counts of sequenced fragments (from sample to sample, and that this depends strongly on the G/C content (Figure IA) It is unclear at this point whether this stems from PCR artifacts during sequencing library preparation or cluster generation, the sequencing process itself, or whether it is a true biological effect relating to chromatin structure. We strongly suspect that it is an artifact since we also observe G/C bias on genomic DNA control, and such bias on the Solexa sequencing platform has recently been reported (38, 39). It has a practical consequence since the sensitivity to aneuploidy detection will vary from chromosome to chromosome; fortunately the most common human aneuploidies (such as 13, 18, and 21) have low variation and therefore high detection sensitivity. Both this problem and the sample volume limitations may possibly be resolved by the use of single molecule sequencing technologies, which do not require the use of PCR for library preparation (40).
Plasma DNA samples used in this study were obtained about 15 to 30 minutes after amniocentesis or chorionic villus sampling. Since these invasive procedures disrupt the interface between the placenta and maternal circulation, there have been discussions whether the amount of fetal DNA in maternal blood might increase following invasive procedures. Neither of the studies to date have observed a significant effect (41, 42).
Our results support this conclusion, since using the digital PCR assay we estimated that fetal DNA constituted less than or equal to 10% of total cell-free DNA in the majority of our maternal plasma samples. This is within the range of previously reported values in maternal plasma samples obtained prior to invasive procedures (13). It would be valuable to have a direct measurement addressing this point in a future study.
The average fetal DNA fraction estimated from sequencing data is higher than the values estimated from digital PCR data by an average factor of two (p<0.005, paired t-test on all male pregnancies that have complete set of data). One possible explanation for this is that the PCR step during Solexa library preparation preferentially amplifies shorter fragments, which others have found to be enriched for fetal DNA (22, 23). Our own measurements of length distribution on one sample do not support this explanation, but nor can we reject it at this point. It should also be pointed out that using the sequence tags we find some variation of fetal fraction even in the same sample depending on which chromosome we use to make the calculation (Figure 7, Table 1). This is most likely due to artifacts and errors in the sequencing and mapping processes, which are substantial - recall that only half of the sequence tags map to the human genome with one error or less. Finally, it is also possible that the PCR measurements are biased since they are only sampling a tiny fraction of the fetal genome.
Our sequencing data suggest that the majority of cell-free plasma DNA is of apoptotic origin and shares features of nucleosomal DNA. Since nucleosome occupancy throughout the eukaryotic genome is not necessarily uniform and depends on factors such as function, expression, or sequence of the region (30, 43), the representation of sequences from different loci in cell-free maternal plasma may not be equal, as one usually expects in genomic DNA extracted from intact cells. Thus, the quantity of a particular locus may not be representative of the quantity of the entire chromosome and care must be taken when one designs assays for measuring gene dosage in cell-free maternal plasma DNA that target only a few loci.
Historically, due to risks associated with chorionic villus sampling and amniocentesis, invasive diagnosis of fetal aneuploidy was primarily offered to women who were considered at risk of carrying an aneuploid fetus based on evaluation of risk factors such as maternal age, levels of serum markers, and ultrasonographic findings. Recently, an American College of Obstetricians and Gynecologists (ACOG) Practice Bulletin recommended that "invasive diagnostic testing for aneuploidy should be available to all women, regardless of maternal age" and that "pretest counseling should include a discussion of the risks and benefits of invasive testing compared with screening tests" (2). A noninvasive genetic test based on the results described here and in future large- scale studies would presumably carry the best of both worlds: minimal risk to the fetus while providing true genetic information. The costs of the assay are already fairly low; the sequencing cost per sample as of this writing is about $700 and the cost of sequencing is expected to continue to drop dramatically in the near future.
Shotgun sequencing can potentially reveal many more previously unknown features of cell-free nucleic acids such as plasma mRNA distributions, as well as epigenetic features of plasma DNA such as DNA methylation and histone modification, in fields including perinatology, oncology and transplantation, thereby improving our understanding of the basic biology of pregnancy, early human development and disease.
Sequencing Methods
Commercially available sequencing equipment was used in the present illustrative examples, namely the Solexa/[upsilon]lumina sequencing platform and the 454/Roche platform. It will be apparent to those skilled in the art that a number of different sequencing methods and variations can be used. One sequencing method that can be used to advantage in the present methods involves paired end sequencing. Fluorescently labeled sequencing primers could be used to simultaneously sequence both strands of a dsDNA template, as described e.g., in Wiemann et al. (Anal. Biochem. 224: 117 [1995]; Anal. Biochem. 234: 166 [1996]. Recent examples of this technique have demonstrated multiplex co-sequencing using the four-color dye terminator reaction chemistry pioneered by Prober et al. (Science 238: 336 [1987]).
Solexa/Illumina offers a "Paired End Module" to its Genome Analyzer. Using this module, after the Genome Analyzer has completed the first sequencing read, the Paired- End Module directs the resynthesis of the original templates and the second round of cluster generation. The Paired-End Module is connected to the Genome Analyzer through a single fluidic connection. In addition, 454 has developed a protocol to generate a library of Paired End reads. These Paired End reads are approximately 84-nucleotide DNA fragments that have a 44-mer adaptor sequence in the middle flanked by a 20-mer sequence on each side. The two flanking 20-mers are segments of DNA that were originally located approximately 2.5 kb apart in the genome of interest.
By using paired end reads in the present method, one may obtain more sequence information from a given plasma DNA fragment, and, significantly, one may also obtain sequence information from both ends of the fragment. The fragment is mapped to the human genome as explained here elsewhere. After mapping both ends, one may deduce the length of the starting fragment. Since fetal DNA is known to be shorter than maternal DNA fragments circulating in plasma, one may use this information about the length of the DNA fragment to effectively increase the weight given to sequences obtained from shorter (e.g., about 300 bp or less) DNA fragments. Methods for weighting are given below.
Another method for increasing sensitivity to fetal DNA is to focus on certain regions within the human genome. One may use sequencing methods which select a priori sequences which map to the chromosomes of interest (as described here elsewhere, such as 18, 21, 13, X and Y). One may also choose to focus, using this method, on partial chromosomal deletions, such as 22ql 1 deletion syndrome. Other microdeletions and microduplications are set forth in Table 1 of US 2005/0181410, published Aug. 18 2005 under the title "Methods and apparatuses for achieving precision genetic diagnosis."
In sequencing selected subsequences, one may employ sequence-based methodologies such as sequencing by array, or capture beads with specific genomic sequences used as capture probes. The use of a sequencing array can be implemented as described in Chetverin et al., "Oligonucleotide arrays: new concepts and possibilities," Biotechnology (N Y). 1994 Nov;12(l l):1093-9, as well as Rothberg, US 2002/0012930 Al entitled "Method of Sequencing a Nucleic Acid," and Reeve et al., "Sequencing by Hybridization," US 6,399,364. In these methods, the target nucleic acid to be sequenced may be genomic DNA, cDNA or RNA. The sample is rendered single stranded and captured under hybridizing conditions with a number of single stranded probes which are catalogued by bar coding or by physical separation in an array. Emulsion PCR, as used in the 454 system, the SOLiD system, and Polonator (Dover Systems) and others may also be used, where capture is directed to specific target sequences, e.g., genome sequences mapping uniquely to chromosome 21 or other chromosome of interest, or to a chromosome region such as 15ql 1 (Prader-Willi syndrome), or excessive CGG repeats in the FMRl gene (fragile X syndrome).
The subsequencing method is in one aspect contrary to conventional massively parallel sequencing methodologies, which seek to obtain all of the sequence information in a sample. This alternative method selectively ignores certain sequence information by using a sequencing method which selectively captures sample molecules containing certain predefined sequences. One may also use the sequencing steps exactly as exemplified, but in mapping the sequence fragments obtained, give greater weight to sequences which map to areas known to be more reliable in their coverage, such as exons. Otherwise, the method proceeds as described below, where one obtains a large number of sequence reads from one or more reference chromosomes, which are compared to a large number of reads obtained from a chromosome of interest, after accounting for variations arising from chromosomal length, G/C content, repeat sequences and the like.
One may also focus on certain regions within the human genome according to the present methods in order to identify partial monosomies and partial trisomies. As described below, the present methods involve analyzing sequence data in a defined chromosomal sliding "window," such as contiguous, nonoverlapping 50Kb regions spread across a chromosome. Partial trisomies of 13q, 8p (8p23.1), 7q, distal 6p, 5p, 3q (3q25.1), 2q, Iq
(Iq42.1 and Iq21-qter), partial Xpand monosomy 4q35.1 have been reported, among others. For example, partial duplications of the long arm of chromosome 18 can result in Edwards syndrome in the case of a duplication of 18q21.1-qter (See, Mewar et al., "Clinical and molecular evaluation of four patients with partial duplications of the long arm of chromosome 18," Am J Hum Genet. 1993 Dec;53(6): 1269-78).
Shotgun Sequencing of Cell-free Plasma DNA
Cell-free plasma DNA from 18 pregnant women and a male donor, as well as whole blood genomic DNA from the same male donor, were sequenced on the Solexa/[upsilon]lumina platform. We obtained on average -10 million 25bp sequence tags per sample. About 50% (i.e., ~5 million) of the reads mapped uniquely to the human genome with at most 1 mismatch against the human genome, covering -4% of the entire genome. An average of -154,000, -135,000, -66,000 sequence tags mapped to chromosomes 13, 18, and 21, respectively. The number of sequence tags for each sample is detailed in the following Table 1 and Table 2.
Table 1.

Table 2.

The volume of plasma is the volume used for Sequencing Library Creation (ml). The amount of DNA is in Plasma (cell equivalent/ml plasma)*. The approximate amount of input DNA is that use for Sequencing Library Construction (ng).
*As quantified by digital PCR with EIF2C1 Taqman Assay, converting from copies to ng assuming 6.6pg/cell equivalent.
^For 454 sequencing, this number represents the number of reads with at least 90% accuracy and 90% coverage when mapped to hgl8.
^Insufficient materials were available for quantifying fetal DNA % with digital PCR for these samples (either no samples remained for analysis or there was insufficient sampling).
<$>Sequenced on Solexa/[upsilon]lumina platform; '"Sequenced on 454/Roche platform
"Sample P13 was the first to be analyzed by shotgun sequencing. It was a normal fetus and the chromosome value was clearly disomic. However, there were some irregularities with this sample and it was not included in further analysis. This sample was sequenced on a different Solexa instrument than the rest of the samples of this study, and it was sequenced in the presence of a number of samples of unknown origin. The G/C content of this sample was lower than the G/C bias of the human genome, while the rest of the samples are above. It had the lowest number of reads, and also the smallest number of reads mapped successfully to the human genome. This sample appeared to be outlier in sequence tag density for most chromosomes and the fetal DNA fraction calculated from chromosomes X was not well defined. For these reasons we suspect that the irregularities are due to technical problems with the sequencing process. In Table 1 and Table 2, each sample represents a different patient, e.g., Pl in the first row. The total number of sequence tags varied but was frequently was in the 10 million range, using the Solexa technology. The 454 technology used for P25 and P13 gave a lower number of reads.
We observed a non-uniform distribution of sequence tags across each chromosome.
This pattern of intra-chromosomal variation was common among all samples, including randomly sheared genomic DNA, indicating the observed variation was most probably due to sequencing artifacts. We applied an arbitrary sliding window of 50kb across each chromosome and counted the number of tags falling within each window. The window can be varied in size to account for larger numbers of reads (in which cases a smaller window, e.g., 10 kb, gives a more detailed picture of a chromosome) or a smaller number of reads, in which case a larger window (e.g., 100kb) may still be used and will detect gross chromosome deletions, omissions or duplications. The median count per 50kb window for each chromosome was selected. The median of the autosomal values (i.e., 22 chromosomes) was used as a normalization constant to account for the differences in total number of sequence tags obtained for different samples. The inter-chromosomal variation within each sample was also consistent among all samples (including genomic DNA control). The mean sequence tag density of each chromosome correlates with the G/C content of the chromosome (p<10<~9>) (Figure 5A, 5B). The standard deviation of sequence tag density for each chromosome also correlates with the absolute degree of deviation in chromosomal G/C content from the genome-wide G/C content (p<10<~12>) (Figure 5A, 5C). The G/C content of sequenced tags of all samples (including the genomic DNA control) was on average 10% higher than the value of the sequenced human genome (41%) (21)(Table 2), suggesting that there is a strong G/C bias stemming from the sequencing process. We plotted in Figure IA the sequence tag density for each chromosome (ordered by increasing G/C content) relative to the corresponding value of the genomic DNA control to remove such bias.
Detection of Fetal Aneuploidy
The distribution of chromosome 21 sequence tag density for all 9 T21 pregnancies is clearly separated from that of pregnancies bearing disomy 21 fetuses (p<10<~5>), Student's t- test) (Figure IA and IB). The coverage of chromosome 21 for T21 cases is about -4-18% higher (average -11%) than that of the disomy 21 cases. Because the sequence tag density of chromosome 21 for T21 cases should be (l+[epsilon]/2) of that of disomy 21 pregnancies, where [epsilon] is the fraction of total plasma DNA originating from the fetus, such increase in chromosome 21 coverage in T21 cases corresponds to a fetal DNA fraction of -8% - 35% (average -23%) (Table 1, Figure 2). We constructed a 99% confidence interval of the distribution of chromosome 21 sequence tag density of disomy 21 pregnancies. The values for all 9 T21 cases lie outside the upper boundary of the confidence interval and those for all 9 disomy 21 cases lie below the boundary (Figure IB). If we used the upper bound of the confidence interval as a threshold value for detecting T21, the minimum fraction of fetal DNA that would be detected is -2%.
Plasma DNA of pregnant women carrying T18 fetuses (2 cases) and a T13 fetus (1 case) were also directly sequenced. Over-representation was observed for chromosome 18 and chromosome 13 in T18 and T13 cases respectively (Figure IA). While there were not enough positive samples to measure a representative distribution, it is encouraging that all of these three positives are outliers from the distribution of disomy values. The Tl 8 are large outliers and are clearly statistically significant (p<10<~7>), while the statistical significance of the single T13 case is marginal (p<0.05). Fetal DNA fraction was also calculated from the over-represented chromosome as described above (Figure 2, Table 1).
Fetal DNA Fraction in Maternal Plasma
Using digital Taqman PCR for a single locus on chromosome 1, we estimated the average cell-free DNA concentration in the sequenced maternal plasma samples to be -360 cell equivalent/ml of plasma (range: 57 to 761 cell equivalent/ml plasma) (Table 1), in rough accordance to previously reported values (13). The cohort included 12 male pregnancies (6 normal cases, 4 T21 cases, 1 T18 case and 1 T13 case) and 6 female pregnancies (5 T21 cases and 1 Tl 8 case). DYS 14, a multi-copy locus on chromosome Y, was detectable in maternal plasma by real-time PCR in all these pregnancies but not in any of the female pregnancies (data not shown). The fraction of fetal DNA in maternal cell-free plasma DNA is usually determined by comparing the amount of fetal specific locus (such as the SRY locus on chromosome Y in male pregnancies) to that of a locus on any autosome that is common to both the mother and the fetus using quantitative real-time PCR (13, 22, 23). We applied a similar duplex assay on a digital PCR platform (see Methods) to compare the counts of the SRY locus and a locus on chromosome 1 in male pregnancies. SRY locus was not detectable in any plasma DNA samples from female pregnancies. We found with digital PCR that for the majority samples, fetal DNA constituted $10% of total DNA in maternal plasma (Table
2), agreeing with previously reported values (13).
The percentage of fetal DNA among total cell-free DNA in maternal plasma can also be calculated from the density of sequence tags of the sex chromosomes for male pregnancies. By comparing the sequence tag density of chromosome Y of plasma DNA from male pregnancies to that of adult male plasma DNA, we estimated fetal DNA percentage to be on average ~ 19% (range: 4-44%) for all male pregnancies (Table 2, above, Figure 2). Because human males have 1 fewer chromosome X than human females, the sequence tag density of chromosome X in male pregnancies should be (l-e/2) of that of female pregnancies, where e is fetal DNA fraction. We indeed observed under-representation of chromosome X in male pregnancies as compared to that of female pregnancies (Figure 5). Based on the data from chromosome X, we estimated fetal DNA percentage to be on average -19% (range: 8-40%) for all male pregnancies (Table 2, above, Figure 2). The fetal DNA percentage estimated from chromosomes X and Y for each male pregnancy sample correlated with each other (p=0.0015) (Figure 7).
We plotted in Figure 2 the fetal DNA fraction calculated from the over-representation of trisomic chromosome in aneuploid pregnancies, and the under-representation of chromosome X and the presence of chromosome Y for male pregnancies against gestational age. The average fetal DNA fraction for each sample correlates with gestational age (p=0.0051), a trend that is also previously reported (13).
Size Distribution of Cell-Free Plasma DNA
We analyzed the sequencing libraries with a commercial lab-on-a-chip capillary electrophoresis system. There is a striking consistency in the peak fragment size, as well as the distribution around the peak, for all plasma DNA samples, including those from pregnant women and male donor. The peak fragment size was on average 261bp (range: 256-264bp). Subtracting the total length of the Solexa adaptors (92bp) from 260bp gives 169bp as the actual peak fragment size. This size corresponds to the length of DNA wrapped in a chromatosome, which is a nucleosome bound to a Hl histone (24). Because the library preparation includes an 18-cycle PCR, there are concerns that the distribution might be biased. To verify that the size distribution observed in the electropherograms is not an artifact of PCR, we also sequenced cell-free plasma DNA from a pregnant woman carrying a male fetus using the 454 platform. The sample preparation for this system uses emulsion PCR, which does not require competitive amplification of the sequencing libraries and creates product that is largely independent of the amplification efficiency. The size distribution of the reads mapped to unique locations of the human genome resembled those of the Solexa sequencing libraries, with a predominant peak at 176bp, after subtracting the length of 454 universal adaptors (Figure 3 and Figure 8). These findings suggest that the majority of cell- free DNA in the plasma is derived from apoptotic cells, in accordance with previous findings (22, 23, 25, 26).
Of particular interest is the size distribution of maternal and fetal DNA in maternal cell-free plasma. Two groups have previously shown that the majority of fetal DNA has size range of that of mono-nucleosome (<200-300bp), while maternal DNA is longer. Because 454 sequencing has a targeted read-length of 250bp, we interpreted the small peak at around 250bp (Figure 3 and Figure 8) as the instrumentation limit from sequencing higher molecular weight fragments. We plotted the distribution of all reads and those mapped to Y- chromosome (Figure 3). We observed a slight depletion of Y-chromosome reads in the higher end of the distribution. Reads <220bp constitute 94% of Y-chromosome and 87% of the total reads. Our results are not in complete agreement with previous findings in that we do not see as dramatic an enrichment of fetal DNA at short lengths (22, 23). Future studies will be needed to resolve this point and to eliminate any potential residual bias in the 454 sample preparation process, but it is worth noting that the ability to sequence single plasma samples permits one to measure the distribution in length enrichments across many individual patients rather than measuring the average length enrichment of pooled patient samples.
Cell-Free Plasma DNA Shares Features of Nucleosomal DNA
Since our observations of the size distribution of cell-free plasma DNA suggested that plasma DNA is mainly apoptotic of origin, we investigated whether features of nucleosomal DNA and positioning are found in plasma DNA. One such feature is nucleosome positioning around transcription start sites. Experimental data from yeast and human have suggested that nucleosomes are depleted in promoters upstream of transcription start sites and nucleosomes are well-positioned near transcription start sites (27-30). We applied a 5bp window spanning +/- lOOObp of transcription start sites of all RefSeq genes and counted the number of tags mapping to the sense and antisense strands within each window. A peak in the sense strand represents the beginning of a nucleosome while a peak in the antisense strand represents the end. After smoothing, we saw that for most plasma DNA samples, at least 3 well-positioned nucleosomes downstream of transcription start sites could be detected, and in some cases, up to 5 well-positioned nucleosomes could be detected, in rough accordance to the results of Schones et al. (27) (Figure 4). We applied the same analysis on sequence tags of randomly sheared genomic DNA and observed no obvious pattern in tag localization, although the density of tags was higher at the transcription start site (Figure 4).
Correction for sequencing bias
Shown in Figures 10 and 12 are results which may be obtained when sequence tag numbers are treated statistically based on data from the reference human genome. That is, for example, sequence tags from fragments with higher GC content may be overrepresented, and suggest an aneuploidy where none exists. The sequence tag information itself may not be informative, since only a small portion of the fragment ordinarily will be sequenced, while it is the overall G/C content of the fragment that causes the bias. Thus there is provided a method, described in detail in Examples 8 and 10, for correcting for this bias, and this method may facilitate analysis of samples which otherwise would not produce statistically significant results. This method, for correcting for G/C bias of sequence reads from massively parallel sequencing of a genome, comprises the step of dividing the genome into a number of windows within each chromosome and calculating the G/C content of each window. These windows need not be the same as the windows used for calculating sequence tag density; they may be on the order of 10kb-30kb in length, for example. One then calculates the relationship between sequence coverage and G/C content of each window by determining a number of reads per a given window and a G/C content of that window. The G/C content of each window is known from the human genome reference sequence. Certain windows will be ignored, i.e., with no reads or no G/C content. One then assigns a weight to the number of reads per a given window (i.e., the number of sequence tags assigned to that window) based on G/C content, where the weight has a relationship to G/C content such that increasing numbers of reads with increasing G/C content results in decreasing weight per increasing G/C content.
EXAMPLES
The examples below describe the direct sequencing of cell-free DNA from plasma of pregnant women with high throughput shotgun sequencing technology, obtaining on average 5 million sequence tags per patient sample. The sequences obtained were mapped to specific chromosomal locations. This enabled us to measure the over- and under-representation of chromosomes from an aneuploid fetus. The sequencing approach is polymorphism- independent and therefore universally applicable for the non-invasive detection of fetal aneuploidy. Using this method we successfully identified all 9 cases of trisomy 21 (Down syndrome), 2 cases of trisomy 18 and 1 case of trisomy 13 in a cohort of 18 normal and aneuploid pregnancies; trisomy was detected at gestational ages as early as the 14th week. Direct sequencing also allowed us to study the characteristics of cell-free plasma DNA, and we found evidence that this DNA is enriched for sequences from nucleosomes.
EXAMPLE 1: Subject Enrollment
The study was approved by the Institutional Review Board of Stanford University. Pregnant women at risk for fetal aneuploidy were recruited at the Lucile Packard Children Hospital Perinatal Diagnostic Center of Stanford University during the period of April 2007 to May 2008. Informed consent was obtained from each participant prior to the blood draw. Blood was collected 15 to 30 minutes after amniocentesis or chorionic villus sampling except for 1 sample that was collected during the third trimester. Karyotype analysis was performed via amniocentesis or chorionic villus sampling to confirm fetal karyotype. 9 trisomy 21 (T21), 2 trisomy 18 (T18), 1 trisomy 13 (T13) and 6 normal singleton pregnancies were included in this study. The gestational age of the subjects at the time of blood draw ranged from 10 to 35 weeks (Table 1). Blood sample from a male donor was obtained from the Stanford Blood Center.
EXAMPLE 2: Sample Processing and DNA Quantification
7 to 15ml of peripheral blood drawn from each subject and donor was collected in EDTA tubes. Blood was centrifuged at 160Og for 10 minutes. Plasma was transferred to microcentrifuge tubes and centrifuged at 1600Og for 10 minutes to remove residual cells. The two centrifugation steps were performed within 24 hours after blood collection. Cell-free plasma was stored at -80C until further processing and was frozen and thawed only once before DNA extraction. DNA was extracted from cell-free plasma using QIAamp DNA Micro Kit (Qiagen) or NucleoSpin Plasma Kit (Macherey-Nagel) according to manufacturers' instructions. Genomic DNA was extracted from 200[mu]l whole blood of the donors using QIAamp DNA Blood Mini Kit (Qiagen). Microfluidic digital PCR (Fluidigm) was used to quantify the amount of total and fetal DNA using Taqman assays targeting at the EIF2C1 locus on chromosome 1 (Forward: 5' GTTCGGCTTTCACCAGTCT 3' (SEQ ID NO: 1) ; Reverse: 5' CTCCATAGCTCTCCCCACTC 3' (SEQ ID NO: 2); Probe: 5' HEX-GCCCTGCCATGTGGAAGAT-BHQ 1 3' (SEQ ID NO: 3); amplicon size: 81bp) and the
SRY locus on chromosome Y (Forward: 5' CGCTTAACATAGCAGAAGCA 3'(SEQ ID NO: 4); Reverse: 5' AGTTTCGAACTCTGGCACCT 3'(SEQ ID NO: 5); Probe: 5' FAM- TGTCGCACTCTCCTTGTTTTTGACA-BHQ 1 3'(SEQ ID NO: 6); amplicon size: 84bp) respectively. A Taqman assay targeting at DYS 14 (Forward: 5' ATCGTCCATTTCCAGAATCA 3'(SEQ ID NO: 6); Reverse: 5' GTTGACAGCCGTGGAATC 3' (SEQ ID NO: 7); Probe: 5' FAM- TGCCACAGACTGAACTGAATGATTTTC-BHQ1 3' (SEQ ID NO: 8); amplicon size: 84bp), a multi-copy locus on chromosome Y, was used for the initial determination of fetal sex from cell-free plasma DNA with traditional real-time PCR. PCR reactions were performed with Ix iQ Supermix (Bio-Rad), 0.1% Tween-20 (microfluidic digital PCR only), 30OnM primers, and 15OnM probes. The PCR thermal cycling protocol was 95C for 10 min, followed by 40 cycles of 95C for 15s and 6OC for 1 min. Primers and probes were purchased form IDT.
EXAMPLE 3: Sequencing
A total of 19 cell-free plasma DNA samples, including 18 from pregnant women and 1 from a male blood donor, and genomic DNA sample from whole blood of the same male donor, were sequenced on the Solexa/Illumina platform. ~1 to 8ng of DNA fragments extracted from 1.3 to 5.6ml cell-free plasma was used for sequencing library preparation (Table 1). Library preparation was carried out according to manufacturer's protocol with slight modifications. Because cell-free plasma DNA was fragmented in nature, no further fragmentation by nebulization or sonication was done on plasma DNA samples.
Genomic DNA from male donor's whole blood was sonicated (Misonix XL-2020) (24 cycles of 30s sonication and 90s pause), yielding fragments with size between 50 and 400bp, with a peak at 150bp. ~2ng of the sonicated genomic DNA was used for library preparation. Briefly, DNA samples were blunt ended and ligated to universal adaptors. The amount of adaptors used for ligation was 500 times less than written on the manufacturer's protocol. 18 cycles of PCR were performed to enrich for fragments with adaptors using primers complementary to the adaptors. The size distributions of the sequencing libraries were analyzed with DNA 1000 Kit on the 2100 Bioanalyzer (Agilent) and quantified with microfluidic digital PCR (Fluidigm). The libraries were then sequenced using the Solexa IG Genome Analyzer according to manufacturer's instructions. Cell-free plasma DNA from a pregnant woman carrying a normal male fetus was also sequenced on the 454/Roche platform. Fragments of DNA extracted from 5.6ml of cell-free plasma (equivalent to ~4.9ng of DNA) were used for sequencing library preparation. The sequencing library was prepared according to manufacturer's protocol, except that no nebulization was performed on the sample and quantification was done with microfluidic digital PCR instead of capillary electrophoresis. The library was then sequenced on the 454 Genome Sequencer FLX System according to manufacturer's instructions.
Electropherograms of Solexa sequencing libraries were prepared from cell-free plasma DNA obtained from 18 pregnant women and 1 male donor. Solexa library prepared from sonicated whole blood genomic DNA from the male donor was also examined. For libraries prepared from cell-free DNA, all had peaks at average 261bp (range: 256-264bp). The actual peak size of DNA fragments in plasma DNA is ~168bp (after removal of Solexa universal adaptor (92bp)). This corresponds to the size of a chromatosome.
EXAMPLE 4: Data Analysis Shotgun Sequence Analysis
Solexa sequencing produced 36 to 50bp reads. The first 25bp of each read was mapped to the human genome build 36 (hgl8) using ELAND from the Solexa data analysis pipeline. The reads that were uniquely mapped to the human genome having at most 1 mismatch were retained for analysis. To compare the coverage of the different chromosomes, a sliding window of 50kb was applied across each chromosome, except in regions of assembly gaps and micro satellites, and the number of sequence tags falling within each window was counted and the median value was chosen to be the representative of the chromosome. Because the total number of sequence tags for each sample was different, for each sample, we normalized the sequence tag density of each chromosome (except chromosome Y) to the median sequence tag density among autosomes. The normalized values were used for comparison among samples in subsequent analysis. We estimated fetal DNA fraction from chromosome 21 for T21 cases, chromosome 18 from Tl 8 cases, chromosome 13 from T13 case, and chromosomes X and Y for male pregnancies. For chromosome 21,18, and 13, fetal DNA fraction was estimated as 2*(x-l), where x was the ratio of the over-represented chromosome sequence tag density of each trisomy case to the median chromosome sequence tag density of the all disomy cases. For chromosome X, fetal DNA was estimated as 2*(l-x), where x was the ratio of chromosome X sequence tag density of each male pregnancy to the median chromosome X sequence tag density of all female pregnancies. For chromosome Y, fetal DNA fraction was estimated as the ratio of chromosome Y sequence tag density of each male pregnancy to that of male donor plasma DNA. Because a small number of chromosome Y sequences were detected in female pregnancies, we only considered sequence tags falling within transcribed regions on chromosome Y and subtracted the median number of tags in female pregnancies from all samples; this amounted to a correction of a few percent. The width of 99% confidence intervals was calculated for all disomy 21 pregnancies as t*s/vN, where N is the number of disomy 21 pregnancies, t is the t-statistic corresponding to a=0.005 with degree of freedom equals N-I, and s is the standard deviation. A confidence interval gives an estimated range of values, which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary vl.l)
To investigate the distribution of sequence tags around transcription start sites, a sliding window of 5bp was applied from -lOOObp to +1000bp of transcription start sites of all RefSeq genes on all chromosomes except chromosome Y. The number of sequence tags mapped to the sense and antisense strands within each window was counted. Moving average with a window of 10 data points was used to smooth the data. All analyses were done with Matlab.
We selected the sequence tags that mapped uniquely to the human genome with at most 1 mismatch (on average ~5 million) for analysis. The distribution of reads across each chromosome was examined. Because the distribution of sequence tags across each chromosome was non-uniform (possibly technical artifacts), we divided the length of each chromosome into non-overlapping sliding window with a fixed width (in this particular analysis, a 50kbp window was used), skipping regions of genome assembly gaps and regions with known micro satellite repeats. The width of the window is should be large enough such that there are a sufficient number of sequence tags in each window, and should be small enough such that there are sufficient number of windows to form a distribution. With increasing sequencing depth (i.e., increasing total number of sequence tags), the window width can be reduced. The number of sequence tags in each window was counted. The distribution of the number of sequence tags per 50kb for each chromosome was examined. The median value of the number of sequence tags per 50kb (or 'sequence tag density') for each chromosome was chosen in order to suppress the effects of any under- or over- represented regions within the chromosome. Because the total number of sequence tags obtained for each sample was different, in order to compare among samples, we normalized each chromosomal sequence tag density value (except chromosome Y) by the median sequence tag density among all autosomes (non-sex chromosomes).
For the 454/Roche data, reads were aligned to the human genome build 36 (hgl8, see hyper text transfer protocol (http) genome.ucsc.edu/cgi-bin/hgGateway) using the 454 Reference Mapper. Reads having accuracy of greater than or equal to 90% and coverage (i.e., fraction of read mapped) greater than or equal to 90% were retained for analysis. To study the size distribution of total and fetal DNA, the number of retained reads falling within each lObp window between 50bp to 330bp was counted. The number of reads falling within different size ranges may be studied, i.e., reads of between 50-60 bp, 60-70 bp, 70-80 bp, etc., up to about 320-330 bp, which is around the maximum read length obtained.
EXAMPLE 5: Genome Data Retrieval Information regarding G/C content, location of transcription start sites of RefSeq genes, location of assembly gaps and microsatellites were obtained from the UCSC Genome Browser.
EXAMPLE 6 Nucleosome Enrichment
The distribution of sequence tags around transcription start sites (TSS) of RefSeq genes were analyzed (data not shown). The plots were similar to Figure 4. Each plot represented the distribution for each plasma DNA or gDNA sample. Data are obtained from three different sequencing runs (Pl, P6, P52, P53, P26, P40, P42 were sequenced together; male genomic DNA, male plasma DNA, P2, P7, P14, P19, P31 were sequenced together; P17, P20, P23, P57, P59, P64 were sequenced together). The second batch of samples suffers greater G/C bias as observed from inter- and intra-chromosomal variation. Their distributions around TSS have similar trends with more tags at the TSS. Such trend is not as prominent as in the distributions of samples sequenced in other runs. Nonetheless, at least 3 well- positioned nucleosomes were detectable downstream of transcription start sites for most plasma DNA samples, suggesting that cell-free plasma DNA shares features of nucleosomal DNA, a piece of evidence that this DNA is of apoptotic origin. EXAMPLE 7: Calculating fetal DNA fraction in maternal plasma of male pregnancies: i. With Digital PCR Taqman Assays
Digital PCR is the amplification of single DNA molecule. DNA sample is diluted and distributed across multiple compartments such that on average there is less than 1 copy of DNA per compartment. A compartment displaying fluorescence at the end of a PCR represents the presence of at least one DNA molecule.
Assay for Total DNA: EIF2C1 (Chromosome 1)
Assay for Fetal DNA: SRY (Chromosome Y)
The count of positive compartments from the microfluidic digital PCR chip of each assay is converted to the most probable count according to the method described in the supporting information of the following reference: Warren L, Bryder D, Weissman IL, Quake SR (2006) Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR. Proc Nat Acad Sci, 103: 17807-12.
Fetal DNA Fraction [epsilon] = (SRY count) / (EIF2C1 count / 2) ii. With Sequence Tags From ChrX:
Let fetal DNA fraction be [epsilon]

Male pregnancies ChrX sequence tag density (fetal and maternal) = 2(1 -[epsilon]) + [epsilon] = 2 - [epsilon]
Female pregnancies ChrX sequence tag density (fetal and maternal) = 2(l- [epsilon]) + 2 [epsilon] =
2
Let x be the ratio of ChrX sequence tag density of male to female pregnancies. In this study, the denominator of this ratio is taken to be the median sequence tag density of all female pregnancies.
Thus, fetal DNA fraction [epsilon] = 2( 1 -x) From ChrY:
Fetal DNA fraction [epsilon] = (sequence tag density of ChrY in maternal plasma/sequence tag density of ChrY in male plasma)
Note that in these derivations, we assume that the total number of sequence tags obtained is the same for all samples. In reality, the total number of sequence tags obtained for different sample is different, and we have taken into account such differences in our estimation of fetal DNA fraction by normalizing the sequence tag density of each chromosome to the median of the autosomal sequence tag densities for each sample.
Calculating fetal DNA fraction in maternal plasma of aneuploid (trisomy) pregnancies: Let fetal DNA fraction be [epsilon]

Trisomic pregnancies trisomic chromosome sequence counts (fetal and maternal)
= 2(l-[epsilon]) + 3[epsilon] = 2 + [epsilon]
Disomic pregnancies trisomic chromosome sequence counts (fetal and maternal)
= 2(l- [epsilon]) + 2 [epsilon] = 2
Let x be the ratio of trisomic chromosome sequence counts (or sequence tag density) of trisomic to disomic pregnancies. In this study, the denominator of this ratio is taken to be the median sequence tag density of all disomic pregnancies.
Thus, fetal DNA fraction [epsilon] = 2(x-l).
EXAMPLE 8: Correction of sequence tag density bias resulting from G/C or A/T content among different chromosomes in a sample
This example shows a refinement of results indicating sequences mapping to different chromosomes and permitting the determination of the count of different chromosomes or regions thereof. That is, the results as shown in Figure IA may be corrected to eliminate the variations in sequence tag density shown for chromosomes higher in G/C content, shown towards the right of the Figure. This spread of values results from sequencing bias in the method used, where a greater number of reads tend to be obtained depending on G/C content. The results of the method of this example are shown in Figure 10. Figure 10 is an overlay which shows the results from a number of different samples, as indicated in the legend. The sequence tag density values in Figs 1 and 10 were normalized to those of a male genomic DNA control, since the density values are not always 1 for all the chromosomes (even after GC correction) but are consistent among a sample. For example, after GC correction, values from all samples for chrl9 cluster around 0.8 (not shown). Adjusting the data to a nominal value of 1 can be done by plotting the value relative to the male gDNA control. This makes the values for all chromosomes cluster around 1
Outlying chromosome sequence tag densities can be seen as significantly above a median sequence tag density; disomic chromosomes are clustered about a line running along a density value of about 1. As can be seen there, the results from chromosome 19 (far right, highest in G/C content), for example, show a similar value when disomic as other disomic chromosomes. The variations between chromosomes with low and high G/C content are eliminated from the data to be examined. Samples (such as P13 in the present study) which could not have been unambiguously interpreted now may be. Since G/C content is the opposite of A/T content, the present method will correct for both. Either G/C bias or A/T bias can result from different sequencing methods. For example, it has been reported by others that the Solexa method results in a higher number of reads from sequences where the G/C content is high. See, Dohm et al., "Substantial biases in ultra-short read data sets from high- throughput DNA sequencing," Nuc. Acids Res. 36(16), elO5; doi:10.1093/nar/gkn425. The procedure of the present example follows the following steps:
a. Calculate G/C content of the human genome. Calculate the G/C content of every 20kb non- overlapping window of each chromosome of the human genome (HG 18) using the hgG/CPercent script of the UCSC Genome Browser's "kent source tree," which contains different utility programs, available to the public under license. The output file contains the coordinate of each 20kb bin and the corresponding G/C content. It was found that a large number of reads were obtained higher G/C ranges (about 55-70%) and very few reads were obtained at lower G/C content percentages, with essentially none below about 30% G/C (data not shown). Because the actual length of a sequenced DNA fragment is not known (we only sequenced the first 25bp of one end of a piece of DNA on the flow cell), and it's the G/C content of the entire piece of DNA that contributed to sequencing bias, an arbitrary window of known human genomic DNA sequence is chosen for determining G/C content of different reads. We chose a 20kb window to look at the relationship between number of reads and GC content. The window can be much smaller e.g., 10kb or 5kb, but a size of 20kb makes computation easier.
b. Calculate the relationship between sequence coverage and G/C content. Assign weight to each read according to G/C content. For each sample, the number of read per 20kb bin is counted. The number of read is plotted against G/C content. The average number of read is calculated for every 0.1% G/C content, ignoring bins with no reads, bins with zero G/C percent, and bins with over- abundant reads. The reciprocal of the average number of reads for a particular G/C percent relative to the global median number of read is calculated as the weight. Each read is then assigned a weight depending on the G/C percent of the 20kb window it falls into.
c. Investigate the distribution of reads across each autosome and chromosome X. In this step, the number of reads, both unweighted and weighted, in each non-overlapping 50kb window is recorded. For counting, we chose a 50kb window in order to obtain a reasonable number of reads per window and reasonable number of windows per chromosome to look at the distributions. Window size may be selected based on the number of reads obtained in a given experiment, and may vary over a wide range. For example, 30K-100K may be used. Known microsatellite regions are ignored. A graph showing the results of chrl of P7 is shown in Figure 11, which illustrates the weight distribution of this step (c) from sample P7, where the weight assigned to different G/C contents is shown; Reads with higher G/C content are overly represented than average and thus are given less weight.
d. Investigate the distribution of reads across chrY. Calculate the number of chrY reads in transcribed regions after applying weight to reads on chrY. Chromosome Y is treated individually because it is short and has many repeats. Even female genome sequence data will map in some part to chromosome Y, due to sequencing and alignment errors. The number of chrY reads in transcribed regions after applying weight to reads on chrY is used to calculate percentage of fetal DNA in the sample.
EXAMPLE 9: Comparing different patient samples using statistical analyses (t statistic)
This example shows another refinement of results as obtained using the previous examples. In this case, multiple patient samples are analyzed in a single process. Figure 12 illustrates the results of an analysis of patients P13, P19, P31, P23, P26, P40, P42, Pl, P2, P6, P7, P14, P17, P20, P52, P53, P57, P59 and P64, with their respective karyotypes indicated, as in Table 1, above. The dotted line shows the 99% confidence interval, and outliers may be quickly identified. It may be seen by looking below the line that male fetuses have less chromosome X (solid triangles). An exception is P19, where it is believed that there were not enough total reads for this analysis. It may be seen by looking above the line that trisomy 21 patients (solid circles) are P 1, 2, 6, 7, 14, 17, 20, 52 and 53. P57 and 59 have trisomy 18 (open diamonds) and P64 has trisomy 13 (star). This method may be presented by the following three step process:
Step 1: Calculate a t statistic for each chromosome relative to all other chromosome in a sample. Each t statistic tells the value of each chromosome median relative to other chromosomes, taking into account the number of reads mapped to each chromosome (since the variation of the median scales with the number of reads). As described above, the present analyses yielded about 5 million reads per sample. Although one may obtain 3-10 million reads per sample, these are short reads, typically only about 20-100 bp, so one has actually only sequenced, for example about 300 million of the 3 billion bp in the human genome. Thus, statistical methods are used where one has a small sample and the standard deviation of the population (3 billion, or 47 million for chromosome 21) is unknown and it is desired to estimate it from the sample number of reads in order to determine the significance of a numerical variation. One way to do this is by calculating Student's t-distribution, which may be used in place of a normal distribution expected from a larger sample. The t-statistic is the value obtained when the t-distribution is calculated. The formula used for this calculation is given below. Using the methods presented here, other t-tests can be used.
Step 2: Calculate the average t statistic matrix by averaging the values from all samples with disomic chromosomes. Each patient sample data is placed in a t matrix, where the row is chrl to chr22, and the column is also chrl to chr22. Each cell represents the t value when comparing the chromosomes in the corresponding row and column (i.e., position (2,1) in the matrix is the t-value of when testing chr2 and chrl) the diagonal of the matrix is 0 and the matrix is symmetric. The number of reads mapping to a chromosome is compared individually to each of chrl -22.
Step 3: Subtract the average t statistic matrix from the t statistic matrix of each patient sample. For each chromosome, the median of the difference in t statistic is selected as the representative value. The t statistic for 99% confidence for large number of samples is 3.09. Any chromosome with a representative t statistic outside -3.09 to 3.09 is determined as non- disomic.
EXAMPLE 10: Calculation of required number of sequence reads after G/C bias correction
In this example, a method is presented that was used to calculate the minimum concentration of fetal DNA in a sample that would be needed to detect an aneuploidy, based on a certain number of reads obtained for that chromosome (except chromosome Y). Figure 13 and Figure 14 show results obtained from 19 patient plasma DNA samples, 1 donor plasma DNA sample, and duplicate runs of a donor gDNA sample. It is estimated in Figure 13 that the minimum fetal DNA % of which over-representation of chr21 can be detected at the best sampling rate (~70k reads mapped to chr21) is -6%. (indicated by solid lines in Fig. 13). The lines are drawn between about 0.7 XlO<5> reads and 6% fetal DNA concentration. It can be expected that higher numbers of reads (not exemplified here) the needed fetal DNA percentage will drop, probably to about 4%.
In Figure 14, the data from Figure 13 are presented in a logarithmic scale. This shows that the minimum required fetal DNA concentration scales linearly with the number of reads in a square root relationship (slope of -.5). These calculations were carried out as follows:
For large n (n>30), t statistic t = , where y2 - yr is the difference in

means (or amount of over- or under-representation of a particular chromosome) to be measured; s is the standard deviation of the number of reads per 50kb in a particular chromosome; n is the number of samples (i.e., the number of 50kb windows per chromosome). Since the number of 50kb windows per chromosome is fixed, U1 = Ti2 . If we
assume that S1 - S2 , - y2 - - y{ ~ t I l -[iota]s -<1'> = sqrt(2)*half width of the confidence interval at
confidence level governed by the value of t. Thus,

<{> . For every chromosome in every sample, we can calculate the value [iota] - -[iota] , which corresponds to the minimum
over- or under-representation (=--1 ) that can be resolved with confidence level governed
by the value of t. Note that 2*( = -l ) *100% corresponds to the minimum fetal DNA % of
which any over- or under-representation of chromosomes can be detected. We expect the number of reads mapped to each chromosome to play a role in determining standard deviation si, since according to Poisson distribution, the standard deviation equals to the
square root of the mean. By plotting 2*( <->=<-> - 1 ) * 100% vs. number of reads mapped to each

chromosome in all the samples, we can evaluate the minimum fetal DNA % of which any over- or under-representation of chromosomes can be detected given the current sampling rate.
After correction of G/C bias, the number of reads per 50kb window for all chromosomes (except chromosome Y) is normally distributed. However, we observed outliers in some chromosomes (e.g., a sub-region in chromosome 9 has near zero representation; a sub-region in chromosome 20 near the centromere has unusually high representation) that affect the calculation of standard deviation and the mean. We therefore chose to calculate confidence interval of the median instead of the mean to avoid the effect of outliers in the calculation of confidence interval. We do not expect the confidence interval of the median and the mean to be very different if the small number of outliers has been removed. The 99.9% confidence interval of the median for each chromosome is estimated from bootstrapping 5000 samples from the 50kb read distribution data using the percentile method. The half width of the confidence interval is estimated as 0.5*confidence interval. We plot 2*(half width of confidence interval of median )/median* 100% vs. number of reads mapped to each chromosome for all samples.
Bootstrap resampling and other computer-implemented calculations described here were carried out in MATLAB<(R)>, available from The Mathworks, Natick, MA. CONCLUSION
The above specific description is meant to exemplify and illustrate the invention and should not be seen as limiting the scope of the invention, which is defined by the literal and equivalent scope of the appended claims. Any patents or publications mentioned in this specification are intended to convey details of methods and materials useful in carrying out certain aspects of the invention which may not be explicitly set out but which would be understood by workers in the field. Such patents or publications are hereby incorporated by reference to the same extent as if each was specifically and individually incorporated by reference, as needed for the purpose of describing and enabling the method or material referred to.

Sequence Baby: Prenatal & Neonatal Genetic Analysis Resource