Extracting CYP2C19 Star Alleles from 23andMe — Plavix (Clopidogrel) Response Prediction in Python
Python tutorial: parse 23andMe raw data for CYP2C19 *2, *3, and *17 variants, predict your clopidogrel (Plavix) response phenotype, and understand why ~20% of Koreans get suboptimal antiplatelet protection on standard dosing. Includes the SNP-to-star-allele lookup, phenotype interpretation, and where DTC data falls short.
The Stent + Plavix Population That Should Care
Roughly 100,000 Koreans receive a coronary stent (PCI) every year. Almost all leave the hospital on dual antiplatelet therapy: aspirin + clopidogrel (Plavix). The two drugs reduce in-stent thrombosis risk by ~80%.
Except — about 15-20% of Koreans are CYP2C19 poor metabolizers and clopidogrel doesn't work properly for them. The drug is a prodrug; it has to be activated by the CYP2C19 enzyme into its active metabolite. If your CYP2C19 is impaired, you get suboptimal antiplatelet effect even on the standard 75 mg/day, and your in-stent thrombosis risk rises 1.5-2× vs normal metabolizers.
This matters because:
- A simple genetic test would catch this risk
- Alternative antiplatelets exist (prasugrel, ticagrelor) that don't require CYP2C19 activation
- Most Korean PCI patients never get tested
This tutorial shows how to extract the CYP2C19 information from a 23andMe raw data file. It's the practical sibling of Reading 23andMe Raw Data for CYP2D6 Star Alleles in Python but for CYP2C19 — the one that matters for Plavix.
CYP2C19 Quick Background
- Gene location: chromosome 10q23.33
- What it does: oxidative drug metabolism (CYP450 family)
- Major substrates: clopidogrel, proton pump inhibitors (omeprazole, esomeprazole), some SSRIs (citalopram, escitalopram), some antidepressants and antifungals
- Key alleles (PharmVar definitions):
| Allele | Function | Defining SNP (rsid) | Variant base |
|---|---|---|---|
| *1 | Normal | (reference) | — |
| *2 | No function | rs4244285 | A (681G>A) |
| *3 | No function | rs4986893 | A (636G>A) |
| *17 | Increased function | rs12248560 | T (-806C>T) |
For Plavix specifically, you need to know if you carry *2, *3, or are *17 (which is the rapid metabolizer flip side).
Population frequencies
| Population | *2 carrier % | *3 carrier % | *17 carrier % | Poor metabolizers |
|---|---|---|---|---|
| Korean | 25-30 | 6-10 | 5-10 | 15-20% |
| Japanese | 25-30 | 7-10 | 5-10 | 15-20% |
| Chinese | 25-30 | 5-8 | 5-10 | 14-18% |
| European | 12-15 | <1 | 22-28 | 2-5% |
| African American | 14-18 | <1 | 18-22 | 3-6% |
Korea has ~5× higher poor metabolizer rate than European populations. This is one of the cleanest examples of why pharmacogenomic guidance from Western trials may underestimate the importance of testing in Korean clinical practice.
Python Tutorial
Load 23andMe Raw Data
import pandas as pd
snps = pd.read_csv(
'genome_Firstname_Lastname_Full_v5_Full_20260101230000.txt',
sep='\t',
comment='#',
names=['rsid', 'chromosome', 'position', 'genotype'],
dtype={'rsid': str, 'chromosome': str, 'position': int, 'genotype': str},
)
Locate CYP2C19 Region
CYP2C19 lives on chromosome 10, GRCh37 coordinates ~96,522,000-96,613,000.
cyp2c19_snps = snps[
(snps['chromosome'] == '10') &
(snps['position'].between(96_520_000, 96_615_000))
]
print(f"CYP2C19 region SNPs detected: {len(cyp2c19_snps)}")
print(cyp2c19_snps)
A typical 23andMe v5 chip returns 8-15 SNPs in this region. Most are intronic; you care about three.
Check Star Alleles
cyp2c19_lookup = {
'rs4244285': {'allele': '*2', 'variant_base': 'A', 'effect': 'no function'},
'rs4986893': {'allele': '*3', 'variant_base': 'A', 'effect': 'no function'},
'rs12248560': {'allele': '*17', 'variant_base': 'T', 'effect': 'increased function'},
}
def check_cyp2c19(snps_df, lookup):
results = []
for rsid, info in lookup.items():
row = snps_df[snps_df['rsid'] == rsid]
if len(row) == 0:
results.append({
'rsid': rsid, 'allele': info['allele'], 'effect': info['effect'],
'genotype': 'NOT_TESTED', 'carries': '?',
})
continue
gt = row.iloc[0]['genotype']
v = info['variant_base']
cnt = gt.count(v)
carries = {0: 'no', 1: 'heterozygous (1 copy)', 2: 'homozygous (2 copies)'}.get(cnt, '?')
results.append({
'rsid': rsid, 'allele': info['allele'], 'effect': info['effect'],
'genotype': gt, 'carries': carries,
})
return pd.DataFrame(results)
stars = check_cyp2c19(cyp2c19_snps, cyp2c19_lookup)
print(stars)
Predict Phenotype
Combine the alleles into a phenotype following CPIC guidelines:
def cyp2c19_phenotype(stars_df):
has = {row['allele']: row['carries'] for _, row in stars_df.iterrows()}
s2 = has.get('*2', '?')
s3 = has.get('*3', '?')
s17 = has.get('*17', '?')
# Activity score (CPIC-style simplified)
no_func_alleles = 0
if s2 == 'homozygous (2 copies)': no_func_alleles += 2
elif s2 == 'heterozygous (1 copy)': no_func_alleles += 1
if s3 == 'homozygous (2 copies)': no_func_alleles += 2
elif s3 == 'heterozygous (1 copy)': no_func_alleles += 1
increased = 0
if s17 == 'homozygous (2 copies)': increased = 2
elif s17 == 'heterozygous (1 copy)': increased = 1
if no_func_alleles == 2:
return "Poor Metabolizer (PM)", "Clopidogrel: avoid; use prasugrel or ticagrelor"
if no_func_alleles == 1 and increased == 0:
return "Intermediate Metabolizer (IM)", "Clopidogrel: reduced efficacy; consider alternatives"
if no_func_alleles == 1 and increased >= 1:
return "Likely Intermediate", "Clopidogrel: variable response; clinical discretion"
if no_func_alleles == 0 and increased >= 1:
return "Rapid/Ultrarapid Metabolizer (RM/UM)", "Clopidogrel: standard dose; SSRI/PPI may need dose review"
return "Normal Metabolizer (NM)", "Standard clopidogrel dose appropriate"
phenotype, clinical_note = cyp2c19_phenotype(stars)
print(f"Phenotype: {phenotype}")
print(f"Clinical implication: {clinical_note}")
Sample output for a hypothetical Korean user:
rsid allele effect genotype carries
0 rs4244285 *2 no function GA heterozygous (1 copy)
1 rs4986893 *3 no function GG no
2 rs12248560 *17 increased function CC no
Phenotype: Intermediate Metabolizer (IM)
Clinical implication: Clopidogrel: reduced efficacy; consider alternatives
This person carries one *2 allele — they're an intermediate metabolizer. If they were prescribed Plavix after a stent, they're at elevated thrombosis risk vs a normal metabolizer.
Phenotype Reference Table
| Genotype | Phenotype | Approx. KR frequency | Clopidogrel guidance |
|---|---|---|---|
| *1/*1 | Normal Metabolizer (NM) | 35-45% | Standard 75 mg/day |
| *1/*2 or *1/*3 | Intermediate (IM) | 30-35% | Suboptimal — consider alternative |
| *2/*2, *2/*3, *3/*3 | Poor Metabolizer (PM) | 15-20% | Avoid clopidogrel; use prasugrel or ticagrelor |
| *1/*17 | Rapid (RM) | 5-8% | Standard dose; monitor for bleeding |
| *17/*17 | Ultrarapid (UM) | <2% | Standard dose; consider lower in bleeding-risk patients |
| Mixed *2 or *3 + *17 | Variable | 2-5% | Case by case |
Clinical Context — Why This Matters Beyond Theory
CPIC guideline (Lee et al., 2022)
- Strong recommendation: PM and IM patients should receive prasugrel or ticagrelor instead of clopidogrel for acute coronary syndrome
- Moderate recommendation: same for elective PCI
Real-world evidence
A 2020 Lancet meta-analysis showed CYP2C19 PMs receiving clopidogrel after PCI had:
- 1.5-1.8× higher rate of recurrent stent thrombosis
- 1.4× higher major adverse cardiovascular events
The Korea-specific gap
Despite the elevated PM rate in Koreans, CYP2C19 testing is not routinely performed before clopidogrel prescription in most Korean hospitals. Reasons:
- Insurance reimbursement is conditional (covered in defined post-stent contexts at major centers; not for all clopidogrel users)
- Test turnaround can exceed acute-phase decision window
- Clinical inertia toward established dosing
A handful of Korean tertiary hospitals (SNUH, AMC, SMC, Severance) now perform routine pre-procedural CYP2C19 testing. Coverage is expanding slowly.
Other CYP2C19 Substrates to Care About
CYP2C19 affects multiple drug classes. If you know your CYP2C19 phenotype, also consider:
Proton pump inhibitors (PPIs)
Omeprazole, esomeprazole, lansoprazole are CYP2C19 substrates.
- PM → drug accumulates → may help reflux better, but long-term PPI use increases certain risks (B12 deficiency, bone loss)
- UM → drug clears too fast → standard PPI dose may be insufficient
SSRIs
Citalopram and escitalopram are partly CYP2C19 metabolized.
- PM → drug accumulates → increased risk of QT prolongation at standard dose (FDA dose cap may apply)
- UM → may need higher dose to achieve effect
Antifungals (voriconazole)
CYP2C19 is the major metabolic pathway. PM has elevated voriconazole levels → toxicity risk; UM has subtherapeutic levels → treatment failure. Routinely tested before voriconazole.
DTC Limitations You Should Know
Same caveats as the broader DTC PGx discussion:
- **23andMe catches *2, 3, 17 common variants — good coverage of clinically relevant alleles
- **Misses rare *4, *5, 6, 8 alleles — small fraction of populations, but if you carry one your phenotype could differ from prediction
- Misses copy number variants — CYP2C19 has *13 (decreased function) and other rare CNVs that SNP arrays don't detect
For making actual prescription decisions (especially around major events like PCI), use a clinical CYP2C19 panel from a hospital lab. The DTC result is hypothesis-generating, not diagnostic. See PGx Complete Guide 2026 for the clinical testing landscape.
FAQ
Q: I'm scheduled for PCI next week — should I rush this? If you have time and your hospital can order it, request CYP2C19 testing. Clinical-grade test, not DTC. If timing is tight, your interventional cardiologist may default to alternative antiplatelets pending result, or use platelet function assays.
Q: I already had a stent on Plavix years ago, no problems — does it matter now? You survived the high-risk window (first 6-12 months post-stent). Long-term Plavix benefit is less time-sensitive. Worth knowing your CYP2C19 for future medications (PPI, SSRI choices). No emergency change needed for the chronic Plavix.
**Q: My DTC shows 17/17 — do I bleed more on Plavix? *17/*17 = ultrarapid → more active metabolite → stronger antiplatelet effect → modestly elevated bleeding risk. Tell your cardiologist; standard dose is still typically used, but they may monitor more closely.
Q: How does this differ from CYP2D6 (the other Big PGx gene)? CYP2D6 metabolizes ~25% of all prescription drugs (codeine, tramadol, many psychiatric meds). CYP2C19 metabolizes ~10% but includes critical cardiology drugs. They're different genes on different chromosomes — knowing one doesn't tell you the other. Test both via clinical panels if you take multiple medications.
Q: My family member had Plavix failure — should I test before I might need it? Familial CYP2C19 PM clusters. If a first-degree relative had documented Plavix failure or stent thrombosis, your prior probability of PM is much higher than population baseline. Worth knowing your status — both for current and future medications.
Q: Can a doctor refuse to honor my DTC result? They cannot prescribe based solely on DTC; clinical guidelines require validated tests. They can use the DTC result as a reason to order the clinical test. That's the appropriate workflow.
Q: Does aspirin have similar genetic factors? Aspirin metabolism has some genetic variation but the clinical impact is much smaller and standard low-dose (75-100 mg) works for most. The dual antiplatelet question is really about clopidogrel.
Closing — The Practical Takeaways
- Korean populations have ~15-20% CYP2C19 PMs — much higher than European populations
- Clopidogrel (Plavix) needs CYP2C19 activation — PMs get reduced antiplatelet protection
- **23andMe catches the common *2, 3, 17 alleles with code in this guide
- For PCI decisions, get clinical CYP2C19 testing — DTC is informational, not diagnostic
- Alternative antiplatelets exist (prasugrel, ticagrelor) for confirmed PMs/IMs
If you've had cardiology contact and ended up on clopidogrel, knowing your CYP2C19 phenotype is one of the most clinically actionable PGx datapoints available. The DTC raw data + the Python here gets you to a reasonable hypothesis; bring it to your cardiologist for confirmation testing.
Related posts:
- 약물유전체학 (PGx) 완전 가이드 2026
- Reading 23andMe Raw Data for CYP2D6 Star Alleles in Python
- DTC Genetic Testing 2026 Complete Buyer's Guide
- BRCA1/2 + 유전성 암 가이드
- DTC Genetic Test 결과지 해석 완전 가이드
References:
- Lee, C. R. et al. (2022). CPIC Guideline for CYP2C19 and Clopidogrel Therapy: 2022 Update. Clinical Pharmacology & Therapeutics, 112, 959-967.
- Mega, J. L. et al. (2010). Cytochrome P-450 polymorphisms and response to clopidogrel. NEJM, 360, 354-362.
- PharmVar CYP2C19 page: https://www.pharmvar.org/gene/CYP2C19
- PharmGKB: https://www.pharmgkb.org
⚠️ Medical disclaimer: This article is for informational purposes. DTC-derived genetic information is not a substitute for clinical pharmacogenomic testing or physician advice. Do not start, stop, or change clopidogrel or any antiplatelet medication based on DTC test results alone — discuss with your cardiologist.
관련 글
Reading 23andMe Raw Data for CYP2D6 Star Alleles in Python — Why DTC Often Misses *5 Deletion
5월 23일 · 11 min read
약물유전체학약물유전체학 (PGx) 완전 가이드 — 유전자가 약물 반응을 어떻게 결정하나 2026
5월 18일 · 22 min read
유전체분석ClinVar Variant Pathogenicity Lookup in Python — Programmatic Access for Hereditary Disease Screening (2026)
5월 27일 · 12 min read
유전자검사DTC Genetic Testing in 2026: Complete Buyer's Guide (Post-23andMe Era)
5월 18일 · 17 min read