biopython/biopython

EMBL DT lines are being ignored

Open

#1016 opened on Dec 1, 2016

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Python (3,452 stars) (1,580 forks)batch import
EnhancementFrom Redminehelp wanted

Description

Originally filed by Dr. Wim De Smet http://lmg.ugent.be/users/dr-wim-de-smet as part of RedMine issue 3074, https://redmine.open-bio.org/issues/3074

You can see this by running the EMBL scanner directly with debugging turned on:

$ python -c "from Bio.GenBank.Scanner import EmblScanner; import sys; print(next(EmblScanner(debug=2).parse_records(sys.stdin)))" < A04195.imgt
Found the start of a record:
ID   A04195 IMGT/LIGM annotation : by annotators; RNA; SYN; 51 BP.

Found feature table
Ignoring EMBL header line:
DT   15-MAY-1995 (Rel. 2, arrived in LIGM-DB )
Ignoring EMBL header line:
DT   20-APR-1999 (Rel. 11, Last updated, Version 4)
Ignoring EMBL header line:
FH   Key                 Location/Qualifiers
Found end of features
ID: A04195
Name: A04195
Description: Artificial Ig lambda-chain mRNA ; RNA; rearranged configuration; Ig
-Light-Lambda; regular.
Database cross-references: EMBL:A04195
Number of features: 2
/taxonomy=['other sequences', 'artificial sequences']
/references=[Reference(title=';', ...)]
/accessions=['A04195']
/molecule_type=RNA
/data_file_division=SYN
/keywords=['antigen receptor', 'Immunoglobulin superfamily (IgSF)', 'Immunoglobu
lin (IG)', 'IG-Light', 'IG-Light-Lambda', 'cDNA', 'rearranged']
/organism=synthetic construct
Seq('GATTGATCAATGCAGGCTGTTATGACTCAGGAATCTGCACTCACCACATCA', IUPACAmbiguousDNA())

Contributor guide