15/12/2008
Projet Raw-data
- La Saga
le développement de cet été
- Le Résultat
Projet Raw-data La Saga le dveloppement de cet t Le Rsultat - - PowerPoint PPT Presentation
Projet Raw-data La Saga le dveloppement de cet t Le Rsultat Readers Analyst & Xcalibur 15/12/2008 La Saga La source : Les API des constructeurs L'interface : OLE/COM Le software : Dveloppement d'un reader
15/12/2008
15/12/2008
15/12/2008
15/12/2008
15/12/2008
– *VB6 ; VBScript ;
15/12/2008
15/12/2008
15/12/2008
Nombreuses itérations
export CSV ; tests graph R
export pseudo mzXML ; tests SmileMS test contrôles directs d'analyste
– Boost serialisation – Google ProtoBuf serialisation
15/12/2008
15/12/2008
15/12/2008
15/12/2008
15/12/2008
– ne charge que la structure gobale – pas des pics / pas de contenus des spectres
– ignore complètement la collection
– chaque spectre a un « refid » : un champs 64bits – readers y mettent un identifiant opaque unique pour le spectre – Avec le paramètre « refid » : on peut lire un spectre spécifique
15/12/2008
– Pas de détails dans les spectres mais – « refid » : identifiand unique pour chaque spectre
15/12/2008
15/12/2008
/*************** * * * GET DATA * * * ***************/ bol::RunLcmsms *run; { arthur reader; // init the reader, and open the file reader.initfile(ifname.c_str()); // get information about sample if (vm.count("sample-info")) { reader.samplelist(NULL); dhUninitialize(TRUE); exit(0); } else { if (vm.count("runname")) reader.samplelist(&runname); else reader.samplelist(); } // read the file reader.dosample( allatonce, skipms1,skipms2,dropms1,dropms2, reflist.size() ? &reflist : NULL); // create links between fragment and precusros reader.buildptr(); // get data run = reader.getRun(); }
15/12/2008
15/12/2008
/** * this function takes care of the first steps of handling raw files : * - getting the necessary OLE/COM object * - setting up a RunLcmsms object to hold the results * * @param ifname Filename to use * @param ptr Write to a specific RunLcmsms instance. Other wise, if NULL, we allocate one. */ void arthur::initfile(const char *ifname, RunLcmsms *ptr) { // reset the tables // as the spectra are all represented using USHORT in XCalibur, I don't except to have more than 64k spectra to handle parentmap.clear(); parentmap.resize(USHRT_MAX,0); parentsp1.clear(); parentsp1.resize(USHRT_MAX,NULL); parentsp2.clear(); parentsp2.resize(USHRT_MAX,NULL); extractmap.clear(); extractmap.resize(USHRT_MAX,false); // *** EXPORT CAPTAINBOL // allocate an instance if needed... run = ptr ? ptr : new RunLcmsms(); // // Open the file // /** * -=- !!!XDK!!! -=- * XDK presents an API for accessing RAW files which consist of a RAW-file reader (XRaw) * which presents the data in a structured content. * See "Raw File Hierarchy" in the XDK documentation : * C:/Xcalibur/help/xdkhelp/rawfile_hierarchy.htm * (unlike, for example, Analyst Cookbook which present an API more oriented toward piloting Analyst * and presents data in a way which closely maps to various phase of the pipeline) * * in Xcalibur, each RAW file contains one and only one single sample, and all data pertaining it * the sample is the first-grade citizen and everything is structured around it. * * Note about classes : * - every count is 1-based and everything is counted consistently using 1...n like in Basic (not 0...n-1 like in C) * - the classes aren't grouped in a common name space like "Xcalibur.{class}.1" or "XDK.{class}.1" * instead the classes use the same name for name space "{class}.{class}.1" * - several classes have "~Read" equivalent which are read only. */ //DISPATCH_OBJ(raw); try { dhCheck( dhCreateObject(L"XRaw.XRaw.1", NULL, &raw) ); } catch (string errstr) { cerr << "Fatal error - can't create the main XCalibur object (XRaw)" << endl << endl << "Is the library correctly registered ?" << endl
15/12/2008
// TODO we can access the parent's TIC using parent.Scan.Header.Tic cout << endl; } END_WITH_THROW(parent); } END_WITH_THROW(parentcol); } catch(string err) { // cout << "\tno parents"; cerr << "Warning, can't get parent informations !!!" << endl << err << endl; } /// /// At THIS point, we have a lot of information gathered. We can start building a spectrum /// // *** EXPORT CAPTAINBOL // No parent ? Use info from Filter if (mass == 0.&& curfilt) mass = curfilt->Parent(); // Charge mapper ChargeMapper chm; if (charge >= 1) chm.addCharge(charge); // spectrum ! SpectrumLcmsms *sp = new SpectrumLcmsms(mass, chm.charges()); if (sp == NULL) { cerr << "Critical Out of memory !!!" << endl;
15/12/2008
... Damage CUDA / C++
by DrYak
... SmileMS Java
Mylonas & Mauron
Java ProteoLib Java
by Masselot, Nikitin & Müller
Xenobol : merlinbol & freudbol COM / C++
by DrYak
Captain-bol BOOST / C++
by Masselot, Collinge & DrYak
Serialisation Google ProtoBuff Serialisation Google ProtoBuff
15/12/2008
15/12/2008
00000000 0a 19 43 61 6e 6e 61 62 69 73 20 31 30 30 20 75 |..Cannabis 100 u| 00000010 67 2f 6c 20 4d 52 4d 20 70 6f 73 12 c8 05 0a 9e |g/l MRM pos.....| 00000020 01 20 2b 4d 52 4d 20 28 32 39 34 20 70 61 69 72 |. +MRM (294 pair| 00000030 73 29 3a 20 45 78 70 20 31 2c 20 30 2e 30 32 39 |s): Exp 1, 0.029| 00000040 20 6d 69 6e 20 66 72 6f 6d 20 53 61 6d 70 6c 65 | min from Sample| 00000050 20 31 20 28 43 61 6e 6e 61 62 69 73 20 31 30 30 | 1 (Cannabis 100| 00000060 20 75 67 2f 6c 20 4d 52 4d 20 70 6f 73 29 20 6f | ug/l MRM pos) o| 00000070 66 20 63 3a 2f 44 6f 63 75 6d 65 6e 74 73 20 61 |f c:/Documents a| 00000080 6e 64 20 53 65 74 74 69 6e 67 73 2f 70 68 65 6e |nd Settings/phen| 00000090 79 78 61 64 6d 69 6e 2f 44 65 73 6b 74 6f 70 2f |yxadmin/Desktop/| 000000a0 79 61 6b 2f 74 6f 78 69 63 6f 2f 61 2e 77 69 66 |yak/toxico/a.wif| 000000b0 66 20 28 54 75 72 62 6f 20 53 70 72 61 79 29 11 |f (Turbo Spray).| 000000c0 00 00 00 40 89 41 fc 3f 32 12 09 00 00 00 00 00 |...@.A.?2.......| 000000d0 b0 74 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |.t@.......i@2...| 000000e0 00 00 a0 99 b1 75 40 19 00 00 00 00 00 00 69 40 |.....u@.......i@| 000000f0 32 12 09 00 00 00 a0 99 61 78 40 19 00 00 00 00 |2.......ax@.....| 00000100 00 00 79 40 32 12 09 00 00 00 a0 99 91 7b 40 19 |..y@2........{@.| 00000110 00 00 00 00 00 00 69 40 32 12 09 00 00 00 40 33 |......i@2.....@3| 00000120 b3 7e 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |.~@.......i@2...| 00000130 00 00 40 33 a3 67 40 19 00 00 00 00 00 c0 82 40 |..@3.g@........@| 00000140 32 12 09 00 00 00 00 00 60 68 40 19 00 00 00 00 |2.......`h@.....| 00000150 00 00 69 40 32 12 09 00 00 00 40 33 23 69 40 19 |..i@2.....@3#i@.| 00000160 00 00 00 00 00 00 89 40 32 12 09 00 00 00 a0 99 |.......@2.......| 00000170 d9 69 40 19 00 00 00 00 00 00 79 40 32 12 09 00 |.i@.......y@2...| 00000180 00 00 40 33 23 6a 40 19 00 00 00 00 00 00 69 40 |..@3#j@.......i@| 00000190 32 12 09 00 00 00 40 33 43 6b 40 19 00 00 00 00 |2.....@3Ck@.....| 000001a0 00 00 69 40 32 12 09 00 00 00 40 33 e3 6c 40 19 |..i@2.....@3.l@.| 000001b0 00 00 00 00 00 00 79 40 32 12 09 00 00 00 40 33 |......y@2.....@3| 000001c0 43 6d 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |Cm@.......i@2...| 000001d0 00 00 40 33 a3 6d 40 19 00 00 00 00 00 00 69 40 |..@3.m@.......i@| 000001e0 32 12 09 00 00 00 00 00 c0 6f 40 19 00 00 00 00 |2........o@.....| 000001f0 00 00 69 40 32 12 09 00 00 00 a0 99 01 70 40 19 |..i@2........p@.| 00000200 00 00 00 00 00 00 69 40 32 12 09 00 00 00 a0 99 |......i@2.......| 00000210 a1 71 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |.q@.......i@2...| 00000220 00 00 40 33 23 72 40 19 00 00 00 00 00 00 79 40 |..@3#r@.......y@| 00000230 32 12 09 00 00 00 a0 99 31 72 40 19 00 00 00 00 |2.......1r@.....| 00000240 00 00 69 40 32 12 09 00 00 00 a0 99 c1 72 40 19 |..i@2........r@.| 00000250 00 00 00 00 00 00 69 40 32 12 09 00 00 00 40 33 |......i@2.....@3| 00000260 c3 72 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |.r@.......i@2...| 00000270 00 00 a0 99 d1 72 40 19 00 00 00 00 00 00 69 40 |.....r@.......i@| 00000280 32 12 09 00 00 00 a0 99 d1 73 40 19 00 00 00 00 |2........s@.....| 00000290 00 00 69 40 32 12 09 00 00 00 a0 99 51 74 40 19 |..i@2.......Qt@.| 000002a0 00 00 00 00 00 00 79 40 32 12 09 00 00 00 40 33 |......y@2.....@3| 000002b0 e3 74 40 19 00 00 00 00 00 00 69 40 32 12 09 00 |.t@.......i@2...| 000002c0 00 00 a0 99 41 78 40 19 00 00 00 00 00 00 69 40 |....Ax@.......i@| 000002d0 32 12 09 00 00 00 40 33 b3 79 40 19 00 00 00 00 |2.....@3.y@.....| 000002e0 00 00 69 40 38 00 12 e0 02 0a 9e 01 20 2b 4d 52 |..i@8....... +MR| 000002f0 4d 20 28 32 39 34 20 70 61 69 72 73 29 3a 20 45 |M (294 pairs): E| 00000300 78 70 20 31 2c 20 30 2e 31 32 35 20 6d 69 6e 20 |xp 1, 0.125 min | description: "Cannabis 100 ug/l MRM pos" ms1 { description: " +MRM (294 pairs): Exp 1, 0.029 min from (...)" retentiontime: 1.7660000324249268 peak { moz: 331 intensity: 200 } peak { moz: 347.10000610351562 intensity: 200 } : : : peak { moz: 314.10000610351562 intensity: 200 } refid: 316 } ms2 { description: " +EPI (201.10) Charge (+0) CE (10) FT (50): Exp 2, (...)" retentiontime: 3.0769999027252197 parent { moz: 201.10000610351562 intensity: 800 } reference { peak: 7 spectrum: 0 } peak { moz: 90.839996337890625 intensity: 33333.33203125 } : : : peak { moz: 506.03997802734375 intensity: 11111.111328125 } refid: 12884902204 collisionenergy: 10 }
15/12/2008
package bol.protobuf;
// option optimize_for = SPEED; message RunLcmsms { message Spectrum { message Peak {
1; // M/Z
chargemask = 2 [default = 0]; // bitmask 0 = ? // nonzero = bit1:1+, bit2:2+, etc....
3 [default = 0];
4 [default = 0]; // m/z of the selective product ion obtained after transition in MRM and SRM reactions } message Reference {
1 [default = -1];
2 [default = -1]; // when spectrum is a MS^2 or MS^3 // reference is the index inside the MS or MS^2 // where the parent was selected from // set to -1 when not available }
[default = -1]; // some unique identifier for the spectra // used when communicating with a reader // for exemple : to ask again more information about a specific spectrum
// these are bytes:. we can't promise that all reader will follow UTF-8 or ASCII
// scale not specified (usually seconds) // retention time is position on the chromatogram
// in case of MS2 spectra : the energy used to fragment // TODO MS^3 and up may have several
3; // when spectrum is a MS^2 or MS^3 parent is the parent mass
repeated double extraMoZ = 5; // currently unused repeated Peak peak = 6; }
repeated Spectrum ms1 = 2; // MS repeated Spectrum ms2 = 3; // MSMS }
15/12/2008
import com.genebio.bol.protobuf; import java.io.IOException; import java.io.PrintStream; import java.io.FileInputStream; public class Tester { /** * @param args */ public static void main(String[] args) throws Exception { System.out.println("Murf ?"); if (args.length != 1) { System.err.println("Usage: {BOL_PROTOBUF_FILE}"); System.exit(-1); } // Read the existing run from protocol buffer protobuf.runlcmsms run = protobuf.runlcmsms.parseFrom(new FileInputStream(args[0])); System.out.println((new StringBuffer()) .append("Sample is ").append(run.getDescription().toStringUtf8()) .append(" and has ").append(run.getMs1Count()).append(" MS spectra") .append(" and ").append(run.getMs2Count()).append(" MS2 spectra") ); for (protobuf.runlcmsms.Spectrum e:run.getMs1List()) { System.out.println((new StringBuffer()) .append("\tspectrum ") .append(e.getDescription().toStringUtf8())); } } }
15/12/2008
non conseillé d'attaquer seul a partir de API
merlinbol et freudbol
→ support vient Gratuitement
→ peut profiter des fonctions offertes par Bol & JPL
→ nécessite stratégie par parties
15/12/2008
http://code.google.com/p/captain-bol/
http://javaprotlib.sourceforge.net/
http://code.google.com/p/protobuf/ http://code.google.com/apis/protocolbuffers/docs/tutorials.html http://code.google.com/apis/protocolbuffers/docs/reference/java-generated.html
http://disphelper.sourceforge.net/
available on request from applied : http://www.appliedbiosystems.com/support/contact/
partial version available on Xcalibur installation CD