Using batch creator script

From EcoliWiki
Jump to: navigation, search

This is a tutorial for the Batch Creator Script which makes annotating and creating batch files for microarray data much faster.

How the script works

You have 3 files total. One is called a template, one the data file, and the final is the completed batch file. The template file contains any information that shows up in every row of the final batch file, and the data file contains information that will be unique to each final batch file. When you run the script the template and data file will be consolidated (with priority given to the data file), thereby creating the final batch file. If you want to learn more about each file and what they should contain, scroll to the bottom if this tutorial.


Rules

  • All of these files are tab-delimited .txt files. If you create your data file in excel, be sure to save it as a tab-delimited text file.
  • Information in the data file will overwrite the information in the template
  • The first row in both the data and template file must contain the headings found in the final batch file.
  • The template and data file must each have 2 rows of information, thought the data file can have more depending upon the number of experiments in one data set.
  • Be cautious when typing in the GSE# in the command line. This data will be used to name the final batch file and be used to fill in some of the columns information.


How to execute the script

You can find the most up-to-date script by finding it on our servers (location listed below) or see it here .

Tetramer: ul/SMD/Batchcreator/
Hexamer: cd ../../Volumes/home/shared/SMD/Batchcreator/


This is what you will type in the command line. These instructions are valid if you are in the folder "SMD/amanda/folder_name " (or you are in the folder that contains the "filename_batch.txt" file) and the scripts are in "SMD/Batchcreator"

For batch files from GEO

perl [script] -t [template file] -d [data file] -n [GSE number]
perl ../../Batchcreator/batchcreator.pl -t ../../Batchcreator/ASv2_template.txt -d GSE4511.txt -n 4511

perl [script] -t [template file] -d [data file] -n [GSE number]
perl ../../Batchcreator/batchcreator.pl -t ../../Batchcreator/E_coli_2_template.txt -d GSE4511.txt -n 4511


For batch files from ArrayExpress - MEXP Files

perl [script] -t [template file] -d [data file] -n [MEXP number]
perl ../../Batchcreator/batchcreator_MEXP.pl -t ../../Batchcreator/ASv2_template.txt -d E-MEXP-910.txt -n 910

perl [script] -t [template file] -d [data file] -n [MEXP number]
perl ../../Batchcreator/batchcreator_MEXP.pl -t ../../Batchcreator/E_coli_2_template.txt -d E-MEXP-910.txt -n 910

More Information about each type of file

Template file

The template file contains only 2 rows. The first is the headings for each column (from "Result_Set_Name" to "PROBE_SET_ALGORITHM"). The second row will contain any "template" information. The second row should have data in any column that includes information that will never change for that template type. If you have multiple array types that you are loading you can make different templates.

  • An example of this is the "Experimenter" column.
    • The name of the experimenter will be the same for every row in the final batch column. You only need to write the name of the experimenter in second row even if the final batch file will contain more than 2 rows.

Rows included in my template for ASv2 arrays:

<protect>

Column Heading Name Template Value

D

Print Name

Affymetrix Ecoli_ASv2

H

Exp File Location

affy2_generic.EXP

K

SINGLE SCAN FILE LOCATION

noimage.DAT

N

Normalization Type

Computed

P

Experimenter

DSIEGELE

T

Individual User

DSIEGELE

U

PROBE_SET_ALGORITHM

Affymetrix MAS 5

</protect>

Data File

The data file will contain any information that changes from batch file to batch file. This file will contain as many rows as needed, but the first row will always contain the headings (Result_Set_Name" to "PROBE_SET_ALGORITHM").

NOTE: Priority is given to any information found in the data file than in the template file. Any information you put in the column of data file that also has information in the same column of the template file will have priority.

Columns you need to fill in:

  • Column E - Experiment Category
  • Column F - Experiment SubCategory
  • Column G - Slide Name
  • Column Q - Experiment Date
  • Column R - Experiment Description


Column G (Slide Name) can be easily obtained without manually typing in the names.

Go to folder celtotxt#### 
ls *.CEL | perl -pe 's/\.CEL$//g'