Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) is a powerful method to determine how transcription factors and other chromatin-associated proteins interact with DNA in order to regulate gene transcription. A single ChIP-seq experiment produces large amounts of highly reproducible data. The challenge is to extract knowledge from the data by thoughtful application of appropriate bioinformatics tools.
We have developed a set of software applications for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions.
The ChIP-Seq tools exist as stand-alone C programs and include the following programs:
The ChIP-Seq tools have been designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework.
As an internal working format, the ChIP-Seq programs use a compact format called SGA (Simplified Genome Annotation). SGA files are single-line-oriented and tab-delimited text files with the following five mandatory fields:
Any number of additional fields may be added containing application-specific information.
The Chip-Seq programs require SGA files to be sorted by sequence name, position, and strand. Note that SGA is a generic format that can be used to represent other genome annotations, e.g. the location of transcription start sites (TSS) or cross-genome conservation scores. Orientation-less features will be associated with a strand value of 0.
Technically, the programs are fast and are able to carry out data analysis across an entire SGA-formatted data file (which can be several hundreds of MBs) in a few minutes, thus allowing high-throughput genomic data analysis.
The programs are documented by UNIX style man pages and a README file that explains the installation procedure. The current distribution also contains a number of auxiliary Perl scripts for reformatting and other pre- and post-processing tasks.