User Guide Documentation

This page contains a brief description of various programs, softwares, webservers, standalone and their configuration files included in the OSDDlinux package. Here we are explaininng the basic architecture of this package, Major directories and subdirectories of OSDDlinux, major routes and funcions of these programs etc.


  • Overview
  • System Configuration & configuration files
  • Getting Started with OSDDlinux / System requirements
  • Major parts of OSDDlinux
  • Web server
  • Standalone
  • Galaxy
  • Software Applications
  • Users

    Overview

    OSDD Forum is an initiative with a vision to provide affordable healthcare to the developing world. As a part of OSDD, OSDDlinux has been initiated to provide computational resources for the scientific community in the field of computer-aided drug designing. One of the major challenges for bioinformaticians is to understand and solve the problems encountered by biologists in their day-to-day work. At the same time, solutions should be user-friendly so that any person having little knowledge of computational output can understand and utilize it. Though we have tried our best to help the biologists, our programs/services are still far from perfection. Our webservers perform very well for single sequence as queries or for small number of sequences but they are unable to perform predictions for the whole genome or proteome. This is due to the limited computational time and bandwidth while performing prediction work. Moreover, many users want to run prediction models on their local machines due to security/privacy of their datasets and results. In order to comply with these requirements of users, our group is releasing this software package, which is a collection and integration of computer programs, developed at our group over the years.

    System Configuration and configuration files

    GPSR is the home directory of OSDDlinux package. This section describes the major directories and subdirectories within the gpsr directory, in hierarchal order. There are mainly three parts of OSDDlinux package, which contained all the programs/webservers. These are (1) Webservers (2) Standalone (3) Galaxy. To make these softwares/programs functional, following are the major directories and subdirectories.

    S/No.Brief description:
    1

    Webserver (/gpsr/webserver): This is the main directory containing all the necessary files required for building a web interface of any software. There are two main components of this directory (1) cgibin (2) cgidocs.

    cgibin (/gpsr/webserver/cgibin): This subdirectory have all the executable programs of any prediction method. It contains the main scripts which run the algorithm and produce the outputs of any method.

    cgidocs (/gpsr/webservers/cgidocs): This subdirectory provides the tools (e.g. html submission page) to put input data to the executable programs in cgibin. The results produced are also displayed by contents of this subdirectory. Every webserver requires a temporary directory to store input data, intermediate results and final prediction output. All such information is stored in a 'sub-directory' in tmp folder of cgidocs e.g. /gpsr/webserver/cgidocs/tmp/ctlpred.

    2

    standalone (/gpsr/standalone): All the major prediction models with their main executable files have been provided in this directory for standalone package. There is separate subdirectory for each of the prediction method.

    3

    galaxy (/gpsr/galaxy): the GPSR programs implemented in galaxy with their .XML files have to be placed in this directory.

    4

    base (/gpsr/base): The whole GPSR package content is kept in this directory. This directory contained following major subdirectories:

    bin (/gpsr/base/bin): all basic command line executable scripts are placed here.

    includes (/gpsr/base/includes): has base_env file containing the environment variables of many softwares used by GPSR programs (e.g. perl, svm-light, MEME, MAST, HMMER etc.) but the executable of these programs are kept in /gpsr/local/bin e.g. /gpsr/local/bin/svm_classify.

    src (/gpsr/base/src): the source-codes of all GPSR programs are kept in this subdirectory. If user wants to add any new program to GPSR, the source code of that program must be kept in this directory.

    All the programs will be installed and complied by executing install.pl program (/gpsr/base/install.pl). If any new program has been added into GPSR package then execute update.pl program (/gpsr/base/update.pl) to install and compile it.

    5

    bin (/gpsr/bin): the repository of many of the UNIX system commands e.g. ls, grep, cut etc. Many of these system commands have been used in source codes of GPSR programs.

    6

    data (/gpsr/data): some prediction methods in GPSR package require their own database for their functioning. The BLAST tools (e.g. Blastpgp) take input sequence and search for the homologous sequences from the databases like SwissProt, PDB or any other database. All these databases are placed in this sub-directory (/gpsr/data/blastdata). Many webservers use their own databases and hence such databases are kept in a subdirectory with same name of the tools e.g. the database used by hivcopred webserver is placed in /gpsr/data/blastdata/hivcopred.

    7

    examples (/gpsr/examples): For ease of user to understand the functioning of standalone version of gpsr programs, many example files have been placed in this subdirectory. Mainly input and output example files have been provided for many software.

    8

    local (/gpsr/local): this directory includes the interpreters of high level programming languages e.g. Perl, Python, C, PhP, MySQL etc. Various machine learning based software e.g. SVMlight, svm601_classify, Weka etc. are also kept in this directory. All these programs are placed in bin subdirectory (/gpsr/local/bin). The Apache http webserver have been kept in separate subdirectory as /gpsr/local/apache.

    9

    software (/gpsr/software): Many software, which are regularly used in Bioinformatics as well as chemoinformatics have been placed in this subdirectory. Most of these software include babel, Psipred, Ipknot etc..

    10

    temp (/gpsr/temp): The GPSR programs in Standalone version require temporary directory for data processing and result output. This folder serves as the temporary directory for these programs. These subdirectories will be deleted after data processing and producing the results of any program of GPSR package in Standalone mode.


    Getting Started with OSDDlinux / System requirements

    OSDDlinux has the basic features of a Linux operating system in possession, currently Ubuntu. It can be used after installation on the local system or could be mounted from a peripheral memory device e.g. CD/DVD, USB memory stick etc. The user can download the ISO image file from the OSDDlinux download page and install or mount on local machine. For Window/Mac users, OSDDlinux can be installed by using virtual machine e.g.. VMWare or Virtual Box. The mountable property of OSDDlinux makes it convenient to use, portable and local machine independent to a larger extent.

    Major parts of OSDDlinux

    There are mainly three major parts of OSDDlinux package. These are mainly : Webservers, Satndalone and Galaxy. All these modules contains numerous bioinformatics/chemoinformatics softwares which may implemented for drug discovery process. Here is the idea about these modules:

    Web server

    In this part, all the webservers developed for various biological processes and drug designing have been implemented. The user can install and run all the webservers on the local machine. Special 'Apcahe' have been provided in OSDDlinux CD to launch these webservers. The main advantage of webservers is that user have the graphical interfaces in this mode and easy to operate. Since the input as well as output produced by the models is visible via graphical mode, this mode is highly user-friendly. Also, the user can mirror image our original web-pages to provide web service to the community.

    Standalone

    Most of the Bioinformatics webservers, which are important for drug designing process, are usually provided through web/internet. It is found that many research groups are reluctant to use these services due to their availability on internet and have to share their datasets on internet. To avoid this problem, we encourage people to use Standalone version of OSDDlinux as all the webservers have been provided in command-mode version. User can run these programs with single command and will get output on users local system, without any dependency on internet-based webservers. All the programs have been provided with command for running that particular program.

    The standalone programs are the most useful variants of the bioinformatics and chemoinformatics tools for the regular users who need to analyze bulk data on a routine basis. The programs are executable on command line convenient to provide input and yield output in easily read file formats.

    Galaxy

    Galaxy is an independent portal, which incorporates a number of programs/software in it, and run them, either independently or in a workflow manner. A webserver in galaxy may be interpreted from first stage of data upload to last stage of result output. The user in Galaxy portal may view all the intermediate stages of final result. This feature of Galaxy enables user the proper functioning of any webserver from first stage to the end.

    This platform has been provided for use by graphical interface rather than command line terminal. The facility targets those users who are not Unix savvy but wish to analyze heavy data locally. Considering long term advantages of Galaxy platform that can be used both online and offline on local machine, OSDDlinux also provides Galaxy compatible versions of the bioinformatics and chemoinformatics tools.

    Software Applications

    OSDDlinux promotes not only usage but also tool development amongst the research community as it provides access to the source codes of the tools it hosts. The source codes for the tools can be retrieved. Their programming can be improved and then these can be again integrated back into OSDDlinux by the user-developer for the benefit of the community. Another aspect is the provision of a login account for the user. This is an incentive to those users who wish to actively participate in bioinformatics tool development but are delimited by scarcity of resources like memory space and open source software.

    Users

    Being in pliable mode and having plenty of information, OSDDlinux is intended to serve at least five types of users.

  • 1. Occasional users who often need bioinformatics and chemoinformatics tools but wish to keep the tool package as a separate entity from their system. For such cases, OSDDlinux has been made bootable and operable from an external memory device e.g. CD/DVD or USB.

  • 2. Regular users for whom the integration of the tool package with their local system would be a big advantage for analyzing bulk data and specially without dependency of internet or web.

  • 3. Windows users haveing desire of using bioinformatics and chemoinformatics tools on Linux simultaneously with Windows system. OSDDlinux can be used on virtual desktop using VMware, Virtual Box, etc. alongside the activities on Windows.

  • 4. Regular Ubuntu users already working in the Unix environment who wish to install OSDDlinux bioinformatics and chemoinformatics tools in their own Ubuntu version. The informaticics tools in OSDDlinux have been made available for download from the OSDDlinux portal.

  • 5. Developers who would like to enhance or customized OSDDlinux for their own needs or new applications have been entertained by providing access to the source code.