RedOak: k-mer Index Structure

RedOak stand for Reference-free[d] Optimized approach byk-mers, RedOak is an alignment-free and reference-free software which allows to index a large collection of similar genomes. RedOak can also be applied to reads from unassembled genomes.

Documentation

Usage: Redoak [options]

  1. Genomes files :

    --genome, -g [infos:]<files> Index the genome described by the given argument. The argument is at least a file (or a comma separated list of files), eventually compressed with gzip. Detailled informations and examples about how to use this parameter at the end of this section.

    --list, -l <file> Use the content of the given (raw formatted)

    Example:

    redoak --list my_genomes.lst
    

    --config, -c <file> Use the content of the given (YAML formatted) <file> to load genomes and initialize settings.

    Example:

    redoak --config my_yaml.cfg
    
  2. Change k-mer and prefix size

    --kmer, -k <int> Set the value of paramter k to the given value.

    --prefix-size, -p <int> Set the value of parameter k1 to the given value.

    Example:

    redoak --kmer 30 or -k 30
    redoak --prefix-size 12 or -p 12
    
  3. Optional usage :

    --query, -Q <sequence> Search the given <sequence> in the index and print the genomes with this sequence.

    --shell, -S Launch an shell to interact with the current index. Within MPI (which is the normal behaviour), the processus having rank 0 creates input/output/error files to handle IO operations.

    --output, -o <pattern>

    --verbose, -v Verbose output (can be provided many times to increase verbosity).

    --quiet, -q Run silently.

    --verbose

    -v verbose

    -vv more verbose

    -vvv even more verbose

    Examples:

    redoak --config my_yaml.cfg --query 'ATAACGAGGGATGCTGGGTAAAATGCAAAGCTAG'
    redoak --shell
    redoak --config my_yaml.cfg --output /tmp/outPutRedOak
    
  4. Other usage:

    --help, -h Print usage and exit.

    --version, -V Print the version informations and exit.

Installation

Requirements

RedOak requires:

  • A modern, C++11 ready compiler such as g++ version 4.9 or higher or clang version 3.2 or higher.
  • A 64-bit operating system. Either Mac OS X or Linux are currently supported.
  • OpenMPI
  • libGkArrays-MPI
  • Doxygen (recommanded but not mandatory)

Single user installation

To download and install RedOak (and libGkArrays-MPI) into some user local directory (e.g., ${HOME}/local_install) , use the following commands:

libGkArrays-MPI

RedOak uses the libGkArrays-MPI library, which requires the zlib and openMPI development files to be already installed on your system (on Debian/Ubuntu it corresponds to the packages zlib1g-dev and libopenmpi-dev).

First, it is neede to clone the libGkArray-MPI repository (including its sub-modules).

git clone --recurse-submodules https://gitlab.info-ufr.univ-montp2.fr/doccy/libGkArrays-MPI.git

Once cloned, go to the newly created directory and artificially restore the relative order of creation/modification dates for some files. Indeed, creation dates and last modification dates are not preserved by the git clone operation, and quite often it leads to an infinite loop or an error during the built.

cd libGkArrays-MPI
touch configure.ac lib*/configure.ac aclocal.m4 lib*/aclocal.m4 Makefile.am */Makefile.am */*/Makefile.am
touch configure lib*/configure Makefile.in */Makefile.in */*/Makefile.in

Now, run the configure script, build the library and the programs and install them.

./configure --prefix=${HOME}/local_install
make
make install

As an alternative, in order to get built files to be in a separated dedicated directory you also can run the following (instead of the previous ./configure command)

mkdir build
cd build
../configure --prefix=${HOME}/local_install

RedOak

The procedure is very similar to the one described above.

First, it is need to clone the RedOak repository:

git clone https://gitlab.southgreen.fr/GenomeHarvest/RedOak.git

Once cloned, go to the newly created directory and artificially restore the relative order of creation/modification dates for some files (see explanation in previous section).

cd RedOak
touch configure.ac aclocal.m4 Makefile.am */Makefile.am
touch configure Makefile.in */Makefile.in

Now, run the configure script, build the library and the programs and install them (it is needed to specify the installation path of the libGkArrays-MPI library when runing the configuration script).

./configure --with-libGkArraysMPI-prefix=${HOME}/local_install --prefix=$HOME/local_install
make
make install

As an alternative, in order to get built files to be in a separated dedicated directory you also can run the following (instead of the previous ./configure command)

mkdir build
cd build
../configure --with-libGkArraysMPI-prefix=${HOME}/local_install --prefix=${HOME}/local_install

Uninstall

To remove RedOak and libGkArrays-MPI from your system use the following commands:

cd RedOak && make uninstall
cd libGkArrays-MPI && make unistall

System installation

To download and install RedOak (and libGkArrays-MPI) globally on your system, then you have to follow the same procedure as described above, but simply remove the options passed to the configuration scripts (for both RedOak and libGkArrays-MPI) and run the make install commands as superuser.

Getting Started

You can test the installation using the following command.

mpirun -n 10 redoak ../test/test-0025_reads.fastq 25

SGE MakeQsubYaml.sh script:

./resources/makeRedOakQsubYaml.sh

SGE command example obtained with MakeQsubYaml.sh script:


Bug Reporting

While we use an extensive set of unit tests and test coverage tools you might still find bugs in the library. We encourage you to report any problems with the library via the gitlab issue tracking system of the project.

Licensing

Copyright © 2017-2020 -- LIRMM / CNRS / UM / CIRAD / INRA (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier / Centre National de la Recherche Scientifique / Université de Montpellier / Centre de coopération Internationale en Recherche Agronomique pour le Développement / Institut National de la Recherche Agronomique)


Ce logiciel est un programme informatique permettant d'indexer une large collection de génomes similaires.

Ce logiciel est régi par la licence CeCILL soumise au droit français et respectant les principes de diffusion des logiciels libres. Vous pouvez utiliser, modifier et/ou redistribuer ce programme sous les conditions de la licence CeCILL telle que diffusée par le CEA, le CNRS et l'INRIA sur le site "http://www.cecill.info".

En contrepartie de l'accessibilité au code source et des droits de copie, de modification et de redistribution accordés par cette licence, il n'est offert aux utilisateurs qu'une garantie limitée. Pour les mêmes raisons, seule une responsabilité restreinte pèse sur l'auteur du programme, le titulaire des droits patrimoniaux et les concédants successifs.

À cet égard l'attention de l'utilisateur est attirée sur les risques associés au chargement, à l'utilisation, à la modification et/ou au développement et à la reproduction du logiciel par l'utilisateur étant donné sa spécificité de logiciel libre, qui peut le rendre complexe à manipuler et qui le réserve donc à des développeurs et des professionnels avertis possédant des connaissances informatiques approfondies. Les utilisateurs sont donc invités à charger et tester l'adéquation du logiciel à leurs besoins dans des conditions permettant d'assurer la sécurité de leurs systèmes et ou de leurs données et, plus généralement, à l'utiliser et l'exploiter dans les mêmes conditions de sécurité.

Le fait que vous puissiez accéder à cet en-tête signifie que vous avez pris connaissance de la licence CeCILL, et que vous en avez accepté les termes.


This software is a computer program whose purpose is to index a large collection of similar genomes.

This software is governed by the CeCILL license under French law and abiding by the rules of distribution of free software. You can use, modify and/ or redistribute the software under the terms of the CeCILL license as circulated by CEA, CNRS and INRIA at the following URL "http://www.cecill.info".

As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive licensors have only limited liability.

In this respect, the user's attention is drawn to the risks associated with loading, using, modifying and/or developing or reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the software's suitability as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions as regards security.

The fact that you are presently reading this means that you have had knowledge of the CeCILL license and that you accept its terms.

Click here to access the full licence

Auteurs/Authors:

Programmeurs/Programmers:

Contact: