Naveenbioinformatics.co.in

Recent Articles & Updates in Bioinformatics

2015-01-10

DOCKERCON EU: BREAKOUTS FROM DAY 1

introduction-to-bioinformatics

METAINTER: meta-analysis of multiple regression models

Course: Pattern Recognition (4th edition)

ADME SARfari: Comparative Genomics of Drug Metabolising Systems

Balti and Bioinformatics On Air: 21st January 2015

Introduction to Bioinformatics using NGS data

T-Bioinfo Bioinformatics platform - Grosmannia clavigera

Global Bioinformatics Market to Push Past US$9 Billion by 2020

Cross Border Collaborations to Nurture Bioinformatics Research

Bioinformatics: 25 Years of Integrating the Biological Sciences

Introductory Bioinformatics (second course)

A bioinformatics case study with insulin

IM-TORNADO: A Tool for Comparison of 16S Reads

Virophage Genomes Discovered from Yellowstone Lake Metagenomes

Java 8 For Bioinformatics

BIOINFORMATICS SERVICES ON THE CLOUD

DOCKERCON EU: BREAKOUTS FROM DAY 1

All the videos and slides from the breakout sessions which took place on the first day of DockerCon Europe. From original Docker use cases in bioinformatics and radio Astronomy to more classic use cases on Continuous Delivery, these videos include a ton of Docker insights, tips and tricks.

Evaluating and ranking genome assemblers by Michael Barton

The Tale of a Docker-based Continuous Delivery Pipeline by Rafe Colton

Continuous Delivery leveraging on Docker CaaS by Adrien Blind

Docker in a big company? by Damien Duportal

Migrating a large code-base to Docker containers by Doug Johnson and Jonathan Lonzinski

Enable Fig to deploy to multiple Docker servers by Willy Kuo

Opinionated containers and the future of game servers by Brendan Fosberry

Python, Docker and Radio Astronomy by Gijs Melenaar

Read More

introduction-to-bioinformatics

Discover how bioinformatics is becoming increasingly important to contemporary healthcare research and delivery. Learn about the principles and practices of bioinformatics, the challenges it faces and the problems it can help to solve.

Read More

METAINTER: meta-analysis of multiple regression models

Meta-analysis of summary statistics is an essential approach to guarantee the success of genome-wide association studies (GWAS). Application of the fixed or random effects model to single-marker association tests is a standard practice. More complex methods of meta-analysis involving multiple parameters have not been used frequently, a gap that could be explained by the lack of a respective meta-analysis pipeline. Meta-analysis based on combining p-values can be applied to any association test.

However, to be powerful, meta-analysis methods for high-dimensional models should incorporate additional information such as study-specific properties of parameter estimates, their effect directions, standard errors and covariance structure.

Read More

Course: Pattern Recognition (4th edition)

Description

Many problems in bioinformatics require classification: prediction of the class to which a certain object (i.e. a gene, protein, cell, patient, ?) belongs. This calls for algorithms that can assign the most likely label (discrete output) to an object, given one or more measurements on that object. For most interesting problems, the underlying physics are too complex to explicitly formulate such an algorithm. In such cases, a machine learning approach is taken: an algorithm is constructed, with parameters that are tuned based on an available dataset of training examples. The algorithm should predict the labels for these examples as well as possible, yet still generalize, i.e. perform well on objects not seen before. Some examples of classification problems in bioinformatics are gene finding (sequence in, gene presence out), diagnostics (microarray data in, diagnosis out), data integration (measurements in, probability of interaction out), etc.

Date

Next occasion: March 23-27, 2015. VU University, Amsterdam, the Netherlands
Last occasion: 21 Jan – 25 Jan 2013, Amsterdam

Goal

After having followed this course, a student should have an overview of basic pattern recognition techniques and be able to recognize what method is most applicable to classification problems (s)he encounters in bioinformatics applications.

Read More

ADME SARfari: Comparative Genomics of Drug Metabolising Systems

ADME SARfari is a freely available web resource that enables comparative analyses of drug-disposition genes. It does so by integrating a number of publicly available data sources, and then providing specific analysis and predictive tools for drug metabolism researchers. The data includes the interactions of small molecules with ADME (Absorption, Distribution, Metabolism and Excretion) proteins responsible for the metabolism and transport of molecules;available pharmacokinetic (PK) data; protein sequences of ADME related molecular targets for pre-clinical model species and human;alignments of the orthologues including information on known SNPs(Single Nucleotide Polymorphism) and information on the tissue distribution of these proteins. In addition in-silico models have been developed which enable users to predict which ADME relevant protein targets a novel compound is likely to interact with.

Availability: https://www.ebi.ac.uk/chembl/admesarfari

Contact: jpo@ebi.ac.uk

Read More

Balti and Bioinformatics On Air: 21st January 2015

The plan this year for the triumphat Balti and Bioinformaticsseries is to alternate between virtual, "on-air" meetings (where sadly you will need to provide your own balti curry) and real life ones which will be mainly held in Birmingham, but may be in other places in England or Wales.

Balti and Bioinformatics On-Air

This meeting's theme is open data and reproducible bioinformatics.

Please register over at the Google Hangout page:https://plus.google.com/events/cbtuikle0h2619obgjrgfu74424

Wednesday 21st January, 4pm GMT (=11am EST, =8am PST, 00:00 China)

Read More

Introduction to Bioinformatics using NGS data

Course content

In collaboration with BILS, SciLIfeLab will organize the course Introduction to Bioinformatics using NGS data. The course will provide an introduction to a wide range of analytical techniques for massively parallel sequencing, including basic linux commands. We will pair lectures on the theory of analysis algorithms with practical computational excercises demonstrating the use of common tools for analyzing data from each of several common sequencing study designs.

Important dates

Application open: December 16

Application deadline: January 18

Confirmation to accepted students: January 21

Responsible teachers: Manfred Grabherr, Bengt Persson

If you don’t receive information according to the dates above, contact eva.molin@scilifelab.uu.se

Read More

T-Bioinfo Bioinformatics platform - Grosmannia clavigera

NGS big data analysis on the revolutionary big data analysis platform developed at the Tauber Bioinformatics Institute in Haifa, Israel.

Global Bioinformatics Market to Push Past US$9 Billion by 2020

Transparency Market Research has released a new report of their analysis of the global bioinformatics market, titled ‘Global Bioinformatics Market (By Platforms, Tools and Services and By Applications: Preventive Medicine, Molecular Medicine, Gene Therapy Drug Development and Others) - Industry Analysis, Size, Share, Growth, Trends and Forecast, 2014 - 2020’. The report estimates that the bioinformatics market, valued at US$2.3 billion in 2012, will reach a value of more than US$9 billion by the end of the report’s forecast period, growing at a healthy CAGR throughout. The fragmented global bioinformatics market is segmented by the type of platform, content management tools, services, and geographical distribution.

Browse Report:

http://www.transparencymarketresearch.com/bioinformatics-market.html

According to platform, the bioinformatics market is divided into four categories: sequence manipulation, sequence analysis, sequence alignment, and structural analysis. There are two types of content management tools: general knowledge management tools and specific content management tools. According to the type of service provided, the global bioinformatics market is divided into four categories: data analysis services, database and management services, sequencing services, and others. By geography, the global bioinformatics market is divided into four regional markets: North America, Europe, Asia Pacific, and Rest of World.

Read More

Cross Border Collaborations to Nurture Bioinformatics Research

Rising overseas expansions and cross border collaborations in the bioinformatics field has given new dimensions to the industry. A number of international alliances are bridging bioinformatics research gaps between different nations. Exponential growth in bioinformatics trade and research result sharing has given a massive thrust to the market.

In their latest research study, “Global Bioinformatics Market Outlook 2019”, RNCOS’ spread over 140 pages, analysts identified the global bioinformatics market reached the mark of around US$ 3.7 Billion in 2013 with the anticipation of its growth at a CAGR of around 19% during 2015-2019. The report is an outcome of in-depth research and comprehensive analysis of the bioinformatics market, trends and future opportunities covering a wide spread examination of bioinformatics space.

Read More

Bioinformatics: 25 Years of Integrating the Biological Sciences

The 26th Presidential Faculty Lecture given by Jason Moore, BS, MA, MS, PhD, Third Century Professor, Professor of Genetics and Community and Family Medicine at the Geisel School of Medicine at Dartmouth.

Introductory Bioinformatics (second course)

The course sets out to introduce an extensive range of computing facilities vital for molecular biological research. This will be achieved primarily through "hands on" exercises based around an investigation of a well documented human disease. How information can be obtained both by analysis of raw sequence data and by interrogation of information resources will be demonstrated.

The last day of the this course will be dedicated to a soft introduction to Next Generation Sequencing (NGS) data analysis.

Objectives

The course is a user course. How to use the various tools is thus the prime objective. However, where it is useful, the operation of the programs will be discussed as far as is required. Participants will know how to set up the programs in an informed fashion, and to fully understand the output generated. On completion of this 4 day long training, they will also know how to implement this methodology elsewhere, using public domain software and data resources.

Read More

A bioinformatics case study with insulin

Blink is a database of protein blast search results. Using Blink can save you lots of time because it organizes blast results from all the organisms in the non-redundant protein sequence database, but getting to Blink can be tricky because it’s a little hard to find.

Why is this sequence in the NCBI database if it’s misidentified?

The presence of the cow insulin sequence from jack beans illustrates an important point about the NCBI database. It’s an archive. Sequences get entered that aren’t always right and they can persist.

Read More

IM-TORNADO: A Tool for Comparison of 16S Reads

16S rDNA hypervariable tag sequencing has become the de facto method for accessing microbial diversity. Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for this application. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads.

Availability and Implementation

IM-TORNADO is freely available at http://sourceforge.net/projects/imtornado and produces BIOM format output for cross compatibility with other pipelines such as QIIME, mothur, and phyloseq.

Read More

Virophage Genomes Discovered from Yellowstone Lake Metagenomes

Virophages are a unique group of circular double-stranded DNA viruses that are considered parasites of giant DNA viruses, which in turn are known to infect eukaryotic hosts. In this study, the genomes of three novel Yellowstone Lake virophages (YSLVs)—YSLV5, YSLV6, and YSLV7—were identified from Yellowstone Lake through metagenomic analyses. The relative abundance of these three novel virophages and previously identified Yellowstone Lake virophages YSLV1 to -4 were determined in different locations of the lake, revealing that most of the sampled locations in the lake, including both mesophilic and thermophilic habitats, had multiple virophage genotypes.

IMPORTANCE

This study discovered novel virophages present within the Yellowstone Lake ecosystem using a conserved major capsid protein as a phylogenetic anchor for assembly of sequence reads from Yellowstone Lake metagenomic samples. The three novel virophage genomes (YSLV5 to -7) were completed by identifying specific environmental samples containing these respective virophages, and closing gaps by targeted PCR and sequencing.

Read More

Java 8 For Bioinformatics

Benefits of using Java for Bioinformatics

Performance:Very early releases of Java earned it (quite rightly) a horrible reputation for performance. However, the modern Java Virtual Machine is extremely fast. In particular, the Hotspot JVM comes with a Just In Time (JIT) compiler, which compiles byte code to native code on the fly when it detects there may be a performance benefit to doing so. Because a lot of the processing we do in bioinformatic analysis is highly repetitive, our work benefits hugely from this.

Multithreading:Java has a high-level abstraction for multithreading that transparently supports multiple processors. Since Java 5, there have been libraries to support blocked queues and executors, and since Java 7 there are libraries supporting fork-join functionality. These add to the performance benefits noted above and make it relatively easy to exploit parallelization in Java. Java 8 offers some new APIs that make this easier still.

Robustness:The experience in our lab is that, while much of our code is written for “one-off” execution, there are data structures we commonly want to reuse. Java is primarily designed as a language to support reusable, robust components. We have, in common with many other labs, I suspect, developed an in-house set of libraries to manage these, and have developed simple class structures to represent genes, exons, genomes, etc., as well as some high-performing memory maps of fasta files. We also use Picard, which provides the functionality of samtools in a Java API. GATK also provides similar reusable libraries.

Familiarity:Java has been around for over 18 years now and has been highly influential in the development of more recent languages. It’s virtually impossible to hire a programmer who hasn’t had some exposure to Java, and so it’s relatively easy to bring new lab members up to speed on existing in-house code.

Platform-Independence:Unlike commercial organizations, academic organizations usually allow a large amount of freedom for employees to choose a computational platform on which to work. Since Java runs on all major systems, it makes a good choice for in-house code, as that code is not dependent on the choice of platform for an investigator. Many UI-based informatics tools (including FastQC, IGV, IPA, and many others) take advantage of this. Bioinformatics is a field where most practitioners are, by nature, computer-savy, so the problems associated with installing and maintaining JVMs are not an issue in this environment.

Read More

BIOINFORMATICS SERVICES ON THE CLOUD

Cloud computing has been seen most influential in high-throughput sequence data analysis. With the volume of data multiplying every year, it is a daunting task for small and large laboratories to maintain and process data for these sequential analyses. Hadoop has been successfully used in bioinformatics as it meets the essential need of biological data analysis. Hadoop consists of two parts – MapReduce and Hadoop Distributed File System (HDFS).Employing these two parts, Hadoop can successfully solve large data problems by using technology infrastructure in a more efficient manner. Cloud-based analysis compares favorably in both performance and cost when compared to local computational clusters, showing that cloud computing technologies might be a viable options to facilitate large-scale translational research in genomic medicine.

The traditional method for bioinformatics was to download databases and software and then proceed to analyze the data at hand using the downloaded data with the software installed locally. Bioinformatics cloud utilization can vary depending on the need of the task.

Read More

References:

http://blog.docker.com/2015/01/dockercon-eu-breakouts-from-day-1/

http://www.genomicseducation.org.uk/courses/an-introduction-to-bioinformatics/

http://bioinformatics.oxfordjournals.org/content/31/2/151.short?rss=1

http://biosb.nl/education/course-portfolio/pattern-recognition/

http://bioinformatics.bioinformatics.btv010.full.pdf

http://nickloman.github.io/balti/2015/01/09/balti-and-bioinformatics-on-air-21st-january-2015/

http://www.scilifelab.se/events/introduction-to-bioinformatics-using-ngs-data-2-hp-2/

https://www.youtube.com/watch?v=Vc1iKGNYfBA&feature=youtu.be&a

http://www.sys-con.com/node/3272898

http://business.wesrch.com/paper-details/press-paper-BU187Q87UEOEN-cross-border-collaborations-to-nurture-bioinformatics-research

https://www.youtube.com/watch?v=wlecxwOa4pY

http://gtpb.igc.gulbenkian.pt/bicourses/IB14S/

http://scienceblogs.com/digitalbio/2014/12/27/and-the-plant-goes-moo-a-bioinformatics-case-study-with-insulin/

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0114804

http://jvi.asm.org/content/89/2/1278.abstract

http://www.marshall.edu/genomicjava/2014/03/17/40/

http://www.nalashaa.com/bioinformatics-services-cloud/