2015-06-24

Here is the list of free python packages which are useful in python data analysis and data mining. Some of these are even useful for text analytics and text mining.

scikit-learn
www.github.com/scikit-learn/scikit-learn

This is a Python package for machine learning built with the help of SciPy.It has lot of classification, regression and clustering algorithms. This also includes support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN. This package was initially designed to take care of Python statistical/numerical and machine learning/scientific libraries NumPy and SciPy.

Pylearn2
www.github.com/lisa-lab/pylearn2

Pylearn2 is the easiest machine learning package in python.

NuPIC
www.github.com/numenta/nupic

NUPIC means Numenta Platform for Intelligent Computing. It is a machine learning intelligence platform. It incorporates the HTM learning algorithms. HTM is one of the best computational concept of the neocortex and neural networks. In the nucleus of HTM, theer are various time-based continuous learning algorithms. These algorithms store and recalculate many spatial and temporal patterns. This python analytics package is mostly used in different problems such as anomaly detection and prediction of streaming data sources.

Nilearn
www.github.com/nilearn/nilearn

Nilearn is a Python package meant for easiest analytical learning on NeuroImaging data. It utilizes scikit-learn Python modules for multivariate statistics along with predictive modeling, classification, decoding, or connectivity analysis.

PyBrain
www.github.com/pybrain/pybrain

PyBrain means Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. It the best package for machine learning algorithms and extremely easy to use.

Pattern
www.github.com/clips/pattern

There are very few module in python which are good in web mining. Pattern is one of them. It has amazing features such as NLP, network analysis and data mining. It has also enhaced features like support vector machines and classification techniques leveraging support vector machine, KNN, perceptron and SVM.

Fuel
www.github.com/mila-udem/fuel

Fuel provides your machine learning models with the data they need to learn. it has interfaces to common datasets such as MNIST, CIFAR-10 (image datasets), Google’s One Billion Words (text). It gives you the ability to iterate over your data in a variety of ways, such as in minibatches with shuffled/sequential examples

Bob
www.github.com/idiap/bob

Bob is a free signal-processing and machine learning toolbox The toolbox is written in a mix of Python and C++ and is designed to be both efficient and reduce development time. It is composed of a reasonably large number of packages that implement tools for image, audio & video processing, machine learning and pattern recognition

skdata
www.github.com/jaberg/skdata

Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.

MILK
www.github.com/luispedro/milk

Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.

IEPY
www.github.com/machinalis/iepy

IEPY is an open source tool for Information Extraction focused on Relation Extraction

It’s aimed at users needing to perform Information Extraction on a large dataset. scientists wanting to experiment with new IE algorithms.

Quepy
www.github.com/machinalis/quepy

Quepy is a python framework to transform natural language questions to queries in a database query language. It can be easily customized to different kinds of questions in natural language and database queries. So, with little coding you can build your own system for natural language access to your database.

Currently Quepy provides support for Sparql and MQL query languages, with plans to extended it to other database query languages.

Hebel
www.github.com/hannes-brt/hebel

Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.

mlxtend
www.github.com/rasbt/mlxtend

Its a library consisting of useful tools and extensions for the day-to-day data science tasks.

nolearn
www.github.com/dnouri/nolearn

This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.

Ramp
www.github.com/kvh/ramp

Ramp is a python library for rapid prototyping of machine learning solutions. It’s a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.

Feature Forge
www.github.com/machinalis/featureforge

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API.

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

REP
www.github.com/yandex/rep

REP is environment for conducting data-driven research in a consistent and reproducible way. It has a unified classifiers wrapper for variety of implementations like TMVA, Sklearn, XGBoost, uBoost. It can train classifiers in parallel on a cluster. It supports interactive plots

Python Machine Learning Samples
www.github.com/awslabs/machine-learning-samples

A collection of sample applications built using Amazon Machine Learning.

Python-ELM
www.github.com/dclambert/Python-ELM

This is an implementation of the Extreme Learning Machine in Python, based on scikit-learn.

Show more