top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Small Introduction about mlpack?

0 votes
482 views

What is Mlpack Library?

mlpack is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. 

This is done by providing a set of command-line executables which can be used as black boxes, and a modular C++ API for expert users and researchers to easily make changes to the internals of the algorithms.

As a result of this approach, mlpack outperforms competing machine learning libraries by large margins; see the BigLearning workshop paper and the benchmarks for details.

mlpack is developed by contributors from around the world. It is released free of charge, under the 3-clause BSD License (more information). (Versions older than 1.0.12 were released under the GNU Lesser General Public License: LGPL, version 3.)

mlpack was originally presented at the BigLearning workshop of NIPS 2011 [pdf] and later published in the Journal of Machine Learning Research [pdf], with version 3 being published in the Journal of Open Source Software [pdf]. Please cite mlpack in your work using this citation.

mlpack bindings for R are provided by the RcppMLPACK project.

Currently mlpack supports the following algorithms:

  • Collaborative Filtering
  • Decision stumps (one-level decision trees)
  • Density Estimation Trees
  • Euclidean Minimum Spanning Trees
  • Gaussian Mixture Models (GMMs)
  • Hidden Markov Models (HMMs)
  • Kernel Principal Component Analysis (KPCA)
  • K-Means Clustering
  • Least-Angle Regression (LARS/LASSO)
  • Linear Regression
  • Local Coordinate Coding
  • Locality-Sensitive Hashing (LSH)
  • Logistic regression
  • Max-Kernel Search
  • Naive Bayes Classifier
  • Nearest neighbor search with dual-tree algorithms
  • Neighbourhood Components Analysis (NCA)
  • Non-negative Matrix Factorization (NMF)
  • Principal Components Analysis (PCA)
  • Independent component analysis (ICA)
  • Rank-Approximate Nearest Neighbor (RANN)
  • Simple Least-Squares Linear Regression (and Ridge Regression)
  • Sparse Coding, Sparse dictionary learning

For more detail visit here - http://mlpack.org/docs.html

Video for Mlpack

posted Nov 26, 2018 by anonymous

  Promote This Article
Facebook Share Button Twitter Share Button LinkedIn Share Button


Related Articles

What is Recommender System?

A Recommender System predicts the likelihood that a user would prefer an item. Based on previous user interaction with the data source that the system takes the information from (besides the data from other users, or historical trends), the system is capable of recommending an item to a user. Think about the fact that Amazon recommends you books that they think you could like; Amazon might be making effective use of a Recommender System behind the curtains. This simple definition, allows us to think in a diverse set of applications where Recommender Systems might be useful. Applications such as documents, movies, music, romantic partners, or who to follow on Twitter, are pervasive and widely known in the world of Information Retrieval.

Recommender systems are among the most popular applications of data science today. They are used to predict the "rating" or "preference" that a user would give to an item. Almost every major tech company has applied them in some form or the other: Amazon uses it to suggest products to customers, YouTube uses it to decide which video to play next on autoplay, and Facebook uses it to recommend pages to like and people to follow. What's more, for some companies -think Netflix and Spotify-, the business model and its success revolves around the potency of their recommendations. In fact, Netflix even offered a million dollars in 2009 to anyone who could improve its system by 10%.

Broadly, recommender systems can be classified into 3 types:

  1. Simple recommenders: offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience. IMDB Top 250 is an example of this system.
  2. Content-based recommenders: suggest similar items based on a particular item. This system uses item metadata, such as genre, director, description, actors, etc. for movies, to make these recommendations. The general idea behind these recommender systems is that if a person liked a particular item, he or she will also like an item that is similar to it.
  3. Collaborative filtering engines: these systems try to predict the rating or preference that a user would give an item-based on past ratings and preferences of other users. Collaborative filters do not require item metadata like its content-based counterparts.

Video for Recommender System 

https://www.youtube.com/watch?v=1JRrCEgiyHM

READ MORE

What is MLlib?

MLlib (Spark) is Apache Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

Main Benefits

  • Ease of Use
  • Performance
  • Runs Everywhere

MLlib contains many algorithms and utilities.

ML algorithms include:

  • Classification: logistic regression, naive Bayes,...
  • Regression: generalized linear regression, survival regression,...
  • Decision trees, random forests, and gradient-boosted trees
  • Recommendation: alternating least squares (ALS)
  • Clustering: K-means, Gaussian mixtures (GMMs),...
  • Topic modeling: latent Dirichlet allocation (LDA)
  • Frequent itemsets, association rules, and sequential pattern mining

Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. There are two ways to create RDDs: parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat.

Video for MLib 

https://www.youtube.com/watch?v=HaNoUnrQWd0

READ MORE

What is Lasagne?

Lasagne is a lightweight library to build and train neural networks in Theano.

Features:

  • Supports feed-forward networks such as Convolutional Neural Networks (CNNs), recurrent networks including Long Short-Term Memory (LSTM), and any combination thereof
  • Allows architectures of multiple inputs and multiple outputs, including auxiliary classifiers
  • Many optimization methods including Nesterov momentum, RMSprop and ADAM
  • Freely definable cost function and no need to derive gradients due to Theano's symbolic differentiation
  • Transparent support of CPUs and GPUs due to Theano's expression compiler

Main Principles

  • Simplicity: Be easy to use, easy to understand and easy to extend, to facilitate use in research
  • Transparency: Do not hide Theano behind abstractions, directly process and return Theano expressions or Python / numpy data types
  • Modularity: Allow all parts (layers, regularizers, optimizers, ...) to be used independently of Lasagne
  • Pragmatism: Make common use cases easy, do not overrate uncommon cases
  • Restraint: Do not obstruct users with features they decide not to use
  • Focus: "Do one thing and do it well"

How to Install

pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/master/requirements.txt
pip install https://github.com/Lasagne/Lasagne/archive/master.zip

Video for Lasagne

https://www.youtube.com/watch?v=t22HUAnefhw
 

READ MORE

What is Keras?

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Features

  • Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
  • Supports both convolutional networks and recurrent networks, as well as combinations of the two.
  • Runs seamlessly on CPU and GPU.

Main Benefits

  • User friendliness. Keras is an API designed for human beings, not machines. It puts user experience front and center. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.
  • Modularity. A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as few restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
  • Easy extensibility. New modules are simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
  • Work with Python. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility

Python Install

pip install keras

Video for Keras

https://www.youtube.com/watch?v=4-gQBRAoVAA 

READ MORE

What is MLlib?

MLlib stands for Machine Learning Library (MLlib)

MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives, as outlined below:

  • Data types
  • Basic statistics
  • Classification and regression
  • Collaborative filtering
  • Clustering
  • Dimensionality reduction
  • Feature extraction and transformation
  • Optimization

Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic I/O functionalities, exposed through an application programming interface  centered on the RDD abstraction  This interface mirrors a functional/higher-order model of programming: a "driver" program invokes parallel operations such as map, filter or reduce on an RDD by passing a function to Spark, which then schedules the function's execution in parallel on the cluster.

These operations, and additional ones such as joins, take RDDs as input and produce new RDDs. RDDs are immutable and their operations are lazy; fault-tolerance is achieved by keeping track of the "lineage" of each RDD so that it can be reconstructed in the case of data loss. RDDs can contain any type of Python, Java, or Scala objects.​

The Video for MLlib Spark

https://www.youtube.com/watch?v=qKYpMPPL-fo

READ MORE

What is Light GBM?

Light GBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm, used for ranking, classification and many other machine learning tasks.

Since it is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise. So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms. Also, it is surprisingly very fast, hence the word ‘Light’.

Diagram

Features

  • Faster training speed and higher efficiency
  • Lower memory usage
  • Better accuracy than any other boosting algorithm
  • Compatibility with Large Datasets
  • Parallel learning supported.

Video for Light GBM

https://www.youtube.com/watch?v=swoGdqGSn-c 

READ MORE

What is TensorFlow?

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. 

The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. 

TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

An open-source software library for Machine Intelligence

TensorFlow is cross-platform. It runs on nearly everything: GPUs and CPUs—including mobile and embedded platforms—and even tensor processing units (TPUs), which are specialized hardware to do tensor math on. They aren't widely available yet, but we have recently launched an alpha program.

TensorFlow's high-level APIs, in conjunction with computation graphs, enable a rich and flexible development environment and powerful production capabilities in the same framework.

Advantages

  • It's portable, as the graph can be executed immediately or saved to use later, and it can run on multiple platforms: CPUs, GPUs, TPUs, mobile, embedded. Also, it can be deployed to production without having to depend on any of the code that built the graph, only the runtime necessary to execute it.
  • It's transformable and optimizable, as the graph can be transformed to produce a more optimal version for a given platform. Also, memory or compute optimizations can be performed and trade-offs made between them. This is useful, for example, in supporting faster mobile inference after training on larger machines.
  • Support for distributed execution

Video for TensorFlow

https://www.youtube.com/watch?v=oZikw5k_2FM

READ MORE
...