top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Small Introduction About Kafka?

0 votes
394 views

What is Kafka ?

Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both). Kafka is used with in-memory microservices to provide durability and it can be used to feed events to CEP (complex event streaming systems), and IOT/IFTTT style automation systems.

Kafka is often used in real-time streaming data architectures to provide real-time analytics. Since Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system, Kafka is used in use cases where JMS, RabbitMQ, and AMQP may not even be considered due to volume and responsiveness. 

Kafka has higher throughput, reliability, and replication characteristics, which makes it applicable for things like tracking service calls (tracks every call) or tracking IoT sensor data where a traditional MOM might not be considered.

Kafka can work with Flume/Flafka, Spark Streaming, Storm, HBase, Flink, and Spark for real-time ingesting, analysis and processing of streaming data. Kafka is a data stream used to feed Hadoop BigData lakes. Kafka brokers support massive message streams for low-latency follow-up analysis in Hadoop or Spark.

Kafka has operational simplicity. Kafka is to set up and use, and it is easy to reason how Kafka works. However, the main reason Kafka is very popular is its excellent performance. It has other characteristics as well, but so do other messaging systems. 

Kafka has great performance, and it is stable, provides reliable durability, has a flexible publish-subscribe/queue that scales well with N-number of consumer groups, has robust replication, provides Producers with tunable consistency guarantees, and it provides preserved ordering at shard level (Kafka Topic Partition). 

In addition, Kafka works well with systems that have data streams to process and enables those systems to aggregate, transform & load into other stores. But none of those characteristics would matter if Kafka was slow. The most important reason Kafka is popular is Kafka’s exceptional performance.

Video for Kafka?

posted Dec 26, 2017 by Madhavi Latha

  Promote This Article
Facebook Share Button Twitter Share Button LinkedIn Share Button


Related Articles

What is Apache SINGA?

SINGA is an Apache Incubating project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

SINGA was initiated by the DB System Group at National University of Singapore in 2014, in collaboration with the database group of Zhejiang University.

SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning.

The second goal is to make SINGA easy to use. It is non-trivial for programmers to develop and train models with deep and complex model structures. Distributed training further increases the burden of programmers, e.g., data and model partitioning, and network communication. Hence it is essential to provide an easy to use programming model so that users can implement their deep learning models/algorithms without much awareness of the underlying distributed platform.

Video for Apache SINGA

https://www.youtube.com/watch?v=tkk5tOiWeEQ

READ MORE

What is aiohttp?
Asynchronous HTTP client/server framework for asyncio and Python 

Features:

  • Supports both client and server side of HTTP protocol.
  • Supports both client and server Web-Sockets out-of-the-box and avoids Callback Hell.
  • Provides Web-server with middlewares and pluggable routing.

Commands

pip install aiohttp

You may want to install optional cchardet library as faster replacement for chardet:

pip install cchardet

For speeding up DNS resolving by client API you may install aiodns as well. This option is highly recommended:

pip install aiodns

Example

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, 'http://python.org')
        print(html)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Video for aiohttp

https://www.youtube.com/watch?v=Z784Mwm4VBg

 

 

READ MORE

What is Seaborn?
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Features

  • A dataset-oriented API for examining relationships between multiple variables
  • Specialized support for using categorical variables to show observations or aggregate statistics
  • Options for visualizing univariate or bivariate distributions and for comparing them between subsets of data
  • Automatic estimation and plotting of linear regression models for different kinds dependent variables
  • Convenient views onto the overall structure of complex datasets
  • High-level abstractions for structuring multi-plot grids that let you easily build complex visualizations
  • Concise control over matplotlib figure styling with several built-in themes
  • Tools for choosing color palettes that faithfully reveal patterns in your data

Seaborn aims to make visualization a central part of exploring and understanding data. Its dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots.

Example Code

import seaborn as sns
sns.set()
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", col="time",
            hue="smoker", style="smoker", size="size",
            data=tips);

Video for Seaborn
https://www.youtube.com/watch?v=eMkEL7gdVV0

READ MORE

What is Mlpack Library?

mlpack is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. 

This is done by providing a set of command-line executables which can be used as black boxes, and a modular C++ API for expert users and researchers to easily make changes to the internals of the algorithms.

As a result of this approach, mlpack outperforms competing machine learning libraries by large margins; see the BigLearning workshop paper and the benchmarks for details.

mlpack is developed by contributors from around the world. It is released free of charge, under the 3-clause BSD License (more information). (Versions older than 1.0.12 were released under the GNU Lesser General Public License: LGPL, version 3.)

mlpack was originally presented at the BigLearning workshop of NIPS 2011 [pdf] and later published in the Journal of Machine Learning Research [pdf], with version 3 being published in the Journal of Open Source Software [pdf]. Please cite mlpack in your work using this citation.

mlpack bindings for R are provided by the RcppMLPACK project.

Currently mlpack supports the following algorithms:

  • Collaborative Filtering
  • Decision stumps (one-level decision trees)
  • Density Estimation Trees
  • Euclidean Minimum Spanning Trees
  • Gaussian Mixture Models (GMMs)
  • Hidden Markov Models (HMMs)
  • Kernel Principal Component Analysis (KPCA)
  • K-Means Clustering
  • Least-Angle Regression (LARS/LASSO)
  • Linear Regression
  • Local Coordinate Coding
  • Locality-Sensitive Hashing (LSH)
  • Logistic regression
  • Max-Kernel Search
  • Naive Bayes Classifier
  • Nearest neighbor search with dual-tree algorithms
  • Neighbourhood Components Analysis (NCA)
  • Non-negative Matrix Factorization (NMF)
  • Principal Components Analysis (PCA)
  • Independent component analysis (ICA)
  • Rank-Approximate Nearest Neighbor (RANN)
  • Simple Least-Squares Linear Regression (and Ridge Regression)
  • Sparse Coding, Sparse dictionary learning

For more detail visit here - http://mlpack.org/docs.html

Video for Mlpack

https://www.youtube.com/watch?v=yQtp3gf5wtY

READ MORE
...