How to Add Functionality to Ruby Classes with Decorators


How to Add Functionality to Ruby Classes with Decorators

Decorators allow us to add behavior to objects in runtime and don’t affect other objects of the class. Decorators can be applied when you need to dynamically add and remove responsibility to a class. The decorator pattern is a helpful alternative to creating sub-classes. They give additional functionality to a class while still keeping the public API consistent. Let’s look at an example to understand the importance of Ruby Decorators.

consider we have a Tattoo class with a price method that returns 300.

Class Tattoo
def price
300
end
end

Now we will add an extra color as a feature, and the price would be increased by 150

The simplest way is to create a TattooWithColour subclass that returns 450 in the price method.

class TattooWithColour < Tattoo
def price
450
end
end

Next, we need to represent a big tattoo that adds 200 to the price of our tattoos. We can represent this using a BigTattoo subclass of Tattoo.

class BigTattoo < Tattoo
def price
500
end
end

We could also have bigger sized tattoos and they may add further price to our BigTattoo. If we were to consider that these tattoos types could be used with colours, we would need to add BigTattooWithColour and BiggerTattooWithColour subclasses.

With this method, we end up with total of 6 classes. Even Double that the number if you want to represent these combinations with extra designs on tattoo.

Read full article at RailsCarma Blog. 

Advertisements

Components of Hadoop


big_data_component

The previous article has given you an overview about the Hadoop and the two components of the Hadoop which are HDFS and the Mapreduce framework. This article would now give you the brief explanation about the HDFS architecture and its functioning.

HDFS:

The Hadoop Distributed File System(HDFS) is self-healing high-bandwidth clustered storage. HDFS has a master/slave architecture. An HDFS cluster constitutes of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition there are number of datanodes usually one per node in the cluster, which manages the storage attached to the nodes that they run on.

HDFS exposes a file system namespace and allows user data to be stored in files. Internally a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing and renaming files and directories.

It also determines the mapping of blocks to DataNOdes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion and replication upon instruction from the NameNode.

graphics-300x207

The above sketch represents the architecture of the HDFS.

MapReduce:

The other concept and component of the Hadoop is the Mapreduce. The Mapreduce is distributed fault-tolerant resource management and scheduling coupled with a scalable data programming abstraction.

It is parallel data-processing framework. The Mapreduce framework is used to get the data out from the various files and data nodes available in a system.The first part is that the data has to be pushed onto the different servers where the files would get replicated in short it is to store the data.

The second step once the data is stored the code is to be pushed onto the Hadoop cluster to the namenode which would be distributed on different datanodes which would be becoming the compute nodes and then the end user would be receiving the final output.

Mapreduce in Hadoop is not only the one function happening, there are different tasks involved like record reader, map, combiner, partition-er, shuffle and sort and reduce the data and finally gives the output. It splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner.

The framework sorts the outputs of the maps, which are then pushed as an input to the reduced tasks. Typically both the input and the output of the job are stored in a file-system. The framework also does take care of scheduling, monitoring them re-executing the failed tasks.

Mapreduce Key-value pair:

Mappers and reducers always use key-value pairs as input and output. A reducer reduces values per key only. A mapper or reducer may emit 0,1 or more key value pairs for every input. Mappers and reducers may emit any arbitrary keys or values,not just subsets or transformations of those in the input.

Example:

def map(key, value, context)

value.to_s.split.each do |word|

word.gsub!(/\W/, ”)

word.downcase!

unless word.empty?

context.write(Hadoop::Io::Text.new(word), Hadoop::Io::IntWritable.new(1))

end

end

end

def reduce(key, values, context)

sum = 0

values.each { |value| sum += value.get }

context.write(key, Hadoop::Io::IntWritable.new(sum))

end

Mapper method splits on whitespace, removes all non-word characters and downcases. It outputs a one as the value. Reducer method is iterating over the values, adding up all the numbers, and output the input key and the sum.

Input file: Hello World Bye World

output file: Bye 1

Hello 1

World 2

graphics123-300x173

Hence ends the briefing about the components of the Hadoop, their architecture, functioning and also steps involved in different processes happening in both the systems of the Hadoop.

There are also some pros and cons of the Hadoop similarly like a coin consisting of two faces which would be discussed in the coming blogs. Complete knowledge of any concept can be possible only once you get to know about the merits and demerits of the particular concept.

Henceforth to acquire complete knowledge of Hadoop keep following the upcoming posts of the blog.

 The Two Faces of Hadoop

Pros:

  • Hadoop is a platform which provides Distributed storage & Computational capabilities both.
  • Hadoop is extremely scalable, In fact Hadoop was the first considered to fix a scalability issue that existed in Nutch – Start at 1TB/3-nodes grow to petabytes/1000s of nodes.
  • One of the major component of Hadoop is HDFS (the storage component) that is optimized for high throughput.
  • HDFS uses large block sizes that ultimately helps It works best when manipulating large files (gigabytes, petabytes…)
  • Scalability and Availability are the distinguished features of HDFS to achieve data replication and fault tolerance system.
  • HDFS can replicate files for specified number of times (default is 3 replica) that is tolerant of software and hardware failure, Moreover it can automatically re-replicates data blocks on nodes that have failed.
  • Hadoop uses MapReduce framework which is a batch-based, distributed computing framework, It allows paralleled work over a large amount of data.
  • MapReduce let the developers to focus on addressing business needs only, rather than getting involved in distributed system complications.
  • To achieve parallel & faster execution of the Job, MapReduce decomposes the job into Map & Reduce tasks and schedules them for remote execution on the slave or data nodes of the Hadoop Cluster.
  • Hadoop do has capability to work with MR jobs created in other languages – it is called streaming
  • suited to analyzing big data
  • Amazon’s S3 is the ultimate source of truth here and HDFS is ephemeral. You don’t have to worry about reliability etc – Amazon S3 takes care of that for you. Also means you don’t need high replication factor in HDFS.
  • You can take advantage of cool archiving features like Glacier.
  • You also pay for compute only when you need it. It is well known that most Hadoop installations struggle to hit even 40% utilization [3],[4]. If your utilization is low, spinning up clusters on demand may be a winner for you.
  • Another key point is that your workloads may have some spikes (say end of the week or month) or may be growing every month. You can launch larger clusters when you need to and stick with smaller ones otherwise.
  • You don’t have to provision for peak workload all the time. Similarly, you don’t need to plan your hardware 2-3 years upfront as is common practice with in-house clusters. You can pay as you go, grow as you please. This reduces the risk involved with Big Data projects considerably.
  • Your administration costs can be significantly lower reducing your TCO.
  • No up-front equipment costs. You can spin up as many nodes as you like, for as long as you need them, then shut it down. It’s getting easier to run Hadoop on them.
  • Economics – Cost per TB at a fraction of traditional options.
  • Flexibility – Store any data, Run any analysis.

cons:

  • As you know Hadoop uses HDFS and MapReduce, Both of their master processes are single points of failure, Although there is active work going on for High Availability versions.
  • Until the Hadoop 2.x release, HDFS and MapReduce will be using single-master models which can result in single points of failure.
  • Security is also one of the major concern because Hadoop does offer a security model But by default it is disabled because of its high complexity.
  • Hadoop does not offer storage or network level encryption which is very big concern for government sector application data.
  • HDFS is inefficient for handling small files, and it lacks transparent compression. As HDFS is not designed to work well with random reads over small files due to its optimization for sustained throughput.
  • MapReduce is a batch-based architecture that means it does not lend itself to use cases which needs real-time data access.
  • MapReduce is a shared-nothing architecture hence Tasks that require global synchronization or sharing of mutable data are not a good fit which can pose challenges for some algorithms.
  • S3 is not very fast and vanilla Apache Hadoop’s S3 performance is not great. We, at Qubole, have done some work on Hadoop’s performance with S3 filesystem .
  • S3, of course, comes with its own storage cost .
  • If you want to keep the machines (or data) around for a long time, it’s not as economical a solution as a physical cluster.

Here ends the briefing of Big Data and Hadoop and its various systems and their pros and cons. Wish you would have got an overview about the concept of Big Data and Hadoop.

Source: RailsCarma

 

Scaling Applications with Multiple Database Connection


Business requirements keep changing day by day and we always keep optimizing or scaling our applications based on the usage, new feature additions or subtractions. Over all the agile development adds challenges every now and then.

Applications depending on databases can be scaled by separating the database layer and scaling it independently. The OPS team does take care of such infrastructure changes based on the application deployment architecture.

As a programmer, we can configure our application to work with multiple databases. In this document we are going to explain how we can achieve this in a Rails application.

There are 3 different ways to connect extra database to an application

  1. Set-up database.yml
  2. Direct connection
  3. Writing in module

1. Set-up database.yml:

As we know database.yml will be having 3 database connection by default for development, test and production. We can connect another database to all three environments by adding the code shown below.

other_development:
  adapter: adapter_name (mysql2, postgresql, oracle, Mssql, etc.,)

  database: database_name_development

  user_name: user_name

  password: ******

other_test:

  adapter: adapter_name (mysql2, postgresql, oracle, Mssql, etc.,)

  database: database_name_test

  user_name: user_name

  password: ******

other_production:

  adapter: adapter_name (mysql2, postgresql, oracle, Mssql, etc.,)

  database: database_name_production

  user_name: user_name

  password: ******

After setting up database.yml we can connect it in 2 ways based on the below cases

  • Known database structure
  • Un-known database structure

Known database structure:

If we are aware of the database structure, we can create models for each and we can establish the connection in the model.

Example:

class OtherTable < ActiveRecord::Base

  self.abstract_class = true

  establish_connection “other_#{Rails.env}”

end

This can also be inherited by another model

class Astronaut < OtherTable

  has_many :missions

  has_many :shuttles, through: :missions

end

Un-known database structure:

When we don’t know the database structure we can write only one model and we can make the connection to it. We can do the crud based on the dynamic parameters.

Example:

class ExternalDatabaseConnection < ActiveRecord::Base

  self.abstract_class = true # this class doesn’t have a table

  establish_connection(:database_name)

end

  1. Direct connection:

In case 2nd database has not much importance and is used in one or two places we can directly call the

ActiveRecord::Base.establish_connection with credentials and we can interact with that database.

Example:

ActiveRecord::Base.establish_connection(:adapter=>"adapter_name",:host=>"localhost",

:username =>"user_name",:password => "*********",:database =>  "database_name")

  1. Writing in module:

We can also connect the database from module and included in model as shown below.

Example:

module SecondDatabaseMixin

  extend ActiveSupport::Concern

  included { establish_connection “other_#{Rails.env}” }

end

External database connection:

Database to be connected can be exists on any server. In case it is not on the same server we can give host as IP address of the server where it exists.

Example:

adapter:  adapter_name (mysql2, postgresql, oracle, Mssql, etc.,)

  host:  external_db_server_ip (192.168.1.1)

  username:  user_name

  password:  *******

  database:  db_name

Note: There are few gems available to  magic_multi_connections, Db-charme etc…

Pros and cons:

 Pros

  • If the application has multiple clients and each wants a different database for their customers.
  • Helps in backups for each client.
  • Another database may be used in another application which may have different adapter.
  • When users report that access is slow, easy to know which DB is causing the trouble.

Cons

  • If application is simple with less users
  • Maintenance of code for the rest if any changes on database structure.

Source: RailsCarma

A Simple Way To Increase The Performance Of Your Rails App


Developing a application and optimizing its performance should go hand in hand but mostly the due to achieve the deadlines and complete the features in a project the scope of optimization reduces and the optimization is kept for the end (which is not a good practice) and other factor such as lack of experience and substandard coding also lead to decrease in the performance.

There are many way to increase the performance and code your application, it is a vast topic and subjective to the application your are development will be discussing small changes that can be done, which will change performance and improve it.

The main areas to concentrate to improve you app performance while developing :

  1. Database optimization and Query optimization
  2. JavaScript, CSS optimization

Lets concentrate on database optimization and here are few quick tips (again these are basic tweaks which we feel should be applied & are subjective to applications and developers)

  • Maintaining proper indexing for required tables in the db (don’t over do indexing ,it may also lead to lowering performance. You can decide this on case to case basis)
  • Maintaining proper relation and association between models also is major factor effecting the app performance and proper utilization of association will increase the performance.
  • Fetch only when it is required and only what is required and reuse the fetched data from the db as much as possible.
  • Optimize the query by limiting the data fetched and fetch data in batches for large amount of data.
  • Database caching can be used to reduce the response time and number of queries . We can achieve it by implementing memcached and dalli gem.
  • Do not write queries in loop, Its the biggest Don’t do while coding. .If its already done find a way to rewrite that part of code and avoid the calling query in a loop situation.

These are the few point can be looked in to for optimizing the rails application. We would also recommend Bullet gem in development which is very useful to reduce N+1 queries in application .

The project is available on GitHub: http://github.com/flyerhzm/bullet
If you have any other pointers to add to this, feel free to comment. We shall take them while writing our next set of articles.

Source: RailsCarma

An Introduction to Rails API


API stands for Application Interface Program, which provides one application to interact with ‘n’ number of applications which is of same/different language, to access the data/functionality.
Creating API application provides more scalability to the web applications. It will also helps for the easy integration with cross domain applications/languages.
• iOS apps
• Android apps
• Node js framework
• Angular js framework

There are 2 ways to achieve this in rails.

1. We can easily create a new API application using gem called rails-api, which inherit the application from ActionControllerAPI instead of ActionControllerBase and it will skip view generation. This will also helps to configure the middlewares.

2. In case the application is already created we have to inherit ActionControllerAPI manually.

Basic Flow

railscarma_Api

Versioning API’s
Once the application is set-up we can create the controller under controller/v1 folder, that will helps for the easy maintenance of versions and releasing new version of API’s.

In this controller we can write code for crud or some functionality that can be called by curl or as API request from front end application for the GET, POST, DELETE, PATCH requests gives responses in JSON/xml format, which is in human readable form. This json data can be read and shown from front-end application.

Security
By passing the token which is generated for each user and email of the user through an api header to secure an api. It can be ensured that there only authenticated user can access and modify data using api.

Using these we can authenticate the user and secure the application. According to the data sent and the data matches in the applications we can send the proper responses back to the front-end application.

These are few basic aspects which can be implemented using rails and create a robust API architecture.

Components of Hadoop


In our previous blog we learned that the platform that processes and organizes Big Data is  Hadoop. Here we will learn more about Hadoop which is a core platform for structuring Big Data and solves the problems of utilizing it for analytic purposes. It is an Open Source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

Main characteristics of Hadoop:

  • Highly scalable (scaled out)
  • Commodity hardware based
  • Open Source, low acquisition and storage costs

Hadoop is basically divided into two parts namely : HDFS and Mapreduce framework. A Hadoop cluster is specially designed for storing and analyzing huge amounts of unstructured data. Workload is distributed across multiple cluster nodes that work to process data in parallel.

images

History of Hadoop

Doug Cutting is the brains behind Hadoop which has its origin in Apache and Nutch. Nutch was started in 2002 and it itself is an Open Source web search engine. Google published the paper that introduced the Mapreduce to the world.  In early 2005  Nutch developers had a working Mapreduce implementation in Nutch.

In February 2006  Hadoop was formed as an independent project by Nutch. In January 2008 Hadoop has made its own top level project at Apache and by this time major companies like Yahoo and Facebook started using  Hadoop.

HDFS is the first aspect and Mapreduce  is the secondary aspect of  Hadoop.  HDFS has an architecture which helps it in processing the data and organizing it.

To get into details of  HDFS, its architecture, functioning and several other concepts, keep an eye on the blogs that will be published in coming days.

Source : RailsCarma

The Tool For Processing Big Data – Hadoop


imagesIn our previous blog we learned that the platform that processes and organizes Big Data is  Hadoop. Here we will learn more about Hadoop which is a core platform for structuring Big Data and solves the problems of utilizing it for analytic purposes. It is an Open Source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.

Main characteristics of Hadoop:

  • Highly scalable (scaled out)
  • Commodity hardware based
  • Open Source, low acquisition and storage costs

Hadoop is basically divided into two parts namely : HDFS and Mapreduce framework. A Hadoop cluster is specially designed for storing and analyzing huge amounts of unstructured data. Workload is distributed across multiple cluster nodes that work to process data in parallel.

History of Hadoop

Doug Cutting is the brains behind Hadoop which has its origin in Apache and Nutch. Nutch was started in 2002 and it itself is an Open Source web search engine. Google published the paper that introduced the Mapreduce to the world.  In early 2005  Nutch developers had a working Mapreduce implementation in Nutch.

In February 2006  Hadoop was formed as an independent project by Nutch. In January 2008 Hadoop has made its own top level project at Apache and by this time major companies like Yahoo and Facebook started using  Hadoop.

HDFS is the first aspect and Mapreduce  is the secondary aspect of  Hadoop.  HDFS has an architecture which helps it in processing the data and organizing it.

To get into details of  HDFS, its architecture, functioning and several other concepts, keep an eye on the blogs that will be published in coming days.

Source : RailsCarma