Monday, May 30, 2016

How Data-Driven Companies Perform by making use of bigdata

How Data-Driven Companies Perform

The second question doubters may stance is this present: "Where's the confirmation that utilizing huge information astutely will enhance business execution?" The business press is overflowing with accounts and contextual analyses that as far as anyone knows show the benefit of being information driven. In any case, reality, we understood as of late, is that no one was handling that question thoroughly. To address this humiliating hole, we drove a group at the MIT Center for Digital Business, working in organization with McKinsey's business innovation office and with our partner Lorin Hitt at Wharton and the MIT doctoral understudy Heekyung Kim. We set out to test the speculation that information driven organizations would be better entertainers. We led organized meetings with officials at 330 open North American organizations about their authoritative and innovation administration hones, and assembled execution information from their yearly reports and free sources.

Not everybody was grasping information driven basic leadership. Indeed, we found a wide range of mentalities and methodologies in each industry. Be that as it may, over all the examinations we directed, one relationship emerged: The more organizations portrayed themselves as information driven, the better they performed on target measures of money related and operational results. Specifically, organizations in the top third of their industry in the utilization of information driven basic leadership were, by and large, 5% more gainful and 6% more beneficial than their rivals. This execution contrast stayed powerful subsequent to representing the commitments of work, capital, acquired administrations, and customary IT speculation. It was factually huge and financially vital and was reflected in quantifiable increments in securities exchange valuations.

big data the new revolution in management

What's New Here?

Business administrators infrequently ask us, "Isn't 'enormous information' simply one more method for saying 'examination'?" It's actual that they're connected: The huge information development, as investigation before it, looks to gather knowledge from information and make an interpretation of that into business advantage. In any case, there are three key contrasts:


Starting 2012, around 2.5 exabytes of information are made every day, and that number is multiplying at regular intervals or somewhere in the vicinity. A greater number of information cross the web each second than were put away in the whole web only 20 years back. This gives organizations a chance to work with numerous petabyes of information in a solitary information set—and not simply from the web. For example, it is assessed that Walmart gathers more than 2.5 petabytes of information consistently from its client exchanges. A petabyte is one quadrillion bytes, or what might as well be called around 20 million file organizers of content. An exabyte is 1,000 times that sum, or one billion gigabytes.


For some applications, the rate of information creation is much more critical than the volume. Ongoing or almost constant data makes it workable for an organization to be significantly more nimble than its rivals. For example, our associate Alex "Sandy" Pentland and his gathering at the MIT Media Lab utilized area information from cell telephones to induce what number of individuals were in Macy's parking areas on Black Friday—the begin of the Christmas shopping season in the United States. This made it conceivable to evaluate the retailer's deals on that basic day even before Macy's itself had recorded those deals. Fast bits of knowledge like that can give a conspicuous upper hand to Wall Street investigators and Main Street directors.


Enormous information takes the type of messages, overhauls, and pictures presented on interpersonal organizations; readings from sensors; GPS signals from PDAs, and that's only the tip of the iceberg. A number of the most vital wellsprings of huge information are generally new. The enormous measures of data from informal communities, for instance, are just as old as the systems themselves; Facebook was dispatched in 2004, Twitter in 2006. The same holds for cell phones and the other cell phones that now give colossal floods of information fixing to individuals, exercises, and areas. Since these gadgets are pervasive, it's anything but difficult to overlook that the iPhone was uncovered just five years prior, and the iPad in 2010. Therefore the organized databases that put away most corporate data as of not long ago are illsuited to putting away and handling huge information. In the meantime, the relentlessly declining expenses of the considerable number of components of registering—stockpiling, memory, preparing, transfer speed, et cetera—imply that already costly information concentrated methodologies are rapidly getting to be sparing.

As more business movement is digitized, new wellsprings of data and ever-less expensive gear consolidate to bring us into another time: one in which a lot of advanced data exist on for all intents and purposes any theme important to a business. Cell telephones, internet shopping, interpersonal organizations, electronic correspondence, GPS, and instrumented apparatus all produce deluges of information as a by-result of their common operations. Each of us is currently a mobile information generator. The information accessible are regularly unstructured—not sorted out in a database—and cumbersome, but rather there's an immense measure of sign in the commotion, just holding up to be discharged. Investigation conveyed thorough strategies to basic leadership; enormous information is on the double less difficult and all the more intense. As Google's executive of exploration, Peter Norvig, puts it: "We don't have better calculations. We simply have more information."

Sunday, May 29, 2016

Big Data, Analytics and KPIs in E-commerce and Retail Industry by

Step by step instructions to change over Lookers to Bookers…

Step by step instructions to make one of a kind and powerful Digital Experiences that effect likelihood of procurement or probability of return.

What offers may bring about higher "take rates"

The adjustment in buyer conduct and desires that e-trade, versatile and online networking are bringing on is immensely noteworthy – huge information and prescient investigation will isolate brand/retail champs from failures. This won't occur without any forethought however the change is no doubt.

Retail Industry makes up a sizable part of the world economy (6-7%) and covers an expansive environment – E-trade, Apparel, Department Stores, Discount Drugstores, Discount Retailers, Electronics, Home Improvement, Specialty Grocery, Specialty Retailers and Consumer Product Goods suppliers.

Retail is progressively is resembling a barbell – a brand situated group at the top of the line, a slim center, and a value delicate bunch at the low end. The consumerization of innovation is putting all the more descending valuing weight in an officially focused "center" retail environment. The press is originating from e-trade and new "point, check and examine" innovations that give customers basic leadership apparatuses — intense estimating, advancement and item data, regularly continuously. Applications in iPhones and Droid, similar to Red Laser can examine standardized identifications and give prompt value, item and cross-retailer examinations. They can even guide you toward the closest retailer who can give you free transporting (absolute expense of procurement streamlining). This will prompt further edge disintegration for retailers that contend taking into account value (a sizable lump of the business sector in the U.S, Europe and Asia).

Information investigation is not new for retailers. Purpose of offer value-based information acquired from standardized identifications initially showed up in 1970s. A pack of Wrigley's biting gum was the principal thing checked utilizing Universal Product Code (UPC) in a Marsh Supermarket in Troy, Ohio in 1974. From that point forward, retailers have been applying examination to get considerably more astute and speedup the whole business esteem chain.

Later utilize instances of retail examination include:

Corporate Express is utilizing its business sector wicker bin application to track the crate of the greater part of its online clients, and can recognize which SKUs are most productive or poor entertainers when acquired in mixes.

Liz Claiborne breaks down monetary, deals, and stock information crosswise over 30,000 purposes of offer around the world — retail chains, claim to fame stores, and lists/Internet — to reveal income streamlining opportunities.

Restricted Brands presently utilizes MicroStrategy for client investigation, money related examination, stock administration, deals investigation, and classification administration reporting over the greater part of its brands (counting Victoria's Secret, Express, Bath and Body Works, and Henri Bendel).

Netflix is a substantial client of huge information… The organization issued a $1M investigation challenge in October 2006: concoct suggestion programming that could make a superior showing with regards to anticipating the films clients might want than Netflix's in-house programming, Cinematch. To fit the bill for the prize, sections must be no less than 10 percent superior to anything Cinematch. (for more detail see notes at end).

BestBuy had a clickstream examination issue — 3.5 billion records, 71 million interesting treats, 1.7 million focused on promotions required every day. How to comprehend this information? They employed Razorfish to actualize a logical arrangement on Amazon Web Services and Elastic MapReduce. Arrangement was a 100 hub bunch on interest; preparing time was decreased from 2+ days to 8 hours.

In a business with razor-slender edges, little, mid and huge retailers are continually searching for the execution change edge. Retailers are dashing to end up better at cutting and dicing reams of information they gather from numerous channels – on the web, stores, call focus, lists and cell phones. They are utilizing business examination to get knowledge into retail KPIs (result measures) or Metrics (operational exercises that serve as driving or slacking pointers). The objective is to drive huge topline and bottomline changes through enhanced promoting choices, all the more convenient reactions to data demands, cost decrease activities, upgraded worker efficiency, and better administration to clients.

Sorts of Big Data Analytics in Retail

We examined the commonplace retailers and discovered some fascinating examples of how crude information is being changed over into signs and activities. A large portion of the Retail Analytics and Big Data retail tend to bunch around the accompanying zones:

Client Analytics and KPIs

Merchandizing KPIs — Eliminate Stockouts and Overstocks

Store Operations Analytics and KPIs

Merchant and SKU Management Scorecards and KPIs

Promoting Analytics and KPIs

Returns, Fraud and Loss Prevention Analytics and KPIs

It's vital to recognize retail KPIs and measurements. We see KPIs as groups of measurements that in total give a report card. Measurements then again are essentially previews of an action. Consider KPIs general wellbeing versus measurements as taking temperature or measuring shortness of expansiveness.

KPIs resemble idle variables measurements (instead of recognizable variables). KPIs are variables that are not specifically watched but rather will be fairly induced (through a numerical model) from different variables that are watched (straightforwardly measured).

Client Analytics and KPIs

Objective: Understand Your Most Valuable Customers. Target Them to Maximize Profits and Loyalty

Each retailer is centered around comprehension and utilizing all client touch focuses, from granular in-store value-based information to online store and web look data to portable information, with a specific end goal to most viably target clients, enhance rehash buy, drive cross-offer. All to boost benefits and client unwaveringness.

Giving best in class client investigation, retail business examination empowers retailers to coordinate and dissect the continually growing universe of information to acquire important experiences about client conduct, purchasing examples, and buyer inclinations.

Region Type of Analytical Insights

Client and Market Basket Analytics and KPIs –Understanding and utilizing all client touch focuses

Conduct profiling

Inclination demonstrating

Fondness following

Connection Rates

Bushel Register Correlation

Brand Switching

Center Item Frequency

Center Items

Client Loyalty

Demographic Baskets

In-Basket Price

Things per Basket

Income Contribution

Customer Penetration

What Can You Do With Customer Analytics?

The most critical resources for each retailer are clients. The test is to search for ways enhance the quality from every relationship.

Find who your clients are. Recognize their special qualities, and realize what items they buy and why.

Client venture investigation, desires and notion following.

Track the effect of advancements on client wicker container as they travel through your stores and give a comprehensive perspective of the greater part of their practices.

Tap into their value-based information to come to an obvious conclusion regarding clients, stores, items, and advancements.

Move past fundamental division, personas, and rundown pulls to make focused on microsegments – exceedingly refined gatherings of clients in view of client characterized attributes.

When you comprehend your client at this level, you can adjust compelling strategies to customized client fragments to encompass them with an all encompassing shopping knowledge. Empower business clients in advertising and marketing to consistently coordinate knowledge with advancement arranging, week after week flyers, planograms, store varieties, and that's only the tip of the iceberg. Support offer execution. Drive bigger wicker container. Final objective – expand deals, client productivity, and client dependability.

Merchandizing KPIs

Objective: Significantly decrease costs, dispose of the cost of stock-outs and overloads, and make intense, quick choices

Retailers face monstrous weight to adjust stock levels with in-stock positions while keeping up collections that are pertinent and new. Purchasers and stock administrators should be continually on-the-go either in group gatherings, voyaging abroad for a purchasing trek, or working as an inseparable unit with merchants.

Whether at their work area or on-the-go, this occupation requires portability and steady availability to react to purchaser request, occasional patterns, and form styles, all to guarantee that clients get the right items.

Stock Analytics and KPIs — Eliminate Stockouts and Overstocks

Office Contribution

Hot Item Report

In Stock %

Stock Turns

Lineal Feet

Write Down %



Occasional Buying

Offer Through

Weeks of Supply

The Merchandise Analysis situations are intended to enhance stock choices. Clients can rapidly quicken shipments by assessing top-offering items, settle on markdown choices in light of regular offer through, wipe out shipments for base offering items, and convey all the more successfully with merchants.

Store Operations Analytics and KPIs

Objective: Keep Store Managers on the Selling Floor, Not Behind a Desk. Give Store Operations the Right Information at the Right Time to Make the Right Decisions.

Store directors are more viable when they are out on the offering floor, not in a back office. The old worldview of paper reporting or even web/desktop is presently getting to be outdated. Portability enables field chiefs in stores to set aside a few minutes choices with respect to stock, advancements, valuing, staffing, and store outline.

Store administrators can now run day by day examinations of store execution where they can total by Store deals, SKU deals and deals per representative. In some propelled retailers, the information is revived progressively so they can manage without a moment to spare renewal all the more astutely.

Portable store operations reporting keeps the store supervisors where they ought to be: in the store, on the offering floor. Store chiefs can see, at consistent interims for the duration of the day, how their deals are performing contrasted with a year ago, giving them the data and devices to enhance execution.

Zone Type of Analytical Insights

Store Operations Analytics and KPIs

Resource Turnover

Contender Stores

Comp Store Sales

Front Store Sales

Stock Turnover

Work Cost Analysis


Register Usage Analysis


Deals and Margin

Deals/Square Foot

Store activity data conveys point of interest on a store's execution: Units per Transaction, Average Transactions (Units and $), Conversion Rates, and Sales per Payroll Hour are all given in the versatile BI application.

Applicable, separated data conveyed when and where it is required stores faculty distinguish oddities in their operations and drive quick activity in the ranges of work booking, client administration, stock presentation, limited time signage, potential stock deficits created by neighborhood economic situations, and that's only the tip of the iceberg.

Merchant and SKU Management Scorecards and KPIs

Objective: Analyze Vendor Performance, Drive Improvement, and Strengthen Negotiations. Enhance Performance Across the Supply Chain

Overseeing merchants in today's surroundings obliges you to realize what is going on in the store network constant, empowering you to settle on fast choices to upgrade execution and to reinforce your transactions on cost and quality.

Region Type of Analytical Insights

Class Management KPIs

Arrangement Optimization

Class Contribution

Class Scorecards

Thing Definitions

Market Comparison

Estimating Modules

Part Analysis

Season Changeover

Offer of Shelf

SKU Rationalization

Seller Performance

A thorough Vendor Scorecard gives the right data to make the right move at the opportune time. A scorecard helps mutually enhance execution by giving a typical perspective of the accompanying:

How are my items/classifications performing?

Where do I have issues with item deals, returns, or abundance stock?

How am I performing operationally in conveying item?

What is the genuine benefit commitment of every item?

What are the consequences of my advancement endeavors?

Hadoop vs Data warehousing

Numerous Hadoop specialists trust an incorporated information distribution center (IDW) is essentially a gigantic heap of information. In any case, information volume has nothing to do with what makes an information stockroom. An IDW is an outline example, an engineering for an investigation situation. Initially characterized by Barry Devlin in 1988, the engineering rapidly was raised doubt about as implementers fabricated enormous databases with basic outlines and little databases with complex plans.
“Subject oriented” means the IDW is a digital reflection of the business. Subject areas contain tabular data about customers, inventory, financials, sales, suppliers, accounts, etc. The IDW contains many subject areas, each of which is 250 to 5,000 relational tables. Having many subject areas enables cross-organizational analysis – often called the 360-degree view. The IDW can answer thousands of routine, ad hoc, and complex questions.

In contrast, a data mart deploys a small fraction of one or two subject areas (i.e., a few tables). With only a few tables, data marts answer far fewer questions and are poor at handling ad hoc requests from executives.

Integration in a data warehouse has many aspects. First is the standardization of data types. This means account balances contain only valid numbers, date fields have only valid dates, and so on. Integration also means rationalizing data from multiple operational applications. For example, say four corporate applications have Bill Franks, William Franks, W. J. Franks, and Frank Williams all at the same street address. Data-integration tools figure out which is the best data to put in the IDW. Data cleansing corrects messed-up data. For example, repairs are needed when “123 Oak St., Atlanta” is in the street address but the city field is blank. Data integration performs dozens of tasks to improve the quality and validity of the data. Coupled with subject areas, this is called “a single version of the truth.”

Does Hadoop Have What it Takes?

Hadoop was built to depend on the diagram on-read approach, in which information is parsed, reformatted, and purified at runtime in a physically composed system. However, Hadoop (and Hive) have restricted to no capacity to guarantee legitimate dates and numeric record parities. Interestingly, social database administration frameworks (RDBMS) guarantee that information records adjust to the database plan – called the construction. As per Dr. Michael Stonebraker, "This is the most ideal approach to keep an application from including "rubbish" to an information set."

The present fury in the Hadoop people group is SQL-on-Hadoop. The individuals who have resolved to open-source Apache are playing get up to speed to databases by including SQL dialect highlights. SQL-on-Hadoop offers are a subset of the ANSI 1992 SQL dialect, which means they need highlights found in SQL 1999, 2003, 2006, 2008, and 2011 principles. Subsequently, the business client's capacity to perform self-administration reporting and investigation is throttled. This, thus, tosses a significant work cost again into IT to create reports in Java.

Moreover, the absence of a database establishment additionally keeps SQL-on-Hadoop from accomplishing quick execution. Missing from Hadoop are strong indexing systems, in-database administrators, propelled memory administration, simultaneousness, and element workload administration.

A reliable – at times irate – grumbling from Hadoop specialists is the poor execution in huge table joins, which the SQL-on-Hadoop apparatuses don't settle. Keep in mind those branches of knowledge above? Some branches of knowledge have two to 10 tables in the 50-1,000 terabyte range. With a full grown scientific database, it is a testing issue to advance questions that join 50TB with 500TB, sort it, and do it quick. Luckily, RDBMS sellers have been improving the RDBMS and cost-based analyzers since the 1980s. A couple Apache Hadoop committers are at present rethinking this wheel, proposing to discharge a juvenile analyzer later in 2014. Once more, self-administration business client question and reporting endures.

Hadoop, in this way, does not have what it takes to be an information stockroom. It is, in any case, nipping at the heels of information stores.

What number of Warehouses Has Hadoop Replaced?

To the extent we know, Hadoop has never supplanted an information stockroom, in spite of the fact that I've saw a couple fizzled endeavors. Rather, Hadoop has possessed the capacity to peel off a couple of workloads from an IDW. Moving low-esteem information and workloads to Hadoop is not boundless, but rather nor is it uncommon.

One workload regularly offloaded is concentrate change load (ETL). In fact, Hadoop is not an ETL arrangement. It's a middleware framework for parallelism. Hadoop requires hand coding of ETL changes, which is costly, particularly when support costs heap up in the years to come. Basic RDBMS assignments like referential trustworthiness checks and match key lookup don't exist in Hadoop or Hive. Hadoop does not give average ETL subsystem highlights out-of-the-container, for example,

Several implicit information sort transformations, transformers, gaze upward coordinating, and conglomerations

Strong metadata, information genealogy, and information displaying capacities

Information quality and profiling subsystems

Work process administration, i.e., a GUI for creating ETL scripts and taking care of blunders

Fine grained, part based security

Since movements frequently accompany million-dollar sticker prices, there is not a rush of ETL relocations to Hadoop. Numerous associations keep the low-esteem ETL workload in the IDW in light of the fact that:

The IDW works (it ain't broke, don't settle it)

A long time of business rationale must be recoded, repaired, and confirmed in Hadoop (hazard)

There are higher business esteem Hadoop activities to be actualized (ROI)

In any case, some ETL workload movements are legitimate. When they happen, the IDW assets authorized are immediately devoured by business clients.

So also, Hadoop gives a parallel stage to investigation, however it doesn't give the examination. Hadoop downloads do exclude report improvement devices, dashboards, OLAP 3D shapes, many factual capacities, time arrangement examination, prescient investigation, streamlining, and different investigation. These must be hand coded or obtained somewhere else and coordinated into tasks.

Hadoop Was Never Free

Where does this leave the destitute CIO who is still under weight? As per Phil Russom of The Data Warehousing Institute: "Hadoop is not free, the same number of individuals have erroneously said in regards to it. Various Hadoop clients talking at late TDWI gatherings have clarified that Hadoop brings about generous finance costs because of its escalated hand coding typically done by high-finance work force."

This mirrors the general understanding in the business, which is that Hadoop is a long way from free. The $1,000/terabyte equipment expenses are buildup in the first place, and conventional merchants are surrounding Hadoop's equipment value advantage at any rate. Moreover, some SQL-on-Hadoop offerings are independently estimated as open source merchants look for income. In the event that you need Hadoop to be quick and utilitarian, well, that part is moving far from free and toward turning into an exclusive, valued database.

Hadoop Jumps in the Lake

Mark Madsen, President of Third Nature, gives some bearing on Hadoop advantages: "A portion of the workloads, especially when extensive information volumes are included, require new capacity layers in the information engineering and new preparing motors. These are the issues Hadoop and exchange preparing motors are prepared to settle."

Hadoop characterizes another business sector, called the information lake. Information lake workloads incorporate the accompanying:

Numerous server farms have 50 million to 150 million records. Sorting out this into a strong foundation, knowing where everything is, its age, its worth, and its upstream/downstream uses is an imposing assignment. The information lake idea is particularly arranged to unravel this.

Hadoop can run parallel inquiries over level records. This permits it do essential operational giving an account of information in its unique structure.

Hadoop exceeds expectations as a chronicled subsystem. Utilizing ease plate stockpiling, Hadoop can pack and clutch information in its crude structure for a considerable length of time. This maintains a strategic distance from the issue of disintegrating attractive tapes and current programming forms that can't read the tape they delivered eight years before. A nearby cousin to authentic is reinforcement to-circle. Once more, attractive tape is the contender.

Hadoop is perfect for makeshift information that will be utilized for a month or two then disposed of. There are numerous earnest ventures that need information for a brief timeframe then never again. Utilizing Hadoop keeps away from the long procedure of getting information through advisory groups into the information distribution center.

Hadoop, most quite YARN from Hortonworks, is giving the primary bunch working framework. This is astounding stuff. YARN enhances Hadoop group administration however does not change Hadoop's position opposite the information distribution center.

Wednesday, May 11, 2016


Here is how retailers can adopt big data into existing businesses.
  1. Personalization
    “Respect the shopper” is the tagline of eCommerce businesses from the last couple of years. Collect data such as customer demographics, order history, behavioral patterns, etc. from various channels and apply learning for a more privileged customer experience with personalized products, offers, and so on.
  2. Dynamic Pricing and Ad Targeting
    Dynamic pricing in real-time is where online retail is headed. Dynamic pricing is necessary in online retail, as shoppers are highly price-sensitive and markets are hyper-competitive. Dynamic and relevant ad presentation increases customer engagement and conversion rates. Learn from the purchases, conversion rates, demand, and traffic to cut or boost prices. Amazon dynamically re-prices products every 10 minutes to stay competitive in the market.
  3. Supply Chain Management
    Big data allows retailers to determine patterns that can be useful for forecasting any potential hiccups or disruptions to supply. For instance, any changes to purchase patterns can be captured in real-time and necessary steps can be taken immediately on inventory, warehouse, distribution, etc..
  4. Customer Service
    Excellent customer service is critical for increasing sales. Big data can help consolidate the various communication channel data — including phone calls, emails, and live chat — for the identification of critical customer issues, time taken to resolve issues, etc., thereby finding and implementing efficiencies.
  5. Predictive Analytics
    Big data allows businesses to get the in-depth insights of customer, operational, and business data, ultimately allowing them to whittle everything down to a much more predictable and manageable format.
  6. Complex Event Processing (CEP)
    Big data and CEP can help in-store and online retailers take real-time decisions based on the data in motion. Retailers can learn about events such as items being added to carts/wishlists in real-time and act upon them.

Monday, May 9, 2016

4 V in Big Data and evolution of big data

4 v in bigdata

Bigdata is usually transformed in three dimensions- volume, velocity and variety.

Volume: Machine generated data is produced in larger quantities than non traditional data.

Velocity: This refers to the speed of data processing.

Variety: This refers to large variety of input data which in turn generates large amount of data as output.

Veracity: Data in doubt.

Evolution of  Big Data

1990:OLAP and data warehousing
2000:Batch processing and Distributed data store(Big data)
2010:Real time and single View (Big Data)

Thursday, May 5, 2016

Data base management System Used in Big Data Analytics

All major relational database management system vendors are adding features to address big data analytics from a relational point of view. The two most
significant architectural developments have been the overtaking of the high end of the market with massively parallel processing (MPP), and the growing adoption of columnar storage. When MPP and columnar storage techniques are combined, a number of the system requirements in the above list can start to be addressed, including:

1. To support exabytes (thousands of petabytes) of data being distributed across     thousands of geographically dispersed processors subsecond response time    
    for highly constrained standard SQL queries.
2  Updating data in place at full load speeds
3  being configured without being subject to a single point of failure
4 fail over and process continuation when processing nodes fail

Additionally, RDBMS vendors are adding some complex user-defined functions
(UDF's) to their syntax, but the kind of general purpose procedural language
computing required by big data analytics is not being satisfied in relational
environments at this time.
In a similar vein, RDBMS vendors are allowing complex data structures to be stored. These kind of embedded complex data structures have been known as "blobs" .It's important to understand that relational databases have a hard time providing general support for interpreting blobs 

The Evolving Role of the Enterprise Data Warehouse in the era of Big Data Analytics ,Blobs are related to “data bags” 
MPP implementations have never satisfactorily addressed the "big join” where dimension table are to be joined to a trillion row fact table without resorting to clustered storage. The big join crisis occurs when an ad hoc constraint is
placed against the dimension table . Since the dimension keys are scattered  randomly across the separate segments of the trillion row fact table, it is very hard to avoid a lengthy download step of the very large dimension table to
every one of the fact table storage partitions. To be fair, the MapReduce/Hadoop
architecture has not been able to address the big join problem either.
 The standard RDBMS architecture for implementing an enterprise data warehouse based on dimensional modeling principles is simple and well understood.

Wednesday, May 4, 2016


Basic requirement of Big Data analytic system are
1.Mission statement
2. identify use case requirement
3.Easily supportable to tera byte of Data
4.Distributed thousand of processor
5.heterogenous in nature
6.Have high response time
7.embedded user defined function
8.implementing UDF in a wide variety of industry requirement
9.assembling UDF libraries acorss the system
10.executing UDF in few seconds
11.Support wide variety of data types
12.Loading data and be ready of analysis
13.integrating data from multiple sources at a very high rate
14. Loading data before discovering its structure
15.updating data at full speed
16.joining billions of row dimension data
17.scheduling and executing the data
18.supporting extreme workloads

Monday, May 2, 2016


Big data is structured, semi structured, unstructured, and raw data in many different formats, also its looking totally different than the numbers and text we have stored in our data warehouses for almost 3 decades years. Much big data cannot be analyzed with anything that looks like SQ or any another DML languages available . But most important, big data is a paradigm shift in how we think about data assets, where do we collect them, how do we analyze them, and how do we monetize the insights from the analysis.
The big data revolution is about finding new value within and outside common data sources. An additional approach is needed because the software and hardware environments of the past have not been able to capture, manage, or process the new forms of data within reasonable times or processing times. 
 Before proceding further every organization need to understand the importance of data and they are able to understhand the important role big data analytics

Data is an asset on the balance sheet of the organization

Slowly Enterprises now understand that data itself is an asset that should appear on the balance sheet in the same way that traditional assets from the manufacturing age such as equipment and land have always appeared. There are several ways to determine the value of the data asset, including  cost to produce the data,  cost to replace the data if it is lost, revenue or profit opportunity provided by the data,revenue or profit loss if data falls into competitors hands, legal exposure from fines and lawsuits if data is exposed to the wrong parties
But more important than the data itself, enterprises understand that data can be monetized,many e commerce application doing it.When an e-commerce site detects an increase in favorable click throughs from an experimental ad treatment, that insight can be taken to the bottom line immediately. This direct cause-and-effect is easily understood by management, and an analytic research group that consistently demonstrates these insights is looked upon as a strategic resource for the enterprise by the highest levels of management. This growth in business awareness of the value of data-driven insights is rapidly spreading outward from the e-commerce world to virtually every business segment