Thesis Proposal


A Study of MongoDB and Oracle 11g R2 in an Ecommerce Environment

Aaron Ploetz
Research Methodology Summary
MCT 624 - Thesis Fundamentals
Thesis Advisor - Darl Kuhn
Regis University

Thesis Statement

NoSQL databases have attained success in large-scale, niche-based internet implementations, but have yet to experience widespread acceptance in ecommerce. This study will compare and examine MongoDB and Oracle 11g, to identify performance patterns in ecommerce scenarios. The goal of this study will be to ascertain a set of conditions that will describe whether or not an ecommerce database project would be better-served with a NoSQL or (traditional) Relational Database Management System (RDBMS).


Scope

While the original thought was to compare NoSQL databases to Relational Database Management Systems (RDBMS), that has been determined to be too broad. NoSQL databases vary dramatically (even when compared to each other) from an architectural perspective. To assume that experiments done with MongoDB would reflect those done with Cassandra or CouchDB would be erroneous. Testing multiple NoSQL databases would be challenging and time-consuming, therefore a single NoSQL database had to be chosen.


MongoDB has been chosen as the NoSQL subject for this study for several reasons. First of all, it has been advocated for use in ecommerce by several authors, including “MongoDB in Action” author Kyle Banker. Authors Steve Francia and Dwight Merriman also describe MongoDB as (Francia, Merriman 2011) “well suited” for ecommerce. Additionally, MongoDB's reputed ability to scale horizontally and its flexible schema make it a good candidate for a web product database.


Oracle 11g was chosen to represent the RDBMS side of this study. Oracle is widely considered to be the front-runner in the current RDBMS market (Mullins 2011) with a 48% market share. Its status as an industry leader makes it the most attractive option.


The original implementation of this study was to be on “traditional data processing environments.” That statement is too vague to do a valuable study on, and had to be refined. Due to the author's experience and qualifications, the use cases for this study were chosen to relate to the ecommerce industry.


Significance

I believe that this study is significant for two reasons. First of all, NoSQL databases have proven to be a viable solution for some unique scaling problems. But the idea of implementing a NoSQL solution seems premature to many information technology organizations. Ofttimes experienced professionals will decide to “live with” a process that is lengthy or slow due to an inability to scale horizontally.


Secondly, NoSQL is a known “buzzword”, which has the effect of people seeking it out when it may not be the best solution. There have been instances of early adopters who do not really understand NoSQL technology (Banker, 2010), and run into issues trying to implement a relational model with it. It is the author’s opinion that there is a high degree of confusion surrounding the appropriate use of NoSQL databases.


Research Methodologies

A testing framework (to be developed) will simulate an ecommerce website. As a part of the testing framework, a series of common ecommerce functions will be written to operate in a multi-threaded capacity.


Test data will be generated to simulate customers, addresses (of customers) and products. The data will then be loaded into each database (MongoDB and Oracle 11g), so that each will have the same product, customer and address data. The database instances will be running on Linux machines with identical hardware configurations. Next, a series of experiments will be run using the aforementioned ecommerce functions of the testing framework. Statistics will be tracked for performance of CRUD-based transactions typical for an ecommerce website.


Some of the experiments will be focused on testing the ACID (atomicity, consistency, isolation, and durability) properties of each database. It is expected that the all of the ACID tests run against the Oracle 11g instance will succeed. However, data regarding the performance of the MongoDB tests will be recorded and scrutinized for its adherence to ACID properties.


Success criteria

Once the data has been recorded, select variables (including but not limited to column size, number of rows indexed, and size of database) will be analyzed with a correlational approach. The presence of meaningful correlational coefficients will help in deriving conclusions for this study.


A deliverable of this study will be a list of specific, concrete instances where a NoSQL database is a better choice for an ecommerce back-end. The definition of “better” in this case, resembles a favorable trade-off of performance and ACIDity.


Project Plan


Final draft of thesis

Task # Phase Task Name Deliverable Completion Date Dependency #

Planning / Preliminary Research



1
Thesis proposal Final draft of proposal, initial draft of project plan 02/17/2012 -
2
Advisor approval MCT-624 grade of "PASS" 02/27/2012 1
3
Proposal bibliography Annotated bibliography of works read thus far 02/26/2012 -

Secondary Research



4
Articulate the context Refined research question 03/05/2012 2
5
Prepare for the search Context list 03/09/2012 4
6
Conduct the search List of cited works 07/14/2012 5
7
Obtain materials cited
07/14/2012 6
8
Evaluation of materials
07/17/2012 7
9
Critical analysis of source materials Annotated bibliography 09/09/2012 8

Primary Research



10
Build Data Tools to generate test data Generated test data 02/26/2012 -
11
Build Data Loaders Data loaded into Oracle and MongoDB 05/09/2012 10
12
Design relational model for Oracle 11g instance DDL statements 03/04/2012 10
13
Build testing framework Completed testing software 07/14/2012 11
14
Build Ubuntu Linux machine(s) with install of MongoDB A running machine with DB instance 08/04/2012 -
15
Build Ubuntu Linux machine(s) with install of Oracle 11g A running machine with DB instance 08/09/2012 -
16
Execute tests for customer maintenance Performance and transaction data 09/08/2012 10,11,12,13
17
Execute tests for product maintenance Performance and transaction data 09/08/2012 10,11,12,13
18
Execute tests for customer orders Performance and transaction data 09/08/2012 10,11,12,13
19
Analyze transaction statistics Conclusions drawn from data 01/23/2013 16,17,18

Thesis Composition



20
Introduction Section detailing the identification of the problem 09/29/2012 19
21
Methodology Description of research methodologies used 10/05/2012 20
22
Results and Evaluation Sections describing the meanings inferred from data 01/09/2013 21
23
Discussion Conclusions of study will be presented 01/23/2013 22
24
Annotated Bibliography List of previous works which influence this study 01/01/2013 23
25
Revise Intro Make revisions to introduction 01/09/2013 24

Thesis Refinement and Conclusion



26
Initial Draft of Thesis First draft presented to advisor (and others) 03/09/2013 25
27
Revise Thesis Draft Apply suggested revisions 05/11/2013 26
28
Draft of Thesis Draft presented to advisor 06/29/2013 27
29
Final draft, submission, and presentation
07/17/2013 28


References


Banker, K. (2010), “MongoDB and E-commerce”, (retrieved from: http://kylebanker.com/blog/2010/04/30/mongodb-and-ecommerce/)


Francia, S., Merriman, D. (2011), “MongoDB: Use Cases”, 10gen Inc., (retrieved from: http://www.mongodb.org/display/DOCS/Use+Cases)


Leedy, P. D., Omrod J. E. (2010), “Practical Research: Planning and Design” (9th Edition), Pearson Education, Boston, MA, (pp. 121-124)


Mullins, C. (2011), “The Database Report – July 2011”, The Database Administration Newsletter (retrieved from: http://www.tdan.com/view-articles/15299)


Copyright © Aaron Ploetz 2010 -
All corporate trademarks are property of their respective owners, and are shown here for reference only.