Delivery in day(s): 5
Unit 33 Data Analysis and Design Sample Assignment
1.1-Critically compare different data models and schemas.
Data Models tells us how to create logical design of database. There are three major types of data models:
- Relational Model –In this model data is organized in form of tables or relations which are related to each other.
Consider an example, where there is a company with various departments. Each department has certain manager and employees. Every entity is related to each other.
- Hierarchical Model - In this model, each entity has one parent or root but may have one or several children or child entities.
- Network Model – In this model, all entities are organized in form of graph. Some entities can be accessed from more than one path.
1.2- Compare database systems with file based systems and also discuss the benefits and limitations of different database technologies with reference to above scenario
File systems and database management systems (DBMS) are two ways of managing data. DBMS is a computer based system to record persistent data.
- In database management system, data is stored in large databases, while in file system, data is stored in form of files.
- In file system, tasks like storage, retrieval, and search are done manually while in DBMS such tasks are automated, usually done by help of some tool.
- Manual process of executing tasks invites problems like data integrity, data inconsistency and data security while such problems can be avoided in DBMS.
- In DBMS certain control mechanisms are used and reading line by line is not required, whereas this is not the case with file systems.
- We can avoid unauthorized access to database in DBMS whereas in file systems this is not possible.
- With DBMS we can make use of backup and recovery, whereas in file systems, data once lost cannot be recovered back.
- In DBMS, multiple users can access the data at one time, whereas with file systems this is not possible. File systems are basically designed for single user to access it at a time.
- DBMS avoids data duplication. If a set of data is required by multiple web application, it is made available, whereas in file system this is not possible, one program is not readable by another program.
- DBMS coordinates both logical and physical access to data, while file system only provides physical access to data.
1.3- Analyze different approaches to database design.
Organizations these day’s follow four major techniques of designing databases. These are:
- Top-Down Approach – This starts with a general idea of what is required for the system and moves towards more specific details. This process involves identifying various entities and their attributes.
- Bottom-Up Approach–This approach is just reverse of top down approach. It starts from more specific details and moves down to general details. In this approach, attributes are identified prior to entities.
- Centralized Approach–This approach is useful in small organizations, where there are less number of entities and interactions between entities and attributes is easy. This approach is even considered in some large organizations which follow a relatively simple database structure. Normally it has a single database administrator or a small website design team that designs the entire database for the organization. Problem or specification is defined by this person or design team, a conceptual design is created, this conceptual design is verified by user views and system processes and data constraints are defined, to ensure that developed design complies with organization’s requirements.
- Decentralized Approach–This approach is useful when the system has large number of entities and complex relationships exist among them. A team of carefully selected designers is employed for large or complex projects, to accomplish the job.
2.1 - Discuss the principles of normalization and steps followed to achieve normal forms
Database normalization is a technique to arrange, manage or organize the data (fields and tables) in a relational database. This process is followed to minimize redundancy and to ensure that data is logically stored.
If our database is not normalized, it becomes difficult to maintain and update database without data loss. We might face three kinds of anomalies if database is not normalized. These are:
Consider table Manager:
Suppose for table Manager, for a new entry we have manager_id, manager_name, manager_address, but the newly recruited person has not been assigned any department yet, still then we have to insert NULL there, leading to insertion anomaly.
In above table, if for ‘01’ manager_id there is only one department assigned, but if he temporarily drops it, then we might have to delete entire record corresponding to it, rather than only one column.
To update the address of the manager, who occurs more than once in a table, we need to update address in all entries, else data will become inconsistent. E.g.: In above table, Manager_name = Neil occurs more than once, so manager_address entries needs to be changed two times in the table.
Therefore, to remove such anomalies, we need to normalize our database tables.
There are 4 major types of Normal Forms:
First Normal Form (1NF):According to this rule, no two groups of data or records in table must contain same information. E.g.:
Manager Table: (Before normalization)
Manager Table: (After Normalization)
After normalizing manager table, each row has unique data, but data redundancy increases.
Second Normal Form (2NF): There must not be any partial dependencies, according to this rule. The concatenation must depend on primary key.
Manager table after 2NF:
Third Normal Form (3NF): According to this rule, all the transitive dependencies must be removed from the table and every non-prime attribute must be dependent on primary key.
New tables after 3NF:
Boycc Codd Normal Form (BCNF):A table is said to be in BCNF, if it is in 3NF and if it does not have multiple overlapping candidate keys.
2.2 – ER Diagram for English Premier League with all entity sets, attributes and relationships and cardinality constraints
2.3 –Relational Data Model
2.4 – Create tables using SQL (DDL) commands
2.5 – Screenshots from SQL Server Management Studio (IDE)
Creation of Table -Manager:
Creation of Table -Players:
Creation of table – Coach:
Creation of Table – refree:
Creation of Table – Team:
Creation of Table – Game:
Update table with Primary Key Constraint:
Insertion of Data into Table:
Updation of Data into table:
3.1 - Benefits of using manipulation and query tools in a relational database system
SQL DDL commands – DDL stands for data definition language. These commands help us to define the database structure.
Create Table statement, Drop table statement and Alter table statement are three major commands which help us to define structure of database and schema.
Create Table statement: This statement helps us to create table in database. E.g.:
Alter table statement: This statement helps us change a value in a database table. E.g.:
Drop table statement: This statement helps us to delete a database table along with its schema. E.g.:
SQL DML commands – DML stands for data manipulation language. These help us to manage data with schema objects.
Insert statement, update statement and delete statement are three major commands which help us to manipulate with the date in the database tables.
Insert Statement: It helps us to insert data or records in database table.
E.g.–To insert a record of data into table manager we write following query in SQL Server Management Studio IDE:
Delete Statement: It helps us to delete data from database table.
E.g. – To delete a record of data from table manager, we write following query in SQL Server Management Studio IDE:
where manager_id = 2;
Update Statement : This statement helps us to modify data from database table.
Eg – To update a record of data from table manager, we have to write following query in SQL Server Management Studio IDE:
set manager_fname ='Edwin'where manager_id ='1';
By above query, the record with manager_id = 1 will update, manager_fname to Edwin.
3.2 – Populate tables with some data and answer queries –
- List the managers of each team.
- Manager_fname, Manager_lname from manager
On manager.manager_id = team.manager_id;
- Output the full name of the top scorer in the league with the number of goals scored.
Selecttop 1 players. player_fname, players.player_lname, players.score from players
On players.game_id = team.game_id
Orderby team.score desc
- Output the average number of goals per game of all games played.
- AVG(score)as'Average Score'from game
- List all of the games that were played in a particular ‘city’.
Select game_id from Game where city = ‘Manchester’;
- List all players who have played a game that was refereed by a particular referee (choose any referee).
Select players.player_fname, players.player_lname from players, game, team, refree
Where game.refree_id = team.refree_id and
players.player_id = team.player_id and
team.refree_id = refree.refree_id and
- Create a database viewfor your system to produce a list of fixtures yet to be played along with the date of each game.
Createview fixture as
3.3 - Critically evaluate the validity of the data extracted using the above queries and comment on the design process followed to ensure that meaningful data is extracted through the use of query tools.
All the queries mentioned above have been executed using SQL Server Management Studio IDE, and are thus successful in retrieving relevant information. By making use of SQL DDL (Data definition language) and DML (Data Manipulation Language), meaningful data can be fetch in IDE.
4.1 - Critically review and test the relational database system designed for the given scenario and thus provide a documentation supporting implementation and testing of the relational database system developed.
There are six entities in this given scenario based upon which I have designed the relational database for the system. Manager, Players, Coaches, Refree, Team, Game. I have developed the relational database for this system by defining tables with primary key and foreign key constraints. Each entity and relationship has been converted into table, and attributes into corresponding fields.
The database thus developed has important data which must be tested from time to time for any discrepancy. In day to day basis, databases are accessed by thousands of users in an organization. These databases even have important functionality like stored procedures, stored functions, triggers, views, instances, queries etc. There must be a complete regression test suite, to be run over the database periodically. There are number of reasons why we must test our database on regular basis:
- Database is an important asset, by which hundreds or thousands of users are associated. If there is any loss of data, people associated with it will be affected and they may have to incur huge loss.
- Databases of large organizations incorporate mission critical data in them, which contains their business functionality, so it is essential to secure it from unidentified access and any unforeseen disasters.
- Testing tells us if there are any defects in the system, and whether we must take any steps to remove those errors.
- If we make changes or modifications in our database, then sometimes it may result in some errors like overlapping, redundancy or missing data. Regression testing helps us detect those errors.
4.2 - Create brief user documentation for the relational database system developed.
According to given scenario, English premier league has managers, players, and coaches, refree. Teams participate in English premier league and play games. Team can be either home team or away team. The players who play the game can be goalkeeper, midfielder, defender, and striker.
The entire database developed for this scenario is normalized up to 3nf (3rd normal form) and thus there is no transitive dependency or redundancy.
In this project, I have created 6 database tables: Manager, Players, Coach, Refree, Team, and Game.
Description of tables is as follows:
- Manager : Manager_id (pk), manager_fname, manager_lname, manager_dob, manager_pob
- Players : Player_id(pk), player_fname, player_lname, player_dob, player_pob, score, game_id(fk)
- Coach : Coach_id(pk), coach_fname, coach_lname
- Refree : Refree_id(pk), refree_fname, refree_lname
- Team : Team_id(pk), team_name, score, city, manager_id(fk), coach_id(fk), player_id(fk), score, refree_id(fk)
- Game: Game_id(pk), team_id(fk), refree_id(fk), date, score
Where pk represents primary key and fk represents foreign key.
4.3 - By giving a tabular V&V document, explain how verification and validation has been addressed.
We have V-Model in software development, popularly known as verification and validation model. It is executed in sequential manner and each phase must be completed before the next phase begins. These tasks are performed to identify how consumers or users perceives the final product or software. Is the final product satisfy the quality standards and is it good to be used.
It is the process in software development life cycle to check if we are in the right track of developing the product or software.
It is the process in software development life cycle to check whether the final product meets business needs.
This process involves checking the intermediary documents like ER Diagrams, database design, website design, test cases, test plan, design documents, traceability matrix.
This process involves checking that all the design documents and requirement specification reflect the requirement and are fit to be used. When placed in intended environment, it is to be checked that product fulfills its intended use.
Verification is done before software execution
Validation involves software execution
This process involves all techniques of static testing.
This process involves all techniques of dynamic testing.
Examples of verification are : Reviews, inspection, walkthrough
Examples of validation are : All types of testing like Regression testing, System testing, User Acceptance testing, smoke testing etc.
In this project, all design documents like ER Diagrams and requirement specifications have been verified and all system requirements have been addressed therein.
After system completion, it will undergo various types of testing to check if it is fit to be used.
4.4 - Explain control mechanisms and show how these techniques have been used in developing your system.
Control mechanisms are methods or process to define and manage variables in a desirable way. For example a test manager at deployment site might install a variety of control mechanisms to help them monitor various testing activities in software testing life cycle.
We need to apply control mechanisms over our system to ensure that a high standard of quality is met. To keep the entire process on track, it is important to have quality control. It helps us to figure out the problem, fix it and also helps us in judging the effectiveness of implemented solution.
The entire process of quality control must go on smoothly. A control feature must be devised, to manage major portion of work in standardized manner.
If in the entire process of controlling the system work flow, we are not able to achieve any task by straight method, we need to devise some alternative control mechanisms, which comply with the standardized methods.
In the process of controlling, one must also know how to react in case of defects. The areas of system where defects are most likely to occur must be monitored carefully, so that the defects can be caught and fixed there and then, rather than continuing the work flow further. Defects must be caught and treated in their nascent stage or must be reduced to near zero, so that we can say that six sigma is attained.
- ER Diagram in DBMS [Online]. [Accessed on 17 November 2014]. Available on world wide web: <http://www.studytonight.com/dbms/er-diagram.php>
- Structured Query Language/Data Manipulation Language [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://en.wikibooks.org/wiki/Structured_Query_Language/Data_Manipulation_Language>
- Database Testing [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.agiledata.org/essays/databaseTesting.html>
- Why to test a database [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.softwaretestinggenius.com/why-should-we-test-a-relational-database>
- What is verification and validation [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.softwaretestinghelp.com/what-is-verification-and-validation/>
- Six Sigma control phase [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.managementstudyguide.com/six-sigma-control-phase.htm>
- Normalization of Database [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.studytonight.com/dbms/database-normalization.php>
- Database design strategies [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://databasemanagement.wikia.com/wiki/Database_Design_Strategies>
- Difference between DBMS and File System [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://www.differencebetween.com/difference-between-dbms-and-vs-file-system/>
- Difference between file processing system and database management system [Online]. [Accessed on 18 November 2014]. Available on world wide web <http://studychacha.com/discuss/135910-difference-between-file-processing-system-database-management-system.html>