IST459: Notes: The Database Environment
Top of Form
Topic: The Database Environment
Table of Contents
Topic: The Database Environment 1
Learning Objectives 1
Part 1: Databases: It’s all about the data or *is* it? 2
Data 2
Information 2
Metadata 3
Data Management 4
Part 2: Databases vs. DBMS 6
Database 6
So, what features does the DBMS bring to the database party? 7
What are the benefits and drawbacks of the DBMS? 8
Wow the DBMS does so much. It slices. It dices. It does Windows! 9
Data Models: Degrees or “Layers” of Data Abstraction 9
Table of Contents [Hide/Show]Learning Unit 01 Notes – The Database Environment
Learning Objectives
Part 1: Databases: It’s all about the data… or is it?
Data
Information
Metadata
Data Management
Part 2: Databases vs. DBMS
Database
So, what features does the DBMS bring to the database party?
What are the benefits and drawbacks of the DBMS?
Wow the DBMS does so much. It slices. It dices. It does Windows…
Data Models: Degrees of Database abstraction
Learning Objectives¶
In this learning unit we will learn the fundamental concepts which will lay the foundation for the rest of the course. Some of these objectives will be covered in this document, others in the class lecture, assigned readings, and labs.
· Concertize the concepts of data, information, data management and metadata
· Explain what a database is and why databases are important
· Describe a database management system
· Differentiate between the DBMS and a database
· Describe the different data models and abstraction layers
· Explain the similarities and differences among DBMS products
· Explain DBMS history and modern uses
· Describe how data is physically stored in primary and secondary storage
Part 1: Databases: It’s all about the data or *is* it?¶
Are databases really all about the data? Well, not really. As you will see, data are just one piece of the puzzle. And to truly differentiate between what a database is what it’s not, you must first have a clear understanding of these four fundamentals: Data, Information, Metadata, and Data Management.
Data¶
What is data? Data is a generic label for the attributes, facts, figures, measurements or characteristics that describe real world or super-natural objects or entities. Data are typically objects like people, places, things, events or ideas that we care to store for a specific application or purpose. Data can be very useful or it can cause challenges which lead to bad decision-making and high data management costs. There are four characteristics of data that we need to consider. For data to be useful it needs to be ARTC, pronounced “artsy”:
· Accurate - correctly represent an actual entity attribute
· Relevant - germane or pertinent to the entity being described
· Timely - within the timeframe for when it is most useable
· Contextual - able to be associated with other data
Computers systems and software help us keep our data ARTC. For example, before the era of those great technological advanced known as mobile phones and caller ID, people actually had to write names and phone numbers down on paper in an address book. (I know it’s hard to believe, but true!) Storing, organizing and retrieving information from these archaic address books was quite a challenge.
Notice I said retrieving information and not retrieving data? It is quite common for people to use the terms data and information interchangeably despite there being a fundamental difference between the two concepts. BTW - It is our civic duty as information lists to politely correct our mothers, fathers, neighbors, and postal carriers whenever the terms data and information are bastardized.
Data are raw unprocessed facts. By itself data has no meaning and no structure. For example, a series of digits, such as these 4439686 are just data. When data takes on meaning, because of some form of context, we call it:
Information¶
Information is interpreted or processed data. It is the result of someone or something (like a computer) finding use for data. Whenever someone or something incurs knowledge from data, that data is information. If I told you that the data from the previous paragraph is my office phone number 443-9686, for example, then the data now has meaning in context, so it is information.
Try to think of information as data that has been processed via context and/or manipulated in a way the result is more useable for making better decisions. Remember, data by itself is useless. It’s the context that gives it meaning, and hence makes it information. If I handed you an unlabeled CD-R disc what do you know about it? Not much. You know there are bits and bytes on it, but that’s about it. The contents of that CD-R Disc are data. If you pop that CD-R, and it starts playing Barry Manilow’s greatest hits well, now you’ve got some sweet-sounding information!
Here’s another, more systematic way to think of information. I’m sure somewhere along the line in your academic careers you learned about the Information Processing Cycle (IPC). The IPC is the world’s most generic data-flow diagram (DFD):
Figure 1: The information processing cycle, IPC
The input into a process is always data and the output of that process is always information in the context of the process. Since the output of one process can be the input of another, information can be data; it truly is about the context! Take this DFD for checking out a shopping cart for an e-commerce website, such as Amazon.com:
Figure 2: A DFD for checking out an E-shopping cart.
The middle arrow in this diagram is information from the first process and data to the second process!
The human brain is a powerful and efficient information processor; constantly placing information in context for us, almost unconsciously. Once we learn the context behind the data, it is really difficult to think about it in any other way. For example consider this data: $5,000 | | 911. It’s kind of hard to look at these and not process them as 5 thousand dollars, your instructor’s email address, and the phone number for emergencies. You interpret them incorrectly as information, even though they are actually data. Why? Because your mind has already learned the context!
Metadata¶
As I said earlier, data itself has no meaning or structure, but on the other hand, I’m sure you’ve seen structured data before. When I last represented my office phone number, I placed a hyphen between the 3rd and 4th digits, like this 443-9686. What does that hyphen tell us about the data? What if I represented the data this way: $4,439,686 does our knowledge of the $ symbol change our intrinsic interpretation of the data? Local phone US numbers are always 7 digits long. The $ symbol means currency. These are all data descriptors or “data about data” - they’re metadata!
Here are some things that metadata describes:
· Data name - What name or label do we put on the data? What do we call the data? E.g. that’s a phone number.
· Data definition - How do we describe what the data is used for? What are some of its exceptions or issues? E.g. Phone numbers are used to call people.
· Data type - What are the allowable characters that can be used? E.g. Integers? Dates? Currency? Text?
· Length - How many characters are allowed? E.g. 7? 10? Between 7 and 10?
· Location - Where is the data allowed to live? What is its source? E.g. phone numbers are local to my mobile phone.
· Constraints -Which specific characters or string of characters are allowed? Does the data have to exist in one location in order to be used in another? E.g. For example, an employee’s hourly wage must be larger than or equal to the minimum wage.
· Ownership - Who or what applications are allowed access to the data? E.g. only accessible by me.
Metadata is an important concept since all databases use structured data to organize and categorize data, and that structure is metadata. Going back to the cell phone address book feature example from earlier, you can enter the contact name, phone number, email, select an Icon for the number, etc. The contacts themselves represent data, but they are structured into the categories of name, phone, and email. The categories are the meta-data, and the actual names, phone numbers, and emails themselves are the data.
Data Management¶
You’ve got data and information. You can structure it with metadata. But what good is data if you cannot read or manipulate it? Data management is the process of storing, maintaining, and retrieving data. Yes, it is a process, and the details of that process depend on the data and its structure (a.k.a. the metadata). How do you enter a new contact into your mobile phone, for example? It is the same procedure for every mobile phone, or is it easier on some phones than on others? Does every mobile phone ask for the same data (i.e. is structured with the same metadata)?
There are 4 data management activities, cutely known as the “CRUD” operations:
· Create - adding new data
· Read - retrieving information
· Update - modifying existing data
· Delete - removing data
If we go back to the old address book example, people were responsible for their own data management under this scenario. If someone’s phone number changed, you simply cross it out with a pen and write in a new one. If you run out of room on one page, flip it over and use the next page. And forget keeping things in alphabetical order in a PNP (pen-and-paper) address book. Over time, the data in your data got messy, making the “R” in CRUD quite difficult!
Figure 3: Paper makes for ineffective data management.
Today, computers assist with the data management activities greatly. We enter the data, and then technology will capture organize, sort and filter the out the data into useful information. For example, most popular mobile phones of today have a Facebook phonebook feature. This feature reads your http://www.facebook.com friend list, and for any of your friends with phone numbers listed in their profile their name, profile picture and phone numbers are added to your phonebook. Neato!
Figure 4: Technology trivializes data management.
Part 2: Databases vs. DBMS¶
At this point you might be wondering: Are you going to define database or what? I already did. I just took my own sweet time. J I’ll also discuss the differences between a database and a DBMS, as well as give you the current lay of the DBMS land.
Database¶
A database is an organized collection of data and metadata, managed over a period of time. The data are what we’re mainly interested in, so that we may retrieve information, typically via query (where we ask a question of the data or perform a read in the CRUD operations). However, it is the metadata which is also important as it helps describe and structure the data, making it convenient to query in the first place. For example you might search your mobile phone contact list for last names beginning with “F”. If you’re database is not structured by last name (using meta-data) it would be very difficult to query the data in this manner. Meta-data helps us determine what data is there to query in the first place.
Databases are not one-time deals, and over time the data management activities CRUD are used to manipulate the data within the database. Data within databases are persistent; they stick around in the database for as long and they’re relevant and hence as long as we want or need them to.
So, to put it all together every database has:
· Data: raw, unprocessed facts and
· Metadata for structuring, constraining, and describing the data
· Data management activities for performing the CRUD operations, which in turn...
· Helps keep the data ARTC and allows us to retrieve information from it.
Figure 5: Putting it all together - a picture’s worth 1,000 words. Well, at least 8 in this case :-)
When most of you think of the term database you’re more than likely envisioning a computerized database implemented using software designed for that specific purpose - some sort of application with fancy entry screens and pretty reports cobbled together in Microsoft Access, or Filemaker for instance. Software of this ilk is known as database management systems (DBMS). However, it is important to realize that databases have existed long before the computer was ever conceptualized. Of the databases that exist today, some are computerized, some are not. Some use DBMS; some don’t. What do you think file cabinets we used for back in the day? J
IMPORTANT: A Database does not have to be computerized or digital. A database management system is computer software which facilitates the use of databases
So, what features does the DBMS bring to the database party?¶
Again, I’d like to reiterate that anyone can make a computerized database using only Notepad, or better yet, a spreadsheet. Of course by the same logic you can also dig a 3ft deep hole with a spoon. The DBMS is software specifically suited to the task of database management, including the storage and retrieval of data, rules for defining metadata, and of course the simplification of the data management (CRUD) tasks. Yes, the DBMS is to databases what PhotoShop, the GIMP, or Flikr is to digital images, or better yet what plumbing is to civilization!
When you design a database using a DBMS, you get a whole lot more, such as these features of the modern DBMS:
· Robust metadata implementation. Meta-data can be defined to mimic actual business rules, perform calculations, control how data is entered, and automatically change or delete data to maintain data integrity. For example if a contact is removed from the database, that contact would also be removed from any of their contact groups as well. Metadata management is one of the most significant advantages the DBMS brings to the table.