SQL & Databases: The Basics

by Nayeli Jaime

One of the most fundamental tools that a data analyst can have in their arsenal is SQL, also known as Structured Query Language. SQL allows users to interact with relational databases. More specifically, it is a programming language that allows you to retrieve, manipulate, and organize data that lives in a relational database. SQL allows you to communicate with a database using something called a query. A query is essentially the question you ask the database, and then SQL pulls the data based on the conditions that were established in the query.

Within a relational database, there are a series of tables that are related to each other using a set of “key” fields. These “keys” are known as Primary Keys or Foreign Keys. These “keys” act as a unique identifier for a specific table by uniquely identifying a row in the table. A Primary Key acts as the primary unique identifier to quickly parse data within a given table. While a Foreign Key is used to link two tables together, referring back to the Primary Key of another table. The table containing the Foreign Key is referred to as the child table, and the table containing the Primary Key is referred to as the referenced or parent table. Tables in a database can only have one Primary Key and every table should have its own Primary Key. Due to the unique nature of Primary and Foreign keys, the possibility of duplication of records within a relational database is close to none.

However, when pulling in data with a query, you can extract duplicated data. That’s why it’s important to think deeply about how to structure your query. Which brings us to the basic recipe of a query:

  1. SELECT: A statement used to determine what data will be retrieved from the database. Here is where you can add any field names or aggregation (of fields).
  2. FROM: A clause that defines what table to retrieve the data from.
  3. WHERE: A clause used to filter data according to the conditions set. This allows you to pull only records related to the criteria set. However, you cannot filter using WHERE with an aggregation, which creates the need for HAVING.
  4. GROUP BY: A clause that allows users to split data into pieces or buckets.
  5. HAVING: A clause that allows you to filter buckets of data; similar to a WHERE clause, but allows users to filter using an aggregation function.
  6. ORDER BY: A clause used to sort the data after a query has been executed.

The order of operations in SQL is very important, thus, this basic “recipe” for a successful query needs to be followed in order to run a query successfully. And while this is the basic structure of a SQL query, the data being retrieved is actually being pulled in the following order:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

This structure for a query falls under a sub-category of SQL known as DQL, also known as, Data Query Language. DQL statements are used to perform queries on data that live within a schema object. A schema object(s) are the basic data storage structure(s) that exist within a database, such as tables, views, procedures, etc. The other 3 major categories of SQL are DDL, DCL, and DML. DDL, also known as Data Definition Language, consists of SQL commands that can be used to define or modify a database schema. It handles the description of the database schema and is used to create and modify the structure of a database object. DCL, also known as Data Control Language, deals primarily with the rights, permissions, security, and controls of a database system. Finally, DML, also known as Data Manipulation Language, is the SQL programming language used for adding/inserting, deleting, and modifying/updating data in a database. All 4 of these sub-categories of SQL have their own commands (similar to the above) that are used to perform their roles:

https://cybercoastal.com/types-of-sql-commands-sql-for-beginners/

It’s important to familiarize yourself with what words/statements actually change a database, rather than just allowing you to retrieve data. The “basic recipe” we have above does not change a database, only pulls data from one.

Congrats on learning some SQL basics!