Often when using SQL for samplinginformation from the tables, the user receives redundant data, consisting of the presence of absolutely identical duplicate lines. To exclude this situation, use the SQL distinct argument in the Select clause. In this article, we will consider examples of using this argument, as well as situations in which it is better to abandon the argument.
Before we begin to consider specific examples, we will create a couple of necessary tables in the database.
Imagine that we have stored in our databaseinformation about the wallpaper, presented in two tables. This is the Oboi table (wallpaper) with the fields id (unique identifier), type (type of wallpaper - paper, vinyl, etc.), color, struct and price. And the Ostatki table (leftovers) with the fields id_oboi (reference to the unique identifier in the Oboi table) and count (the number of rolls in the warehouse).
Fill in the tables with data. In the table with wallpaper add 9 entries:
Wallpaper | ||||
id | type | color | struct | price |
1 | Paper | Multicolor | Embossed | 56,9 |
2 | Double-layer paper | Beige | Smooth | 114,8 |
3 | Vinyl | Orange | Embossed | 504 |
4 | Non-woven fleece gloves | Beige | Embossed | 1020,9 |
5 | Double-layer paper | Beige | Smooth | 150,6 |
6 | Paper | Multicolor | Smooth | 95,4 |
7 | Vinyl | Brown | Smooth | 372 |
8 | Non-woven fleece gloves | White | Embossed | 980,1 |
9 | Fabric | Pink | Smooth | 1166,5 |
In the table with the remainders there are also nine records:
Remains | |
id_oboi | count |
1 | 8 |
2 | 12 |
3 | 24 |
4 | 9 |
5 | 16 |
6 | 7 |
7 | 24 |
8 | 32 |
9 | 11 |
Let's start by describing how to use distinct in SQL.
The distinct argument should be placed immediately afterthe Select keyword in the queries. It is applied immediately to all the columns specified in the Select clause because it will exclude absolutely identical rows from the result of the query. Thus, it is sufficient to specify "select distinct" when writing the SQL query. The only exception is the use of distinct inside the aggregate functions, which we will consider a little later.
It should be remembered that most DBMS does not recognize your request like this:
SELECT Distinctive Remains.Count, Distinct Wallpapers. * Fromm Wallpapers INNER EIN Remains of IT Wallpaper.id = Remains.id_wall |
Here, the argument is specified several times or specified once, but before the second, third or other selectable column. You will get an error with a reference to inaccuracies in the syntax.
Obviously, with a well-structured structuretables and their filling, within a single table, situations are excluded when absolutely identical strings are encountered. Therefore, the execution of the "Select distinct *" query with a selection from one table is practically impractical.
Imagine a situation where we need to find out what type of wallpaper we have, just for convenience, sort by type:
SELECT Oboi.type FROM Oboi order by type |
And we get the result:
type |
Paper |
Paper |
Double-layer paper |
Double-layer paper |
Vinyl |
Vinyl |
Fabric |
Non-woven fleece gloves |
Non-woven fleece gloves |
As you can see, there are duplicate rows in the table. If we add to the Select distinct clause:
SELECT distinct Oboi.type FROM Oboi order by type |
then we get the result without repeating:
type |
Paper |
Double-layer paper |
Vinyl |
Fabric |
Non-woven fleece gloves |
Thus, if the data were correctly entered intable, then immediately after the call or the request of buyers, we will be able to answer that the liquid wallpaper, wallpaper and acrylic wallpaper is not available in the store. Given that the assortment in stores is usually not limited to one hundred wallpaper, it would be quite labor-intensive to scan the list of non-unique types.
The SQL distinct argument can be used with anyaggregate function. But for Min and Max, its use will not have any effect, and when calculating the sum or average value, it is rarely possible to imagine a situation where it would not be necessary to take into account the repetitions.
Let's say we want to know how much our warehouse is full, and for this we send a request that calculates the total number of rolls in the warehouse:
SELECT sum (Ostatki.count) Fromm Remains |
The query will return a response 143. If we change to:
SELECT sum (distinct Ostatki.count) Fromm Remains |
then we get only 119, because the wallpaper under articles 3 and 7 are in stock in the same quantity. However, it is obvious that this answer is incorrect.
Most often, SQL is used with the Count function. So, without difficulty we can find out how many unique types of wallpaper we generally have:
SELECT count (distinct Oboi.type) Fromm Wallpapers |
And get the result 5 - paper ordinary andtwo-layer, vinyl, fabric and non-woven. Surely everyone saw an advertisement like: "Only we have more than 20 kinds of different wallpapers!", Which means that in this store there are not a couple of dozen rolls of everything, but wallpaper of the most diverse modern types.
Interestingly, in one query, you can specifymultiple Count functions with or without the distinct attribute. That is, this is the only situation where distinct in Select can not be present more than once.
The use of the SQL distinct argument should be discarded in one of two ways:
Suppose the boss asks you to display a list of wallpaper that you have, indicating only two columns - type and color. By habit, you specify the argument distinct:
SELECT distinct Oboi.type, Oboi.color Fromm Wallpapers ORDER |
And - lose some of the data:
type | color |
Paper | Multicolor |
Double-layer paper | Beige |
Vinyl | Brown |
Vinyl | Orange |
Fabric | Pink |
Non-woven fleece gloves | Beige |
Non-woven fleece gloves | White |
It may appear that we have only one type of paper wallpaper (conventional and two-layered), although in fact even in our small table they have two articles (the result without distinct):
type | color |
Paper | Multicolor |
Paper | Multicolor |
Double-layer paper | Beige |
Double-layer paper | Beige |
Vinyl | Brown |
Vinyl | Orange |
Fabric | Pink |
Non-woven fleece gloves | White |
Non-woven fleece gloves | Beige |
Therefore, as in writing any query, with the distinct argument one must be careful and correctly solve the problem with its application depending on the task in hand.
The opposite of the argument distinct is the argumentAll. When you use it, duplicate lines are saved. But since by default the DBMS thinks that it is necessary to print all the values, the All argument is more of a specifier than a real functional argument.