If you don’t know it, it’s only temporary state…
I’ve read many headings on Linkedin and other job related, IT-related sites that agreed on one topic. If you are working with data (anyhow) or even want to have analytical role, the first thing you have to learn and master is SQL.
Some people spell it ‘sequel’, some spell it by a letter… which doesn’t really matter. More important is that this language work with the base concept of data analytics, databases. Important!!! SQL is not programming language, it is Structured Query Language, and it means that you cannot write program with it but rather prepare “data background” for programs using data queries.
Some of you would say, “Hey, but there are procedures, triggers and other stuff which can be used to perform very difficult and complex tasks”. Of course, just like you can write website in Notepad, animate in Excel and many other weird things using tools and concepts that are not designed to this purpose. And finally I could admit that there is possibility to make analytical job without even touching SQL, but not for long.
SQL as a language have four main so-called subsets:
- DML — data manipulation language — you can manipulate specific records of data. Its commands are: INSERT, UPDATE, DELETE.
- DDL — data definition language — you can manipulate whole structures of data as tables or databases. Commands: CREATE, DROP, ALTER.
- DCL — data control language — you can control users and grant them specific level of privileges. Thats why some users could clear the table, and other not so responsible, should have only access to commands of fourth subset, not to destroy anything. Commands: GRANT, REVOKE, DENY.
- DQL — data query language. May be very small, because has only one command (SELECT), but usual analyst is using this part of SQL.
So do you need to know every single subset? From my rather short career in data (about 5y) I would say, that it depends in what kind of department you work and what are your collegues competencies. If department have data engineers or ETL specialists, probably DQL will be just enough. But on the other hand, there are teams that have all team of all-embracing individuals. And sometimes these guys just want to test something on database designed by them. Don’t do it at home…
Or rather exactly do it at home, because some RDBMS can be installed locally on your Windows or Mac. And it can be great opportunity to exercise SQL, but also build “data base” for your web app, machine learning models etc. I already make some attempts to SQL Server Express, MySQL and MariaDB. So called “NO-SQL” databases are still ahead of me in means of make it and play with it.
As I worked with Tableau I used Tableau to construct complex queries to optimize refreshing times. In R eRa, almost 90% of tasks started as:
data = dbGetQuery(conn, “SELECT ……”).
Different RDBMS have so called flavours and database specific functions, syntax. If you want to master them all be prepared for long time learning. Some people use only a half or even less commands available and doing great job.
And at the end. Do you know what is the smallest correct command which you can use in query?
“SELECT 5” which gives you only number five in results. And the longest… sky is your limit (and computer performance).
In the next post I’ll present you basic elements of SQL language and later some complex stuff to work with JSON’s and other weird things.