Posts

SQL Analytic Functions

Image
Window functions play a very important part in analytic queries. I explore some of the window functions using Postgres 13.4. Window function defaults differ from Postgres to Redshift, although the results are the same. The OVER () default for Redshift is (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) . Postgres's default is (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) . When writing this post, Redshift hasn't implemented the RANGE . Therefore, I haven't mixed the two databases and only stuck to Postgres. This post is created using Jupyter Notebook. I've used Jupyter Magics. For example, the Jupyter SQL Magic result set was converted as Pandas data frame and plotted graph. Setup Jupyter Notebook and Postgres Windows functions RANGE and ROWS Functions Other important functions How to create a table with random data Generate Create pglia database using external SQLs Setup Jupyter Notebook and Postgres If you want to experiment with t

VSCode for Haskell Development

Image
Haskell is a purely functional programming language. The VSCode extension Haskell for Visual Studio Code has been configured to develop Haskell code in the Docker container remotely. This blog explains how to use the VSCode editor to develop Haskell programs. The features , where you can find the table, summarises the features that HLS supports. Many of these are standard LSP features. Install using docker VSCode development Complete Docker image for Haskell Features Install using docker Create the following Dockerfile 1 : FROM ubuntu:20.04 ARG USERNAME=ojitha ARG USER_UID=1000 ARG USER_GID=$USER_UID # Create the user RUN groupadd --gid $USER_GID $USERNAME \ && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \ && apt-get update \ && apt-get install -y sudo \ && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \ && chmod 0440 /etc/sudoers.d/$USERNAME \ && apt-get install buil

PySpark relational operations

Image
In this blog I explore the PySpark DataFrame API to create SQL JOIN and Windowing functionality. I am using MovieLens 1 dataset to explain. In addition to SQL JOIN and CASE, advanced topic such as SQL Window and Pivot will be discussed here. In my last blog post 2 , I've explained how to use Hive metastore which is more intent to use SQL only. For the full list, see the Spark SQL documentation 3 . Create Dataframes JOIN UDF and explode function Windowing CASE Pivoting First start the spark session as follows. I've changed the logging level as well. from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Releational operations").getOrCreate() sc = spark.sparkContext # sel the log leve sc.setLogLevel('ERROR') Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLeve