GitHub
Ojitha Hewa Kumanayaka
Lua filters for Pandoc
Lua filter used in Pandoc 3.6.3. This blog has solutions for: Creating Glossary for ePub ver 3 book GitHub style alerts
AWS PITR Explained
PITR stands for Point-in-Time Recovery, which is a feature offered by several AWS services to provide continuous data protection and the ability to restore data to a specific point in time.
Spark - create database and table
This is a short note to create a Hive meta store using Spark 3.3.1.
Semantic search with ELSER in Elasticsearch
Elastic Learned Sparse EncodeR(ELSER) is a retrieval model trained by Elastic that enables you to perform semantic search to retrieve more relevant search results. install ELSER v2: Only once (DevOPs will do for your) Create source index where you can insert all your documents Create target index Create ingestion pipeline Reindex process to create embeddings Ready to do semantic search using text expansion queries I created this blog post on docker to demonstrate Linux-optimised ELSER v2. The Elasticsearch version is 8.11.1.
Kafka PySpark streaming example
The diagram shows that the Kafka producer reads from Wikimedia and writes to the Kafka topic. Then Kafka Spark consumer pulls the data from the Kafka topic and writes the steam batches to disk.
Terraform For each iteration
This is to explain Terraform for each looping technique. In this example, 3 buckets are created to demonstrate the looping idea. In the first step, we will create the above 3 buckets starting from 0.
Spark to create a table in AWS Redshift
In this post, Spark reads the data from a CSV file to a DateFrame and saves that DataFrame as a Redshift table. In addition to that, I’ve explained how to create a table in Postgres, use Jupyter magics and plot a diagram.
Spark Streaming Basics
This is a very basic example created to explain Spark streaming. Spark run on the AWS Glue container locally.
Spark Kafka Docker Configuration
This is the continuation of the Spark Streaming Basics. I explained the basic stream example, which runs only on one AWS Glue container. The stream producer was Netcat, and the sink was a text file. In this post, the stream producer is still Netcat, but the sink is Kafka. Both Kafka and Spark running on Docker containers.
Introduction to Lambda Calculus
This is a short description of lambda calculus. Lambda calculus is the smallest programming language that is capable of variable substitution and a single function definition scheme. Haskell is the functional programming language based on lambda calculus, which I will explore. I already explained how to use VSCode for Haskell Development to support the code listed here.
Comments
Post a Comment
commented your blog