Category: Uncategorized

An Ideal Summer Internship @ GoIbibo

Hello Everyone, It’s been almost 3 weeks since my internship at GoIbibo has ended and I thought what could be better than writing a blog to preserve  good memories?

So here it is, a full description of how I spent my summers hacking into some cool projects at GoIbibo!

Day 1:

I was excited to work in a company which had developed rapidly in a few years span! This was my first internship in a well-established start-up and I was hoping that I could make the best out of it. I was introduced to my mentor Bala Phani Chand (Principal Software Engineer at GoIbibo) by the HR. Phani told me that I would be a part of the control team which is responsible for Backend Development and consists of some of the brilliant minds of GoIbibo!I couldn’t wait to get started on my project! 😀

My First Project:

Before getting into the technical details of my project, let me ask you, how many times were you disappointed due to sudden change in price of flights when navigating from search result page to flight booking page? I bet you must have thought it is a technique of attracting customers but it is not! GoIbibo used to cache the results for a fixed short duration of time to avoid unnecessary hits to the API. So GoIbibo decided that it was time to fix this problem in such a way that the Customers are happy and they are happy too! 🙂  My job was to devise an intelligent algorithm which will reduce the no of such invalidations occurring specially during sales and some special occasions when the demand is high. Sounds interesting, doesn’t it?

The Birth of CacheBot:

I began researching on various factors which lead to increase in the no of invalidations and relation between these factors. I learnt machine learning algorithms as well as implemented them to predict the time for which the data is to be cached. The accuracy obtained was around 80% which wasn’t quite satisfactory. I then implemented multi-layer neural networks which failed to improve the accuracy and also handle cases when there is any sale. I used to constantly discuss the results of my research with Phani and Mr Neeraj Koul (Director of GoIbibo) and they would always help me figure out a better approach to solve the problem. After a whole month of research with various algorithms, I was successful in developing a Cachebot which determines the time for which the new data is to be cached based on past data results (obtained using machine learning algorithms) and current scenario analysis. The accuracy obtained was around 95% and the best part was it could handle sales as well as any sudden change in demand! 😀

GoIbibo 24 hours Hackathon:

GoIbibo conducted an awesome hackathon in which only its employees and interns were allowed to participate! We had access to unlimited food and drinks! 😀 All the teams coded the entire night and tried to compete with the other teams. I was amazed at the awesome products developed by each team in one night! Btw my team included Neha Jagadish , Pawan Sharma (Two of the finest backend developers from the Flights team) and Rupesh Singh (a talented JS developer). We developed a price tracker which instantly notifies the Customer when the prices of the flight he/she is tracking has changed! We managed to come up with a constant time solution for the same using some simple hacks on database signals! 🙂

Here is a group photo of all the GoIbibo developers who participated in the hackathon! 🙂 These guys were still bubbling with enthusiasm even after spending the whole night coding! 😀 Kudos to the awesome GoIbibo Devs! (Y)

goibibo

Project 2:

My second project was another interesting and challenging problem! 🙂 The aim of this project was to write a distributed cache library in Go which would store data in the form of key-value pair in-memory of each node and thereby reduce the latency which is generally encountered in Memcached or Redis. But you must be wondering why write it in Go instead of Python?  The major reason we used Go is because of its awesome concurrency feature and its channels which are greatly useful for reducing delay in networks. I was mentored by some of the most intelligent minds of GoIbibo – Mr Jyotiswarup Raiturkar who is the Head Architect of GoIbibo, Neeraj Koul and Harshad Patil (Senior Software Engineer).

GoCache:

This project required several discussions on the design and constant modification of the design! I encountered many challenging problems while implementing this library which I used to discuss with Mr Jyotiswarup, Mr Neeraj and Harshad, and they would help me figure out efficient solutions to those problems. Although I was an intern, I was free to put forward my ideas or challenge their ideas without having to think twice! 🙂

By the end of my internship, I had finally implemented a distributed peer to peer network wherein each node has its own BusSocket which can listen to peer nodes, dial to peer nodes, receive messages from peer nodes and evaluate them one at a time and also send messages to peer nodes concurrently. Akansha Gupta wrote an api for Ledisdb-Goleveldb cache which I used for maintaining a storage for each node. GoLeveldb is super fast, has expiry feature and allows concurrent reads and writes in a single process, and that is why we chose it for our library. I also implemented automatic redialling until connection is established,  when connection is lost and EOF is received on the connection. Each node could send acknowledgement to its peers after evaluating the message received. If a node fails to receive acknowledgement of the message it sent within the timeout period, it would break the connection and redial to the peer.  Harshad implemented stale data error acknowledgement which notifies the sender node that the data it sent is stale and updates the sender’s node database with the most recent data. We also implemented an optimal solution for handling cases wherein a node’s peer goes down for a while and by the time it comes up, its database is outdated.

On testing the code, we found that the network could send and simultaneously receive 1 million messages to and from its peers in 3-4 minutes. Yay! 🙂

My mentor Harshad has done a great job repainting the code, optimizing it and encapsulating it. The code has now been made open source and pushed to the GoIbibo github. Anyone and everyone is welcome to contribute to it. 🙂

Link to Harshad’s GoCache Library:  https://github.com/goibibo/stash

Link to my library which is still under development: https://github.com/elitalobo/DistributedCache

Overview of my internship experience:

I enjoyed every bit of it, so much that I wished I never had to return to college! 😀 I made really great friends and we had a tons of fun together! 🙂  The timings were flexible. I had all the freedom to voice my ideas and and they were always considered! And the best part of the internship is that I got to learn a lot from my projects and my awesome mentors which I doubt I could have learnt elsewhere! 🙂 Summer was well spent! 🙂

PS: The projects I have worked on haven’t yet gone into production as GoIbibo is rewriting the entire backend code in Go. 

My Internship Experience @GoIbibo – Part 2

Hi Everyone, This is the continuation of the previous post. In this post, I would give you a brief insight of my second project which was written in Go!

Project 2:

The goal of this project was to write a distributed cache library in Go which would store data in the form of key-value pair in-memory of each node and thereby reduce the latency which is generally encountered in Memcached or Redis. The major reason we used Go is because of its awesome concurrency feature and its channels which are greatly useful for reducing delay in networks. I was mentored by Mr Jyotiswarup Raiturkar who is the Head Architect of GoIbibo, Neeraj Kaur and Harshad Patil .

Week 1:

Learnt Go and networking in Go from various resources. I was then assigned my first task. I had to implement a basic peer to peer network which has the following functions.

1. Set key-value pair in the node’s own database and also send the key-value pair to its peers.

2. Get function which retrieves value of the key specified if it exists in the node’s database.

3. Delete function which deletes key from the node’s database and tells its peers to delete the same key.

Week 2:

Implemented the peer to peer network with the above functionalities. Akansha Gupta wrote an api for Ledisdb-Goleveldb cache which I used for maintaining a storage for each node.

Week 3:

Implemented a BusSocket for each node which can listen to peer nodes, dial to peer nodes, receive messages from peer nodes and evaluate them one at a time and also send messages to peer nodes concurrently.

Week 4:

Optimized the code and encapsulated it. Implemented automatic redialling until connection is established,  when connection is lost and EOF is received on the connection. Each node could send acknowledgement to its peers after evaluating the message received. If a node fails to receive acknowledgement of the message it sent within the timeout period, it would break the connection and redial to the peer. On testing the code, we found that the network could send and simultaneously receive 1 million messages to and from its peers in 3-4 minutes. 🙂  Harshad implemented stale data error acknowledgement which notifies the sender node that the data it sent is stale and updates the sender’s node database with the most recent data.

After returning to my college:

I handled the case wherein a node goes down for a while and there is a possibility that its database is outdated by the time is becomes active again. My mentor Harshad had implemented mqueues which basically stores the latest m messages (value of m can be configured by the user) to be sent to a peer in a queue (unique to each peer) if the peer is down. So when the peer is up again, the node will dial to it and start sending it messages from the queue. One major limitation of mqueue, was its size. The older messages get lost when the size of the queue exceeds m (maximum size).  I eliminated this limitation by maintaining a cache for each peer of the node. Each time the size of the queue exceeds its maximum size, the older data would be flushed into the cache maintained for that particular peer. Also we would set an expiry time over the messages flushed to the cache so that more than 1 week old messages will automatically be expired on the cache. We have used goleveldb as storage as it allows concurrent read and writes which makes it super fast. When the peer comes up again and dials to the node, the node will iterate over all the messages in the cache (messages which the node has missed) and send it to its peer. It would then try to send the messages from the queue if any, to its peer.

My mentor Harshad has done a great job repainting the code, optimizing it and encapsulating it. The code has now been made open source and pushed to the GoIbibo github. Anyone and everyone is welcome to contribute to it. 🙂

Link to Harshad’s GoCache Library:  https://github.com/goibibo/stash

Link to my library which is still under development: https://github.com/elitalobo/DistributedCache

My Internship Experience @ GoIbibo – Part 1

Hello everyone,

This summer I spent my summer hacking away on some cool projects at GoIbibo. So here is a detailed description of my first project in GoIbibo!

PROJECT 1:

Getting into the technical details of my project, the primary aim was to reduce the no of invalidations occurring when Customers find that the prices of the flights have changed on navigating from the search result page to booking page. These invalidations increase drastically during sales and it was my responsibility to come up with an intelligent algorithm that would handle sales and reduce the no of invalidations occurring everyday!

Week 1:

My mentor, Phani gave me the invalidation logs consisting of millions of invalidations and I began my research. I plotted graphs for each route to find a relation between various factors that might result in sudden increase of invalidations and  tried to find a way to use this data to resolve the problem.  I had a discussion with Mr Neeraj Kaul (Director of the Control team), on what I had derived from the graphs. He helped me figure out more factors which can also result in increased invalidations.

Week 2:

From the graphs, I was able to derive the following conclusions.                          1. There is a nearly linear relation between the time left for the journey date and the time it takes for the data obtained from a new invalidation to get invalidated.

2. Time taken for this data to expire is also inversely proportional to demand.

3. Demand generally increase on weekends but one could observe demand sometimes increasing on week days as well.

4. There is no relation between the price change and time taken for the new data to get invalidated.

I realised that this is purely a machine learning problem and hence started learning machine learning concepts from Andrew Ng’s Machine learning course and simultaneously implemented the algos to predict the time , a new data would take to get invalidated. I began with linear regression curves. I used scipy (a Python Machine learning library) to determine the coefficients of various linear regression curves and checked the accuracy of the predictions. I implemented gradient descent algorithm which would correct the coefficients of the curves depending on the mean square error of the predicted values. I was able to achieve 80-90% accuracy using these algorithms. :/

WEEK 3:

Still not satisfied with the performance, i decided to dive into neural network concepts and see if I can further improve the performance. The first few days of the week went in understanding multi-layer neural networks and implementing them. Basically neural networks is used to find a pattern between the attributes and the results when we are not sure about the pattern or relation between the attributes and the results, ie- It is used for unsupervised machine learning. In this case, we have several factors like demand which is interdependent on various factors like time before journey date, whether it is a weekday or weekend etc so Neural networks seemed to be the best approach to solve this problem.  I implemented single layer and multi-layer neural network with gradient descent for error correction. I tried varying the no of neurons and other constant values like learning rate etc but in vain, the best I could achieve was 80% accuracy. The performance kept changing each time since the weights between the neurons were initialized randomly. 😦

A few reasons why I chose not to go forward with the neural network are as follows:

1. Since the neural network is entirely responsible for generating a pattern from the past data, and the past data included sales data as well which couldn’t be filtered out, the neural network was generating a wrong pattern most of the times.

2.  It is difficult  to correct the pattern generated based on the errors because error correction required a large amount of training data and takes a lot of time (around 30 minutes).

3. The neural network took to much time to learn and it was required to store 360 coefficients for each route corresponding to each airline (We currently have 40,000 such combinations) which seemed to be an unnecessary waste of memory.

4. If the no of nodes in each layer is not optimal, there can errors arising due to over-fitting of the curve or under-fitting of the curve. It is not possible to figure out an optimal no of  nodes in each layer for 40,000 curves which may differ from one another due to various reasons.

5. The pattern generated from past data may be different from the current pattern due to reasons like sudden change in demand, sales etc. The neural network will generate a curve without considering the fact that time left before journey date is directly proportional to time it takes for new data to get expired and it may happen that the pattern generated does not follow this logic due to the errors brought in by the sales.

WEEK 4:

With the help of a fellow employee Akansha Verma, I started testing these results with some machine learning tools which has all the machine learning algorithms inbuilt ( just incase there is some fault with my implementation of the machine learning algos) and found that there was not much difference between the results.

I discussed this with Neeraj Sir and we finally came to a conclusion that we would use polynomial regression to learn from past data and divide the data into 3 or 4 categories to improve accuracy. Also we devised an algorithm which would handle sales based on current scenario analysis, independent of the machine learning algos. So basically we would determine true_expiry (time for which the data is to be stored in cache) from the polynomial regression curves, generated for the past data and logical expiry would be determined from the time it took for the most recent data to get expired. Expiry in redis cache would be set with respect to true_expiry. Whenever an invalidation occurs, we would check if the time taken for the most recent  data to get invalidated is way less than the calculated value. If yes, logical expiry would be half the time it took for recent data to get invalidated else logical expiry would be same as true_expiry. Logical expiry not equal to true_expiry  would generally mark the beginning of any sale or sudden increase in demand. This algo would ensure  that  the sales are handled properly. When a cached data is logically expired, it would hit the api to get the new price. If price is same as the data in the cache which has been logically expired, it will reset the logical expiry to the new true_expiry calculated from the current time left for the journey date to arrive. So when a sale ends, it would go back to using what it had learnt from past data.  With this we were able to achieve an accuracy of 95% and above  with no negative values of expiry time. In case by any chance if the above algorithms generate negative value of expiry time, it would recalculate using default values of coefficients of polynomial regression curves. Thus an intelligent cachebot was built which considers past data analysis as well as current scenario analysis and intelligently determines the time for which the new data is to be cached. 🙂

WEEK 5:

The whole week was spent testing the performance of the above devised algorithm and fixing bugs in code. 🙂

Concurrency in Go

Go is described as a concurrent -friendly language. The reason behind this is that it provides a simple syntax over two powerful mechanisms – goroutines and channels. Before we go forward with understanding the use of Goroutines and Channels in Go and how to implement them, let me make it clear that concurrency does not imply parallelism. Concurrency is about dealing with multiple task at once! Whereas parallelism is about doing  multiple task simultaneously. Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

Goroutines:                                                                                                                     

A Goroutine is similar to a thread but it is scheduled by Go and not the Os. Code that runs in a Goroutine can run concurrently with other code. Goroutines are easy to create and have little overhead. Most of the goroutines will end up running on the same underlying Os thread. Hence Goroutines are means of creating concurrent architecture of a program which could possibly execute in parallel in case the hardware allows it. Note that Goroutines run in the same address space, so access to shared memory must be synchronized.Lets take a look at an example.


package main

import (
     “fmt”
     “time”
)

func say(s string) {
     for i := 0; i < 5; i++ {
         time.Sleep(10* time.Millisecond)
         fmt.Println(s)
       }
}
func say1(s string) {
     for i := 0; i < 5; i++ {
          time.Sleep(10* time.Millisecond)
          fmt.Println(s)
      }
}

func main() {
go say(“world”)
say1(“hello”)
}


go say(“world”)  starts a new go gubroutine whereas say1(“hello”) is executed in the current goroutine. The output of above code is as follows:


hello world                                                                                                                                                                                    hello world                                                                                                                                                                                    hello world                                                                                                                                                                                    hello world                                                                                                                                                                             hello world


Creating goroutines is trivial, and they are so cheap that we can start many. However, concurrent code needs to be coordinated. Let us see why!


package main
import (
    “fmt”
    “time”
)
var counter = 5
func main() {
    for i := 0; i < 2; i++ {
        go increment()
   }
    time.Sleep(time.Millisecond * 10)
}
func increment() {
    counter++
    fmt.Println(counter)            
}

What do you think will be the output?  6 and 7 ?. When you run the above code, that is probably what you might get but in actual the behavior undefined. This is because, we have to goroutines trying to write to the same variable -counter simultaneously or in worst case, while one is writing to it, the other must be reading it. Could this possibly be dangerous? Yes, absolutely. counter++ might seem like a simple line of code, but it actually gets broken
down into multiple assembly statements – the exact nature is dependent on the platform that you’re running. The system may even crash due to this.
Hence we need to synchronize the writes. There are various ways to do this, including using some truly atomic operations that rely on special CPU instructions. The most common approach is to use a mutex. However if we don’t use mutex wisely, we may end up with deadlocks.

Channels to the rescue!                                                   

A channel is a communication pipe between goroutines which is used to pass data. In other words, a goroutine that has data can pass it to
another goroutine via a channel. The result is that, at any point in time, only one goroutine has access to the data. Channels serve to synchronize execution of concurrently running functions and to provide a mechanism for their communication by passing a value of a specified type. Channels have several characteristics: the type of element you can send through a channel, capacity (or buffer size) and direction of communication specified by a <- operator. You can allocate a channel using the built-in function make:


c := make (chan int)


Channel supports two operations:                                                                                                                                               a) Sending   CHANNEL

We can think of a channel as a conveyer belt or a pipe having defined size and capacity.The number of items that the channel (which is our conveyer belt) can hold is important. It indicates how many items can be worked with at a time. Even if the sender is capable of producing multiple items, if the receiver is not capable of accepting them, then it won’t work.   If the capacity of the channel is 1 – i.e. once an item is placed on the channel, it has to be taken off before another one is put in its place, this becomes a synchronous channel. Each side – the sender and the receiver – is communicating one item at a time, and has to wait until the other side performs either a sending or a receiving correspondingly. If the channel capacity is not specified ,by default, senders and receivers block until the other side is ready. This allows goroutines to synchronize without explicit locks or condition variables.  If a channel is unbuffered , the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value has been copied to buffer. If the buffer is full, sender will wait until some receiver has received the values. With this understanding, let us look at some codes.


package main

import (
     “fmt”
     “time”
     “strconv”
)

var i int

func makeItem(cs chan string) {
     i = i + 1
    itemName := “Item ” + strconv.Itoa(i)
    fmt.Println(“Making an Item and sending …”, itemName)
    cs
}

func receiveAndPackItem(cs chan string) {
     s :=
     fmt.Println(“Packing received item: “, cs)
}

func main() {
     cs := make(chan string)
     for i := 0; i<3; i++ {
        go makeItem(cs)
        go receiveAndPackItem(cs)

       //sleep for a while so that the program doesn’t exit immediately and output is    //clear for illustration
      time.Sleep(1 * 1e9)
     }
}


Output is as follows:

Making an item and sending … Item 1
Packing received item: Item 1
Making an item and sending … Item 2
Packing received item: Item 2
Making an item and sending … Item 3
Packing received item: Item 3
So what happens when the above program is executed? As we can see in the for loop, each time two new goroutines are created to execute makeItem and receiveAndPackItem and both the goroutines listen to the same channel. makeItem creates an item and sends it on the channel and the receiver receives it from the channel and packs it. Since the channel is synchronous with a default
capacity 1, making an item and packing it should happen before a new item is made and sent to the channel. Now let’s play around with the code to understand it in depth.

package main
import (
    “fmt”
    “time”
    “strconv”
)

func makeItem(cs chan string, count int) {
     for i := 1; i <= count; i++ {
         itemName := “Item ” + strconv.Itoa(i)                                                           

        fmt.Println(“Making Item:”, itemName)
        cs
      }
}

func receiveAndPackItem(cs chan string) {
     for s := range cs {
        fmt.Println(“Packing received item: “, s)
     }
}

func main() {
     cs := make(chan string)
     go makeItem(cs, 5)
     go receiveAndPackItem(cs)

//sleep for a while so that the program doesn’t exit immediately
    time.Sleep(3 * 1e9)
}


When we run the above code, the receiverAndPackItem runs an infinite forloop. It does not know how many items are going to be pushed to the channel and when it should stop listening to the channel. A sender can close a channel to indicate that no more values will be sent. Receivers can test whether a channel has been closed by assigning a second parameter to the receive expression:         v, ok := <-c

if ok is false, there are no more values to be sent and the channel is closed.  Go provides the range keyword which when used with a channel will wait on the channel until it is closed.  The output of the above code is:

Making Item: Item 1
Packing received item: Item 1
Making Item: Item 2
Making Item: Item 3
Packing received item: Item 2
Packing received item: Item 3
Making Item: Item 4
Making Item: Item 5
Packing received item: Item 4
Packing received item: Item 5

It is important that we understand that the output as shown is not the correct reflection of the actual sending and receiving on the channel. The sending and receiving here is synchronous – one item is made at a time and packed immediately. However due to the time lag between the print statement and the actual channel sending and receiving, the output seems to indicate an incorrect order. So in reality what is happening is:

Making Item: Item 1
Packing received item: Item 1                                                                                                                                                     Making Item: Item 2
Packing received item: Item 2                                                                                                                                                    Making Item: Item 3
Packing received item: Item 3                                                                                                                                                    Making Item: Item 4
Packing received item: Item 4                                                                                                                                                    Making Item: Item 5
Packing received item: Item 5

Select:

The select keyword when used in conjunction with many channels checks which channel is ready and accordingly performs transmission or reception of data to/from that channel. The case blocks within it can be for sending or receiving – when either a send or a receive is initiated with the <- operator, then that channel is ready. There can also be a default case block, which is always ready. The algorithm with which the select keyword works on blocks can be approximated to this:
* check each of the case blocks
* if any one of them is sending or receiving, execute the code block corresponding to it
* if more than one is sending or receiving, randomly pick any of them and execute its code block
* if none of them are ready, wait
* if there is a default block, and none of the other case blocks are ready, then execute the default block (I’m not very sure about this, but from coding experiments, it appears that default gets last priority).

Now lets take a look at this final code.


package main

import (
     “fmt”
     “strconv”
     “time”
)

func makeBeverage(cs chan string, flavor string, count int) {
     for i := 1; i <= count; i++ {
        beverageName := flavor + strconv.Itoa(i)
        cs
     }
     close(cs)
}

func receiveAndPackBeverage(coffee_cs chan string, tea_cs chan string) {
     coffee_closed, tea_closed := false, false

     for {
         //if both channels are closed then we can stop
         if coffee_closed && tea_closed {
             return
         }
        fmt.Println(“Waiting for a new beverage …”)
       select {
           case beverageName, coffee_ok := <-coffee_cs:
            if !coffee_ok {
           coffee_closed = true
           fmt.Println(” … Coffee channel closed!”)
          } else {
          fmt.Println(“Received from Coffee channel. Now packing”, beverageName)       }                                                                  

case beverageName, tea_ok := <-tea_cs:
      if !tea_ok {
      tea_closed = true
      fmt.Println(” … Tea channel closed!”)
      } else {
      fmt.Println(“Received from Tea channel. Now packing”, beverageName)
}
   }
  }
}

func main() {
     coffee_cs := make(chan string)
     tea_cs := make(chan string)
     go makeBeverage(tea_cs, “tea”, 3)
     go makeBeverage(coffee_cs, “coffee”, 3)

    go receiveAndPackBeverage(coffee_cs, tea_cs)

    //sleep for a while so that the program doesn’t exit immediately
    time.Sleep(2 * 1e9)
}


The output is as follows:

Waiting for a new beverage …
Received from Tea channel. Now packing tea1
Waiting for a new beverage …
Received from Coffee channel. Now packing coffee1
Waiting for a new beverage …
Received from Tea channel. Now packing tea2
Waiting for a new beverage …
Received from Coffee channel. Now packing coffee2
Waiting for a new beverage …
Received from Tea channel. Now packing tea3
Waiting for a new beverage …
Received from Coffee channel. Now packing coffee3
Waiting for a new beverage …
… Coffee channel closed!
Waiting for a new beverage …
… Tea channel closed!

This output actually makes sense with respect to the above algorithm. Whenever the tea/coffee channel is empty, it accepts another tea/coffee from the sender and when a tea or a coffee is send to a tea/coffee channel, it becomes ready to output the data to a receiver. If there is no more tea/coffee to be sent to the tea/coffee channel, coffee_ok/tea_ok becomes false and the receiveAndPackBeverage will stop listening to the respective channel.

Resources: The Little Go Book , A Tour of Go

                                                     THE END

Django: South Migrations made easy!

Hi, everyone! This is a basic tutorial on South Migrations as explained by my mentor Mr Sayan Chaudhury.

What are Migrations and why do we need them?

Migrations are Django’s way of propagating changes to your existing database schema (adding new fields or modifying/deleting existing fields in an database model).We all know that syncdb effectively creates database tables from the models.py file in your django app folder. But as your app evolves with time, you may need to add many new fields to your existing database models and syncdb cannot alter existing database tables! This is one major limitation of syncdb and hence we don’t generally use it more than once while developing a django app.South is a Django project which provides easy to use, consistent and effective database-agnostic migrations which solves this problem.

How does South Migrations work?

Firstly to install South, do the following steps:

  1. pip install south
  2. add ‘south’ to your project’s INSTALLED_APPS
  3. run ‘syncdb’ (before you create your own models).Note that the work of syncdb is done here and we are not going to use it again!
  4. run ‘manage.py migrate‘ to migrate the changes.

Now lets create a new app in our project.

./manage.py startapp south_app

Add this app as well to your project’s INSTALLED_APPS and open south_app/models.py


from django.db import models
class UserProfile(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)


Now you need to migrate these changes.This involves two steps. We need to do a schemamigration followed by a migrate. There are several ways of creating migrations, some are automatic and some are manual. The two basic automatic ways are –auto and –initial. When we doing a schemamigration for the first time, we need to use –initial. This will create the database table for us. Note that once the schemamigration has already been done once, we use –auto instead of –initial. Run the following command to migrate the changes.

./manage.py schemamigration south_app –inital

This command creates a 0001_initial.py file in the south_app/migrations/

This file consist of a class Migration consisting of two fnctions namely forwards function and backwards function. The forwards function creates a database table UserProfile and the backwards function deletes the database table UserProfile. At the bottom of the file you will also find the modified database schema of the app.


coder@coder:~/test/south_test$ ./manage.py schemamigration south_app –initial
Creating migrations directory at ‘/home/coder/test/south_test/south_app/migrations’…
Creating __init__.py in ‘/home/coder/test/south_test/south_app/migrations’…
+ Added model south_app.UserProfile
Created 0001_initial.py. You can now apply this migration with: ./manage.py migrate south_app


Run ./manage.py migrate south_app to  apply the migrate.                                  This command will run 0001_initial.py file which will create the database table UserProfile and add an entry to the SouthMigrationHistory table. The SouthMigrationHistory table keeps a record of all the migrations that have been done so far and have filename, appname and timestamp as its attributes.


coder@coder:~/test/south_test$ ./manage.py migrate south_app
Running migrations for south_app:
– Migrating forwards to 0001_initial.
> south_app:0001_initial
– Loading initial data for south_app.
Installed 0 object(s) from 0 fixture(s)


You can now add entries to the UserProfile table from the shell.

Now we will try adding new fields to UserProfile and modifying the existing fields.


from django.db import models
class UserProfile(models.Model):
first_name = models.CharField(max_length=100, unique=True)
last_name = models.CharField(max_length=100)
is_student = models.BooleanField()


We have set the first_name to be unique and added boolean field is_student.

Run ./manage.py schemamigration UserProfile –auto  for migrating the above changes.

Note that we have replace –auto by –initial since this time we have modified an existing model and did not create a new one.


coder@coder:~/test/south_test$ ./manage.py schemamigration south_app –auto
+ Added field is_student on south_app.UserProfile
+ Added unique constraint for [‘first_name’] on south_app.UserProfile
Created 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n.py. You can now apply this migration with: ./manage.py migrate south_app


Notice that a new file 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n.py.  has been created in the south_app/migrations directory. The forwards function in this file is for adding is_student field and adding a new unique constraint to first_name and the backwards function is for deleting  is_student field and removing the unique constraint on field first_name.

Now run ./manage.py migrate south_app to apply this migrate. This will create a new entry in the SouthMigrationHistory table and run 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n.py.  which will make the required changes in the database table UserProfile.

Few more things you need to know about South Migration…

  • If we make a few modifications to a model and migrate the changes and then realize that these modifications need refinement, in that case we can use –update.Consider the following UserProfile model.

from django.db import models
class UserProfile(models.Model):
first_name = models.CharField(max_length=100, unique=True)
last__name = models.CharField(max_length=100)
is_student = models.CharField(max_length=100)


Consider that we have already run the schemamigration and migrate commands and now we need the last__name to last_name and is_student should be a boolean field instead of a char field.

In this case, we can make the required changes in the model and run the following commands.

./manage.py schemamigration south_app –auto–update            


coder@coder:~/test/south_test$ ./manage.py schemamigration south_app –auto –update
+ Added field is_student on south_app.UserProfile
+ Added unique constraint for [‘first_name’] on south_app.UserProfile
Migration to be updated, 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n, is already applied, rolling it back now…
previous_migration: 0001_initial (applied: 2014-12-26 06:56:32.214512+00:00)
Running migrations for south_app:
– Migrating backwards to just after 0001_initial.
< south_app:0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n
Updated 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n.py. You can now apply this migration with: ./manage.py migrate south_app


South will remove the most recent migration which created the model but included the mistakes and replace it with a new migration that corrects the mistake. It will use the backwards function to remove the previous migration.

We can now run ./manage.py migrate south_app to apply the migrate.

  • South also automatically detects ManyToMany fields; when you add the field, South will create the table the ManyToMany represents, and when you remove the field, the table will be deleted.
  • Sometimes it may so happen that we add dumps as test data which can result in an error that a particular field is not found. This occurs since the SouthMigrationHistory table has an entry which shows that the field has been added but the field does not exist in the database table. This issue can be solved by first commenting that particular field in the models.py file and running schemamigration followed by a migrate command with –fake . This will create an entry in the SouthMigrationHistory table that the field has been deleted without actually removing the field from the database table (ie- since the field doesn’t actually exist in the database table). Now uncomment the field in the models.py file and do a schemamigration  followed by a migrate without fake.This will add an new entry in the SouthMigrationHistory table that a new field has been created and it will also add this field to the database table.

Note that if we have done a list of schemamigrations without a migrate, when we do a migrate for the first time it will run the files created in the migrations folder sequentially in the order in which they were created.

  • Lastly, inorder to list the migrations that we have applied so far run the following command.

./manage.py migrate –list

We would get the following list of migrations that we have applied.


south_app
(*) 0001_initial
( ) 0002_auto__add_field_userprofile_is_student__add_unique_userprofile_first_n


Final GSOC Report: Revamping the UI of Gnome-Calculator and Implementing History-View

Hi  Everybody,

The amazing  Google Summer of Code has finally come to an end and I feel very lucky for having the opportunity of being a part of it. I learned a lot of new things as well as had a lot of fun this summer doing what I like to do the most! Let me summarize what I have successfully implemented so far in Gnome-Calculator.

1. History View which displays previous calculations for modification and reuse.

2. Explicit Keyboard Mode with resizing capability.

3. Improved the appearance of User Interface.

Although GSOC has come to an end, my journey as a GNOME Developer has just begun. I plan to continue contributing to Gnome as long as I can and also try to make Gnome-Calculator as advanced as possible.

You can find all the patches related to my GSOC project here.

I would like to sincerely thank my mentor Arth Patel , Garima Joshi and Michael Catanzaro for patiently answering several queries I had during the GSOC period and for helping me to complete my GSOC Project successfully.

Cheers!

Final Week of GSOC : Revamping the UI of Gnome-Calculator and adding History-View

Hi Everybody,

We have introduced a new mode into Gnome-Calculator, that is the Keyboard Mode. The Keyboard Mode as its name suggests, is designed for users who prefer to give input to the calculator only from keyboard. This mode does not consists of buttons panel, instead the space is occupied by button panel in other modes is allocated to History View. The mode also allows resizing of the Main Window. On resizing the Main Window, the History View resizes accordingly.

The keyboard mode appears as follows.

keyboard_mode

I have also added Documentation for History View as well as Keyboard mode in the Users Documentation.