Your Typing Sucks More Than Your Vacuum Cleaner !!
Welcome back to another irrelevant and crazy project (* hey hey its important alright!! ). After the successful launch (* me believing it was a crazy hit) of the infamous T-Man (* your AI powered Tinder Wing-man), If you haven’t read it , go read it (* please me wants more views) T-Man.
I welcome you to Protyper.
(* In a dramatic scene)
Have you ever felt like your typing sucks more than your vacuum cleaner ? And if typing was a football match , you were Courtois (* messi nutmegging you so hard that you wanna cut off your legs ….. Visca El Barca ……. no offense … I can do this all day).
* wait wait I got one more
Does your typing sucks more than lana rhoades (* god isn't she beautiful ) in her movies ? Protyper is just the product for you. (* officer arrest me , cuz i am killing it!!! …. let that officer be lana please! )
Okay enough ! lets get technical now.
As usual I will be dividing the problem into several sub problems and try to solve them individually.
- Capturing screen
- Detecting characters within the captured image
- Controller to detect and respond to user input
- Creating notifications to show information
Tackling this problem is fairly easy. There are tons of tools in python to take screenshots. In this project we will be using pyautogui. As pyautogui lets you to capture a specific region of your screen.
Detecting characters withing the captured image
There are two options to tackle this problem. The hard option and the easy option. The hard option is that we could create our own neural network from scratch to detect character (*oh come on who does that). The easy option is to use some else’s model (* work smart not hard). So, in this project we will be using pytesseract and OpenCv to detect characters.
Controller to detect user input
Once again there are tons of tools that provides interfaces to listen and control mouse / keyboard events. In this project we will be using pyinput.
Creating notification to show information
Creating notification is not that hard. We could write our own code to access dbus to create notification or use library to do it for us. (* as you have already guessed I am lazy and will be using some library) In this project we will be using plyer to create OS level notifications.
As usual you can find all the implementation on my personal GitHub repository. Pro-Typer
- python 3.10 (*cuz why not !! … it came with switch statements yehhhhhh)
For debian system do
sudo apt install tesseract-ocr
Note: Do this if you do not want to add tessaract executable in you PATH manually
Setting Up Project
- Clone the repo
2. Go inside pro-typer
3. Install requirements
pip3 install -r requirements.txt
We will start by creating an abstract class namely Trigger, which will monitor all the key events.
........... bla bal bla ........ @abstractmethod
def on_keydown(self, key):
# do stuff
You can see that I have created an abstract method namely on_keydown which will let us write our business logic.
Next, we will be creating somewhat of a helper class which will store starting and ending coordinates of the region to be captured with respect to cursors position.
class ImageSize:............. bla bla bla ....................... def is_valid(self):
if self.start is None or self.end is None:
height, width = self.height_width
x_valid = self.start < self.end
y_valid = self.start > self.end
h_w_valid = height > 28 and width > 28
status = x_valid and y_valid and h_w_valid
if not status:
So, the main intuition behind valid region is that x coordinate of start must be less than x coordinate of end and y coordinate of start must be greater than y coordinate of end.
Screen shoot mechanism is simple. I wont be explaining how its done but you might find a weird keyword namely callbacks.
def __init__(self, callbacks=None):
self.image = None
self.callbacks = callbacks
self.image_size = ImageSize()
self.mouse_controller = Controller()
It is just a fancy way of passing function within a function. Whenever we take a screenshot all the callback functions are called by passing the captured image.
_ = None if self.callbacks is None else [calls(self.image) for calls in self.callbacks]
ImageToChar class inside OCR is just a simple abstract class to detect characters within the passed image and predict the detected character which will be stored in a variable called prediction.
To control all the activities we will be creating a class namely SysController , cuz why not. Adding Sys to a class name makes the class look more important than other classes (* Rip my logic).
The key map goes like.
Ctrl : If you press ctrl it takes a screenshot of your currently visible screen
Esc : If you press escape then it will terminate the program
Shift: If you press shift then it will capture the starting coordinates which is your pointer’s location in the screen.
Alt: If you press alt then it will capture the ending coordinates which is also your pointer’s location in the screen (*but different lol. I am very bad at explaining stuffs).
Down : And finally if you press your down arrow key then the script will type the captured characters for you.
Voilà you have a OCR on steroids now. Here are the few screenshot of the system.
Now you can type gazillion words in seconds. You can use this program to comment on my post (* me promoting myself in my own article)
Who is Sabin Sharma ?
If you are interested in automation you should check my other article Automating Tinder?! Perks Of Being A Programmer
If you are interested in Algorithms you should check my other articles, Molecular Dynamics Simulation of Hard Spheres — Priority Queue In Action With Java , Make Your Own AI Powered Game, NVIDIA Fan Controller For Linux(DIY).
And If you are interested in Cyber Security read my recent article Getting Your Hands Dirty: Exploiting Buffer Overflow Vulnerability In C