Instabase Challenge @ HackMIT
Prize: $1000 to split amongst the team - instabase.com/hackmit
Instabase is a data processing platform providing services such as OCR, text extraction, natural language processing, and text classification. We’re awarding $1,000 to the team that demonstrates the best use of our platform to solve a problem! More specifically, solve a problem using our platform to do:
- Optical Character Recognition
- Natural Language Processing
- Clustering and classification
- Information extraction and web scraping
- Python Scripting
Visit instabase.com/hackmit for more information about our API.
Getting Started
1. Create an account on Instabase
In order to create an account, go to our registration page and use the following token:
hack-mit-2019
2. Create a new workspace/repository
A repository/workplace is a place to hold all of your code. Create one by clicking “New Workspace” in the Workspaces menu.
You can also share workspaces by clicking the IB icon and clicking the share icon next to the workspace name.
Quick terminology / explanation
Instabase let’s you automate data processing by creating Flows. A Flow is a series of steps, most typically OCR -> Extract Information. Here is an example of an end-to-end solution.
To run a Flow, simply click Tools -> Run, and select an input folder with your documents.
This page has more information on a typical project setup.
Need Help?
We will be here all weekend to help with your projects! Come by our booth in the area, or contact our mentor at aaron@instabase.com
You can also text our mentor at (860) 805-0050
What can I do with Instabase?
Below is a list of just a few of the capabilities of Instabase. Follow the link for each one and copy the folder into your workspace/repository. Each project has a README detailing the capabilities and how to run that project.
For a full list of features and documentation, visit our documentation site.
To start using these projects, right click the folder and click “copy” to copy the code into your own workspace.
Extract Text from Images (OCR)
Instabase provides a process for extracting text from images, PDFs, scans, emails, and more, with a variety of document quality. The output can include character-level confidence and positional information as well. Instabase OCR can also handle document rotations.
Try it out by running the project flow.
Example output for a receipt can be found here (click View Text), with many more found here.
LUKES
LOCAL
Luke's Local
960 Cole Street
San Francisco, CA
Order# 275838
Station# POS2
Server: Rafael
Date: 11/7/17, 8:27 PM
Transaction: 10275838
Luke's Potato Salad - 8oz $2.99
Luke's Smoked Trout Salad $12.99
Subtotal: $15.98
Total Tax: $0.00
Total: $15.98
Paid With: VISA XXXX2278
Bill: JESSICA LIN
Total: $15.98
Extract Information from Documents/Images
Instabase provides tools and techniques for extracting information from documents. For instance, given the OCR and scan of the receipt example above, we can easily extract items such as name, total amount, subtotal, and more. Using an app called Refiner, we provide over 50 various functions for extracting information.
A list of all functions available are found here, including everything from regex-based extraction to advanced NLP techniques.
Try it out by running the project flow.
Classifying Text-based Documents
An automated process for classifying and extracting information from document is available on Instabase. We provide a few different classification techniques to get you started that work great on most documents, and allow you to create your own classifiers.
Full documentation can be found here, while an example project to get you started can be found here.
Try it out by running the project meta flow.
Natural Language Processing
Instabase provides tools for doing natural language processing, such as sentiment analysis, entity tagging, and entity-action analysis/topic classification.
Check out this Jupyter Notebook for an example, or checkout the Adverse News app.
Jupyter Notebooks
Instabase provides hosted Jupyter Notebooks that have access to Instabase functionality. Creating a notebook is as easy as clicking “New” > “New Notebook”.
Note: Sometimes renaming a notebook will say that it failed - clicking rename again will fix that issue.
Common FAQ
How do I use Instabase via API?
Most Instabase functionality is available through both UI and and API. In order to use Instabase via API (for instance, to run Flows, OCR, NLP, read and write files, etc…), create an access token by doing the following:
- Go to instabase.com
- Click the IB Icon (to access the main menu)
- Hover over “Help”
- Click “Developer SDK”
- Click “New Application” in the top right of the screen
- Enter a name and description for your project
- Copy the Access Token generated for your project
Details for API calls can be found in our documentation
The Access Token is passed via a header in your HTTP requests. For instance, in Python this might look like the following:
url = 'some/url/to/instabase/features'
headers = {
'Authorization': 'Bearer {0}'.format(my_access_token),
'Instabase-API-Args': json.dumps(my_api_arguments)
}
resp = requests.post(url, headers=headers).json()
Join Instabase!
Looking for an internship or full time opportunity? Check out our career page.
Curious about what our internships are like? Check out this blog post from one of our recent interns!