To-Do & Ideas

Here we list a few major projects we are planning to do in the future and that you are very welcome to join. Please contact us if you are interested in a cooperation, e.g. as part of your Master/Bachelor/PhD thesis or an internship. If you don’t want to start with a big project, you may have  a look at our Ticket System to work on smaller tasks.

Academic Recommender System
Mind Map Analysis
Usability Enhancement
Mobile Viewer (Android & iOS)
Online Viewer
Integrating Google Scholar
Bibliographic Metadata Spider
Academic PDF Spider
Open/LibreOffice Add-On
Header Extraction from academic PDFs
Deduplication of Records in Academic Literature Databases
Citation Extraction from Academic PDF Documents

 

Academic Recommender System

Brief Description: One of our main goals is recommending relevant papers, journals, conferences, universities, etc. to our users. To give relevant recommendations we need detailed user models and there are various approaches to do so (content based and collaborative filtering). However, classic user modeling approaches usually focus on modeling interests based on websites, emails, etc. but not on mind maps as Docear is using them. Therefore, it will be your task to find out which existing user modeling approaches are most suitable for mind maps and how they need to be adjusted to be most effective.

This project is a major project which will require lots of time. If you want to do this project as part of an internship of Bachelor/Master thesis, we may break it down to smaller parts. For instance,

  1. You could focus on applying standard user modeling approaches to mind maps. These would be used as a base-line to evaluate the effectiveness of the following approaches
  2. You could focus on the suitability of the terms in a mind map for building a user model and matching them with items (e.g. academic articles)
  3. You could focus on the links and citations in a mind map for building a user model and matching them with items (e.g. academic articles)
  4. You could focus on collaborative filtering, i.e. determining similar users, and recommending items of these similar users
  5. You could focus on combining the previous approaches (of course only after they are implemented)

Expected Results: A Java library that creates user models based on mind maps and matches the user models with appropriate items to recommend.

Expected Research Results (optional): Evaluation of how effective your user modeling library works, based on click through rates or a user study.

Required Knowledge: Java, Databases (MySQL and ideally Neo4j), Text Mining and Information Retrieval (recommended), Machine Learning (recommended), User Modeling (recommended)

 

Mind Map Analysis (Pure research project)

Description: Mind maps apparently differ in their structure and way of creation from emails, academic papers, or web pages. But what exactly are the differences? Your task will be to analyse the thousands of mind maps we collected from our users and to find out what makes them unique. To do so, you will, among others, compare mind maps with web pages and academic articles (it will be up to you to which other documents you compare mind maps with and where to get these other documents from). Your work will be the ground work for our academic recommender system (see above) because to provide good mind-map based recommendations you need to know the unique features of mind maps.

Expected Research Results: A study about the unique features of mind maps, compared to web pages, academic articles, and maybe emails, social tags and search queries.

Required Knowledge: A programming language to analyse the data (preferably Java, C/++, or Python), XML, some MySQL and ideally Neo4j

 

Usability Enhancement

Description: Docear is a great software with lots of unique features but to be honest, it’s not very user friendly. Your task is to improve the usability experience of Docear. Find out what disturbs the users and how to make features more accessable and easier to use.

Expected Results: A new version of Docear that is easier to use than the current version

Expected Research Results (optional): A study to find out how much better your new version exactly is. You could either perform a user study (e.g. give certain tasks to users and analyse how long they need for it) or analyse the usage behavior (we monitor which functions users use how often and how long) and find out whether users of your new version use more feature (or need less time to do certain tasks).

Required Knowledge: Excellent Java skills, knowledge in user interface design, OSGi (recommended) and it would certainly help if you had used Docear for a while to know the workflow.

 

Mobile Viewer

Description: Currently, Docear is available as desktop software only. We would love to offer an iOS and/or Android app to our users that allows them to view their data on their mobile phone. The viewer should be able to display mind maps, PDFs and references that were backuped from the desktop application to Docear’s server. Of course, an editor would be even better than a viewer.

Expected Results: Mobile app to display Docear’s user data on mobile phones and tablets
Expected Research Results (optional): User study evaluating the effectiveness of the application (ideally in comparison with other mobile apps).

Required Knowledge: Java, mobile application development, REST Web Services, Jersey framework, XML

 

Online Viewer

Description: Currently, Docear is available as desktop software only. We want to provide a browser version to our users. It will be your task to develop an online viewer that can display the users’ mind maps and ideally their bibliographic data and PDFs. Also, the viewer should allow editing the mind maps of the users.

Expected Results: An web front-end to display the data of Docear’s users.
Expected Research Results (optional): User study evaluating the effectiveness of the application (ideally in comparison with other applications).

Required Knowledge: HTML (5), Java, JSP, PHP, MySQL, Flash, REST Web Services, Jersey framework, XML,  Java Script (recommended), WordPress (recommended)

 

Integrating Google Scholar

Description: Google Scholar is one of the major academic search engines and we would like to integrate it into Docear. That means, we want our users to be able to search Google Scholar directly from the Docear software. Probably, the easiest way would be to integrate a web browser into Docear that automatically opens the Google Scholar web page. Ideally, you would adjust the browser so it changes the Google Scholar result page and adds a Docear download button that enables users to download the PDFs linked on Google Scholar directly to the literature repository. However, we are open to hear your ideas.

Expected Results: A Java library for Docear to search Google Scholar and ideally download linked PDFs directly to the literature repository.

Required Knowledge: Java, HTML, JavaScript, HTTP

 

Bibliographic Metadata Spider

Description: Digital libraries and the Web contain millions, if not billions, of bibliographic data sets about academic articles (titles, authors, year, etc.). For Docear we need a database containing as much bibliographic data as possible. This database will build the base for many of the other projects, for instance the PDF spider and academic recommender system. A comprehensive database is also important for complementing bibliographic data extracted from PDFs to create references automatically in Docear. It will be your task to develop a spider that searches the web and digital libraries for bibliographic data and adds the data to our database.

Expected Results: A spider searching the Web and academic repositories for bibliographic information and inserting it into our database
Expected Research Results (optional): An evaluation of the performance and effectiveness of your spider

Required Knowledge: Java, HTML, MySQL, REST Web Services (recommended), Metadata standards such as Dublin Core (recommended)

 

Academic PDF Spider

Description: As described above, we would like to offer a recommender for academic articles to our users. Ideally, we want our users to be able download the articles right away. Therefore, we need a spider that searches the Web for freely accessible PDFs of academic articles. Writing a spider that finds these PDFs and adds them to our database will be the goal of this project.

Expected Results: A Java based Web spider, searching the Web for PDFs of academic articles and that stores and indexes the articles
Expected Research Results (optional): An evaluation how effective your PDF spider finds academic PDFs on the Web and hoe effective it recognizes duplicates

Required Knowledge: Java and various Web technologies

 

Open/LibreOffice Add-On

Description: We are already developing an MS-Word plugin to access Docear’s bibliographic data from MS-Word. However, we would like to offer the same functionality for Libre/OpenOffice users. Therefore, we need an add-on that allows users to access their bibliographic data to insert citations and creating bibliographies in OO/LibreOffice.

Expected Results: Add-On for Open/LibreOffice to access Docear’s bibliographic database

Required Knowledge: Java, Add-On developing for Open/LibreOffice

 

Header Extraction from Academic PDF Documents

Description: Maintaining bibliographic information of academic documents (e.g. authors, title and journal) is a labor-intensive process. We want to automate this task by analyzing PDF files, extracting their header information and complete the data with information from our bibliographic database.  It will be your task to evaluate existing libraries for extracting header information from PDF files (among others our own tool), select the most promising tool(s), enhance them and integrate them into Docear. As PDFs usually contain only limited bibliographic information (mostly  authors and title), your tool will send this data to Docear’s bibliographic Web Service which then returns further bibliographic data.

Expected Results: A Java library that takes a PDF document as input and returns the PDFs bibliographic data (title, author, journal, year, etc.) based on header information extracted from the PDF and additional data requested from Docear’s bibliographic database.
Expected Research Results (optional): Evaluation of your  tool in comparison to existing tools

Required Knowledge: Java, C/++, Machine Learning (recommended),  REST Web Service (recommended)

 

Deduplication of Records in an Academic Literature Database

Description: Providing good literature recommendations based on a large digital library requires high data quality. Being able to identify individual documents, authors, journals and alike despite potential spelling variations is a key factor for achieving the required consistency of data in the library. For instance, if a client posts document data to the digital library, suitable algorithms must be in place to check whether the respective document, author etc. is already part of the library. The goal of this project is to develop and integrate algorithms into Docear’s digital library that can identify and de-duplicate individual entities such as authors, documents and journals within the digital library.

Expected results: A Java library that uses all available information on an entity for deciding whether the item is already part of the collection and should be merged with existing records or should be added as a new entry.
Expected Research Results (optional): Evaluation of the effectiveness of your tool

Required Knowledge: Java, C/++, machine learning (recommended), REST Web Services (recommended)

 

Citation Extraction from Academic PDF Documents

Description: Citations within academic documents provide valuable information on related work. We intend to leverage this information as part of Docear for recommending further relevant documents to Docear users. In order to implement this functionality, we need to extract the respective citation information from documents in a structured format. The goal of this project is to develop a library that extracts citation information from PDF documents and integrates it into the Docear digital library. You do not have to develop everything from the scratch, but can use existing solutions like ParsCit.

Expected Results: A Java library that takes a PDF document as input and returns a data structure containing citation metadata
Expected Research Results (optional): Evaluation of the effectiveness of your tool

Required Knowledge: Java, C/++, Machine Learning (recommended)