Data Analytics
08-741 Very Large Information Systems
This course studies the theory, design, and implementation of text-based information systems. The IR core components of the course include important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The course covers a variety of current research topics, including cross-lingual retrieval, document summarization, machine learning, and topic detection and tracking.
Prerequisites: None
Units: 12
Schedule: Fall semester
11-441 Search Engines and Web Mining
This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course examines text search engines for enterprise and web environments; the open-source Indri search engine is used as a working example. The second half of the course explores text mining techniques such as recommender systems, clustering, and categorization. Programming assignments allow for a hands-on experience in document ranking, evaluation, and classification into browsing hierarchies, as well as other related topics.
Prerequisites: Programming and data-structures proficiency at the 15-211 course level or higher. An understanding of algorithms comparable to the CMU 15-451 course level or higher. An understanding of basic linear algebra, comparable to the CMU 21-241/ 21-341 level. An understanding of basic statistics, comparable to the CMU 36-202 course level or higher.
Units: 12
Schedule: Fall semester
11-741 Information Retrieval
This course studies the theory, design, and implementation of text-based information systems. The IR core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. A variety of current research topics are also covered, including cross-lingual retrieval, document summarization, machine learning, topic detection and tracking, and multi-media retrieval.
Prerequisites: Solid programming skills.
Units: 12
Schedule: Fall & Spring semesters
15-826 Multimedia Databases and Data Mining
This course covers advanced algorithms for learning, analysis, data management and visualization of large datasets. Topics include indexing for text and DNA databases, searching medical and multimedia databases by content, fundamental signal processing methods, compression, fractals in databases, data mining, privacy and security issues, rule discovery and data visualization.
Prerequisites: Introductory database course,similiar to CMU 15-415 course. Familiarity with B-trees and Hashing.
Units: 12
Schedule: Fall semester